Configuring Network Proxy

Configuring Network Proxy

Setup of the Network Proxy 

General Information

INFO

Datameer X can be configured to use network proxy servers for both, Datameer X and the Hadoop Cluster, for outgoing http(s) connections, e.g., for web service connectors and import jobs.

  • You can configure either a HTTP or a SOCKS proxy for all outgoing HTTP and HTTPS traffic to the remote system.

  • Additionally, you can configure the Hadoop Cluster a HTTP proxy for outgoing FTP and SFTP traffic:

    • Datameer - HTTP (http/https) or SOCKS (http/https)  

    • Hadoop Cluster - HTTP (http/https/ftp/sftp) or SOCKS (for outgoing connection with http/https)
      INFO: This configuration method gives you more flexibility to specify exactly what each proxy is responsible for on the system network.

Setup

The setup can be configured in the 'das-common-properties' file as follows:

#proxy settings for outgoing connection from Datameer X web application das.network.proxy.type=[SOCKS|HTTP] das.network.proxy.host=<host ip> das.network.proxy.port=<port> das.network.proxy.username=<username> das.network.proxy.password=<password> das.network.proxy.blacklist=<url pattern with '|' separated> #proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocols http, https das.clusterjob.network.proxy.type=[SOCKS|HTTP] das.clusterjob.network.proxy.host=<host ip> das.clusterjob.network.proxy.port=<port> das.clusterjob.network.proxy.username=<username> das.clusterjob.network.proxy.password=<password> das.clusterjob.network.proxy.blacklist=<url pattern with '|' separated>   #automatically find and add all data nodes in the cluster to the blacklist das.clusterjob.network.proxy.blacklist.datanodes.autosetup=true #proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocol ftp das.clusterjob.network.proxy.ftp.type=[HTTP] das.clusterjob.network.proxy.ftp.host=<host ip> das.clusterjob.network.proxy.ftp.port=<port> das.clusterjob.network.proxy.ftp.username=<username> das.clusterjob.network.proxy.ftp.password=<password> #proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocol sftp das.clusterjob.network.proxy.sftp.type=[HTTP] das.clusterjob.network.proxy.sftp.host=<host ip> das.clusterjob.network.proxy.sftp.port=<port> das.clusterjob.network.proxy.sftp.username=<username> das.clusterjob.network.proxy.sftp.password=<password> #proxy setting for Datameer X web application. The configuration impacting traffic of S3, S3 Native, Snowflake and Redshift connectors. das.network.proxy.s3.protocol=[HTTP] das.network.proxy.s3.host=<host ip> das.network.proxy.s3.port=<port> das.network.proxy.s3.username=<username> das.network.proxy.s3.password=<password> #proxy settings for outgoing connection from cluster -> triggered by Datameer X job to fetch data from web services over the protocol s3. # This is impacting the traffic S3, S3 native, Snowflake and Redshift das.clusterjob.network.proxy.s3.type=[HTTP] das.clusterjob.network.proxy.s3.host=<host ip> das.clusterjob.network.proxy.s3.port=<port> das.clusterjob.network.proxy.s3.username=<username> das.clusterjob.network.proxy.s3.password=<password> #proxy settings to connect a SFTP server hidden by the proxy das.network.proxy.sftp.type=[HTTP] das.network.proxy.sftp.host=<host ip> das.network.proxy.sftp.port=<port> das.network.proxy.sftp.username=<username> das.network.proxy.sftp.password=<password> #proxy setting for Datameer X web application. The configuration is impacting the traffic of the Azure Blob Storage connector. das.network.proxy.azure.blob.host=<host ip> das.network.proxy.azure.blob.port=<port> das.network.proxy.azure.blob.username=<username> das.network.proxy.azure.blob.password=<password> #proxy settings for outgoing connection from cluster -> triggered by Datameer X job to fetch data from web services over the protocol wasb/wasbs. # This is impacting the traffic Azure Blob Storage das.clusterjob.network.proxy.azure.blob.host=<host ip> das.clusterjob.network.proxy.azure.blob.port=<port>

HDFS Transparent Encryption

In case you use HDFS Transparent encryption in the environment on which this Datameer X instance is installed, please ensure that you have added KMS server hostnames to the proxy blacklist for the Datameer X application server and cluster jobs.

Appropriate configuration properties:

das.network.proxy.blacklist=<url pattern with '|' separated> das.clusterjob.network.proxy.blacklist=<url pattern with '|' separated>

Troubleshooting

For troubleshooting configuration, you can configure verbose logging in the 'conf/log4j-production-properties' file:

log4j.category.datameer.dap.common.util.network=DEBUG

Find further information in the 'conductor-log' file and the Hadoop job log: The conductor.log and Hadoop job logs shows information such as this:

DEBUG [2013-10-01 12:14:53] (NetworkProxySelector.java:38) - Using proxy [SOCKS @ localhost/127.0.0.1:3128] for uri https://www.googleapis.com/adsense/v1.2/accounts. DEBUG [2013-10-01 12:14:54] (NetworkProxyAuthenticator.java:32) - Authentication requested for localhost/SOCKS5/SERVER