Configuring Network Proxy

Configuring a Network Proxy

Setup

Datameer X can be configured to use network proxy servers for both Datameer X and the Hadoop Cluster for outgoing http(s) connections. (e.g., for the web service connectors and import jobs.)

You can configure either a HTTP or a SOCKS proxy for all outgoing HTTP and HTTPS traffic to the remote system.

Additionally, you can configure the Hadoop Cluster a HTTP proxy for outgoing FTP and SFTP traffic.

  • Datameer - HTTP (http/https) or SOCKS (http/https) or   
  • Hadoop Cluster - HTTP (http/https/ftp/sftp) or SOCKS (for outgoing connection with http/https)

This configuration method gives you more flexibility to specify exactly what each proxy is responsible for on the system network.

Setup can be made in "das-common.properties" as follows:

#proxy settings for outgoing connection from Datameer X web application
das.network.proxy.type=[SOCKS|HTTP]
das.network.proxy.host=<host ip>
das.network.proxy.port=<port>
das.network.proxy.username=<username>
das.network.proxy.password=<password>
das.network.proxy.blacklist=<url pattern with '|' separated>

#proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocols http, https
das.clusterjob.network.proxy.type=[SOCKS|HTTP]
das.clusterjob.network.proxy.host=<host ip>
das.clusterjob.network.proxy.port=<port>
das.clusterjob.network.proxy.username=<username>
das.clusterjob.network.proxy.password=<password>
das.clusterjob.network.proxy.blacklist=<url pattern with '|' separated>
 
#automatically find and add all data nodes in the cluster to the blacklist
das.clusterjob.network.proxy.blacklist.datanodes.autosetup=true

#proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocol ftp
das.clusterjob.network.proxy.ftp.type=[HTTP]
das.clusterjob.network.proxy.ftp.host=<host ip>
das.clusterjob.network.proxy.ftp.port=<port>
das.clusterjob.network.proxy.ftp.username=<username>
das.clusterjob.network.proxy.ftp.password=<password>

#proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocol sftp
das.clusterjob.network.proxy.sftp.type=[HTTP]
das.clusterjob.network.proxy.sftp.host=<host ip>
das.clusterjob.network.proxy.sftp.port=<port>
das.clusterjob.network.proxy.sftp.username=<username>
das.clusterjob.network.proxy.sftp.password=<password>


#proxy setting for Datameer X web application. The configuration impacting traffic of S3, S3 Native, Snowflake and Redshift connectors.
das.network.proxy.s3.protocol=[HTTP]
das.network.proxy.s3.host=<host ip>
das.network.proxy.s3.port=<port>
das.network.proxy.s3.username=<username>
das.network.proxy.s3.password=<password>

#proxy settings for outgoing connection from cluster -> triggered by Datameer X job to fetch data from web services over the protocol s3. 
# This is impacting the traffic S3, S3 native, Snowflake and Redshift 
das.clusterjob.network.proxy.s3.type=[HTTP]
das.clusterjob.network.proxy.s3.host=<host ip>
das.clusterjob.network.proxy.s3.port=<port>
das.clusterjob.network.proxy.s3.username=<username>
das.clusterjob.network.proxy.s3.password=<password>

#proxy setting for Datameer X web application. The configuration is impacting the traffic of the Azure Blob Storage connector.
das.network.proxy.azure.blob.host=<host ip>
das.network.proxy.azure.blob.port=<port>
das.network.proxy.azure.blob.username=<username>
das.network.proxy.azure.blob.password=<password>

#proxy settings for outgoing connection from cluster -> triggered by Datameer X job to fetch data from web services over the protocol wasb/wasbs. 
# This is impacting the traffic Azure Blob Storage  
das.clusterjob.network.proxy.azure.blob.host=<host ip>
das.clusterjob.network.proxy.azure.blob.port=<port>

HDFS transparent encryption

In case you use HDFS Transparent encryption in the environment on which this Datameer X instance is installed, please ensure that you have added KMS server hostnames to the proxy blacklist for the Datameer X application server and cluster jobs.

Appropriate configuration properties:

das.network.proxy.blacklist=<url pattern with '|' separated>
das.clusterjob.network.proxy.blacklist=<url pattern with '|' separated>


Troubleshooting

For troubleshooting configuration, you can configure verbose logging in conf/log4j-production.properties

log4j.category.datameer.dap.common.util.network=DEBUG

The conductor.log and Hadoop job logs shows information such as this:

DEBUG [2013-10-01 12:14:53] (NetworkProxySelector.java:38) - Using proxy [SOCKS @ localhost/127.0.0.1:3128] for uri https://www.googleapis.com/adsense/v1.2/accounts.
DEBUG [2013-10-01 12:14:54] (NetworkProxyAuthenticator.java:32) - Authentication requested for localhost/SOCKS5/SERVER