Configuring Network Proxy
Setup of the Network ProxyÂ
General Information
INFO
Datameer X can be configured to use network proxy servers for both, Datameer X and the Hadoop Cluster, for outgoing http(s) connections, e.g., for web service connectors and import jobs.
- You can configure either a HTTP or a SOCKS proxy for all outgoing HTTP and HTTPS traffic to the remote system.
- Additionally, you can configure the Hadoop Cluster a HTTP proxy for outgoing FTP and SFTP traffic:
- Datameer - HTTP (http/https) or SOCKS (http/https) Â
- Hadoop Cluster - HTTP (http/https/ftp/sftp) or SOCKS (for outgoing connection with http/https)
INFO: This configuration method gives you more flexibility to specify exactly what each proxy is responsible for on the system network.
Setup
The setup can be configured in the 'das-common-properties' file as follows:
#proxy settings for outgoing connection from Datameer X web application das.network.proxy.type=[SOCKS|HTTP] das.network.proxy.host=<host ip> das.network.proxy.port=<port> das.network.proxy.username=<username> das.network.proxy.password=<password> das.network.proxy.blacklist=<url pattern with '|' separated> #proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocols http, https das.clusterjob.network.proxy.type=[SOCKS|HTTP] das.clusterjob.network.proxy.host=<host ip> das.clusterjob.network.proxy.port=<port> das.clusterjob.network.proxy.username=<username> das.clusterjob.network.proxy.password=<password> das.clusterjob.network.proxy.blacklist=<url pattern with '|' separated> Â #automatically find and add all data nodes in the cluster to the blacklist das.clusterjob.network.proxy.blacklist.datanodes.autosetup=true #proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocol ftp das.clusterjob.network.proxy.ftp.type=[HTTP] das.clusterjob.network.proxy.ftp.host=<host ip> das.clusterjob.network.proxy.ftp.port=<port> das.clusterjob.network.proxy.ftp.username=<username> das.clusterjob.network.proxy.ftp.password=<password> #proxy settings for outgoing connection from cluster -> triggered by datameer job to fetch data from web services over the protocol sftp das.clusterjob.network.proxy.sftp.type=[HTTP] das.clusterjob.network.proxy.sftp.host=<host ip> das.clusterjob.network.proxy.sftp.port=<port> das.clusterjob.network.proxy.sftp.username=<username> das.clusterjob.network.proxy.sftp.password=<password> #proxy setting for Datameer X web application. The configuration impacting traffic of S3, S3 Native, Snowflake and Redshift connectors. das.network.proxy.s3.protocol=[HTTP] das.network.proxy.s3.host=<host ip> das.network.proxy.s3.port=<port> das.network.proxy.s3.username=<username> das.network.proxy.s3.password=<password> #proxy settings for outgoing connection from cluster -> triggered by Datameer X job to fetch data from web services over the protocol s3. # This is impacting the traffic S3, S3 native, Snowflake and Redshift das.clusterjob.network.proxy.s3.type=[HTTP] das.clusterjob.network.proxy.s3.host=<host ip> das.clusterjob.network.proxy.s3.port=<port> das.clusterjob.network.proxy.s3.username=<username> das.clusterjob.network.proxy.s3.password=<password> #proxy settings to connect a SFTP server hidden by the proxy das.network.proxy.sftp.type=[HTTP] das.network.proxy.sftp.host=<host ip> das.network.proxy.sftp.port=<port> das.network.proxy.sftp.username=<username> das.network.proxy.sftp.password=<password> #proxy setting for Datameer X web application. The configuration is impacting the traffic of the Azure Blob Storage connector. das.network.proxy.azure.blob.host=<host ip> das.network.proxy.azure.blob.port=<port> das.network.proxy.azure.blob.username=<username> das.network.proxy.azure.blob.password=<password> #proxy settings for outgoing connection from cluster -> triggered by Datameer X job to fetch data from web services over the protocol wasb/wasbs. # This is impacting the traffic Azure Blob Storage das.clusterjob.network.proxy.azure.blob.host=<host ip> das.clusterjob.network.proxy.azure.blob.port=<port>
HDFS Transparent Encryption
In case you use HDFS Transparent encryption in the environment on which this Datameer X instance is installed, please ensure that you have added KMS server hostnames to the proxy blacklist for the Datameer X application server and cluster jobs.
Appropriate configuration properties:
das.network.proxy.blacklist=<url pattern with '|' separated> das.clusterjob.network.proxy.blacklist=<url pattern with '|' separated>
Troubleshooting
For troubleshooting configuration, you can configure verbose logging in the 'conf/log4j-production-properties' file:
log4j.category.datameer.dap.common.util.network=DEBUG
Find further information in the 'conductor-log' file and the Hadoop job log: The conductor.log
 and Hadoop job logs shows information such as this:
DEBUG [2013-10-01 12:14:53] (NetworkProxySelector.java:38) - Using proxy [SOCKS @ localhost/127.0.0.1:3128] for uri https://www.googleapis.com/adsense/v1.2/accounts. DEBUG [2013-10-01 12:14:54] (NetworkProxyAuthenticator.java:32) - Authentication requested for localhost/SOCKS5/SERVER