Configuring Hive Server2 as a Connection
Configuring a HiveServer2 connection is similar to configuring Hive.
To create a HiveServer2 connector:
- Click the + (plus) button at the top left of the File Browser and select Connection, or right-click on a folder, select Create New then select Connection.
- From drop-down list, select Hive Server2 as the connection type. Click Next.
Enter the HiveServer2 connection address and port number.
Select TCP/Binary or HTTP as the transport mode. (Transport mode option is available as of Datameer v6.3)
- If selecting HTTP as the transport mode, the propertyhive.server2.thrift.http.path
has the hardcoded value ofcliservice
. Learn more at Apache Hive.
- HTTP mode using SSL is not supported.
- HTTP mode can only be used if Hive supports token based authentication. Hive versions >= 1.2 supports this feature.
Optional:
- Enter a database filter to limit the amount listed databases when importing with this connector.
- Enter an export path where exports from this connector are written.
- Enter any additional custom properties.If the database filter for a connection is updated, all import and export jobs using a prohibited database fail upon running.
Security
Kerberos
Select Kerberos Secured Hive and enter both the Hive and HDFS principals.
LDAP (As of Datameer v6.1)
Select LDAP/AD Authentication and select between having the HiveServer2 connector provide the authorization credentials or have the credentials provided by the individual import/export jobs.
Sentry
Datameer respects Sentry permissions from users running HiveServer2.
How Datameer and Sentry interact with users:
When you add an artifact though Datameer to Hive, it is added using the Datameer service account. The Datameer service account must has all permissions for Hive to authenticate with Sentry.
However, data is accessed on the Hive cluster using the Datameer user account. The Datameer user account must have permissions for the requested data in Hive as well as Datameer to authenticate with Sentry.
At the transport level, HiverServer2 has multiple connection methods available. Datameer currently supports the following binary connection methods:
SASL
- Non SSL
- NOSASL
- Kerberos
Plain
- Fill out a description and save the connector.
Importing Data with a HiveServer2 Connector
Configuring the import job wizard with a HiveServer2 connector follows the same procedure as importing from Hive.
Configuring the Hive Plug-in
The Hive plug-in is provided per default with the installation of Datameer to import from and export to Hive servers. It can be found under the Admin tab and selecting Plug-ins from the menu.
As of Datameer v6.1
Click the cog icon under actions to configure Datameer's Hive plug-in.
One of the plug-in's features is to cache Java objects that represent partitions on HiverServer2. These partitions contain locations, columns names, and other information. Datameer stores these objects on a local disk. When Datameer needs to read Hive partitions it increases performance by using the stored cache instead of having to pull the same information each time it is needed.
The cached Hive partition data is created for all registered import job models when the plug-in starts and then updates automatically every 6 hours. Registration occurs when the import job or data link is created using the wizard as well as when the job is processed.
From the Hive plug-in configuration settings, users have the ability to clear the current cache or start filling it immediately without having to wait for the automated renewal.
- The cache can be cleared to remove stored partition data that is no longer being used and is decreasing performance.
- The cache can be manually filled before the auto update to cache new/changed Hive partition data to increase performance.
The feature is unavailable for HiveServer1.