Datameer Implementation
Datameer allows administrators to leverage their underlying security setup by running in secure cluster mode with secure impersonation enabled. Enabling impersonation, along with configuring Datameer to use a remote authenticator with access to the HDFS user community, allows Datameer to access HDFS as the user logged into Datameer and to run Hadoop jobs as the owner of the artifact and to preview data. This ensures that the underlying HDFS permission setup is respected by Datameer:
- Data accessed via an HDFS connections is subject to HDFS permission checks by the user running any Datameer job.
- Datameer permissions set on import jobs, data links and workbooks are pushed down the HDFS layer for all imported data, job result data, and job artifacts.
- Export jobs to a secure HDFS connection also respect the permissions set on the export job.
The Datameer process is run by a member of the HDFS supergroup and is configured to proxy other users when submitting jobs or accessing HDFS. Datameer achieves this by using the secure impersonation feature, behaving similarly to the workflow manager, Oozie. For more information see secure impersonation.
You can find more information about the approach of using proxy users in the following documents:
- Hadoop: The Definitive Guide, 4th Ed. by Tom White, Ch. 10: Setting Up Hadoop Cluster - Security, p. 309 ff..
- Hadoop Security, 1st Ed. by Ben Spivey and Joey Echeverria, Ch. 5: Identiy and Authentication - Impersonation, p. 82 ff.
You can also refer to this example.
This documentation explains how the Hadoop framework provides secure impersonation features and that the configuration needs to be correctly to enable Hadoop clients to leverage this functionality.
To understand permissions, you can also refer to the HDFS Permissions Guide - Group Mapping.
Supported versions
Secure impersonation is also supported for Hortonworks HDP 2.1.x, Hive Server 2, and Tez.
Single group mode
One key difference in behavior with secure impersonation enabled is the Datameer entity permission system changes to single group mode. What this means is that everywhere in the system where you specify entity permissions instead of having the options to configure permissions for a set of groups you are forced to provide permissions for exactly one group in addition to "Others". This is because job artifacts are stored in HDFS using these same permissions and HDFS permissions follow the POSIX model of owner, group and all permissions.
As an example, with the following import job permission setup in Datameer:
The HDFS data associated with running this job are as follows:
drwxrwx--x - joe it 0 2012-01-16 18:40 /home/das/v1.4/importjobs/52/148/data -rw-rw---- 3 joe it 438 2012-01-16 18:40 /home/das/v1.4/importjobs/52/148/data/part-00000
Ensure proper setup of group names
Groups need to be in both Datameer and in HDFS to be properly configured.
Cloudera Sentry and Hortonworks Ranger integration
When Sentry or Ranger are on a Hadoop cluster that has the impersonation plug-in enabled, Datameer acts as a DFS client and respects their (Sentry or Ranger) permissions.
The Datameer private folder in HDFS should be owned by datameer:<dasuser>, which is the group that is specified for impersonation, including its core directories. To ensure this ownership, run secure_hdfs_tool.sh.
If you want Datameer users to access resources that are controlled by Sentry, then proper privileges need to be granted from Sentry.
Datameer doesn't have any Sentry-specific integration