Prerequisites and Preparation
Before getting started with preparation, ensure that the Datameer application is configured with the appropriate authenticator to ensure that only valid HDFS users and groups exist in Datameer.
Authenticator integration
A key consideration for enabling the impersonation feature is that all users and groups available in Datameer must map directly to the HDFS user/group community. This is typically done by configuring Datameer to use an LDAP authenticator and employing group filtering to ensure that only valid HDFS groups are available within Datameer. See details on configuring the Datameer LDAP Authenticator.
Set up Authenticator First
For the best results, you should first configure the remote authenticator and then the import users to insure that the group filters are working properly.
Configure secure cluster mode
At this point, the Datameer installation should be configured to run in secure cluster mode. Please ensure that secure grid mode is configured and working before continuing.
Don't enable secure impersonation, yet!
HDFS group setup
It is recommended to create an HDFS group containing all Datameer users for a few reasons:
- To avoid having to configure any directories to world writable.
- To tightly control which users that the Datameer user can proxy.
This Datameer users group can be excluded from Datameer's LDAP authenticator if you don't want to expose it to end users.
Configuring Datameer as a super user and specifying allowed proxy users
Because secure impersonation in Datameer is based on native Hadoop instruments, the OS user which runs the Datameer application must be configured as both an HDFS superuser (member of the hdfs.supergroup) and allowed to proxy Datameer users from the Datameer machine.
Add a Datameer user to the HDFS supergroup
The HDFS supergroup is configured by default as {{supergroup}}, but is configured in hdfs-site.xml by the setting:
dfs.permissions.supergroup = supergroup
Once you have determined the supergroup, add the Datameer user to this group through your normal OS user management tools.
Configure proxy user
There are two configuration settings related to the proxy user capability that need to be set in core-site.xml
on both the Namenode and on the JobTracker:
hadoop.proxyuser.<USERNAME>.groups
hadoop.proxyuser.<USERNAME>.hosts
For example, assuming the Datameer user is datameer
and that a group exists called dasusers
which contains all Datameer users, the groups setting are as follows:
hadoop.proxyuser.datameer.groups = dasusers
Next, assuming that the Datameer application is running on datameer.example.com
then hosts are configured as:
hadoop.proxyuser.datameer.hosts = datameer.example.com
If using Cloudera Manager, update the safety valve on the Name Node, Secondary Name Node and the Job tracker. You might need to reset any override that is present for these settings to take effect.
If you are using a Kerberos-secured cluster with secure impersonation and HDFS transparent encryption, you also need to configure the proxy user for KMS.
Preparing the Datameer application
Before finally enabling secure impersonation, you must prepare the Datameer application and HDFS by following the instructions here. When that task is complete, you can continue with enabling the feature.
Enabling Secure Impersonation
To enable secure impersonation, navigate to the secure grid mode settings and select Enable Impersonation:
After enabling secure impersonation, there is a message about cluster validation. In order to ensure best operation, Datameer can run a validation job to ensure that the cluster adheres to certain configuration guidelines. To run the set of assertions associated with secure impersonation, click Run Tests.
Kerberos principal name rules
Depending on your naming conventions for Kerberos principal names you might need to override the 'hadoop.security.auth_to_local' property. In fact, you might have already overridden this on the cluster. Datameer needs the rules from this property in the custom properties section of the cluster configuration. The custom property section doesn't support property values across multiple lines, so the rules should be separated by a single space. As an example, the following can be useful when not all of the principals are from the default domain:
hadoop.security.auth_to_local=RULE:[1:$1](.*) RULE:[2:$1](.*) DEFAULT
You can find more information about the mapping Kerberos principals to user names in the following book:
- Hadoop Security, 1st Ed. by Ben Spivey and Joey Echeverria, Ch. 5: Identiy and Authentication - Mapping Kerberos Principals to Usernames, p. 68 ff.
If you need to see how your AD/LDAP user names are submitted to the cluster after the rules are applied when secure impersonation is implemented you can add additional logging.
Expected impersonation behaviors
Refer to the following table to understand how secure impersonation affects the ownership of import jobs, file uploads, data links, workbooks, and export jobs. Note that the group permissions apply to the artifact, not the folders the artifacts are in.
Scenario | Owner in HDFS | Group in HDFS | Permissions for Owner in HDFS | Permissions for Group in HDFS | Owner of YARN application (when job is triggered manually) | Owner of YARN application (when job is triggered by schedule) | Preview data accessed |
---|---|---|---|---|---|---|---|
Creating an artifact | Creator | Group selected, if none selected, the default Datameer group | Read and write | Only read | n/a | n/a | n/a |
Running a job | Creator | n/a | Read and write | Only read | Creator | Creator | Logged in user |
Previewing data | Creator | Group selected, if none selected, the default Datameer group | Read and write | Only read | Creator | Creator | Logged in user |
Saving edited artifact (not as creator) | Creator | Group selected, if none selected, the default Datameer group | Read and write | Only read | Creator | Creator | Logged in user |
Updating permissions | Creator | Newly selected group | Read and write | Newly selected group and read permission only | Creator | Creator | Logged in user |