Preparing Datameer Installation for Secure Impersonation

In order to enable secure impersonation on an existing Datameer installation, you must ensure that all relevant Datameer entities conform to the constraints of this feature, namely that entities have a single group permission mapping to an existing HDFS group. You also need to set up the Datameer core HDFS directories with appropriate permissions and synchronize HDFS artifacts with the desired ownership and permissions based the stored Datameer entities.

Enabling Secure Grid Mode

In order to prepare for enabling secure impersonation, Datameer must be configured to successfully connect to the secure cluster. This is normally the case when enabling secure impersonation on an existing installation, but if this is a fresh install, you should configure secure grid mode and ensure that you can successfully connect to the secure cluster.

Set up Authenticator First

At this point, the Datameer authenticator should be configured to provide the same set (or a subset) of users and groups as configured on HDFS. If this isn't the case, set up your authenticator for use with secure impersonation before continuing.

Checklist

Ensure you have the following before proceeding: 

  • A complete list of HDFS groups applicable to Datameer users. This list allows you to perform a deeper validation of existing entities. It isn't required, but is strongly recommended.
  • A one-to-one mapping of users and groups between hosts (Datameer and all HDFS nodes).
  • A core Datameer group name. An HDFS group which contains all Datameer users including the Datameer service account allows for a tighter access control policy on shared HDFS resources. This isn't required, but is strongly recommended.
  • Access to the Datameer application as an administrator.
  • Access to the machine running Datameer as the user running the Datameer application.
  • The Datameer service account added to the HDFS Super Group (making it a Hadoop admin).
  • Single group mode for Datameer.

If an environment isn't set up to take advantage of a single point of configuration for users and groups, the objects must be created manually and maintained consistently. This process involves the creation of all users and groups currently accessing Datameer via LDAP on each HDFS node.

Preparing Datameer Entities and HDFS Artifacts

Given that secure impersonation mode enforces stricter requirements on the permissions associated with Datameer entities, you need to identify current invalid entries and convert them to the supported structure. A command line tool, secure_hdfs_tool.sh located in DAS_HOME/bin, is provided to help in the process. See its full manual page, here.

This tool also provides the ability to synchronize the HDFS artifacts associated with Datameer entities to match their access control setup. Again, see the full manual for details of working with this tool.

In preparation for enabling secure impersonation, you need to:

  1. Identify and fix invalid Datameer entities. Find these entities and update their permissions using the Datameer web application.
  2. Update core Datameer directories. Set up appropriate ownership and permissions for core Datameer HDFS directories.
  3. Synchronize HDFS artifacts. Apply existing Datameer ownership and permissions to the existing HDFS artifacts.

Find and fix invalid entries

To produce a list of invalid Datameer entries, run:

secure_hdfs_tool.sh -G foo,bar,baz

This action emits invalid entries to STDOUT, producing a work list for Datameer entities to fix. Work off of this list and update each referenced entity from within the Datameer application ensuring that each entity has only a single group permission (exactly one). To update an entity, select it from the appropriate list view and click Permissions:

Next, ensure that there is exactly one group permission selected.

Repeat this process for all offending entities. You should re-run this command continuously until it returns no results.

The group list passed in via -G (--hdfs-groups) is optional, but provides a deeper validation ensuring that not only are there only single group entities in the system but that they are all valid HDFS groups.

All salient entries start with the token INVALID_ENTITY so you can pipe the output to grep (and optionally then to a file):

secure_hdfs_tool.sh -G foo,bar,baz | grep INVALID_ENTITY > objects_to_fix.txt
# same as above but using the long form for the hdfs-groups argument
secure_hdfs_tool.sh --hdfs-groups foo,bar,baz | grep INVALID_ENTITY > objects_to_fix.txt

Update core Datameer directories

Once you have successfully resolved all invalid Datameer entities, you can prepare the core HDFS directories:

secure_hdfs_tool.sh -u -g das_users
# once again, with long-form arguments
secure_hdfs_tool.sh --update-core-directories --core-group das_users

This ensures that the core Datameer directories (those directly under the root) exist and have the appropriate ownership and permissions. If the optional -g (--core-group) argument is given, then core directories are  chmod'ed to 770 with the user set to the Datameer secure principal's username and the group set to the value of the -g argument. If the argument isn't supplied, the group isn't changed and permissions are opened up to 777. It is strongly recommended that a core HDFS group containing all Datameer users is created and maintained when using secure impersonation.

Validation still runs with the above commands so you might see INVALID_ENTRYs emitted if another user created an invalid entity in the meantime.

Synchronize HDFS with the Datameer artifact ownership

Finally, you need to synchronize HDFS with all permissions stored in the Datameer database:

secure_hdfs_tool.sh -G foo,bar,baz -s
# once again, with long-form arguments
secure_hdfs_tool.sh --hdfs-groups foo,bar,baz --sync-hdfs

Executing the above also still runs validations and emits any currently invalid entity, however, all valid entities have their HDFS artifacts' permissions and ownership updated to reflect what is in the Datameer database. After running the above, with no INVALID_ENTITY messages, you are done and ready to enable secure impersonation.

Configure the Datameer properties file 

The Datameer config file needs to be updated to include a new parameter das.system.user.name. The value for this property needs to be set by a user with admin rights in Datameer that belongs to the superuser group in HDFS. Any health check job that runs as the system user impersonates this user at the HDFS level.

To determine which file to change in the $DAS_HOME/conf directory please check the DAS_DEPLOY_MODE section of $DAS_HOME/etc/das-env.sh:


$DATAMEER_HOME/etc/das-env.sh
# Change this to DAS_DEPLOY_MODE=live when you want to run in live mode against a mysql db 
export DAS_DEPLOY_MODE=live


Edit live.properties to include this new property. ("datameer" is the name of the example admin user in Datameer that is also part of the supergroup in HDFS).


$DATAMEER_HOME/conf/live.properties
system.property.das.system.user.name=datameer