Tutorial07 - Building Data Obfuscation on Import
Introduction
Build in a tool to obfuscate data while importing data into Datameer. Sometimes it isn't sufficient to do obfuscation in a workbook because some information should never go to the cluster in clear text at all.
Currently this isn't supported by Datameer as a shipped feature although customers can provide this functionality by adding a plug-in with custom code that does the obfuscation.
How-to
- Create a new plug-in project.
- Extend class
datameer.dap.sdk.importjob.extensions.ImportFilterExtension
to provide functionality.
Example
This example illustrates user entered column names that are obfuscated by applying a SHA1 hash.
/** * Adds another property "Obfuscated Columns" to the wizard page and obfuscates all configured * columns using SHA1 hash. */ public class ObfuscatingImportFilterExtension extends ImportFilterExtension { private static final long serialVersionUID = ManifestMetaData.SERIAL_VERSION_UID; private static final String KEY = "ObfuscatedColumns"; @Override public String getId() { return "EncryptingImportFilterExtension"; } @Override public RawRecordCollector decorateRawRecordCollector(Field[] fields, ReadableGenericConfiguration configuration, RawRecordCollector recordCollector) { return DecoratingRawRecordCollector.decorate(recordCollector, new ObfuscatingRawRecordDecorator(fields, configuration.getStringProperty(KEY, ""))); } @Override public void populateWizardPageImpl(WizardPageDefinition page) { PropertyGroupDefinition encryption = page.addGroup("Encryption"); encryption.addPropertyDefinition(new PropertyDefinition(KEY, "Obfuscated Columns", PropertyType.STRING)); } }
The actual obfuscation logic looks like this:
public class ObfuscatingRawRecordDecorator implements Consumer<RawRecord> { private final ImmutableSet<Integer> _columnsToEncrypt; public ObfuscatingRawRecordDecorator(Field[] fields, String columnsToEncryptString) { String[] columnsToEncypt = columnsToEncryptString.split(" "); ImmutableSet.Builder<Integer> columnsToEncrypt = ImmutableSet.builder(); for (String columnToEncypt : columnsToEncypt) { int indexByName = Field.getIndexByName(Field.filterIncludedFields(fields), columnToEncypt.trim(), -1); if (indexByName != -1) { columnsToEncrypt.add(indexByName); } } _columnsToEncrypt = columnsToEncrypt.build(); } static String obfuscate(String string) { if (string == null) { return null; } return IoUtil.serializeBase64(Hashing.sha1().newHasher().putString(string, Charsets.UTF_8).hash().asBytes()); } @Override public void accept(RawRecord rawRecord) { for (Integer columnToEncrypt : _columnsToEncrypt) { rawRecord.setValue(columnToEncrypt, obfuscate(StringUtil.toString(rawRecord.getValue(columnToEncrypt), null))); } } }
ObfuscatingRawRecordDecorator
gets all the raw records that are read from any kind of input stream and obfuscates all columns that have been configured on the data details page of the wizard. The example implementation is only one approach and uses SHA1 hashing.
In the wizard, an additional Encryption section has been added where the columns that should be obfuscated can be configured.
On the next tab, all values of the name column are now obfuscated using the SHA1 hashing algorithm.
Our customer services specialists can assist you with more information if required.