Tutorial06 - Building an Export Adaptor for Custom Connections
Introduction
Datameer X provides a pluggable architecture to export workbook sheets into a custom connection.
Building an Export Adaptor
You can implement your own export job type and provide a dynamic wizard page and your own output adapter.
To export a workbook sheet into a custom connection you need a custom connection that supports exports. Datameer X provides a prototype plug-in that supports a custom export.
- Add a dummy connection. (This connection type provide a dummy import job type and a dummy export job type.)
- Trigger a workbook you want to export.
- Add a export job and select the dummy-data-store. This underlying export job prints the records to the console instead doing a real save.
- The details page of this export job type should provide only a input text field with label test label with default value hello.
- When you trigger this export you should see that all records are printed to the console.
- Finally, you can run your new export and see the status of eradication.
Code Example Snippets for the Dummy Implementation
package datameer.das.plugin.tutorial06;
import datameer.dap.sdk.datastore.DataStoreModel;
import datameer.dap.sdk.datastore.DataStoreType;
import datameer.dap.sdk.entity.DataStore;
import datameer.dap.sdk.property.PropertyDefinition;
import datameer.dap.sdk.property.PropertyGroupDefinition;
import datameer.dap.sdk.property.PropertyType;
import datameer.dap.sdk.property.WizardPageDefinition;
public class DummyDataStoreType extends DataStoreType {
public final static String ID = "das.DummyDataStore";
public DummyDataStoreType() {
super(new DummyImportJobType(), new DummyExportJobType());
}
@Override
public DataStoreModel createModel(DataStore dataStore) {
return new DummyDataStoreModel(dataStore);
}
@Override
public String getId() {
return ID;
}
@Override
public String getName() {
return "Dummy";
}
@Override
public WizardPageDefinition createDetailsWizardPage() {
WizardPageDefinition page = new WizardPageDefinition("Details");
PropertyGroupDefinition group = page.addGroup("Dummy");
PropertyDefinition propertyDefinition = new PropertyDefinition("dummyKey", "Dummy", PropertyType.STRING);
propertyDefinition.setRequired(true);
propertyDefinition.setHelpText("Some Help Text");
group.addPropertyDefinition(propertyDefinition);
return page;
}
}
package datameer.das.plugin.tutorial06;
import java.io.IOException;
import java.io.Serializable;
import org.apache.hadoop.conf.Configuration;
import datameer.dap.sdk.common.DasContext;
import datameer.dap.sdk.common.Record;
import datameer.dap.sdk.entity.DataSinkConfiguration;
import datameer.dap.sdk.exportjob.ExportJobType;
import datameer.dap.sdk.exportjob.OutputAdapter;
import datameer.dap.sdk.property.PropertyDefinition;
import datameer.dap.sdk.property.PropertyGroupDefinition;
import datameer.dap.sdk.property.PropertyType;
import datameer.dap.sdk.property.WizardPageDefinition;
import datameer.dap.sdk.schema.RecordType;
import datameer.dap.sdk.util.ManifestMetaData;
public class DummyExportJobType implements ExportJobType, Serializable {
private static final long serialVersionUID = ManifestMetaData.SERIAL_VERSION_UID;
@SuppressWarnings("serial")
@Override
public OutputAdapter createModel(DasContext dasContext, DataSinkConfiguration configuration, RecordType fieldTypes) {
return new OutputAdapter() {
@Override
public void write(Record record) {
System.out.println(record);
}
@Override
public void initializeExport(Configuration hadoopConf) {
}
@Override
public void finalizeExport(Configuration hadoopConf, boolean success) throws IOException {
}
@Override
public void disconnectExportInstance(int adapterIndex) {
}
@Override
public void connectExportInstance(Configuration hadoopConf, int adapterIndex) {
}
@Override
public boolean canRunInParallel() {
return false;
}
@Override
public FieldTypeConversionStrategy getFieldTypeConversionStrategy() {
return FieldTypeConversionStrategy.SHEET_FIELD_TYPES;
}
};
}
@Override
public void populateWizardPage(WizardPageDefinition page) {
PropertyGroupDefinition addGroup = page.addGroup("test");
PropertyDefinition propertyDefinition = new PropertyDefinition("testkey", "test label", PropertyType.STRING, "hello");
addGroup.addPropertyDefinition(propertyDefinition);
}
@Override
public boolean isWritingToFileSystem() {
return false;
}
}
Life Cycles of Different Execution Frameworks
Each execution framework used for export has a different life cycle and execution order. The machine on which the method is executed is included in parenthesis. Conductor means the method is executed on the machine where the Datameer X server is installed. Mapper/cluster means the method is executed within the YARN cluster on the mapper. After starting a export job in Datameer X the job is sent to the YARN cluster and is executed there.
For Spark, MapReduce, or Tez, the following methods are executed in this order:
#initializeExport (conductor) #connectExportInstance (mapper/cluster) #write (mapper/cluster) #disconnectExportInstance (mapper/cluster) #finalizeExport (conductor)
For SmallJob, the following methods are executed in this order:
#initializeExport (mapper/cluster) #connectExportInstance (mapper/cluster) #write (mapper/cluster) #disconnectExportInstance (mapper/cluster) #finalizeExport (mapper/cluster)
The life cycle methods are called at the following frequencies:
- Initialize/finalize is called once per job.
- Connect/disconnect is called once per task.
- Write is called once per record.
Source Code
This tutorial can by found in the Datameer X plug-in SDK under plugin-tutorials/tutorial06.