Importing XML Files and Using XPath

XML (Extensible Markup Language) is a language that is both machine and human readable, so it is very usable. 

XPath (XML Path Language) is a powerful language for addressing parts of an XML file.

Importing XML files and using XPath

  1. Open Datameer X and create a new Import Job or File Upload.
  2. Choose your connector where the XML file or files are stored and then choose the XML file data type from the drop-down list.
    \
  3. Select the XML file to import.
    In the XML Log Message section, enter the XML record tag name for the data you want to import. Don't include the opening and closing brackets (<,>).

    In the Field XPaths sections, enter  path expression(s) to specify what data to import. 

     

    The XML parser ignores namespaces .
    To parse the namespaced values, the local-name() function must be used.

    Example:
    local-name([node-set]//text)

    See additional examples of Parsing XML File Format .

  4. Review the sample data.
  5. Save and run the new import job or file upload.

Examples

Simple example

Example file: XML_Examples.xml

This example imports the employee data from the XML file. 

<DataSet1>
	<employee number="1">
    	<emp_id>PAJASI2345</emp_id>
    	<fname>Mathis</fname>
    	<minit>J</minit>
    	<lname>Parkers</lname>
    	<job_id>24</job_id>
    	<job_lvl>87</job_lvl>
    	<pub_id>55869</pub_id>
    	<hire_date>2012-08-14</hire_date>
  	</employee>
 	<employee number="2">
    	<emp_id>KAIJAF45354</emp_id>
    	<fname>Jeff</fname>
   		<minit>L</minit>
   		<lname>Johnson</lname>
    	<job_id>25</job_id>
    	<job_lvl>87</job_lvl>
    	<pub_id>55369</pub_id>
    	<hire_date>2012-09-03</hire_date>
  	</employee>
  	<employee number="3">
    	<emp_id>ASDFKI68464</emp_id>
    	<fname>Tess</fname>
    	<minit>A</minit>
    	<lname>Lane</lname>
    	<job_id>26</job_id>
    	<job_lvl>87</job_lvl>
    	<pub_id>55314</pub_id>
    	<hire_date>2012-10-18</hire_date>
  	</employee>
</DataSet1>

In the XML Record Tag Name field, enter employee.  

In order to import the data, you must use XPath expressions to specify what data to import. 

In this example, you have one attribute for the record tag name and eight different child nodes under the parent employee.

Record tag attribute: number

Child nodes: emp_id, fname, minit, lname, job_id, job_lvl, pub_id, and hire_date.

This example needs to use the expression // in front of each node so that it knows to match that node from anywhere within any employee node.

//employee/@number
//emp_id/text()
//fname/text()
//minit/text()
//lname/text()
//job_id/text()
//job_lvl/text()
//pub_id/text()
//hire_date/text()

 

Click next to review the data after all XPath expressions have been written.

XML tag and and XPath behavior

The XML data:

<employees>
    <emp number="1">
        <firstname>Scott</firstname>
        <lastname>Pilgrim</lastname>
        <location>
            <city>Atlanta</city>
            <country>USA</country>
            <state>Georgia</state>
        </location>
        <position>Sales</position>
        <comment></comment>
        <active />
    </emp>
    <emp number="2">
        <firstname>Kim</firstname>
        <lastname>Pine</lastname>
        <location>
            <city>San Francisco</city>
            <country>USA</country>
            <state>California</state>
        </location>
        <position>Developer</position>
        <comment></comment>
    </emp>
    <emp number="3">
        <firstname>Ramona</firstname>
        <lastname>Flowers</lastname>
        <location>
            <city>Berlin</city>
            <country>Germany</country>
            <state></state>
        </location>
        <position></position>
        <comment></comment>
        <active></active>
    </emp>
    <emp number="">
    </emp>
</employees>

Behavior for XML tags and the related XPath:

XML tag / tag attributeXPathResultComment

<emp number="1"></emp>

<emp number="2"></emp>

<emp number="3"></emp>

<emp number=""></emp>

/emp/@number

The empty attribute is handled as a NULL value.

<position>Sales</position>

<position>Developer</position>

<position></position>

tag missing

/emp/position/text()

If the XPath points to the text content of a tag that is empty or missing a tag, the value is NULL.


<position>Sales</position>

<position>Developer</position>

<position></position>

tag missing

/emp/position

If the XPath points to a tag and the tag is empty, the value is set as (Boolean) true.

If the XPath points to a missing tag, the value is NULL.

<comment></comment>

<comment></comment>

<comment></comment>

tag missing

/emp/comment/text()Data isn't imported in this example.

If the XPath points to the text content of a tag which is empty or missing in all target tags, the column doesn't import.


<location>
  <city>Atlanta</city>
  <country>USA</country>
  <state>Georgia</state>
</location>
<location>
  <city>San Francisco</city>
  <country>USA</country>
  <state>California</state>
</location>
<location>
  <city>Berlin</city>
  <country>Germany</country>
  <state></state>
</location>
/emp/location

If the XPath points to a tag which contains sub-tags, they are represented as a JSON string.