Importing XML Files and Using XPath
XML (Extensible Markup Language) is a language that is both machine and human readable, so it is very usable.
XPath (XML Path Language) is a powerful language for addressing parts of an XML file.
Importing XML files and using XPath
- Open Datameer and create a new Import Job or File Upload.
- Choose your connector where the XML file or files are stored and then choose the XML file data type from the drop-down list.
\ Select the XML file to import.
In the XML Log Message section, enter the XML record tag name for the data you want to import. Don't include the opening and closing brackets (<,>).
In the Field XPaths sections, enter path expression(s) to specify what data to import.
- Review the sample data.
- Save and run the new import job or file upload.
Examples
Simple example
Example file: XML_Examples.xml
This example imports the employee data from the XML file.
<DataSet1> <employee number="1"> <emp_id>PAJASI2345</emp_id> <fname>Mathis</fname> <minit>J</minit> <lname>Parkers</lname> <job_id>24</job_id> <job_lvl>87</job_lvl> <pub_id>55869</pub_id> <hire_date>2012-08-14</hire_date> </employee> <employee number="2"> <emp_id>KAIJAF45354</emp_id> <fname>Jeff</fname> <minit>L</minit> <lname>Johnson</lname> <job_id>25</job_id> <job_lvl>87</job_lvl> <pub_id>55369</pub_id> <hire_date>2012-09-03</hire_date> </employee> <employee number="3"> <emp_id>ASDFKI68464</emp_id> <fname>Tess</fname> <minit>A</minit> <lname>Lane</lname> <job_id>26</job_id> <job_lvl>87</job_lvl> <pub_id>55314</pub_id> <hire_date>2012-10-18</hire_date> </employee> </DataSet1>
In the XML Record Tag Name field, enter employee.
In order to import the data, you must use XPath expressions to specify what data to import.
In this example, you have one attribute for the record tag name and eight different child nodes under the parent employee.
Record tag attribute: number
Child nodes: emp_id, fname, minit, lname, job_id, job_lvl, pub_id, and hire_date.
This example needs to use the expression // in front of each node so that it knows to match that node from anywhere within any employee node.
//employee/@number //emp_id/text() //fname/text() //minit/text() //lname/text() //job_id/text() //job_lvl/text() //pub_id/text() //hire_date/text()
Click next to review the data after all XPath expressions have been written.
XML tag and and XPath behavior
The XML data:
<employees> <emp number="1"> <firstname>Scott</firstname> <lastname>Pilgrim</lastname> <location> <city>Atlanta</city> <country>USA</country> <state>Georgia</state> </location> <position>Sales</position> <comment></comment> <active /> </emp> <emp number="2"> <firstname>Kim</firstname> <lastname>Pine</lastname> <location> <city>San Francisco</city> <country>USA</country> <state>California</state> </location> <position>Developer</position> <comment></comment> </emp> <emp number="3"> <firstname>Ramona</firstname> <lastname>Flowers</lastname> <location> <city>Berlin</city> <country>Germany</country> <state></state> </location> <position></position> <comment></comment> <active></active> </emp> <emp number=""> </emp> </employees>
Behavior for XML tags and the related XPath:
XML tag / tag attribute | XPath | Result | Comment |
---|---|---|---|
<emp number="1"></emp> <emp number="2"></emp> <emp number="3"></emp> <emp number=""></emp> | /emp/@number | The empty attribute is handled as a NULL value. | |
<position>Sales</position> <position>Developer</position> <position></position> tag missing | /emp/position/text() | If the XPath points to the text content of a tag that is empty or missing a tag, the value is NULL. | |
<position>Sales</position> <position>Developer</position> <position></position> tag missing | /emp/position | If the XPath points to a tag and the tag is empty, the value is set as (Boolean) true. If the XPath points to a missing tag, the value is NULL. | |
<comment></comment> <comment></comment> <comment></comment> tag missing | /emp/comment/text() | Data isn't imported in this example. | If the XPath points to the text content of a tag which is empty or missing in all target tags, the column doesn't import. |
<location> <city>Atlanta</city> <country>USA</country> <state>Georgia</state> </location> <location> <city>San Francisco</city> <country>USA</country> <state>California</state> </location> <location> <city>Berlin</city> <country>Germany</country> <state></state> </location> | /emp/location | If the XPath points to a tag which contains sub-tags, they are represented as a JSON string. |