XML (Extensible Markup Language) is a language that is both machine and human readable, so it is very usable.
XPath (XML Path Language) is a powerful language for addressing parts of an XML file.
Table of Contents |
---|
Importing XML files and using XPath
- Open Datameer and create a new Import Job or File Upload.
- Choose your connector where the XML file or files are stored and then choose the XML file data type from the drop-down list.
\ Select the XML file to import.
In the XML Log Message section, enter the XML record tag name for the data you want to import. Don't include the opening and closing brackets (<,>).
In the Field XPaths sections, enter path expression(s) to specify what data to import.Info icon false The XML parser ignores namespaces .
To parse the namespaced values, the local-name() function must be used.Code Block title Example: local-name([node-set]//text)
See additional examples of Parsing XML File Format .
- Review the sample data.
- Save and run the new import job or file upload.
Examples
...
Simple example
Example file: XML_Examples.xml
...
Click next to review the data after all XPath expressions have been written.
...
XML tag and and XPath behavior
The XML data:
Code Block | ||
---|---|---|
| ||
<root attr<employees> <emp number="Root">1"> <firstname>Scott</firstname> <lastname>Pilgrim</lastname> <location> <city>Atlanta</city> <country>USA</country> <group attr="Group1"> <state>Georgia</state> <entry attr="Entry1">Group1Entry1</entry> </location> <position>Sales</position> <entry attr="Entry2">Group1Entry2</entry> <comment></comment> <entry attr="Entry3">Group1Entry3</entry><active /> </group>emp> <group<emp attrnumber="Group2">2"> <firstname>Kim</firstname> <lastname>Pine</lastname> <location> <entry attr="Entry1">Group2Entry1</entry> <city>San Francisco</city> <country>USA</country> <state>California</state> <entry attr="Entry2">Group2Entry2</entry> </location> <position>Developer</position> <entry attr="Entry3">Group2Entry3</entry> <comment></comment> </group>emp> <group<emp attrnumber="Group3">3"> <firstname>Ramona</firstname> <lastname>Flowers</lastname> <location> <entry attr="Entry1">Group3Entry1</entry> <city>Berlin</city> <country>Germany</country> <state></state> </location> <position></position> <comment></comment> <entry attr="Entry2">Group3Entry2</entry> <active></active> </emp> <entry<emp attrnumber="Entry3">Group3Entry3</entry>> </group>emp> </root>employees> |
Info | ||||
---|---|---|---|---|
| ||||
See W3C.org for full documentation on XML Path Language (XPath) expressions. |
...
Select attr
attribute from all entry
elements.
XML Record Tag Name
Code Block |
---|
group |
XPath
Code Block |
---|
//entry/@attr |
...
<root attr="Root">
<group attr="Group1">
<entry attr="Entry1">Group1Entry1</entry>
<entry attr="Entry2">Group1Entry2</entry>
<entry attr="Entry3">Group1Entry3</entry>
</group>
<group attr="Group2">
<entry attr="Entry1">Group2Entry1</entry>
<entry attr="Entry2">Group2Entry2</entry>
<entry attr="Entry3">Group2Entry3</entry>
</group>
<group attr="Group3">
<entry attr="Entry1">Group3Entry1</entry>
<entry attr="Entry2">Group3Entry2</entry>
<entry attr="Entry3">Group3Entry3</entry>
</group>
</root>
...
Select all attr
attributes of entire document.
XML Record Tag Name
Code Block |
---|
root |
XPath
Code Block |
---|
//@attr |
...
<root attr="Root">
<group attr="Group1">
<entry attr="Entry1">Group1Entry1</entry>
<entry attr="Entry2">Group1Entry2</entry>
<entry attr="Entry3">Group1Entry3</entry>
</group>
<group attr="Group2">
<entry attr="Entry1">Group2Entry1</entry>
<entry attr="Entry2">Group2Entry2</entry>
<entry attr="Entry3">Group2Entry3</entry>
</group>
<group attr="Group3">
<entry attr="Entry1">Group3Entry1</entry>
<entry attr="Entry2">Group3Entry2</entry>
<entry attr="Entry3">Group3Entry3</entry>
</group>
</root>
Select attr
attribute from all group
elements.
XML Record Tag Name
Code Block |
---|
root |
XPath
Code Block |
---|
//group/@attr |
...
<root attr="Root">
<group attr="Group1">
<entry attr="Entry1">Group1Entry1</entry>
<entry attr="Entry2">Group1Entry2</entry>
<entry attr="Entry3">Group1Entry3</entry>
</group>
<group attr="Group2">
<entry attr="Entry1">Group2Entry1</entry>
<entry attr="Entry2">Group2Entry2</entry>
<entry attr="Entry3">Group2Entry3</entry>
</group>
<group attr="Group3">
<entry attr="Entry1">Group3Entry1</entry>
<entry attr="Entry2">Group3Entry2</entry>
<entry attr="Entry3">Group3Entry3</entry>
</group>
</root>
Select only text content of all entry
elements.
XML Record Tag Name
Code Block |
---|
group |
XPath
Code Block |
---|
//entry/text() |
...
<root attr="Root">
<group attr="Group1">
<entry attr="Entry1">Group1Entry1</entry>
<entry attr="Entry2">Group1Entry2</entry>
<entry attr="Entry3">Group1Entry3</entry>
</group>
<group attr="Group2">
<entry attr="Entry1">Group2Entry1</entry>
<entry attr="Entry2">Group2Entry2</entry>
<entry attr="Entry3">Group2Entry3</entry>
</group>
<group attr="Group3">
<entry attr="Entry1">Group3Entry1</entry>
<entry attr="Entry2">Group3Entry2</entry>
<entry attr="Entry3">Group3Entry3</entry>
</group>
</root>
...
Select attr
attribute from XML record tag element.
XML Record Tag Name
Code Block |
---|
root |
Xpath
Code Block |
---|
//root/@attr |
...
<root attr="Root">
<group attr="Group1">
<entry attr="Entry1">Group1Entry1</entry>
<entry attr="Entry2">Group1Entry2</entry>
<entry attr="Entry3">Group1Entry3</entry>
</group>
<group attr="Group2">
<entry attr="Entry1">Group2Entry1</entry>
<entry attr="Entry2">Group2Entry2</entry>
<entry attr="Entry3">Group2Entry3</entry>
</group>
<group attr="Group3">
<entry attr="Entry1">Group3Entry1</entry>
<entry attr="Entry2">Group3Entry2</entry>
<entry attr="Entry3">Group3Entry3</entry>
</group>
</root>
...
Select attr
attribute from the third entry element in the second group element.
XML Record Tag Name
Code Block |
---|
root |
XPath
Code Block |
---|
//group[2]/entry[3]/@attr |
<root attr="Root">
<group attr="Group1">
<entry attr="Entry1">Group1Entry1</entry>
<entry attr="Entry2">Group1Entry2</entry>
<entry attr="Entry3">Group1Entry3</entry>
</group>
<group attr="Group2">
<entry attr="Entry1">Group2Entry1</entry>
<entry attr="Entry2">Group2Entry2</entry>
<entry attr="Entry3">Group2Entry3</entry>
</group>
<group attr="Group3">
<entry attr="Entry1">Group3Entry1</entry>
<entry attr="Entry2">Group3Entry2</entry>
<entry attr="Entry3">Group3Entry3</entry>
</group>
...
Behavior for XML tags and the related XPath:
XML tag / tag attribute | XPath | Result | Comment |
---|---|---|---|
<emp number="1"></emp> <emp number="2"></emp> <emp number="3"></emp> <emp number=""></emp> | /emp/@number | The empty attribute is handled as a NULL value. | |
<position>Sales</position> <position>Developer</position> <position></position> tag missing | /emp/position/text() | If the XPath points to the text content of a tag that is empty or missing a tag, the value is NULL. | |
<position>Sales</position> <position>Developer</position> <position></position> tag missing | /emp/position | If the XPath points to a tag and the tag is empty, the value is set as (Boolean) true. If the XPath points to a missing tag, the value is NULL. | |
<comment></comment> <comment></comment> <comment></comment> tag missing | /emp/comment/text() | Data isn't imported in this example. | If the XPath points to the text content of a tag which is empty or missing in all target tags, the column doesn't import. |
<location>
<city>Atlanta</city>
<country>USA</country>
<state>Georgia</state>
</location> <location>
<city>San Francisco</city>
<country>USA</country>
<state>California</state>
</location> <location>
<city>Berlin</city>
<country>Germany</country>
<state></state>
</location> | /emp/location | If the XPath points to a tag which contains sub-tags, they are represented as a JSON string. |