Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

XML (Extensible Markup Language) is a language that is both machine and human readable, so it is very usable. 

XPath (XML Path Language) is a powerful language for addressing parts of an XML file.

Table of Contents

Importing XML files and using XPath

  1. Open Datameer and create a new Import Job or File Upload.
  2. Choose your connector where the XML file or files are stored and then choose the XML file data type from the drop-down list.
    \
  3. Select the XML file to import.
    In the XML Log Message section, enter the XML record tag name for the data you want to import. Don't include the opening and closing brackets (<,>).

    In the Field XPaths sections, enter  path expression(s) to specify what data to import. 

    Image Modified 

    Info
    iconfalse

    The XML parser ignores namespaces .
    To parse the namespaced values, the local-name() function must be used.

    Code Block
    titleExample:
    local-name([node-set]//text)

    See additional examples of Parsing XML File Format .

  4. Review the sample data.
  5. Save and run the new import job or file upload.

Examples

...

Simple example

Example file: XML_Examples.xml

...

Click next to review the data after all XPath expressions have been written.

...

XML tag and and XPath behavior

The XML data:

Code Block
languagexml
<root attr<employees>
    <emp number="Root">1">
        <firstname>Scott</firstname>
        <lastname>Pilgrim</lastname>
        <location>
            <city>Atlanta</city>
            <country>USA</country>
    <group attr="Group1">
        <state>Georgia</state>
       <entry attr="Entry1">Group1Entry1</entry> </location>
        <position>Sales</position>
     <entry attr="Entry2">Group1Entry2</entry>   <comment></comment>
        <entry attr="Entry3">Group1Entry3</entry><active />
    </group>emp>
    <group<emp attrnumber="Group2">2">
        <firstname>Kim</firstname>
        <lastname>Pine</lastname>
        <location>
        <entry attr="Entry1">Group2Entry1</entry>
    <city>San Francisco</city>
            <country>USA</country>
            <state>California</state>
       <entry attr="Entry2">Group2Entry2</entry>
 </location>
        <position>Developer</position>
       <entry attr="Entry3">Group2Entry3</entry> <comment></comment>
    </group>emp>
    <group<emp attrnumber="Group3">3">
        <firstname>Ramona</firstname>
        <lastname>Flowers</lastname>
        <location>
            <entry attr="Entry1">Group3Entry1</entry>
<city>Berlin</city>
            <country>Germany</country>
            <state></state>
        </location>
        <position></position>
        <comment></comment>
       <entry attr="Entry2">Group3Entry2</entry> <active></active>
    </emp>
    <entry<emp attrnumber="Entry3">Group3Entry3</entry>>
    </group>emp>
</root>employees>
Info
iconfalse
titleFull XPath Expressions

See W3C.org for full documentation on XML Path Language (XPath) expressions.

...

Select attr attribute from all entry elements.

XML Record Tag Name

Code Block
group

XPath

Code Block
//entry/@attr

...

<root attr="Root">

<group attr="Group1">

<entry attr="Entry1">Group1Entry1</entry>

<entry attr="Entry2">Group1Entry2</entry>

<entry attr="Entry3">Group1Entry3</entry>

</group>

<group attr="Group2">

<entry attr="Entry1">Group2Entry1</entry>

<entry attr="Entry2">Group2Entry2</entry>

<entry attr="Entry3">Group2Entry3</entry>

</group>

<group attr="Group3">

<entry attr="Entry1">Group3Entry1</entry>

<entry attr="Entry2">Group3Entry2</entry>

<entry attr="Entry3">Group3Entry3</entry>

</group>

</root>

...

Select all attr attributes of entire document.

XML Record Tag Name

Code Block
root

XPath

Code Block
//@attr

...

<root attr="Root">

<group attr="Group1">

<entry attr="Entry1">Group1Entry1</entry>

<entry attr="Entry2">Group1Entry2</entry>

<entry attr="Entry3">Group1Entry3</entry>

</group>

<group attr="Group2">

<entry attr="Entry1">Group2Entry1</entry>

<entry attr="Entry2">Group2Entry2</entry>

<entry attr="Entry3">Group2Entry3</entry>

</group>

<group attr="Group3">

<entry attr="Entry1">Group3Entry1</entry>

<entry attr="Entry2">Group3Entry2</entry>

<entry attr="Entry3">Group3Entry3</entry>

</group>

</root>

Select attr attribute from all group elements.

XML Record Tag Name

Code Block
root

XPath

Code Block
//group/@attr

...

<root attr="Root">

<group attr="Group1">

<entry attr="Entry1">Group1Entry1</entry>

<entry attr="Entry2">Group1Entry2</entry>

<entry attr="Entry3">Group1Entry3</entry>

</group>

<group attr="Group2">

<entry attr="Entry1">Group2Entry1</entry>

<entry attr="Entry2">Group2Entry2</entry>

<entry attr="Entry3">Group2Entry3</entry>

</group>

<group attr="Group3">

<entry attr="Entry1">Group3Entry1</entry>

<entry attr="Entry2">Group3Entry2</entry>

<entry attr="Entry3">Group3Entry3</entry>

</group>

</root>

Select only text content of all entry elements.

XML Record Tag Name

Code Block
group

XPath

Code Block
//entry/text()

...

<root attr="Root">

<group attr="Group1">

<entry attr="Entry1">Group1Entry1</entry>

<entry attr="Entry2">Group1Entry2</entry>

<entry attr="Entry3">Group1Entry3</entry>

</group>

<group attr="Group2">

<entry attr="Entry1">Group2Entry1</entry>

<entry attr="Entry2">Group2Entry2</entry>

<entry attr="Entry3">Group2Entry3</entry>

</group>

<group attr="Group3">

<entry attr="Entry1">Group3Entry1</entry>

<entry attr="Entry2">Group3Entry2</entry>

<entry attr="Entry3">Group3Entry3</entry>

</group>

</root>

...

Select attr attribute from XML record tag element.

XML Record Tag Name

Code Block
root

Xpath

Code Block
//root/@attr

...

<root attr="Root">

<group attr="Group1">

<entry attr="Entry1">Group1Entry1</entry>

<entry attr="Entry2">Group1Entry2</entry>

<entry attr="Entry3">Group1Entry3</entry>

</group>

<group attr="Group2">

<entry attr="Entry1">Group2Entry1</entry>

<entry attr="Entry2">Group2Entry2</entry>

<entry attr="Entry3">Group2Entry3</entry>

</group>

<group attr="Group3">

<entry attr="Entry1">Group3Entry1</entry>

<entry attr="Entry2">Group3Entry2</entry>

<entry attr="Entry3">Group3Entry3</entry>

</group>

</root>

...

Select attr attribute from the third entry element in the second group element.

XML Record Tag Name

Code Block
root

XPath

Code Block
//group[2]/entry[3]/@attr

<root attr="Root">

<group attr="Group1">

<entry attr="Entry1">Group1Entry1</entry>

<entry attr="Entry2">Group1Entry2</entry>

<entry attr="Entry3">Group1Entry3</entry>

</group>

<group attr="Group2">

<entry attr="Entry1">Group2Entry1</entry>

<entry attr="Entry2">Group2Entry2</entry>

<entry attr="Entry3">Group2Entry3</entry>

</group>

<group attr="Group3">

<entry attr="Entry1">Group3Entry1</entry>

<entry attr="Entry2">Group3Entry2</entry>

<entry attr="Entry3">Group3Entry3</entry>

</group>

...

Behavior for XML tags and the related XPath:

XML tag / tag attributeXPathResultComment

<emp number="1"></emp>

<emp number="2"></emp>

<emp number="3"></emp>

<emp number=""></emp>

/emp/@number

Image Added

The empty attribute is handled as a NULL value.

<position>Sales</position>

<position>Developer</position>

<position></position>

tag missing

/emp/position/text()

Image Added

If the XPath points to the text content of a tag that is empty or missing a tag, the value is NULL.


<position>Sales</position>

<position>Developer</position>

<position></position>

tag missing

/emp/position

Image Added

If the XPath points to a tag and the tag is empty, the value is set as (Boolean) true.

If the XPath points to a missing tag, the value is NULL.

<comment></comment>

<comment></comment>

<comment></comment>

tag missing

/emp/comment/text()Data isn't imported in this example.

If the XPath points to the text content of a tag which is empty or missing in all target tags, the column doesn't import.


<location>
  <city>Atlanta</city>
  <country>USA</country>
  <state>Georgia</state>
</location>
<location>
  <city>San Francisco</city>
  <country>USA</country>
  <state>California</state>
</location>
<location>
  <city>Berlin</city>
  <country>Germany</country>
  <state></state>
</location>
/emp/location

Image Added

If the XPath points to a tag which contains sub-tags, they are represented as a JSON string.