ELEMENTS_BY_SELECTOR_QUERY
Syntax
ELEMENTS_BY_SELECTOR_QUERY(<string [containing HTML elements]>;<selector query>)
Description
Returns all elements that match the selector query as a list.
For more information on selector queries, see jsoup.org
Example
Download the example file: HTML_File_Example.html
Given the following excerpt from the HTML file:
<table border="1" rules="groups"> <thead> <tr> <th>Association 1</th> <th>Association 2</th> <th>Association 3</th> </tr> </thead> <tfoot> <tr> <td><i>affected:<br>4 Million People</i></td> <td><i>affected:<br>2 Million People</i></td> <td><i>affected:<br>1 Million People</i></td> </tr> </tfoot> <tbody> <tr> <td>New York</td> <td>San Francisco</td> <td>Atlanta</td> </tr> <tr> <td>Bread</td> <td>Biscuits</td> <td>Rolls</td> </tr> <tr> <td>Sandwich</td> <td>Soup</td> <td>Salad</td> </tr> </tbody> </table>
The goal is to extract only the table data content that is located in the table body. Looking at the jsoup documentation on defining queries, a possible query to use is:
ancestor child: child elements that descend from ancestor |
In this case, first extract the ancestor table body and then the child table data.
tbody td |
The results are the table data <td> elements that are located in the table body <tbody> tag.
[<td>New York</td>, <td>San Francisco</td>, <td>Atlanta</td>, <td>Bread</td>, <td>Biscuits</td>, <td>Rolls</td>, <td>Sandwich</td>, <td>Soup</td>, <td>Salad</td>] |