/
Using Regular Expressions

Using Regular Expressions

Regular expressions are a way for matching strings of text, e.g., particular characters, words, or phrases. They can be used in searching, editing and manipulating your data. You can use Java-style regular expressions in formulas by typing them into the formula or using them in a filter. Use them to process text for items such as email addresses, file names and extensions, or company names.

Using Regular Expressions in Formulas

  1. Click a column in a workbook. The the current formula for that column is displayed in the Fx bar.
  2. Create or edit the formula using regular expressions. Then press Enter.

The following functions allow regular expressions to be used as arguments:
INDEX(), MATCHES(), NGRAM(), REGEX(), REGEXTRACT(), REPLACEALL(), and TOKENIZE()

Example:

If your workbook had a column listing all the users of your website, you could use the MATCHES() function and .*\d+.* to find all user names containing numbers.
Or you could use the REGEXTRACT() function and [JMS].* to tokenize all user names that start with "J", "M" or "S".

Using Regular Expressions in Filters

Here is a quick overview of how filters work with regular expressions. Refer to Filtering Data to learn more about filters.

Using regular expressions in simple filters:

  1. Click the Filter icon.
  2. Select the data to be filtered. Only string data support regular expressions. Select matches regexp from the drop-down list.
  3. Create the regular expression.
  4. Click the + (plus) button to add more columns to the filter if desired.
  5. Click Filter Sheet.

Using regular expressions in advanced filters:

  1. Click the Filter icon.
  2. Click the Advanced tab.
  3. Enter a function which uses regular expressions as an argument.
  4. Add more functions using the logical AND or OR operators (&& or || respectively)
  5. Click Filter Sheet.

Escape character

As with other types of regular expressions, Java regular expressions include pre-defined character classes, e.g., \d (any digit from 0 to 9) or \s (a whitespace character, like tab or return). As you can see these constructs include a \ (backslash). Normally when using Java regular expressions the defined character classes must also be preceded by a backslash, e.g., \\d. When using the Formula Builder in Datameer X this convention isn't followed, the backslash is inserted automatically. However, when inserting a regular expression directly into the formula field, this step isn't done.

Construct

Meaning

Used in the Formula Builder

Used in the Formula Field

.*\d

any character(s) followed by a number

.*\d

.*\\d

\s

a white space character

\s

\\s

Datameer X tries to match entire record without automatically adding  ^   or  $   to the regular expression. But if the record has multiple lines, then this must be accounted for with  (?s)  or line terminators.

More information about regular expressions can be found on the web. Two sites we recommend are the Java Tutorials and 10 Regular Expressions You Should Know.
If you have created a regular expression and want to make sure that it is correct or if you need to change it into a java string then RegexPlanet's Regular Expression Test Page is just what you need.

See Importing with Regular Expressions to learn more.