The pdf2Data Editor allows you to use the expert mode for selectors; so-called because it gives you extra flexibility, but also requires extended knowledge to build an extraction pipeline.

Prerequisites

We assume you know how to edit the data field in the expert mode.

Expert mode selectors

There are a few pdf2Data selectors which are exclusively available in the expert mode.

Table frequency selector

  • Keyword: tableFreq: selectCell=1;2, selectRow=1:2, selectColumn=2:2

Uses text frequency analysis to detect table cells and might work better than the default Table selector for borderless tables.

The properties selectCell, selectRow, selectColumn are optional, and specify the row and column numbers (or ranges using start:end syntax), if only a part of the table needs to be extracted.

XML Grouping

XML Grouping is used to structure the XML output combining the detected data fields into groups.

  • Keyword: groupByTb: FIELD_NAME
  • FIELD_NAME is a name of any other field in the template

This selector results in all instances of the current data field to be placed inside the preceding (vertically top to bottom) data field FIELD_NAME.

info

Please see the XML grouping article to know more.

Font size selector (expert)

  • Keyword: fontSize: minSize=X, maxSize=Y

Unlike the standard Font size selector, it selects all characters with a font size between X and Y. If minSize and  maxSize parameters are present, the font size of the text inside the field region is ignored.

All pdf2Data selectors can be used in expert mode with special keywords, and some of them also allow you to specify additional parameters that affect accuracy. Please see a particular selector page to get insight on how to use it in expert mode.