pdf2Data Command Line Interface allows extracting data from PDF files from the command line. The output format for data extraction is XML


To start PDF data capturing, you need to download the CLI application from the iText Artifactory.

Basically, you don`t need to configure your environment specifically, as long as you have Java 8, you can use pdf2Data CLI from the command line.


The steps are similar to the ones you would typically do in code. 

Creating template entity from a template PDF

java -jar cli.jar preprocess -t template.pdf -x template.xml -l license.json

File recognition

java -jar cli.jar parse -t template.xml -s file_for_parsing.pdf -p recognized.pdf -x recognized.xml -l license.json

Help information

java -jar cli.jar help preprocess

java -jar cli.jar help parse