The pdf2Data SDK is a native Java (or .NET) application. Its primary function is to extract data from PDF files using predefined extraction rules.

The extracted data is output in XML format.

Installation

Java

The preferred way to set up pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from the iText Artifactory located at https://repo.itextsupport.com/pdf2data/

The groupId is com.itextpdf.pdf2data, and the artifactId is pdf2data

In Maven, the configuration would look similar to the example below:

Maven
<repository>
	<id>pdf2Data</id>
	<name>pdf2Data Maven Repository</name>
	<url>https://repo.itextsupport.com/pdf2data</url>
</repository>

<dependency>
	<groupId>com.itextpdf.pdf2data</groupId>
	<artifactId>pdf2data</artifactId>
	<version>3.1.0</version>
</dependency>

.NET

For .NET pdf2Data is distributed as a NuGet package which is available at NuGet.org or at iText Artifactory.

You can browse for the desired NuGet package manually or install it with the Install-Package itext7.pdf2data NuGet Package Manager command.

Integrating pdf2Data into your code

Below is an example of how pdf2Data can be used in code:

// Make sure to load license file before invoking any code LicenseKey.loadLicenseFile(pathToLicenseFile); // Parse template into an object that will be used later on Template template = Pdf2DataExtractor.parseTemplateFromPDF(pathToPdfTemplate); // Create an instance of Pdf2DataExtractor for the parsed template Pdf2DataExtractor extractor = new Pdf2DataExtractor(template); // Feed file to be parsed against the template. Can be called multiple times for different files ParsingResult result = extractor.recognize(pathToFileToParse); // Save result to XML or explore the ParsingResult object to fetch information programmatically result.saveToXML(pathToOutXmlFile);
// Make sure to load license file before invoking any code LicenseKey.LoadLicenseFile(pathToLicenseFile); // Parse template into an object that will be used later on Template template = Pdf2DataExtractor.ParseTemplateFromPDF(pathToPdfTemplate); // Create an instance of Pdf2DataExtractor for the parsed template Pdf2DataExtractor extractor = new Pdf2DataExtractor(template); // Feed file to be parsed against the template. Can be called multiple times for different files ParsingResult result = extractor.Recognize(pathToFileToParse); // Save result to XML or explore the ParsingResult object to fetch information programmatically result.SaveToXML(pathToOutXmlFile);