PDF Extractor

From WilkWiki
Jump to: navigation, search
PDF Extractor
  Developer Wilkinson
  Product WilComm
  First released 20th November 2008
  First version
  Latest release April 2009
  Latest version


PDF Extractor allows PDF files to be processed through the WilComm Document Distribution system. It outputs the content of the PDF file as text fields along with their coordinates. The contents are written to the specified text output file, ready for processing by WilComm.

PDF Extractor runs as a Windows Service under the name of PDFMonitor.

Currently supports ERP files from:

  • IBS Enterprise
  • JDE
  • SAP R/3



The PDFMonitor service must be installed before starting.

Run this from command line:

\WINDOWS\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe "\Program Files\Wilkinson\WilComm 4\Application Data\PDFExtractor\PDFExtractor.exe"


Ensure the PDFMonitor service is running.

PDF files placed in PickUpFolder are automatically processed.

INI Files


Known Issues

  1. We may need to set the code page for accurate translation (especially for Japanese PDF).
  2. Some PDF formats fail to process. These are addressed on an as-needed basis.

Technical Background

  1. Decompress any compressed stream using FlateDecode
  2. Provide translation for character codes from CID (Postcript Character ID) providing the PDF contains /ToUnicode stream.
  3. Finds the appropriate font and font size and write it to the UNICODE text file (and use it within Bitmap)