PDF Extractor

From WilkWiki
Jump to: navigation, search
PDF Extractor
  Developer Wilkinson
  Product WilComm
  First released 20th November 2008
  First version 1.0.0.0
  Latest release April 2009
  Latest version 2.0.0.1
 
 


Overview

PDF Extractor allows PDF files to be processed through the WilComm Document Distribution system. It outputs the content of the PDF file as text fields along with their coordinates. The contents are written to the specified text output file, ready for processing by WilComm.


PDF Extractor runs as a Windows Service under the name of PDFMonitor.


Currently supports ERP files from:

  • IBS Enterprise
  • JDE
  • SAP R/3


Components


Installation

The PDFMonitor service must be installed before starting.

Run this from command line:

\WINDOWS\Microsoft.NET\Framework\v2.0.50727\InstallUtil.exe "\Program Files\Wilkinson\WilComm 4\Application Data\PDFExtractor\PDFExtractor.exe"


Operation

Ensure the PDFMonitor service is running.

PDF files placed in PickUpFolder are automatically processed.



INI Files


General

Known Issues

  1. We may need to set the code page for accurate translation (especially for Japanese PDF).
  2. Some PDF formats fail to process. These are addressed on an as-needed basis.


Technical Background

  1. Decompress any compressed stream using FlateDecode
  2. Provide translation for character codes from CID (Postcript Character ID) providing the PDF contains /ToUnicode stream.
  3. Finds the appropriate font and font size and write it to the UNICODE text file (and use it within Bitmap)