[gdal-dev] extract vector/raster data from GeoPDF

Klokan Petr Přidal klokan at klokan.cz
Wed Sep 2 10:27:56 EDT 2009


There is a great blog post (and the linked "worked example" post with details):

It shows you how to create geopdf via GhostScript - so there is
already a practical open-source example how to encode the georeference
into the PDF/PS according the OGC standard - for use in Acrobat
Reader. To add support for such tag in MapServer, which generates pdf
dynamicaly via pdflib, should not be totally problematic.

Decoding is not as hard either, there are nice libraries like poppler
(http://poppler.freedesktop.org/), which allows you to parse vectors
(and convert them to SVG for example) or rasterize the PDF files (into
TIFF,...) via Cairo.
The work is in assigning correct geographic coordinates to the
coordinate system internally used in PDF files and especially write
the bridge to the outside world (with GDAL/OGR).
I am afraid that authors of the GeoPDF standard would not like this,
as it seems that the idea of GeoPDF is "see it in the Acrobat, print
it, but that's all". At least I think so, because they discontinued
their Geopdf2geotiff product and all the conversion tools are just one
way - into GeoPDF. Please correct me...

Anyway, in this moment you can quite easily use utility like
"pdfimages" to extract full quality image tiles from any GeoPDF (like
those from USGS) and merge it based on their location in PDF into one
GDAL file via VRT (gdalbuildvrt) with a bit of hacking. This is what I
did for my favorite USGS DRG of Grand Canyon ;-).
Look at: http://klokan.mzk.cz/~klokan/geopdf/ - soon I will update the
MapTiler.org overlay examples...

Unfortunately all PDF parsing libraries I know are GPL, and that means
we can't use them for the gdal driver - because of the license issues.
But to create a GPL utility for converting GeoPDF to anything what
GDAL/OGR supports should be OK. Poppler can be the best base of such
GDAL-based utility for reading/rasterizing of the GeoPDF files.

Now just find a sponsor and time to make it ;-).


Klokan Petr Pridal

More information about the gdal-dev mailing list