[OSGeo-Discuss] Automatic geocoding of PDF documents
Stephen Woodbridge
woodbri at swoodbridge.com
Tue Jan 17 19:00:42 PST 2012
On 1/17/2012 2:51 PM, Arnie Shore wrote:
> I wonder if someone can describe what's seen as the
> tall-pole-in-the-tent here, difficulty-wise.
Arnie,
I think that there is no simple answer to this because it is largely
defined by the specific requirements.
If your problem is scanning text and extracting location references,
then the problem is based on how do you recognize locations in a text
document?, how do you deal with languages?, how do you use context to
disambiguate locations?, etc, and then how do you geocode it?.
For the geocoding part, what are you geocoding? eg, addresses,
intersections, placenames, postal codes, landmarks, parcel data,
geography names, historical names? and do you have good reference data
for these? What is your reference data set?, how accurate/complete is
it?, how do you standardize it?, how do you standardize you input
locations? Are there different standardization rules for different types
of data? For addresses in different countries? Fuzzy searching is
another area of expertise that can be deployed in this problem area
which has its one set of issues with respect to the specific requirements.
Between dealing with natural language issues, idiomatic and slang
references, local knowledge issues, spelling abbreviations and errors
and reference data errors and missing data and how these interact is
probably one of the harder issues.
I'm not sure there is one long pole, more like 5-6 long poles ;-)
-Steve
More information about the Discuss
mailing list