[gdal-dev] Open source vector geoprocessing libraries?
jason.roberts at duke.edu
Wed Jan 13 10:27:43 EST 2010
> are you constrained to retaining your data in an ArcGIS compatible format?
We are attempting to build tools that can work with data stored in a variety
of formats. Our current user community uses mostly shapefiles, ArcGIS
personal geodatabases, and ArcGIS file geodatabases. Many of them are
ecologists who do not have the interest or skills to deploy a real DBMS
system. Thus we are hoping to provide tools that can work without one. This
is one reason I was exploring how embeddable PostGIS and SpatiaLite might be
in the other fork of this thread.
> Until the File
> Geodatabase format is published (later this year?) and someone has the
> build an OGR interface, the DBMS route is probably the best route to
It would be really great for that to happen, but I'm not holding my breath.
If it does get published, I would seriously contemplate building an OGR
I have contemplated building an ArcObjects- or arcgisscripting-based driver.
This would at least allow people who have ArcGIS to use OGR to access any
ArcGIS layer, including those created by ArcGIS's tools for joining
arbitrary layers, etc. That would handle file geodatabases, as well as ALL
formats accessible from ArcGIS. If such a driver existed, then we could use
OGR as the base interface inside our application. But creating such a driver
would be a lot of work and have funky dependencies because it either needs
to use Windows COM (for ArcObjects) or Python (for arcgisscripting) to call
the ArcGIS APIs. I am certainly capable of implementing it but because most
of our code is in Python, it is probably easier for me to wrap OGR and
arcgisscripting behind a common abstraction, and then have our tools work
against that abstraction rather than OGR directly.
At any rate, I'm sure it is nice being able to do all your work in a
From: Peter J Halls [mailto:P.Halls at york.ac.uk]
Sent: Wednesday, January 13, 2010 2:33 AM
To: Mateusz Loskot
Cc: Jason Roberts; 'gdal-dev'
Subject: Re: [gdal-dev] Open source vector geoprocessing libraries?
are you constrained to retaining your data in an ArcGIS compatible
If so and if you do not have ArcSDE, then what follows may not be much help.
Otherwise, I think it likely that you will find using a DBMS as your data
repository advantageous for many reasons. Apart from the built in indexing
index based operations, it is *very* much easier to share data between
retaining a single copy and all user having effective access. Until the
Geodatabase format is published (later this year?) and someone has the
build an OGR interface, the DBMS route is probably the best route to
compatibility. We happen to be a corporate Oracle site, but PostGres is
similar. PostGres is supported by ESRI with ArcSDE, so it is possible to
ArcGIS compatibility this way.
Many years ago, I had a Simula class for performing many of these basic
operations, however now my data is all in Oracle: I am able to use the
functions and no longer have to worry about building and rebuilding indexes,
etc. - other than USER_SDO_GEOM_METADATA which, unfortunately, OGR only
to at table creation and does not update. Frankly, life (and maintenance)
much easier now and, certainly with Oracle, I think there have been
Just my ha'pence-worth.
Mateusz Loskot wrote:
> Jason Roberts wrote:
>> I'm not an expert in this area, but I think that big performance
>> gains can be obtained by using a spatial index.
> Yes, likely true.
>> For example, consider a situation where you want to clip out a study
>> region from the full resolution GSHHS shoreline database, a polygon
>> layer. The shoreline polygons have very large, complicated
>> geometries. It would be expensive to loop over every polygon, loading
>> its full geometry and calling GEOS. Instead, you would use the
>> spatial index to isolate the polygons that are likely to overlap with
>> the study region, then loop over just those ones.
> GEOS as JTS provides support of various spatial indexes.
> It is possible to index data and optimise it in this manner as you
> mention. In fact, GEOS uses index internally in various operations.
> The problem is that such index is not persistent, not serialised
> anywhere, so happens in memory only. In fact, there are much more
> problems than this one.
> BTW, PostGIS is an index serialisation.
> OGR does not provide any spatial indexing layer common to various
> vector datasets. For many simple formats it performs the brute-force
> Alternative is to try to divide the tasks:
> 1. Query features from data source using spatial index capability of
> data source.
> 2. Having only subject features selected, apply geometric processing.
> I did it that way, actually.
>> If OGR takes advantage of spatial indexes internally (e.g. if the
>> data source drivers can tell the core about these indexes, and the
>> core can use them when OGRLayer::SetSpatialFilter is called), then
>> many scenarios could be efficiently implemented by just OGR and GEOS
> The problem with OGR and GEOS is cost of translation from OGR geometry
> to GEOS geometry. It can be a bottleneck.
> However, if such processing functionality would be considered as
> built in to OGR, that would make sense, but I still see limitations:
> Let's brainstom a bit and assume it implements operation:
> OGRLayer OGR::SymDifference(OGRLayer layer1, OGRLayer layer2);
> Depending on data source, OGR could exploit its capabilities.,
> If both layers sit in the same PostGIS (or other spatial)
> database, OGR just delegates the processing to PostGIS
> where ST_SymDifference is executed and OGR only grabs the
> results and generates OGRLayer.
> What if layer1 is a Shapefile and layer2 is Oracle table?
> Let's assume Shapefile has .qix file with spatial index
> and Oracle has its own index. What does OGR do?
> Loads .qix to memory, then grabs layer2 and decides which features to
> select form layer1?
> Loads the whole Shapefile to memory and uses Oracle index to select
> features from layer2 "masked" by layer1?
> How to calculate cost which one to transfer in which direction, etc.
> Certainly, it depends on number of elements, what algorithm is used,
> direction of application of algorithm (who is subject, who is object),
> and many more.
> It's plenty of combinations and my point is that if performance (it's
> not only in terms of speed, but any resource) is critical, it would be
> extremely difficult to provide efficient implementation of such
> features in OGR with guaranteed or even determinable degree of
> complexity. Without these guarantees, I see little of use of
> such solution.
> Given that, depending on needs, write a specialised application using
> available tools like OGR and GEOS, that is optimised according to
> specifics of datasets, type of processing, system requirements, etc.
>> If not, then your suggestion may be as fast as any other. For
>> example, the idea of loading the features in to PostGIS or SpatiaLite
>> will require loading all of the full geometries, passing them to
>> another database system, etc, etc. It may be that shuffling all of
>> the data around will be hugely expensive and that just using OGR
>> functions with simple approaches like calling GEOS from nested loops
>> will be faster than shuffling the data to a system that implements a
>> more efficient approach once the data gets there.
> It's never "just using". Performance is usualy a concern regarding large
> datasets. Large datasets are unlikely to be stored in a simple
> format, but in proper spatial data storage, like PostGIS.
> It nicely combines all the elements necessary to perform geometrical
> processing in usable and optimised form, with index.
>> Is that basically what you are saying?
> It is.
> Best regards,
Peter J Halls, GIS Advisor, University of York
Telephone: 01904 433806 Fax: 01904 433740
Snail mail: Computing Service, University of York, Heslington, York YO10 5DD
This message has the status of a private and personal communication
More information about the gdal-dev