[gdal-dev] Open source vector geoprocessing libraries?

Peter J Halls P.Halls at york.ac.uk
Wed Jan 13 02:32:57 EST 2010


Jason,

    are you constrained to retaining your data in an ArcGIS compatible format? 
If so and if you do not have ArcSDE, then what follows may not be much help.

Otherwise, I think it likely that you will find using a DBMS as your data 
repository advantageous for many reasons.  Apart from the built in indexing and 
index based operations, it is *very* much easier to share data between users, 
retaining a single copy and all user having effective access.  Until the File 
Geodatabase format is published (later this year?) and someone has the effort to 
build an OGR interface, the DBMS route is probably the best route to 
compatibility.  We happen to be a corporate Oracle site, but PostGres is pretty 
similar.  PostGres is supported by ESRI with ArcSDE, so it is possible to retain 
ArcGIS compatibility this way.

Many years ago, I had a Simula class for performing many of these basic spatial 
operations, however now my data is all in Oracle: I am able to use the Oracle 
functions and no longer have to worry about building and rebuilding indexes, 
etc. - other than USER_SDO_GEOM_METADATA which, unfortunately, OGR only writes 
to at table creation and does not update.  Frankly, life (and maintenance) is 
much easier now and, certainly with Oracle, I think there have been performance 
gains.

Just my ha'pence-worth.

Peter

Mateusz Loskot wrote:
> Jason Roberts wrote:
>> Mateusz,
>>
>> I'm not an expert in this area, but I think that big performance 
>> gains can be obtained by using a spatial index.
> 
> Yes, likely true.
> 
>> For example, consider a situation where you want to clip out a study 
>> region from the full resolution GSHHS shoreline database, a polygon 
>> layer. The shoreline polygons have very large, complicated 
>> geometries. It would be expensive to loop over every polygon, loading
>>  its full geometry and calling GEOS. Instead, you would use the 
>> spatial index to isolate the polygons that are likely to overlap with
>>  the study region, then loop over just those ones.
> 
> GEOS as JTS provides support of various spatial indexes.
> It is possible to index data and optimise it in this manner as you
> mention. In fact, GEOS uses index internally in various operations.
> The problem is that such index is not persistent, not serialised
> anywhere, so happens in memory only. In fact, there are much more
> problems than this one.
> 
> BTW, PostGIS is an index serialisation.
> 
> OGR does not provide any spatial indexing layer common to various
> vector datasets. For many simple formats it performs the brute-force
> selection.
> 
> Alternative is to try to divide the tasks:
> 1. Query features from data source using spatial index capability of
> data source.
> 2. Having only subject features selected, apply geometric processing.
> 
> I did it that way, actually.
> 
>> If OGR takes advantage of spatial indexes internally (e.g. if the 
>> data source drivers can tell the core about these indexes, and the 
>> core can use them when OGRLayer::SetSpatialFilter is called), then 
>> many scenarios could be efficiently implemented by just OGR and GEOS 
>> alone.
> 
> The problem with OGR and GEOS is cost of translation from OGR geometry
> to GEOS geometry. It can be a bottleneck.
> 
> However, if such processing functionality would be considered as
> built in to OGR, that would make sense, but I still see limitations:
> 
> Let's brainstom a bit and assume it implements operation:
> 
> OGRLayer OGR::SymDifference(OGRLayer layer1, OGRLayer layer2);
> 
> Depending on data source, OGR could exploit its capabilities.,
> If both layers sit in the same PostGIS (or other spatial)
> database, OGR just delegates the processing to PostGIS
> where ST_SymDifference is executed and OGR only grabs the
> results and generates OGRLayer.
> 
> What if layer1 is a Shapefile and layer2 is Oracle table?
> Let's assume Shapefile has .qix file with spatial index
> and Oracle has its own index. What does OGR do?
> 
> Loads .qix to memory, then grabs layer2 and decides which features to
> select form layer1?
> Loads the whole Shapefile to memory and uses Oracle index to select
> features from layer2 "masked" by layer1?
> How to calculate cost which one to transfer in which direction, etc.
> 
> Certainly, it depends on number of elements, what algorithm is used,
> direction of application of algorithm (who is subject, who is object),
> and many more.
> 
> It's plenty of combinations and my point is that if performance (it's
> not only in terms of speed, but any resource) is critical, it would be
> extremely difficult to provide efficient  implementation of such
> features in OGR with guaranteed or even determinable degree of
> complexity. Without these guarantees, I see little of use of
> such solution.
> 
> Given that, depending on needs, write a specialised application using
> available tools like OGR and GEOS, that is optimised according to
> specifics of datasets, type of processing, system requirements, etc.
> 
>> If not, then your suggestion may be as fast as any other. For 
>> example, the idea of loading the features in to PostGIS or SpatiaLite
>>  will require loading all of the full geometries, passing them to 
>> another database system, etc, etc. It may be that shuffling all of 
>> the data around will be hugely expensive and that just using OGR 
>> functions with simple approaches like calling GEOS from nested loops 
>> will be faster than shuffling the data to a system that implements a 
>> more efficient approach once the data gets there.
> 
> It's never "just using". Performance is usualy a concern regarding large
> datasets. Large datasets are unlikely to be stored in a simple
> format, but in proper spatial data storage, like PostGIS.
> It nicely combines all the elements necessary to perform geometrical
> processing in usable and optimised form, with index.
> 
>> Is that basically what you are saying?
> 
> It is.
> 
> Best regards,

-- 
--------------------------------------------------------------------------------
Peter J Halls, GIS Advisor, University of York
Telephone: 01904 433806     Fax: 01904 433740
Snail mail: Computing Service, University of York, Heslington, York YO10 5DD
This message has the status of a private and personal communication
--------------------------------------------------------------------------------


More information about the gdal-dev mailing list