[postgis-users] Performance compared to large shapefiles?

Carl Anderson carl.anderson at vadose.org
Thu Apr 26 23:13:23 PDT 2007


To me the control is the tension between shapefiles being faster at read 
individual records and PostGIS being faster retrieving a small subset of 
a much larger dataset.
The difference between loading the entire SHX file (necessary for 
shapefile use) into memory and the cost of serializing geometries 
through Postgresql seems to be the root factor.

PostGIS  indexes such as GIST are easier on memory use (of the PostGIS 
Server).
The cost of loading the entire SHX into memory can be expensive for 
shapefiles with a large number of records.  (much more than 100,000)

For those not intimately acquainted with shapefiles,  A shapefile is 
composed of
    a DBF file with the attributes
    a SHP file with the vertex information
    a SHX file with the binary location of the start of each shape in 
the SHP file.  shapes can have variable length an the SHX index is 
necessary to locate individual shapes withing the SHP file.

We find that it is faster and easier to dedicate a server to datasets 
witth 100k to 1bn records and tune the server, than it is to tune 
shapefiles through tiling.

For dataset with a large number of records
    if you will be usually rendering the entire dataset, shapefiles are 
usually faster
    if you will be usually be rendering a small subset of the records 
PostGIS will usually be faster
   
Tiled shapefiles can be fast to render as well but may have performance 
problems with attribute queries.

The real answer is to time and test each of the 3 methods
    A single shapefile
    Tiled shapefiles
    PostGIS

and see how they work in your case.

C.

Paul Ramsey wrote:
> It depends a lot on your use case. If you're rendering thousands of 
> features, that's just going to take a lot of time, shape file or 
> postgis. You might find that for really large tables the postgis index 
> is balanced better than the shape quadtree, and that provides some 
> performance boost. But always, you have to be drawing a managable subset.
>
> The 10% figure is quite old, and was a single-threaded test on the old 
> postgis geometries. Some folks have told me that for concurrent access 
> PostGIS is actually faster than shape files. But frankly, we are 
> missing any kind of real benchmark at this point. You just have to try 
> yourself and see.
>
> Paul
>
> On 25-Apr-07, at 7:46 AM, Jeff Dege wrote:
>
>> Someone pointed me to PostGIS as being a tool worth considering.
>>
>> We've done our mapping so far with various extensions built on top of
>> UMN MapServer (ka-map and openlayers).  The GIS data we've been storing
>> in shapfiles.
>>
>> We're finding it very difficult to manage acceptable performance when
>> working with large shapefiles, where large means >500MB, >3 million
>> features.  We've been splitting these both by feature type (pulling the
>> features we display at wider zooms into separate files) and by geography
>> (tiling).  It's tedious, time-consuming, and performance still isn't
>> what we'd wish.
>>
>> The docs for PostGIS say that we can expect access times to be about 10%
>> greater than working with shapefiles, due to the overhead of
>> establishing a database, etc.
>>
>> Is this constant?
>>
>> What I'd like to be true is that PostGIS would offer indexing
>> possibilities that would allow for faster access to subsets of large
>> sets of geographic data than we're getting with shapefiles.
>>
>> But what I'd like to be true isn't always the case.
>>
>> Can I accomplish the sort of speedups I'm getting by splitting up
>> shapefiles, within PostGIS, without the expense of splitting up
>> shapefiles?
>>
>> Can I do better?
>>
>> Thanks.
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users


-- 

Carl Anderson
carl.anderson at vadose.org





More information about the postgis-users mailing list