[postgis-users] Performance compared to large shapefiles?
Carl Anderson
carl.anderson at vadose.org
Thu Apr 26 23:13:23 PDT 2007
To me the control is the tension between shapefiles being faster at read
individual records and PostGIS being faster retrieving a small subset of
a much larger dataset.
The difference between loading the entire SHX file (necessary for
shapefile use) into memory and the cost of serializing geometries
through Postgresql seems to be the root factor.
PostGIS indexes such as GIST are easier on memory use (of the PostGIS
Server).
The cost of loading the entire SHX into memory can be expensive for
shapefiles with a large number of records. (much more than 100,000)
For those not intimately acquainted with shapefiles, A shapefile is
composed of
a DBF file with the attributes
a SHP file with the vertex information
a SHX file with the binary location of the start of each shape in
the SHP file. shapes can have variable length an the SHX index is
necessary to locate individual shapes withing the SHP file.
We find that it is faster and easier to dedicate a server to datasets
witth 100k to 1bn records and tune the server, than it is to tune
shapefiles through tiling.
For dataset with a large number of records
if you will be usually rendering the entire dataset, shapefiles are
usually faster
if you will be usually be rendering a small subset of the records
PostGIS will usually be faster
Tiled shapefiles can be fast to render as well but may have performance
problems with attribute queries.
The real answer is to time and test each of the 3 methods
A single shapefile
Tiled shapefiles
PostGIS
and see how they work in your case.
C.
Paul Ramsey wrote:
> It depends a lot on your use case. If you're rendering thousands of
> features, that's just going to take a lot of time, shape file or
> postgis. You might find that for really large tables the postgis index
> is balanced better than the shape quadtree, and that provides some
> performance boost. But always, you have to be drawing a managable subset.
>
> The 10% figure is quite old, and was a single-threaded test on the old
> postgis geometries. Some folks have told me that for concurrent access
> PostGIS is actually faster than shape files. But frankly, we are
> missing any kind of real benchmark at this point. You just have to try
> yourself and see.
>
> Paul
>
> On 25-Apr-07, at 7:46 AM, Jeff Dege wrote:
>
>> Someone pointed me to PostGIS as being a tool worth considering.
>>
>> We've done our mapping so far with various extensions built on top of
>> UMN MapServer (ka-map and openlayers). The GIS data we've been storing
>> in shapfiles.
>>
>> We're finding it very difficult to manage acceptable performance when
>> working with large shapefiles, where large means >500MB, >3 million
>> features. We've been splitting these both by feature type (pulling the
>> features we display at wider zooms into separate files) and by geography
>> (tiling). It's tedious, time-consuming, and performance still isn't
>> what we'd wish.
>>
>> The docs for PostGIS say that we can expect access times to be about 10%
>> greater than working with shapefiles, due to the overhead of
>> establishing a database, etc.
>>
>> Is this constant?
>>
>> What I'd like to be true is that PostGIS would offer indexing
>> possibilities that would allow for faster access to subsets of large
>> sets of geographic data than we're getting with shapefiles.
>>
>> But what I'd like to be true isn't always the case.
>>
>> Can I accomplish the sort of speedups I'm getting by splitting up
>> shapefiles, within PostGIS, without the expense of splitting up
>> shapefiles?
>>
>> Can I do better?
>>
>> Thanks.
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
--
Carl Anderson
carl.anderson at vadose.org
More information about the postgis-users
mailing list