[postgis-users] How do you use PostGIS?
Martin Daly
Martin.Daly at cadcorp.com
Fri May 6 00:29:47 PDT 2005
> You can't share the comparison, but you could tell us the PostGIS raw
> numbers and the hardware specs? All data is good data :)
OS MasterMap loading is a complicated process. OS supply "chunks" of
GZIP compressed GML; in our case 5km x 5km squares. Adjacent chunks
contain the same feature if that feature crosses the bounday between the
chunks. However, as each feature has a unique identifier - the
Topographic Identifier, or TOID - we can check whether or not any
feature is already in the database. Because each feature also has a
version number, this check also confirms that the version numbers of the
feature in the database (from an earlier chunk) and the candidate
feature for insertion (from the current chunk) are identical.
The figures below are for all the 100km National Grid squares that cover
Scotland (purdy picture attached). The source data is 4407 GZIP-d GML
files, ~5.4 Gb on disk. The machine spec is:
Dell PowerEdge 2650
Dual 2.8 GHz Xeon plus HyperThreading
4 GB RAM
2 x 73 GB 10,000rpm SCSI disks with RAID 0
Microsoft Windows Server 2003
PostgreSQL 8.0.0 (native Windows)
PostGIS 1.0 RC6 (ditto)
The upload has several steps:
1. Pre-processing: a simple check of all the OS MM chunks
47 sec(s) (0.0870693%)
2. Loading: extracting features from each GZIP in turn (this parsing is
highly optimised for OS MM)
3 hour(s), 46 min(s), 57 sec(s) (25.226%)
3. TOID Processing: checking for features/TOIDs already in the DB
1 hour(s), 59 min(s), 14 sec(s) (13.2531%)
4. Adding: inserting new features into the DB, using COPY
7 hour(s), 14 min(s), 23 sec(s) (48.2827%)
5. Creating spatial index
1 hour(s), 58 min(s), 4 sec(s) (13.1234%)
6. Total
14 hour(s), 59 min(s), 40 sec(s)
Features/minute (Loading): 233335
Features/minute (TOID processing): 444132
Features/minute (Adding): 121909
Features/minute (Creating spatial index): 448520
Features/minute (Overall): 58861
The resulting table has ~53 million features, and is ~25.6 Gb on disk.
National coverage of OS MM is ~420 million features. From our
benchmarks, we see no reason that the upload would not scale linearly.
So, on that hardware we could expect an upload time of ~5 days, and a
database size of ~200 Gb. Except that that hardware doesn't have enough
disk space, so a National Load will have to wait for a new server later
in the summer.
It is also worth noting that the PostgreSQL/PostGIS installation is
*completely* standard. We have made no attempt to tune the database for
performance whatsoever. All tips welcome.
Regards,
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Scotland.png
Type: image/png
Size: 17953 bytes
Desc: Scotland.png
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20050506/d7b61de1/attachment.png>
More information about the postgis-users
mailing list