[postgis-users] (no subject)

Tue Apr 12 09:38:02 PDT 2011

Hi,
Thank you Mark and Richard for your replies. Having looked at this it seems a Full Vacuum is the answer, however I'm not sure why. Processing the SQL scripts as originall reported I do get a large table from TestSet1 and a small table from TestSet2. Once a Full vacuum is performed on the large table from TestSet1 its size drops to the same as the small table from TestS2, however adding a full vacuum into the TestSet1 procedure makes it slower to run than TestSet2, very much slower especially on uploading the very large data sets (70 mill rows). This begs the question is TestSet2 very efficient or is it missing something fundamental that a Full Vacuum provides that I'm not realising at the moment. 

Thanks again,

Paul

From: paulcaz80 at hotmail.com
To: postgis-users at postgis.refractions.net
Subject: 
Date: Thu, 7 Apr 2011 17:57:26 +0100

DB SIZE PROBLEM:
   I have run the following tests on a Postgresql 9 with postgis 1.5 platform and am getting significant table and index size differences.
TestSet1 is run with a file tunnel6.asc, a CSV file with the following being a sample of the data:
-6.34223029,53.39211958,132.586
The file is 6 GB in size with 70 million rows. After running TestSet1 the table has the correct number of rows (70 mill), random viewing of the data and it seems fine, while the table size is 16 GB and the index size is 6195 MB.

I now Drop Cascade the tunnel6  table for TestSet2

For TestSet2 it is run on a preprocessed version of tunnel6.asc, now called tunnel6_py.asc, with the following being a sample of the data:
-6.34223029,53.39211958,132.586,SRID=4326;POINT(-6.34223029 53.39211958 132.586)
This file grows to 8 GB and still has 70 million rows but after following TestSet2 steps while the table still has the correct number of rows (70 mill), random viewing of the data and it still seems fine but now the table size is 9.5 GB and the index size is 3363 MB. 

Have I done something significantly wrong in these tests?
The TestSet2 process loads the data about 10 minutes quicker than TestSet1 so I would like to use it but I don't trust it now given the significant differences in table sizes.

--********************************************************************/
--TestSet1
--********************************************************************/
CREATE TABLE tunnel6
(
latitude numeric,
longitude numeric,
altitude numeric);

COPY tunnel6 (
   latitude,
   longitude,
   altitude,)
  FROM '/media/storage/tunnel6.asc'
  CSV
  HEADER;

SELECT AddGeometryColumn('tunnel6','wgs_geom','4326','POINT',3);

UPDATE tunnel6 SET wgs_geom = ST_SETSRID(ST_MAKEPOINT(longitude, latitude, altitude),4326);

CREATE INDEX tunnel6_wgs_point ON tunnel6 USING gist(wgs_geom);
--********************************************************************/
--TestSet2 - Python PreProcessed Input File
--********************************************************************/
--********************************************************************/
CREATE TABLE tunnel6
(
latitude numeric,
longitude numeric,
altitude numeric
);

SELECT AddGeometryColumn('tunnel6','wgs_geom','4326','POINT',3);

COPY tunnel6(
   latitude,
   longitude,
   altitude,
   wgs_geom)
  FROM '/media/storage/tunnel6_py.asc'
  CSV
  HEADER;

CREATE INDEX tunnel6_wgs_point ON tunnel6 USING gist(wgs_geom);
--********************************************************************/

Any help or insights would be much appreciated.
Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110412/d6ce824f/attachment.html>