[postgis-users] (no subject)

Thu Apr 14 03:01:17 PDT 2011

Hi Paul,

I'm not expert on Postgres so, maybe I should remain in silence?
Anyway, I know Oracle well and I know that the physical organization of 
your data may be highly relevant to performance.

Maybe you could prepare all the data you need on a temporary table and 
in the end create another table from the temporary one. In my case, I 
load longitude/latitude and I create geography and geometry fields. In 
the end, I create another table, all indexes and finally I get rid of 
the temporary table. The big advantage of this process is that the table 
I'm creating in read-only, so, the better organized the data is, the 
better. Below you can see an example with some useful comments. I hope 
it helps somehow.

-- working with a temporaty table

CREATE TEMPORARY TABLE tmp
(
   location  character varying(100) NOT NULL,
   easting   integer NOT NULL,
   northing  integer NOT NULL,
   longitude double precision NOT NULL,
   latitude  double precision NOT NULL
)
WITH (
   OIDS=FALSE
);

COPY tmp FROM '/home/rgomes/tmp/gis/gaz50k_gb.csv' WITH DELIMITER ',' 
CSV HEADER;

ALTER TABLE tmp ADD COLUMN geometry GEOMETRY;
UPDATE tmp SET geometry = ST_GeometryFromText( 'POINT(' || easting || ' 
' || northing || ')' , 97405 );
COMMIT;

ALTER TABLE tmp ADD COLUMN geography GEOGRAPHY(POINT,4326);
UPDATE tmp SET geography = ST_GeographyFromText( 'POINT(' || longitude 
|| ' ' || latitude || ')' );
COMMIT;

-- now create the table I really need

drop table location;
CREATE TABLE location AS SELECT location, easting, northing, longitude, 
latitude, geometry, geography FROM tmp;

-- actually, I would like to avoid this step below. I should really get 
rid of this:
ALTER TABLE location ADD COLUMN  id SERIAL PRIMARY KEY;
-- I could change the tool which generates the CSV file and I could 
generate the primary key right in place.
-- Once this table "location" is read-only, I could simply generate the 
primary at load time and get rid of this step.
-- I suppose it would be better with regards data organization (Not 
sure! I'm not an expert on Postgres).

VACUUM ANALYZE location;

CREATE UNIQUE INDEX location_akey ON location ( upper(location) );
CREATE INDEX location_geometry_akey  ON location USING gist(geometry);
CREATE INDEX location_geography_akey ON location USING gist(geography);

Richard Gomes
http://www.jquantlib.org/index.php/User:RichardGomes
twitter: frgomes

JQuantLib is a library for Quantitative Finance written in Java.
http://www.jquantlib.com/
twitter: jquantlib

On 14/04/11 10:37, Mark Cave-Ayland wrote:
> On 12/04/11 17:38, Paul & Caroline Lewis wrote:
>
>> Hi,
>> Thank you Mark and Richard for your replies. Having looked at this it
>> seems a Full Vacuum is the answer, however I'm not sure why. Processing
>> the SQL scripts as originall reported I do get a large table from
>> TestSet1 and a small table from TestSet2. Once a Full vacuum is
>> performed on the large table from TestSet1 its size drops to the same as
>> the small table from TestS2, however adding a full vacuum into the
>> TestSet1 procedure makes it slower to run than TestSet2, very much
>> slower especially on uploading the very large data sets (70 mill rows).
>> This begs the question is TestSet2 very efficient or is it missing
>> something fundamental that a Full Vacuum provides that I'm not realising
>> at the moment.
>
> That's strange - do you see the same behaviour if you swap the order 
> of the data load, i.e. do the ordered data set first, and/or use a 
> different table name for each load? I'm just wondering if you're 
> seeing some kind of database bloat if VACUUM fixes the issue.
>
>
> ATB,
>
> Mark.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110414/d4a59efc/attachment.html>