[postgis-users] (no subject)
Richard Gomes
rgomes1997 at yahoo.co.uk
Thu Apr 14 03:01:17 PDT 2011
Hi Paul,
I'm not expert on Postgres so, maybe I should remain in silence?
Anyway, I know Oracle well and I know that the physical organization of
your data may be highly relevant to performance.
Maybe you could prepare all the data you need on a temporary table and
in the end create another table from the temporary one. In my case, I
load longitude/latitude and I create geography and geometry fields. In
the end, I create another table, all indexes and finally I get rid of
the temporary table. The big advantage of this process is that the table
I'm creating in read-only, so, the better organized the data is, the
better. Below you can see an example with some useful comments. I hope
it helps somehow.
-- working with a temporaty table
CREATE TEMPORARY TABLE tmp
(
location character varying(100) NOT NULL,
easting integer NOT NULL,
northing integer NOT NULL,
longitude double precision NOT NULL,
latitude double precision NOT NULL
)
WITH (
OIDS=FALSE
);
COPY tmp FROM '/home/rgomes/tmp/gis/gaz50k_gb.csv' WITH DELIMITER ','
CSV HEADER;
ALTER TABLE tmp ADD COLUMN geometry GEOMETRY;
UPDATE tmp SET geometry = ST_GeometryFromText( 'POINT(' || easting || '
' || northing || ')' , 97405 );
COMMIT;
ALTER TABLE tmp ADD COLUMN geography GEOGRAPHY(POINT,4326);
UPDATE tmp SET geography = ST_GeographyFromText( 'POINT(' || longitude
|| ' ' || latitude || ')' );
COMMIT;
-- now create the table I really need
drop table location;
CREATE TABLE location AS SELECT location, easting, northing, longitude,
latitude, geometry, geography FROM tmp;
-- actually, I would like to avoid this step below. I should really get
rid of this:
ALTER TABLE location ADD COLUMN id SERIAL PRIMARY KEY;
-- I could change the tool which generates the CSV file and I could
generate the primary key right in place.
-- Once this table "location" is read-only, I could simply generate the
primary at load time and get rid of this step.
-- I suppose it would be better with regards data organization (Not
sure! I'm not an expert on Postgres).
VACUUM ANALYZE location;
CREATE UNIQUE INDEX location_akey ON location ( upper(location) );
CREATE INDEX location_geometry_akey ON location USING gist(geometry);
CREATE INDEX location_geography_akey ON location USING gist(geography);
Richard Gomes
http://www.jquantlib.org/index.php/User:RichardGomes
twitter: frgomes
JQuantLib is a library for Quantitative Finance written in Java.
http://www.jquantlib.com/
twitter: jquantlib
On 14/04/11 10:37, Mark Cave-Ayland wrote:
> On 12/04/11 17:38, Paul & Caroline Lewis wrote:
>
>> Hi,
>> Thank you Mark and Richard for your replies. Having looked at this it
>> seems a Full Vacuum is the answer, however I'm not sure why. Processing
>> the SQL scripts as originall reported I do get a large table from
>> TestSet1 and a small table from TestSet2. Once a Full vacuum is
>> performed on the large table from TestSet1 its size drops to the same as
>> the small table from TestS2, however adding a full vacuum into the
>> TestSet1 procedure makes it slower to run than TestSet2, very much
>> slower especially on uploading the very large data sets (70 mill rows).
>> This begs the question is TestSet2 very efficient or is it missing
>> something fundamental that a Full Vacuum provides that I'm not realising
>> at the moment.
>
> That's strange - do you see the same behaviour if you swap the order
> of the data load, i.e. do the ordered data set first, and/or use a
> different table name for each load? I'm just wondering if you're
> seeing some kind of database bloat if VACUUM fixes the issue.
>
>
> ATB,
>
> Mark.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110414/d4a59efc/attachment.html>
More information about the postgis-users
mailing list