<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<font face="sans-serif">Hi Paul</font>,<br>
<br>
I'm not expert on Postgres so, maybe I should remain in silence?<br>
Anyway, I know Oracle well and I know that the physical organization
of your data may be highly relevant to performance. <br>
<br>
Maybe you could prepare all the data you need on a temporary table
and in the end create another table from the temporary one. In my
case, I load longitude/latitude and I create geography and geometry
fields. In the end, I create another table, all indexes and finally
I get rid of the temporary table. The big advantage of this process
is that the table I'm creating in read-only, so, the better
organized the data is, the better. Below you can see an example with
some useful comments. I hope it helps somehow.<br>
<br>
<br>
<br>
-- working with a temporaty table<br>
<br>
CREATE TEMPORARY TABLE tmp<br>
(<br>
location character varying(100) NOT NULL,<br>
easting integer NOT NULL,<br>
northing integer NOT NULL,<br>
longitude double precision NOT NULL,<br>
latitude double precision NOT NULL<br>
)<br>
WITH (<br>
OIDS=FALSE<br>
);<br>
<br>
COPY tmp FROM '/home/rgomes/tmp/gis/gaz50k_gb.csv' WITH DELIMITER
',' CSV HEADER;<br>
<br>
ALTER TABLE tmp ADD COLUMN geometry GEOMETRY;<br>
UPDATE tmp SET geometry = ST_GeometryFromText( 'POINT(' || easting
|| ' ' || northing || ')' , 97405 );<br>
COMMIT;<br>
<br>
ALTER TABLE tmp ADD COLUMN geography GEOGRAPHY(POINT,4326);<br>
UPDATE tmp SET geography = ST_GeographyFromText( 'POINT(' ||
longitude || ' ' || latitude || ')' );<br>
COMMIT;<br>
<br>
<br>
-- now create the table I really need<br>
<br>
drop table location;<br>
CREATE TABLE location AS SELECT location, easting, northing,
longitude, latitude, geometry, geography FROM tmp;<br>
<br>
-- actually, I would like to avoid this step below. I should really
get rid of this:<br>
ALTER TABLE location ADD COLUMN id SERIAL PRIMARY KEY;<br>
-- I could change the tool which generates the CSV file and I could
generate the primary key right in place.<br>
-- Once this table "location" is read-only, I could simply generate
the primary at load time and get rid of this step.<br>
-- I suppose it would be better with regards data organization (Not
sure! I'm not an expert on Postgres).<br>
<br>
<br>
VACUUM ANALYZE location;<br>
<br>
CREATE UNIQUE INDEX location_akey ON location ( upper(location) );<br>
CREATE INDEX location_geometry_akey ON location USING
gist(geometry);<br>
CREATE INDEX location_geography_akey ON location USING
gist(geography);<br>
<br>
<br>
<pre class="moz-signature" cols="72">Richard Gomes
<a class="moz-txt-link-freetext" href="http://www.jquantlib.org/index.php/User:RichardGomes">http://www.jquantlib.org/index.php/User:RichardGomes</a>
twitter: frgomes
JQuantLib is a library for Quantitative Finance written in Java.
<a class="moz-txt-link-freetext" href="http://www.jquantlib.com/">http://www.jquantlib.com/</a>
twitter: jquantlib
</pre>
<br>
On 14/04/11 10:37, Mark Cave-Ayland wrote:
<blockquote cite="mid:4DA6C03C.1060600@siriusit.co.uk" type="cite">On
12/04/11 17:38, Paul & Caroline Lewis wrote:
<br>
<br>
<blockquote type="cite">Hi,
<br>
Thank you Mark and Richard for your replies. Having looked at
this it
<br>
seems a Full Vacuum is the answer, however I'm not sure why.
Processing
<br>
the SQL scripts as originall reported I do get a large table
from
<br>
TestSet1 and a small table from TestSet2. Once a Full vacuum is
<br>
performed on the large table from TestSet1 its size drops to the
same as
<br>
the small table from TestS2, however adding a full vacuum into
the
<br>
TestSet1 procedure makes it slower to run than TestSet2, very
much
<br>
slower especially on uploading the very large data sets (70 mill
rows).
<br>
This begs the question is TestSet2 very efficient or is it
missing
<br>
something fundamental that a Full Vacuum provides that I'm not
realising
<br>
at the moment.
<br>
</blockquote>
<br>
That's strange - do you see the same behaviour if you swap the
order of the data load, i.e. do the ordered data set first, and/or
use a different table name for each load? I'm just wondering if
you're seeing some kind of database bloat if VACUUM fixes the
issue.
<br>
<br>
<br>
ATB,
<br>
<br>
Mark.
<br>
<br>
</blockquote>
</body>
</html>