<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    <font face="sans-serif">Hi Paul</font>,<br>

    <br>

    I'm not expert on Postgres so, maybe I should remain in silence?<br>

    Anyway, I know Oracle well and I know that the physical organization

    of your data may be highly relevant to performance. <br>

    <br>

    Maybe you could prepare all the data you need on a temporary table

    and in the end create another table from the temporary one. In my

    case, I load longitude/latitude and I create geography and geometry

    fields. In the end, I create another table, all indexes and finally

    I get rid of the temporary table. The big advantage of this process

    is that the table I'm creating in read-only, so, the better

    organized the data is, the better. Below you can see an example with

    some useful comments. I hope it helps somehow.<br>

    <br>

    <br>

    <br>

    -- working with a temporaty table<br>

    <br>

    CREATE TEMPORARY TABLE tmp<br>

    (<br>

      location  character varying(100) NOT NULL,<br>

      easting   integer NOT NULL,<br>

      northing  integer NOT NULL,<br>

      longitude double precision NOT NULL,<br>

      latitude  double precision NOT NULL<br>

    )<br>

    WITH (<br>

      OIDS=FALSE<br>

    );<br>

    <br>

    COPY tmp FROM '/home/rgomes/tmp/gis/gaz50k_gb.csv' WITH DELIMITER

    ',' CSV HEADER;<br>

    <br>

    ALTER TABLE tmp ADD COLUMN geometry GEOMETRY;<br>

    UPDATE tmp SET geometry = ST_GeometryFromText( 'POINT(' || easting

    || ' ' || northing || ')' , 97405 );<br>

    COMMIT;<br>

    <br>

    ALTER TABLE tmp ADD COLUMN geography GEOGRAPHY(POINT,4326);<br>

    UPDATE tmp SET geography = ST_GeographyFromText( 'POINT(' ||

    longitude || ' ' || latitude || ')' );<br>

    COMMIT;<br>

    <br>

    <br>

    -- now create the table I really need<br>

    <br>

    drop table location;<br>

    CREATE TABLE location AS SELECT location, easting, northing,

    longitude, latitude, geometry, geography FROM tmp;<br>

    <br>

    -- actually, I would like to avoid this step below. I should really

    get rid of this:<br>

    ALTER TABLE location ADD COLUMN  id SERIAL PRIMARY KEY;<br>

    -- I could change the tool which generates the CSV file and I could

    generate the primary key right in place.<br>

    -- Once this table "location" is read-only, I could simply generate

    the primary at load time and get rid of this step.<br>

    -- I suppose it would be better with regards data organization (Not

    sure! I'm not an expert on Postgres).<br>

    <br>

    <br>

    VACUUM ANALYZE location;<br>

    <br>

    CREATE UNIQUE INDEX location_akey ON location ( upper(location) );<br>

    CREATE INDEX location_geometry_akey  ON location USING

    gist(geometry);<br>

    CREATE INDEX location_geography_akey ON location USING

    gist(geography);<br>

    <br>

    <br>

    <pre class="moz-signature" cols="72">Richard Gomes

<a class="moz-txt-link-freetext" href="http://www.jquantlib.org/index.php/User:RichardGomes">http://www.jquantlib.org/index.php/User:RichardGomes</a>

twitter: frgomes

JQuantLib is a library for Quantitative Finance written in Java.

<a class="moz-txt-link-freetext" href="http://www.jquantlib.com/">http://www.jquantlib.com/</a>

twitter: jquantlib

</pre>

    <br>

    On 14/04/11 10:37, Mark Cave-Ayland wrote:

    <blockquote cite="mid:4DA6C03C.1060600@siriusit.co.uk" type="cite">On

      12/04/11 17:38, Paul & Caroline Lewis wrote:

      <br>

      <br>

      <blockquote type="cite">Hi,

        <br>

        Thank you Mark and Richard for your replies. Having looked at

        this it

        <br>

        seems a Full Vacuum is the answer, however I'm not sure why.

        Processing

        <br>

        the SQL scripts as originall reported I do get a large table

        from

        <br>

        TestSet1 and a small table from TestSet2. Once a Full vacuum is

        <br>

        performed on the large table from TestSet1 its size drops to the

        same as

        <br>

        the small table from TestS2, however adding a full vacuum into

        the

        <br>

        TestSet1 procedure makes it slower to run than TestSet2, very

        much

        <br>

        slower especially on uploading the very large data sets (70 mill

        rows).

        <br>

        This begs the question is TestSet2 very efficient or is it

        missing

        <br>

        something fundamental that a Full Vacuum provides that I'm not

        realising

        <br>

        at the moment.

        <br>

      </blockquote>

      <br>

      That's strange - do you see the same behaviour if you swap the

      order of the data load, i.e. do the ordered data set first, and/or

      use a different table name for each load? I'm just wondering if

      you're seeing some kind of database bloat if VACUUM fixes the

      issue.

      <br>

      <br>

      <br>

      ATB,

      <br>

      <br>

      Mark.

      <br>

      <br>

    </blockquote>

  </body>

</html>