[postgis-users] (no subject)

Mark Cave-Ayland mark.cave-ayland at siriusit.co.uk
Fri Apr 8 07:16:12 PDT 2011


On 07/04/11 17:57, Paul & Caroline Lewis wrote:

> DB SIZE PROBLEM:
> I have run the following tests on a Postgresql 9 with postgis 1.5
> platform and am getting significant table and index size differences.
> TestSet1 is run with a file tunnel6.asc, a CSV file with the following
> being a sample of the data:
> -6.34223029,53.39211958,132.586
> The file is 6 GB in size with 70 million rows. After running TestSet1
> the table has the correct number of rows (70 mill), random viewing of
> the data and it seems fine, while the table size is 16 GB and the index
> size is 6195 MB.
>
> I now Drop Cascade the tunnel6 table for TestSet2
>
> For TestSet2 it is run on a preprocessed version of tunnel6.asc, now
> called tunnel6_py.asc, with the following being a sample of the data:
> -6.34223029,53.39211958,132.586,SRID=4326;POINT(-6.34223029 53.39211958
> 132.586)
> This file grows to 8 GB and still has 70 million rows but after
> following TestSet2 steps while the table still has the correct number of
> rows (70 mill), random viewing of the data and it still seems fine but
> now the table size is 9.5 GB and the index size is 3363 MB.
>
> Have I done something significantly wrong in these tests?
> The TestSet2 process loads the data about 10 minutes quicker than
> TestSet1 so I would like to use it but I don't trust it now given the
> significant differences in table sizes.

Hi Paul,

Due to the way in which the GiST indexes work, the order in which data 
is fed into the index build function can have a noticeable effect on the 
size and performance of the resulting index. This is because depending 
upon the order, page splits can occur at different points in the build 
process and hence influence the overall structure of the index.

There has been some talk of optimising the split function to create 
better indexes, but no-one has really spent much time on it. That said, 
it appears that whatever you do in your pre-processing does have a 
noticeable performance benefit so if you can provide us with information 
as to how this pre-processing affects the order of the data then we may 
be able to make use of this information.


ATB,

Mark.

-- 
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs



More information about the postgis-users mailing list