[postgis-users] Performace problems with 250.000.000 points

Patricio Cifuentes Ithal pcifuentes at siigsa.cl
Tue Jul 24 08:50:44 PDT 2007


Is very good with JOIN,
More RAM. 200MB = 10% for 2GB and you have 64? In share_buffer,
Aplicate case box3d, extension the view.

Sorry my English... I talk spanglish.. jeje :D

Patricio Cifuentes Ithal
Ingeniero en Informática

SIIGSA
www.siigsa.cl
56-2 204 60 22


> -----Mensaje original-----
> De: postgis-users-bounces at postgis.refractions.net [mailto:postgis-
> users-bounces at postgis.refractions.net] En nombre de Mark Cave-Ayland
> Enviado el: martes, 24 de julio de 2007 11:31
> Para: PostGIS Users Discussion
> Asunto: Re: [postgis-users] Performace problems with 250.000.000 points
> 
> On Tue, 2007-07-24 at 15:26 +0200, Boehm, Andreas wrote:
> > Hi list,
> >
> > working with an huge data table I've massive performance problems.
> >
> > Here's the scenario:
> > There is one table trees, containing about random 250.000.000 points
> > within the square (x = 100000 / y = 100000). SRID is -1. A second
> table
> > land_parcel contains about 900.000 polygons (1 ring with 17 points).
> > These polygons are uniformly distributed within the square from lower
> > left to upper right.
> >
> > The task is to read all the trees within a specific land_parcel.
> > Something like
> > 	select x(t.geom), y(t.geom)
> > 	from land_parcel as l, trees as t
> > 	where (l.lp_id = 123456)
> > 	and (l.geom && t.geom) and Contains(l.geom, t.geom)
> >
> > The problem is that the execution of this query takes ages. Although
> the
> > result set is very small (approximately 25 trees).
> > In the first attempts I measured an average computation time of 366 s
> > for selecting one random land_parcel. The maximum computation time I
> > measured was about 700 s. In the meantime I've got better results
> (see
> > below).
> >
> > My assumption is that the bottle neck is the geo index. Reading the
> 100
> > first land_parcels in a loop I got the result for the 1st land_parcel
> > after 12 s. The times for the following land_parcels are decreasing
> (7
> > s, 6 s, ... 0.5 s).
> > There is an other thing, too. If you once picked a random land_parcel
> > the computation for a neighbour land_parcel is much faster.
> >
> > PostgreSQL is running on a windows machine (Win 2003 Server) with an
> > Intel Core2 CPU 6700, 2.66 GHz, 2 GB RAM and a 1.1 TB RAID 5. CPU
> usage
> > is low, I/O traffic is height, mem usage is about 80 MB.
> >
> > I've already walked through Mark's tuning guide. Here are some
> snippets
> > of the configuration file:
> > 	shared_buffers = 64MB
> > 	effective_cache_size = 196MB
> > 	random_page_cost = 4
> > I've altered the tables:
> > 	ALTER TABLE trees ALTER COLUMN geom SET STATISTICS 1000;
> > 	ALTER TABLE trees ALTER COLUMN tr_id SET STATISTICS 1000;
> > 	ALTER TABLE land_parcel ALTER COLUMN geom SET STATISTICS 1000;
> > 	ALTER TABLE land_parcel ALTER COLUMN lp_id SET STATISTICS 1000;
> > I've done several vacuum analyzes.
> > I've done a reindex on trees and land_parcels.
> > I've re-clusterd the table trees
> > 	CLUSTER trees_index ON trees;
> >
> > The clustering was successful. The average time for one land_parcel
> is
> > 9.7 s. But this is still to slow. Running the scenario in ArcSDE /
> > Oracle the average computation time is approximately 1.4 s.
> >
> > Has anybody still an idea? Or are 250.000.000 points just too many
> > items?
> > Thanks
> >
> > Andreas
> 
> 
> Hi Andreas,
> 
> Glad you found the tuning guide useful ;)
> 
> Firstly break the query down into its two main indexable components and
> check the speed and number of rows returned for each, e.g.
> 
> 
> SELECT * FROM land_parcel WHERE lp_id = 123456;
> 
> then:
> 
> SELECT * FROM trees WHERE geom && (SELECT geom FROM land_parcel WHERE
> lp_id = 123456);
> 
> 
> to see if that works any better (I suspect the second query will be
> faster since it eliminates the nested loop in your plan). Hopefully
> this
> should be enough information to get you on your way, otherwise we need
> to see the EXPLAIN ANALYZE output rather than just EXPLAIN. This is
> because the former contains extra cost information indicating exactly
> why the planner made its current choice of plan.
> 
> 
> Kind regards,
> 
> Mark.
> 
> --
> ILande - Open Source Consultancy
> http://www.ilande.co.uk
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> --
> Este mensaje ha sido analizado por MailScanner
> en busca de virus y otros contenidos peligrosos,
> y se considera que está limpio.
> 
> www.siigsa.cl



-- 
Este mensaje ha sido analizado por MailScanner
en busca de virus y otros contenidos peligrosos,
y se considera que está limpio.

www.siigsa.cl




More information about the postgis-users mailing list