[postgis-users] Performace problems with 250.000.000 points

Paul Ramsey pramsey at refractions.net
Tue Jul 24 08:34:14 PDT 2007


Before I look more closely, your tuning sucks. You have 2GB of RAM,  
but you're only allocating a few hundred MB of it for PostGIS

use

> 	shared_buffers = 1024MB
> 	effective_cache_size = 1024MB
> 	random_page_cost = 3

There's still something wrong with your speeds though.

P

On 24-Jul-07, at 6:26 AM, Boehm, Andreas wrote:

> Hi list,
>
> working with an huge data table I've massive performance problems.
>
> Here's the scenario:
> There is one table trees, containing about random 250.000.000 points
> within the square (x = 100000 / y = 100000). SRID is -1. A second  
> table
> land_parcel contains about 900.000 polygons (1 ring with 17 points).
> These polygons are uniformly distributed within the square from lower
> left to upper right.
>
> The task is to read all the trees within a specific land_parcel.
> Something like
> 	select x(t.geom), y(t.geom)
> 	from land_parcel as l, trees as t
> 	where (l.lp_id = 123456)
> 	and (l.geom && t.geom) and Contains(l.geom, t.geom)
>
> The problem is that the execution of this query takes ages.  
> Although the
> result set is very small (approximately 25 trees).
> In the first attempts I measured an average computation time of 366 s
> for selecting one random land_parcel. The maximum computation time I
> measured was about 700 s. In the meantime I've got better results (see
> below).
>
> My assumption is that the bottle neck is the geo index. Reading the  
> 100
> first land_parcels in a loop I got the result for the 1st land_parcel
> after 12 s. The times for the following land_parcels are decreasing (7
> s, 6 s, ... 0.5 s).
> There is an other thing, too. If you once picked a random land_parcel
> the computation for a neighbour land_parcel is much faster.
>
> PostgreSQL is running on a windows machine (Win 2003 Server) with an
> Intel Core2 CPU 6700, 2.66 GHz, 2 GB RAM and a 1.1 TB RAID 5. CPU  
> usage
> is low, I/O traffic is height, mem usage is about 80 MB.
>
> I've already walked through Mark's tuning guide. Here are some  
> snippets
> of the configuration file:
> 	shared_buffers = 64MB
> 	effective_cache_size = 196MB
> 	random_page_cost = 4
> I've altered the tables:
> 	ALTER TABLE trees ALTER COLUMN geom SET STATISTICS 1000;
> 	ALTER TABLE trees ALTER COLUMN tr_id SET STATISTICS 1000;
> 	ALTER TABLE land_parcel ALTER COLUMN geom SET STATISTICS 1000;
> 	ALTER TABLE land_parcel ALTER COLUMN lp_id SET STATISTICS 1000;
> I've done several vacuum analyzes.
> I've done a reindex on trees and land_parcels.
> I've re-clusterd the table trees
> 	CLUSTER trees_index ON trees;
>
> The clustering was successful. The average time for one land_parcel is
> 9.7 s. But this is still to slow. Running the scenario in ArcSDE /
> Oracle the average computation time is approximately 1.4 s.
>
> Has anybody still an idea? Or are 250.000.000 points just too many
> items?
> Thanks
>
> Andreas
>
> ---
>
> --
> -- Query plan
> --
> Nested Loop  (cost=99.47..5072.48 rows=19 width=21)
>   Join Filter: contains(l.geom, t.geom)
>   ->  Index Scan using land_parcel_pkey on land_parcel l
> (cost=0.00..9.25 rows=1 width=317)
>         Index Cond: (lp_id = 123456)
>   ->  Bitmap Heap Scan on trees t  (cost=99.47..5044.38 rows=1250
> width=21)
>         Filter: (l.geom && t.geom)
>         ->  Bitmap Index Scan on trees_index  (cost=0.00..99.16
> rows=1250 width=0)
>               Index Cond: (l.geom && t.geom)
>
> --
> -- Table: trees
> --
> CREATE TABLE trees
> (
>   tr_id integer NOT NULL,
>   breed_id character varying(10) NOT NULL,
>   height double precision,
>   crown_diam double precision,
>   bhd double precision,
>   volume double precision,
>   plant_date character varying(10),
>   geom geometry,
>   CONSTRAINT trees_x_pkey PRIMARY KEY (tr_id),
>   CONSTRAINT enforce_dims_geom CHECK (ndims(geom) = 2),
>   CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) =
> 'POINT'::text OR geom IS NULL),
>   CONSTRAINT enforce_srid_geom CHECK (srid(geom) = -1)
> )
> WITHOUT OIDS;
> ALTER TABLE trees OWNER TO postgres;
>
> -- Index: trees_breed_ididx
> CREATE INDEX trees_breed_ididx
>   ON trees
>   USING btree
>   (breed_id);
>
> -- Index: trees_heightidx
> CREATE INDEX trees_heightidx
>   ON trees
>   USING btree
>   (height);
>
> -- Index: trees_index
> CREATE INDEX trees_index
>   ON trees
>   USING gist
>   (geom);
>
> -- Index: trees_plant_dateidx
> CREATE INDEX trees_plant_dateidx
>   ON trees
>   USING btree
>   (plant_date);
>
> --
> -- Table: land_parcel
> --
> CREATE TABLE land_parcel
> (
>   lp_id integer NOT NULL,
>   owner_id character varying(20) NOT NULL,
>   numerator integer NOT NULL,
>   denominator integer NOT NULL,
>   geom geometry,
>   CONSTRAINT land_parcel_pkey PRIMARY KEY (lp_id),
>   CONSTRAINT enforce_dims_geom CHECK (ndims(geom) = 2),
>   CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) =
> 'POLYGON'::text OR geom IS NULL),
>   CONSTRAINT enforce_srid_geom CHECK (srid(geom) = -1)
> )
> WITHOUT OIDS;
> ALTER TABLE land_parcel OWNER TO postgres;
>
>
> -- Index: land_parcel_index
> CREATE INDEX land_parcel_index
>   ON land_parcel
>   USING gist
>   (geom);
>
> -- Index: land_parcel_owner_ididx
> CREATE INDEX land_parcel_owner_ididx
>   ON land_parcel
>   USING btree
>   (owner_id);
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list