[postgis-users] Performace problems with 250.000.000 points
Paul Ramsey
pramsey at refractions.net
Tue Jul 24 08:34:14 PDT 2007
Before I look more closely, your tuning sucks. You have 2GB of RAM,
but you're only allocating a few hundred MB of it for PostGIS
use
> shared_buffers = 1024MB
> effective_cache_size = 1024MB
> random_page_cost = 3
There's still something wrong with your speeds though.
P
On 24-Jul-07, at 6:26 AM, Boehm, Andreas wrote:
> Hi list,
>
> working with an huge data table I've massive performance problems.
>
> Here's the scenario:
> There is one table trees, containing about random 250.000.000 points
> within the square (x = 100000 / y = 100000). SRID is -1. A second
> table
> land_parcel contains about 900.000 polygons (1 ring with 17 points).
> These polygons are uniformly distributed within the square from lower
> left to upper right.
>
> The task is to read all the trees within a specific land_parcel.
> Something like
> select x(t.geom), y(t.geom)
> from land_parcel as l, trees as t
> where (l.lp_id = 123456)
> and (l.geom && t.geom) and Contains(l.geom, t.geom)
>
> The problem is that the execution of this query takes ages.
> Although the
> result set is very small (approximately 25 trees).
> In the first attempts I measured an average computation time of 366 s
> for selecting one random land_parcel. The maximum computation time I
> measured was about 700 s. In the meantime I've got better results (see
> below).
>
> My assumption is that the bottle neck is the geo index. Reading the
> 100
> first land_parcels in a loop I got the result for the 1st land_parcel
> after 12 s. The times for the following land_parcels are decreasing (7
> s, 6 s, ... 0.5 s).
> There is an other thing, too. If you once picked a random land_parcel
> the computation for a neighbour land_parcel is much faster.
>
> PostgreSQL is running on a windows machine (Win 2003 Server) with an
> Intel Core2 CPU 6700, 2.66 GHz, 2 GB RAM and a 1.1 TB RAID 5. CPU
> usage
> is low, I/O traffic is height, mem usage is about 80 MB.
>
> I've already walked through Mark's tuning guide. Here are some
> snippets
> of the configuration file:
> shared_buffers = 64MB
> effective_cache_size = 196MB
> random_page_cost = 4
> I've altered the tables:
> ALTER TABLE trees ALTER COLUMN geom SET STATISTICS 1000;
> ALTER TABLE trees ALTER COLUMN tr_id SET STATISTICS 1000;
> ALTER TABLE land_parcel ALTER COLUMN geom SET STATISTICS 1000;
> ALTER TABLE land_parcel ALTER COLUMN lp_id SET STATISTICS 1000;
> I've done several vacuum analyzes.
> I've done a reindex on trees and land_parcels.
> I've re-clusterd the table trees
> CLUSTER trees_index ON trees;
>
> The clustering was successful. The average time for one land_parcel is
> 9.7 s. But this is still to slow. Running the scenario in ArcSDE /
> Oracle the average computation time is approximately 1.4 s.
>
> Has anybody still an idea? Or are 250.000.000 points just too many
> items?
> Thanks
>
> Andreas
>
> ---
>
> --
> -- Query plan
> --
> Nested Loop (cost=99.47..5072.48 rows=19 width=21)
> Join Filter: contains(l.geom, t.geom)
> -> Index Scan using land_parcel_pkey on land_parcel l
> (cost=0.00..9.25 rows=1 width=317)
> Index Cond: (lp_id = 123456)
> -> Bitmap Heap Scan on trees t (cost=99.47..5044.38 rows=1250
> width=21)
> Filter: (l.geom && t.geom)
> -> Bitmap Index Scan on trees_index (cost=0.00..99.16
> rows=1250 width=0)
> Index Cond: (l.geom && t.geom)
>
> --
> -- Table: trees
> --
> CREATE TABLE trees
> (
> tr_id integer NOT NULL,
> breed_id character varying(10) NOT NULL,
> height double precision,
> crown_diam double precision,
> bhd double precision,
> volume double precision,
> plant_date character varying(10),
> geom geometry,
> CONSTRAINT trees_x_pkey PRIMARY KEY (tr_id),
> CONSTRAINT enforce_dims_geom CHECK (ndims(geom) = 2),
> CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) =
> 'POINT'::text OR geom IS NULL),
> CONSTRAINT enforce_srid_geom CHECK (srid(geom) = -1)
> )
> WITHOUT OIDS;
> ALTER TABLE trees OWNER TO postgres;
>
> -- Index: trees_breed_ididx
> CREATE INDEX trees_breed_ididx
> ON trees
> USING btree
> (breed_id);
>
> -- Index: trees_heightidx
> CREATE INDEX trees_heightidx
> ON trees
> USING btree
> (height);
>
> -- Index: trees_index
> CREATE INDEX trees_index
> ON trees
> USING gist
> (geom);
>
> -- Index: trees_plant_dateidx
> CREATE INDEX trees_plant_dateidx
> ON trees
> USING btree
> (plant_date);
>
> --
> -- Table: land_parcel
> --
> CREATE TABLE land_parcel
> (
> lp_id integer NOT NULL,
> owner_id character varying(20) NOT NULL,
> numerator integer NOT NULL,
> denominator integer NOT NULL,
> geom geometry,
> CONSTRAINT land_parcel_pkey PRIMARY KEY (lp_id),
> CONSTRAINT enforce_dims_geom CHECK (ndims(geom) = 2),
> CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) =
> 'POLYGON'::text OR geom IS NULL),
> CONSTRAINT enforce_srid_geom CHECK (srid(geom) = -1)
> )
> WITHOUT OIDS;
> ALTER TABLE land_parcel OWNER TO postgres;
>
>
> -- Index: land_parcel_index
> CREATE INDEX land_parcel_index
> ON land_parcel
> USING gist
> (geom);
>
> -- Index: land_parcel_owner_ididx
> CREATE INDEX land_parcel_owner_ididx
> ON land_parcel
> USING btree
> (owner_id);
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list