[postgis-users] Improving ST_DWithin Query Performance

Fri Mar 11 12:38:41 PST 2011

Hallo Robert

Welcome as a new user :-)

What is the other ways you have tried to do this?

I can see no reason that your function should speed things up. I think
what you are doing is just forcing some of the query planners decisions.

After creating the spatial index, did you run vacuum analyze?

Otherwise the planner will not use the index.

I did som testing with a table with 30000 points my runtimes was about
1.5 seconds if I put a tolerance in st_dwithin so I fount 70000 pairs
or 2.5 seconds if I increase the st_dwithin tolerance so I find 300000
pairs.

The query I ran is

SELECT ST_Distance(table1.the_geom, table2.the_geom), table1.id,
table2.id FROM table1 a INNER JOIN table2 b on
ST_DWithin(table1.the_geom, table2.the_geom, 100);

At my dataset this returnes 292320rows in 15 seconds
but then most of the time is used to output al that data to pgAdmin so
to see how much time the calculations use I try:

SELECT SUM(ST_Distance(table1.the_geom, table2.the_geom)) , COUNT(*)
FROM table1 a INNER JOIN table2 b on ST_DWithin(table1.the_geom,
table2.the_geom, 100);

which returns in 2.4 seconds.

Regards

Nicklas

On Fri, 2011-03-11 at 03:39 -0800, robertvc wrote:
> Hi All,
> 
> As a new user to PostGIS I thought it would be a good idea to post here and
> get some advice/help on my particular problem.
> 
> I have a table with around 30,000 entries every 10 minutes or so. The table
> format is the following:
> 
> -- Table: locations
> 
> -- DROP TABLE locations;
> 
> CREATE TABLE locations
> (
>   "timestamp" timestamp without time zone NOT NULL,
>   id integer NOT NULL,
>   "position" geography(Point,4326),
>   CONSTRAINT locations_pkey PRIMARY KEY ("timestamp", id),
>   CONSTRAINT locations_mmsi_fkey FOREIGN KEY (id)
>       REFERENCES cars (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
>   OIDS=FALSE
> );
> ALTER TABLE locations OWNER TO postgres;
> 
> -- Index: locations_gix
> 
> -- DROP INDEX locations_gix;
> 
> CREATE INDEX locations_gix
>   ON locations
>   USING gist
>   ("position");
> 
> 
> Now, I want to design/implement a function that given a certain timestamp it
> will return me a table with all the pairs of the locations within a certain
> range of each other (x) at that particular time as well as the distance. The
> returned table format would be :
> 
>  timestamp, id1, id2, distance 
> 
> I have tried a few things and the best solution that implemented so far is
> implementing my own function in pgplsql. The function is the following:
> 
> 
> create type carPairs as (t timestamp,id1 int,id2 int,d float8);
> 
> create type idDistancePair as (id int,distance float8);
> 
> -- Function: getallpairsinrange(timestamp without time zone)
> 
> -- DROP FUNCTION getallpairsinrange(timestamp without time zone);
> 
> CREATE OR REPLACE FUNCTION getallpairsinrange(timestamp without time zone)
>   RETURNS SETOF carPairs AS
> $BODY$
> declare
>          inRangePair carPairs;
>          locRow locations%rowtype;
>          locRowComp idDistancePair;
> begin
> 
> for locRow in select * from locations where timestamp = $1 loop
> 
> 	for locRowComp in select id, ST_Distance(position, locRow.position) as
> distance from locations where timestamp = $1 and ST_DWithin(position,
> locRow.position, 3000) and id < locRow.id loop
> 	
> 		inRangePair.t := locRow.timestamp;
> 		inRangePair.id1 := locRow.id;
> 		inRangePair.id2 := locRowComp.id;
> 		inRangePair.d := locRowComp.distance;
> 		return next inRangePair;				
> 		
> 	end loop;	
> 	
> end loop;
> return;
> end
> $BODY$
>   LANGUAGE plpgsql VOLATILE
>   COST 100
>   ROWS 1000;
> ALTER FUNCTION getallpairsinrange(timestamp without time zone) OWNER TO
> postgres;
> 
> 
> 
> This function seems to perform a lot faster than anything I have previously
> been able to implement. Now, I would just like to ask you if you think there
> is a way to further improve this query to get a better time. At the moment,
> the query is executed in around 30 seconds, given that at a certain
> timestamp there are around 30 000 locations.
> 
> I should also probably mention the following changes in the postgresql.conf
> file that I have made after I did some searches online: 
>  
> #1st change
> #max_connections = 100			# (change requires restart)
> max_connections = 10
> 
> #2nd change
> #shared_buffers = 32MB			# min 128kB (change requires restart)
> shared_buffers = 1024MB
> 
> #3rd change
> max_locks_per_transaction = 300		# min 10  (change requires restart)
> # I don't seem to have the initial value saved anywhere in the file
> 
> #4th change
> #effective_cache_size = 128MB
> effective_cache_size = 1024MB
> 
> 
> The computer is running on Windows XP SP2 with 3543MB of memory available.
> 
> Any suggestions would be appreciated.
> 
> Thank you,
> 
> Robert
> 
> 
> 
> 
>