[postgis-users] Improving ST_DWithin Query Performance

Fri Mar 11 03:39:42 PST 2011

Hi All,

As a new user to PostGIS I thought it would be a good idea to post here and
get some advice/help on my particular problem.

I have a table with around 30,000 entries every 10 minutes or so. The table
format is the following:

-- Table: locations

-- DROP TABLE locations;

CREATE TABLE locations
(
  "timestamp" timestamp without time zone NOT NULL,
  id integer NOT NULL,
  "position" geography(Point,4326),
  CONSTRAINT locations_pkey PRIMARY KEY ("timestamp", id),
  CONSTRAINT locations_mmsi_fkey FOREIGN KEY (id)
      REFERENCES cars (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);
ALTER TABLE locations OWNER TO postgres;

-- Index: locations_gix

-- DROP INDEX locations_gix;

CREATE INDEX locations_gix
  ON locations
  USING gist
  ("position");

Now, I want to design/implement a function that given a certain timestamp it
will return me a table with all the pairs of the locations within a certain
range of each other (x) at that particular time as well as the distance. The
returned table format would be :

 timestamp, id1, id2, distance 

I have tried a few things and the best solution that implemented so far is
implementing my own function in pgplsql. The function is the following:

create type carPairs as (t timestamp,id1 int,id2 int,d float8);

create type idDistancePair as (id int,distance float8);

-- Function: getallpairsinrange(timestamp without time zone)

-- DROP FUNCTION getallpairsinrange(timestamp without time zone);

CREATE OR REPLACE FUNCTION getallpairsinrange(timestamp without time zone)
  RETURNS SETOF carPairs AS
$BODY$
declare
         inRangePair carPairs;
         locRow locations%rowtype;
         locRowComp idDistancePair;
begin

for locRow in select * from locations where timestamp = $1 loop

	for locRowComp in select id, ST_Distance(position, locRow.position) as
distance from locations where timestamp = $1 and ST_DWithin(position,
locRow.position, 3000) and id < locRow.id loop

		inRangePair.t := locRow.timestamp;
		inRangePair.id1 := locRow.id;
		inRangePair.id2 := locRowComp.id;
		inRangePair.d := locRowComp.distance;
		return next inRangePair;				

	end loop;	

end loop;
return;
end
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100
  ROWS 1000;
ALTER FUNCTION getallpairsinrange(timestamp without time zone) OWNER TO
postgres;

This function seems to perform a lot faster than anything I have previously
been able to implement. Now, I would just like to ask you if you think there
is a way to further improve this query to get a better time. At the moment,
the query is executed in around 30 seconds, given that at a certain
timestamp there are around 30 000 locations.

I should also probably mention the following changes in the postgresql.conf
file that I have made after I did some searches online: 

#1st change
#max_connections = 100			# (change requires restart)
max_connections = 10

#2nd change
#shared_buffers = 32MB			# min 128kB (change requires restart)
shared_buffers = 1024MB

#3rd change
max_locks_per_transaction = 300		# min 10  (change requires restart)
# I don't seem to have the initial value saved anywhere in the file

#4th change
#effective_cache_size = 128MB
effective_cache_size = 1024MB

The computer is running on Windows XP SP2 with 3543MB of memory available.

Any suggestions would be appreciated.

Thank you,

Robert

-- 
View this message in context: http://old.nabble.com/Improving-ST_DWithin-Query-Performance-tp31124281p31124281.html
Sent from the PostGIS - User mailing list archive at Nabble.com.