[postgis-users] Problem with Intersection

Thu Mar 23 04:25:53 PST 2006

> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-
> bounces at postgis.refractions.net] On Behalf Of First Last
> Sent: 22 March 2006 10:55
> To: postgis-users at postgis.refractions.net
> Subject: [postgis-users] Problem with Intersection
> 
> 
> Somebody please explain to me why the query below is taking too long. I
> mean aside from the fact that I am working on large dataset.
> 
> "SELECT * from (SELECT b.code, a.*, intersection(a.the_geom,
> b.memgeomunion) FROM rlrd AS a, (SELECT code, memgeomunion(the_geom) FROM
> countries WHERE code = 'CAN' GROUP BY code) AS b WHERE a.the_geom &&
> b.memgeomunion AND intersects(a.the_geom, b.memgeomunion)) AS c where
> geometrytype(c.intersection)!='GEOMETRYCOLLECTION';"
> 
> This query is supposed to extract the railroad network of Canada. I am
> passing this query to psql to create a shapefile directly. It works with
> other countries but with Canada it's taking forever to finish (an hour has
> passed and still doesn't output anything, other take only about 3 mins).
> If it would help to know, each country is one record entry in the database
> even if it has several islands/polygons. Is the above query not optimal?

Hi y2kdis,

If you are trying to find out why a query is running slowly, there are
several things that you need to explore to determine exactly where the
problem is. I've deliberately made this quite verbose as it might make a
good candidate for a wiki link we can point people towards:

BEGINNER'S POSTGRESQL/POSTGIS TUNING GUIDE
-------------------------------------

1. Make sure that you have configured your postgresql.conf settings for
shared_buffers, effective_cache_size and possibly random_page_cost. By
default, PostgreSQL will only use 8Mb of RAM out of the box which is nowhere
near enough for larger tables. A general rule of thumb for these values is
to aim for shared_buffers to be ~25% of your total RAM, and
effective_cache_size to be ~75% of your total RAM. For example:

	shared_buffers = (RAM in Mb * 0.25) * 128
	effective_cache_size = (RAM in Mb * 0.75) * 128

If you are using a very fast disk array you may also want to lower
random_page_cost down to 3 or 2.

NB: Don't forget to restart the postmaster using something like
/etc/init.d/postgresql restart or kill -HUP <pid of postmaster> for the
changes to take effect. You can confirm the changes by doing the following
in psql:

	show shared_buffers;
	show effective_cache_size;

and making sure that values match the new ones in the postgresql.conf file.

2. Find out which version of PostgreSQL/PostGIS you are using with SELECT
version() and SELECT postgis_full_version().

3. If you are using PostgreSQL < 7.4, make sure you do a SELECT
update_geometry_stats() and VACUUM ANALYZE to ensure that the planner
statistics are up to date. Try the query again to see if things have
improved.

4. If you are using PostgreSQL >= 8.0, make sure that you do a VACUUM
ANALYZE after loading your data. Try the query again to see if things have
improved.

5. If you are using PostgreSQL >= 8.0, try increasing the statistics on all
the columns involved in the query to 1000 (the maximum). This will cause
PostgreSQL to collect more information about the distribution of the data
within each column, and hence influence its choice of plan.

	ALTER TABLE <table1> ALTER COLUMN <column1> SET STATISTICS 1000;
	ALTER TABLE <table1> ALTER COLUMN <column2> SET STATISTICS 1000;
	ALTER TABLE <table2> ALTER COLUMN <column1> SET STATISTICS 1000;
	...
	...
	VACUUM ANALYZE;

Try the query again to see if things have improved.

6. If your query is still slow, try isolating each subselect and executing
them separately. This may help you narrow down the problem to one particular
part of the query. For example, if you have a query similar to below:

	SELECT a FROM (SELECT b.* FROM b WHERE b && c) t1, (SELECT d.* FROM
d 	WHERE d && e) t2 WHERE t1.id = t2.id

Ideal candidates to try on their own to determine where the bottleneck is
would be:

	SELECT b.* FROM b WHERE b && c
	SELECT d.* FROM d WHERE d && e

7. If you are still experiencing problems, execute your query using EXPLAIN
<query> and, if possible, EXPLAIN ANALYZE <query> and post the complete
output (including the query itself and the schema of the tables involved)
and the PostGIS/PostgreSQL versions to the PostGIS mailing lists.

If your query consists of a number of subqueries as above, please also post
the EXPLAIN/EXPLAIN ANALYZE output for each of the individual subqueries.

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.