[postgis-users] point-in-polygon SQL sentence performance

Mon Jun 30 12:09:14 PDT 2008

Pere,

I'm still not clear why you are creating an individual user_points table for each and deleting it.  I think I understand a little better, but I'm still guessing a bit.

Below is how I would do it without creating a temp table for each user.  By the way ST_Contains is a simpler way of achieving for A && B And Contains(A,B)

INSERT INTO user_points(geom, specie,genus,code, userid, the_geom)
SELECT p.geom, p.specie, p.genus, poly.code, p.userid, p.the_geom
FROM points  p INNER JOIN polygon poly ON ST_Contains(poly.the_geom, p.the_geom)
WHERE p.userid = '$user'  

(You can leave the WHERE out if you plan to do it for all users - it will save you the trouble to just do it in one step.  Although sounds like you maybe creating these temp tables for the logged in user)

You may also want to replace ST_Contains with ST_Intersects (I'm nost sure about performance differences).  although not sure how all these work in long lat since they assume cartesian rather than spherical.  I think it should be fine.

--Next Polygon part - I am assuming genus, species etc are counts (or are those just words)  at each point location and you want for each
user  to count the total of each they captured in each  (you would use SUM not COUNT for this - this is just a guess on my part - they may be actual genus, species etc - in which case just throw out those fields)
--To make this efficient, you will want to put a btree index on user_points.code and polygon.poly_code

INSERT INTO user_polygon (userid, polygon_code,the_geom, numrecords,numgenus, numspecies)
SELECT p.userid, polygon.polygon_code, polygon.the_geom, COUNT(p.code),
SUM(genus), SUM(species)
FROM user_points p
               INNER JOIN polygon ON p.code = polygon.poly_code
GROUP BY p.userid, polygon.polygon_code, polygon.the_geom;

---If you don't really need user_points except for rolling up to user_polygon, then you can forget about all I said above and just do a one statement sql

INSERT INTO user_polygon (userid, polygon_code,the_geom, numrecords,numgenus, numspecies)
SELECT p.userid, polygon.polygon_code, polygon.the_geom, COUNT(p.code),
SUM(genus), SUM(species)
FROM user_points p
               INNER JOIN polygon ON ST_Contains(polygon.the_geom, p.the_geom)
GROUP BY p.userid, polygon.polygon_code, polygon.the_geom;

(normally I would frown on putting the_geom in a group by since it does an extent like grouping, but in this
case I think its fine because your polygon.polygon_code will guarantee uniqueness. So the extra the_geom is a necessary but redundant grouping)

IF you still need to do an update  - some how you took my example and rewoded it so it does something completely different from what I had intended.  It should read like below not what you reiterated (e.g.  standard form is
UPDATE sometable1 
             FROM sometable2 
      WHERE sometable1.somefield1 = sometable2.somefield2  

my sometable2 in this case is a subselect aliased as u

(SELECT
          COUNT(user_point.code) as cnt, user_point.code  
            FROM user_point  GROUP BY
user_point.code) as u  ;

---

  UPDATE user_polygon SET numtax = u.cnt 
FROM  (SELECT
          COUNT(user_point.code) as cnt, user_point.code  
            FROM user_point  GROUP BY
user_point.code) as u  
WHERE u.code = user_polygon.code;

Hope that helps,
Regina

________________________________

From: postgis-users-bounces at postgis.refractions.net on behalf of pere roca
Sent: Mon 6/30/2008 12:40 PM
To: postgis-users at postgis.refractions.net
Subject: RE: [postgis-users] point-in-polygon SQL sentence performance

   dear Regina,

   first, thanks to take some time.
   yes, it's a little complicated to explain; I will try again:
  -My webapp lets user insert point data; all users insert this data in the
same table (point)
  -I have several polygon tables in postGIS (to cross with point data)

I thought about:

Updating an unique,exclusive user_point table with fields: geom, specie,
genus... and code. All fields except the code are extracted from point table
using $user as identifier ($user comes to PHP from user identification). The
code for each point is extracted using CONTAINS and && sentence (see the PHP
script) from polygon table.

A user_polygon table will have the geom, id... and fields like
number_of_records (see SET numtax below), number_genera that require
counting how many records fall inside EACH polygon. This fields are filled
with the COUNT sentences taking into consideration the previous point table
updating.

I've tried the sentence below, following your code:
  UPDATE user_polygon SET numtax = (select u.cnt from (SELECT
COUNT(user_point.code) as cnt, user_point.code  FROM user_point  GROUP BY
user_point.code) as u  WHERE u.code = user_polygon.code);";

The sentence inside (select u.cnt...) goes very very fast, but "loses"  a
lot of time (also takes 1min 20sec) comparing WHERE u.code =
user_polygon.code. I think this is the problem, but there is some solution
to this? isn't the main idea good? everything goes fast except this last
part.

After executing all the SQL sentences, data from  user_point and
user_polygon is deleted (but not the table!).

Some answers:
-"1) Are you using inherited tables and having a different table for each
user?". No inherited tables (was wrong using ONLY parameter). A user = a
point and a polygon table (info deleted each time session expires)
-"How big is your user_polygon table". My testings are based on a 20 Mb
"Toast table size", 544 Kb Table size. It has 1991 records.

  Thanks again

  Pere Roca

Obe, Regina     DND\MIS wrote:
>
> Pere,
>
> Somehow I think your whole code is needlessly complicated, but I still
> don't have a clear idea what you are trying to do and looking at your
> code is actually more confusing than anything.  Maybe try to restate
> exactly what you are trying to do minus the code. 
>
> I'm guessing you are trying to do this (by the way wrapping a (SELECT
> ...) like that is very non-standard)
>
> INSERT INTO user_polygon (userid, polygon_code,the_geom)
> SELECT
> distinct(polygon_code),the_geom FROM polygon where polygon."."the_geom
> &&
> GeometryFromText('".$row['astext']."',4326);
>
>
> 1) Are you using inherited tables and having a different table for each
> user?  I'm assuming that is why you are doing
>
> UPDATE ONLY
>
> rather than UPDATE  (UPDATE ONLY really only makes sense if you are
> using inherited tables and don't want the update to cascade to child
> tables, but then if its inherited - its still silly since you are just
> updating child tables anyway)
>
>
> 2) Where is that $user coming from.  Make sure you know where that $user
> is coming from and you properly sanitize it otherwise its a prime target
> for SQL Injection attack.
>
> 3) How big is your user_polygon table?  Speed would depend on that
>
>  - this particular statement you have is inefficient - because it is
> either doing a correlated subselect for every record or its updating
> every record to the same value. Since you are not prefixing things - its
> hard for me to tell what exactly this is doing.  This I suppose would
> make sense if you are creating a different table for each user
>
> UPDATE ONLY user_polygon SET numreg = (SELECT COUNT (code) FROM
> user_points
> WHERE name = polygon_code);
>
> but its generally more efficient to write a sub select
>
> UPDATE user_polygon SET numreg = u.cnt
> FROM (SELECT COUNT(up.code) as cnt, up.name
>               FROM user_point up
>               GROUP BY up.name) u
> WHERE u.name = user_polygon.polygon_code;
>
>
> 3) I have no idea what the point of the first bounding box insert you
> have is for
> since ST_intersects, ST_Contains already have bounding box checks in
> them.  So I suspect this step you have is is not necessary
>
> For this part
>
> SELECT
> distinct(polygon_code),the_geom FROM polygon where polygon."."the_geom
> &&
> GeometryFromText('".$row['astext']."',4326)
>
>
> 4) Again this statement
> UPDATE ONLY user_polygon SET userid ='$user';
>
> Is updating every record in user_polygon to the current user.  So if you
> are looping thru a user set which I can only guess you are, this
> statement is overwritting the previous updates.
>
> If again you have one table per user, then this is fine - but its
> cheaper to do everything in your insert rather than doing an update
> after the fact.  Updates tend to be slower than inserts.
>
> Hope that helps a bit,
> Regina
>
>
>
>
> 
>
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of pere
> roca
> Sent: Monday, June 30, 2008 4:25 AM
> To: postgis-users at postgis.refractions.net
> Subject: [postgis-users] point-in-polygon SQL sentence performance
>
>
> hi!
>
> I wanna make a point-in-polygon operation but takes looong time (about 4
> minutes!) and the CPU consuming of the server
>  becomes huge when performing this operation for 15000 points /
> polygons:
>
> I need to count how many user points fall in each polygon and perform
> some
> operations based on that.
> We initially have a point_table that stores multi-user point data. We
> extract from there only this-user-data and create new tables:
> user_points
> and user_polygon. To the first one we will insert the code of the
> polygon in
> which it falls; counting how many many times the same polygon_code is
> repeated, we will calculate number-points_in_polygon...
> The important code is commented below (calling SQL from PHP):
>
> //we use the points bbox to narrow the future spatial queries (will only
> select polygons falling inside this bbox; from 1991 initial polygons,
> only
> 85 are from now on used); it is useful when points are close each other
> (if
> there is an outlayer then bbox is not useful...)
>
> $extent="select astext(extent(points_table.the_geom)) from points_table
> where userid='$user'";
> $extent_result=pg_query($extent);
>
> while ($row = pg_fetch_array($extent_result, NULL, PGSQL_ASSOC))
> {
> //creates a NEW polygon table that will store only polygon code and
> polygon
> geometry that fall INSIDE the point's bbox
>    $poligon_sql2="INSERT INTO user_polygon (polygon_code,the_geom)
> (SELECT
> distinct(polygon_code),the_geom FROM polygon where polygon."."the_geom
> &&
> GeometryFromText('".$row['astext']."',4326));
> UPDATE ONLY user_polygon SET userid ='$user';
>
> //we previously created a new user point table, where we insert some
> info.
> The table is created because we are taking data from a multi-user table
> and
> now we want a single user-->a single point table
>
> INSERT INTO user_points (genus, species, name, code, usuario, the_geom)
> (SELECT genus, specie, codigo, oid, userid, the_geom FROM points WHERE
> userid = '$user');
>
> //in this user POINT table we set the code from the polygon's table
> where
> the point falls inside;
> //I expected this one to be the "big" operation , but it's not (takes
> "only"
> 1 min 18 sec)
>
> UPDATE ONLY user_points SET code = (SELECT polygon_code FROM
> user_polygon
> WHERE user_polygon.the_geom && user_points.the_geom AND CONTAINS
> (user_polygon.the_geom,user_points.the_geom));
>
> //updating user polygon table, counting how many times do we have a
> polygon
> code in our POINT table --> number of points in each polygon
> // this updating operation takes 1 min 20 seconds!
>
> UPDATE ONLY user_polygon SET numreg = (SELECT COUNT (code) FROM
> user_points
> WHERE name = polygon_code);
>
> // this updating operation also takes about 1 min 20 seconds!
>
> UPDATE ONLY user_polygon SET numtax = (SELECT COUNT (genus) FROM (SELECT
> DISTINCT (genus) FROM user_points WHERE code = polygon_code) AS foo);
>
>    All important data is indexed; Any ideas to get better performance?
>
>    thanks!
> 
>    Pere Roca
>    EDIT project (http://edit.csic.es/edit_geo/prototype/edit.html)
>
>
> --
> View this message in context:
> http://www.nabble.com/point-in-polygon-SQL-sentence-performance-tp181908
> 54p18190854.html
> Sent from the PostGIS - User mailing list archive at Nabble.com.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>

--
View this message in context: http://www.nabble.com/point-in-polygon-SQL-sentence-performance-tp18190854p18199519.html
Sent from the PostGIS - User mailing list archive at Nabble.com.

_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20080630/e4b54ada/attachment.html>