[Fwd: RE: [UMN_MAPSERVER-USERS] shp2tile question]

Wed Feb 15 17:32:57 EST 2006

Eric,

  I am doing very similar work, so if you want to contact me off-list, 
please feel free.

Some questions first, since you don't say exactly how you're doing this 
now...

>
> What I am trying to do is this:
> 1. Query "Lots" shapefile for a particular lot number.

I assume this means that you're getting a x/y from the user and are 
doing a query to convert that to a lot number (something like 
'14/28/15/27234/000/0520' or 'U-05-27-17-001-000000-00009.0' or the 
like).  This is probably fastest done using Mapscript against a 
shapefile that has an index built.  However, there are lots of 
approaches, and others here may have better ideas.  If you already have 
the lot number (or some other non-spatial ID), you can skip this.

> 2. Find all the lots that are within a distance of the lot in question.

> 3. Gather attribute info about the near-by lots.

This is where PostGIS shines, and where it should be applied, however 
you don't say exactly what you're looking for.  Are you looking for 
Aggregate info?  Like "Average Price of Homes within 1km"?  Or selective 
info like "Light up all homes over $200K within one mile"?  Both are 
doable, but I'm going to assume the second.  Also, if the viewable area 
is smaller than your buffer, do you want to include those items?

Let's assume that your "Lot number" is unique in the PostGIS table that 
holds your shapes (and their attributes).  That way I can just use 'gid' 
for now and most of the SQL will fall out.
For simplicity, lets say that all of the attributes are in one table.

Here's an example of finding all shapes within 3281 feet (1km) of your 
target id (512).  However, this will run ridiculously slowly regardless 
of your indexing:

select gid, parcel_shape  from hills_parcels where 
distance(parcel_shape, (select parcel_shape from hills_parcels where gid 
= 512)) < 3281;

The reason is that no index are being used, because they can only be 
used by the PostGIS operators such as &&.  So, lets modify this to be a 
bit quicker:

gis=# select gid from hills_parcels where parcel_shape && buffer((select 
centroid(parcel_shape) from hills_parcels where gid = 512), 2*3281) AND 
distance(parcel_shape, (select parcel_shape from hills_parcels where 
gid=512)) < 3281 ;

This draws a 2km circle around your shape and only tests those whose 
bounding boxes intersect that to see if they are close enough to count.  
Your parcel size might make you increase or decrease that parameter, but 
I've found that double your distance is fairly safe.  Some will notice 
that we're performing the same sub-select (or close to it) twice.  
Without scripting or PL/PGSQL, this is difficult to avoid, but not a big 
deal (it's a unique id lookup + a centroid() call).

With this request, I select 758 "close" parcels out of 415,733 total in 
about 4 seconds. That is fine for some uses, but for online work, it 
might not be enough.  For times when I need ONLINE access, one simple 
fix usually gets me there: I take that "Within 1 km of the parcel" 
requirement and change it to "Whithin 1km of the center of the parcel".  
This DRAMATICALLY reduces the times needed:

gis=# select gid from hills_parcels where parcel_shape && buffer((select 
centroid(parcel_shape) from hills_parcels where gid = 512), 2*3280) AND 
distance(parcel_shape, (select *centroid*(parcel_shape) from 
hills_parcels where gid=512)) < 3280;

This selects 737 rows in 186 ms!

Another approach is to go one step farther and simplifiy the 
requirements to "Whose center is within one center 1km of the center of 
the target".  With the proper index, it simplifies dramatically (since 
the bounding box of a point is the point), but doesn't actually speed up 
much):

First, we need a centroid() index for this to work:

gis=# create index hill_centroid_idx on hills_parcels USING GIST 
(centroid(parcel_shape) GIST_GEOMETRY_OPS);
gis=# vacuum analyze hills_parcels;
And for pre-8.x Postgresql:
gis=# SELECT UPDATE_GEOMETRY_STATS();

Second, we MIGHT need a quick way to lookup centroid()s by gid, if your 
shapes are very (very) complex.  Either way, it can't hurt, and can make 
a difference if your lot number isn't your primary key.
gis=#  create index hills_centroid_lookup on hills_parcels (gid, 
centroid(parcel_shape));

Finally, the simple request itself:
gis=# select gid from hills_parcels where centroid(parcel_shape) && 
buffer((select centroid(parcel_shape) from hills_parcels where gid = 
512), 3280);

If you go this far, make sure that you use the centroid() index above or 
this final step will get very slow!!

Anyway, depending on what your actual goals are, I think this should get 
you going in the right direction.  If not, please feel free to contact me.

Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/mapserver-users/attachments/20060215/1385a3c3/attachment.html