[postgis-users] Spatially ordering data

Jonathan Moules jonathan-lists at lightpear.com
Mon Mar 6 06:33:04 PST 2017

Hi List,
    Thanks for all the suggestions. I think I've solved this using a combination of methods from the responses and wanted to share for anyone in the future.
Things I tried that didn't work for me (beyond what was in the previous post):
* Convert to a point, then extract the Lat/Lon and integer ORDERing by that (as per Robert's suggestion) - this didn't work well because it tends to produce long strips which isn't suitable for my purposes.

* Mark's linked Quadtree generation SQL. But I couldn't get it to work.

What did work:
I took the Content Based Grid from previously (which seems to basically be a form of Quadtree anyway) that I had created with Lar's tool, and then grabbed the bounds of it.

For each grid, I could then run this SQL:

from (
        floor(st_x(p_geom)) as x_index, 
        floor(st_y(p_geom)) as y_index 
    from (
            ST_PointN(ST_Exteriorring(geom), 1) as p_geom 
    ) a
) b
    x_index >= @Value(_xmin) and x_index < @Value(_xmax)
    Y_index >= @Value(_ymin) and y_index < @Value(_ymax)

This was run from FME, hence the "@Value()" bits in the WHERE clause, which substitute the Grid bounds values when sent to SQL. I'm sure the entire thing could be a single SQL query, but that I leave as an exercise for the reader.

For those who, like me, don't have SQL as their first or second language:
1) The innermost select converts the polygons to points (i.e. "ST_PointN(ST_Exteriorring(geom), 1)"). I'm using this method because it's considerably faster than ST_Centroid, and I don't care where the point is in relation to the geometry.

2) The mid-select get's the point coordinates (i.e. "floor(st_x(p_geom)) as x_index")

3) The outer clause uses the WHERE query to limit the points to only include ones that are greater than or EQUAL to the min bounds and less than the max bounds. This solves the problem of what happens if a point is on a shared boundary - it will be assigned to the grid whose minimum it sits on.

This processes reads and assigns a sample set of 1.3 million points in about 4 minutes.

The result of this is lots of polygons that are all spatially clustered together, in groups of no more than 500,000 polygons (that's the value I used to create the Grid).

The only problem I'm aware of is the fact that any points that sit on the very maximum boundary of the entire grid may not be read as they will be equal to the maximum. I haven't come up with a good solution to that one other than to re-read it all.

Hope that helps someone. I think spatial sorting would be a great feature for PostGIS to incorporate by way of a formal function.


---- On Thu, 02 Mar 2017 18:38:28 +0000  Burgholzer<Robert.Burgholzer at deq.virginia.gov> wrote ---- 

How about: 
1. Take a centroid of each shape, 
2. Rounding it’s x and y coordinates to whole lat/lon values 
3. Ordering by lat_round, lon_round and these values would proceed in a gridded fashion. 
Ex: select oid, floor(st_x(the_geom)) as x_index, floor(st_y(the_geom)) as y_index from precip_noaa_daily_grid order by x_index, y_index; 
 oid | x_index | y_index 
 2143 | -84 | 36 
 2142 | -84 | 36 
 2141 | -84 | 36 
 2620 | -84 | 36 
 2619 | -84 | 36 
 2618 | -84 | 36 
The only drawback is that if there were data that were added during processing, they may be skipped by your processor, however, if you were to set a flag to indicate that something was processed, you’d be covered. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20170306/d2f9244b/attachment.html>

More information about the postgis-users mailing list