Proposal: st_remove_irrelevant_points_for_view() and st_remove_small_geometries()

Wed Feb 28 01:35:08 PST 2024

Dear Postgis community,

I’d like to contribute two new simple and useful functions to postgis, offering functionality that we use for more than 10 years now as crucial part of an network-based GIS application:

1. st_remove_irrelevant_points_for_view (the_geom, bbox)
2. st_remove_small_geometries (the_geom, dx, dy)  and  st_remove_small_geometries (the_geom, area)

Key point for this GIS application is a fast, view-based rendering of relevant data objects (point, multilinestring, multipolyon) being stored in a postgis database. Constraints are:

- IO bandwidth between client, webserver and database is crucial. Geometry objects should be as small as possible in size when leaving the database and should contain only coordinates being relevant to the end user.
- Geometries must be kept as objects to allow object-based user interactions and object-based caching. Pre-rendering as images in database or webserver is out of the question.
- Viewport-based clipping of geometry objects is done by the client as part of the rendering process. The database has no need to do that, e. g. by using st_intersection().

With this in mind, it turns out that the following three preprocessing steps are essential:

1. Remove irrelevant points: This is what the first proposed new function, st_remove_irrelevant_points_for_view(), does: It removes all coordinates being irrelevant for rendering data within a given view. In contrast to st_intersection(), no new coordinates will be computed. Using a test dataset containing the state borders in Europe, preprocessing with st_remove_irrelevant_points_for_view() leads to the same rendering result as with st_intersection() but is up to 20 times faster which is advantageous for real-time applications.

2. Remove small geometries: Often, small geometries (like the thousands of islands along the Norway coastline) make a map cluttered and harder to read at small scales. Also the rendering process will slow down noticeably. Removing those small geometries (that is, small exterior or interior rings of polygons, or small lines) is actually a quite simple operation, but as far as I can see, there is no simple postgis function to achive this goal, just more or less complex approaches using st_dump()/st_collect(), e. g. described in https://gis.stackexchange.com/questions/198987/how-can-i-remove-only-small-inner-rings-in-postgis. Because of this I want to propose a second function, st_remove_small_geometries(). According to my measures this approach is up to 50-100 times faster than the tested complex one and thus advantageous for real-time applications.

3. Reduce spatial resolution: This can already be effectively done with st_snaptogrid().

I think that integrating the two proposed functions can help other people to significantly improve performance and simplify spatial queries. Besides, it would us to simplify the set-up for customers since we don't have to create and deliver a non-standard postgis build in future.

You can find my code here (two .c files, linked in Makefile.in and postgis.sql.in):

https://github.com/gluser1357/postgis-fork/tree/remove-small-and-irrelevant-coords

Comments, suggestions, corrections and reviews are very welcome :-)

Of course, I'd also be happy to try to support with further explainations, tests and doc pages.

Sam

---

Some usage examples (requires a table containing colums id, name and the_geom):

Example:
SELECT id,name,st_npoints(st_remove_irrelevant_points_for_view(the_geom, st_makeenvelope(5,30,20,60,4326))) FROM mytable
WHERE the_geom && st_makeenvelope(5,30,20,60,4326)
ORDER BY id,name

SELECT id,name,st_nrings(st_remove_small_geometries(the_geom,0.01)) FROM mytable
WHERE the_geom && st_makeenvelope(5,30,20,60,4326)
ORDER BY id,name