[postgis-users] isvalid(the_geom)

Gregory S. Williamson gsw at globexplorer.com
Thu May 4 06:57:34 PDT 2006

Christo --

An index on a boolean column is not likely to be very useful to the planner except in trivial cases, I think ... if you have 50000 entries and 49900 are valid a conditional index on only the invalid ones might be useful, but an index with such low selectivity (i.e. returns a large percentage of possible rows) usually results in the scanner using a sequential scan, which is going to be slow.

I think (depending on what you are doing) that adding the column to indicate the validity might be useful, but I'd say *don't* index it; make the other selection criteria highly selective and then use the validity check as a final filter when needed. If less than 1% (a guess) of the data is valid then a conditional index on those rows with valid geometries would probably show a significant improvement. The trick is to try to have indexes be highly selective -- boolean values tend not to be.

In earlier postGIS (< 1.0) the isValid function was not very useful; in the newer versions it does work more robustly, but we don't use it for checking geometries generated on the fly as the result of queries -- we apply it as a final check and don't let data into runtime that is not valid (but we have more control over the data sets than it sounds like you do).

A question to ask might be: "What use is/are data with invalid geometries ?" If there are columns that are useful then including broken geometries might make sense if you can use the data without the geometry.

If your access is by spatial though, having invalid geometries is a non-starter, since many operations on that geometry could yield failures of various degrees.


Greg Williamson
GlobeXplorer LLC

-----Original Message-----
From:	postgis-users-bounces at postgis.refractions.net on behalf of Christo Du Preez
Sent:	Thu 5/4/2006 5:15 AM
To:	PostGIS Users Discussion
Subject:	[postgis-users] isvalid(the_geom)

Hi All,

I have used the isvalid(the_geom) in my queries but this makes a is very 
slow.  eg.

SELECT assvg(simplify(intersection(the_geom, 
geomfromtext('POLYGON((15.900000000000002 -35.0, 15.900000000000002 
-22.0, 33.3 -22.0, 33.3 -35.0, 15.900000000000002 -35.0))', -1)), 
0.025), 0, 6) AS data
FROM table
WHERE the_geom && geomfromtext('POLYGON((15.900000000000002 -35.0, 
15.900000000000002 -22.0, 33.3 -22.0, 33.3 -35.0, 15.900000000000002 
AND distance(geomfromtext('POINT(24.6 -28.5)',-1),the_geom) < 
AND isvalid(the_geom) = true

takes 4300ms

leaving out the 'AND isvalid(the_geom) = true' part

takes 2200ms

2200ms is still too slow but it's a big improvement already.

I thought of doing this:

ALTER TABLE table ADD COLUMN isvalid bool;

update table
set isvalid = isvalid(the_geom);

-- Index: table_btree_isvalid

-- DROP INDEX table_btree_isvalid;

CREATE INDEX table_btree_isvalid
  ON table
  USING btree

vacuum verbose analyze table

on all my tables and will then change my queries too rather include 'WHERE isvalid = true'

Is this advisable???

Any thoughts on this?

It is s bit of a worry just leaving out rows that is not valid, I suppose the issue should be resolved right from the start, to rather fix the data in the first place.

The problem is that I have no idea how to, this is public domain data supplied by the government.

postgis-users mailing list
postgis-users at postgis.refractions.net


More information about the postgis-users mailing list