[postgis-devel] Postgis "Split" with tolerance

Mon Dec 2 02:50:01 PST 2013

Hey Paul ,
thanks for the insight !

I've thought about this, and this is true that this may be a parameter.

We have a given number of digits (15 to 17), and some are spent on
coordinates (I'd say 10 at most), so some remain to do precise computation.
The problem is that the error we can allow (the epsilon) should depend on
this. For example, if I use 10 digits, only 5 remains, and the error should
be 5 digits.
But if use 5 digits for coordinates, I could be much more precise and allow
10 digits to error.
Maybe the (advanced) user could change this.

Here is what I propose, and what I'll enforce for my work (the analogy is
just for example):
any point/line/poly should be precise up to 10 digits (2 points could be
separated by 1000 km + 1 mm ).
Any function working on geom should give good answer on 10 digits geom.

More than this, geometry are equal except during computing (so we can use
full digits during computing ).

Remaining digits should be used to epsilon and like so as to give best
possible answers.

Please find the required test :
the sql querry should return (true,true)
1.) the first true means that on 100 000 random line and random point on
line, all line have been split in 2.
2.) the second true means that for this random lines, none was split by a
point farther than epsilon from line.
All testing are done on point/line having at most 10 digits coordinates

Please note that for 1.) I didn't check than splitting occured at the right
place, because I'm not testing split computing but chen it computes.

This is base code where you can chose number of digits in coordinates,
epsilon, number of tests to have
The random is seeded so test can be reproduced.
This could be reused to test more thoroughly other functions, and could be
improved (lines are always segments, orientation not random).

I also have a function which for a given line A and point P, gives the
closest point to P on A with a given number of digits. A bit like a
ST_SNapToGrid, but for point on line (in concept).

Cheers,
Rémi-C

2013/11/30 Sandro Santilli <strk at keybit.net>

> On Fri, Nov 29, 2013 at 09:26:07AM -0800, Paul Ramsey wrote:
> > Just to be a loose cannon in general: we have hard-wired tolerances
> > all over the code base, just grep for EPSILON and see what pops out.
> > We've hardly been ascetics about abusing tolerances where it suits our
> > purposes. The proposal for a "split that 'works'" fits in with our
> > general historical approach to tolerances that make things "work".
>
> Yeah, and I'll confess: I've injected some of that myself too,
> recently. See r10973 ...
>
> What can I say Remi, give me a testcase and let's add to the mud :)
>
> --strk(I wanna be Anarchy);
>
> >
> > P.
> >
> > On Fri, Nov 29, 2013 at 5:18 AM, Sandro Santilli <strk at keybit.net>
> wrote:
> > > On Fri, Nov 29, 2013 at 10:48:54AM +0100, Rémi Cura wrote:
> > >> ST_Split(line,point) is currently broken.
> > >> This is quite a neutral statement, because given a random point on a
> random
> > >> line, split won't do anything (p_correct_answer<1/10k )!
> > >>
> > >> The only work around I can think about are
> > >> _ snap the line to the point : it implies moving the line, which is
> not an
> > >> option if you do it several times with different points, and is more
> > >> abstractly a very bad idea
> > >
> > > This is what you'd be doing anyway by implementing a tolerance.
> > > That is, the line would be moved anyway.
> > >
> > >> _ change the point to a very small line in the perpendicular
> direction of
> > >> the line we want to split, then use split(line,line). Again not a
> very good
> > >> option because you would possibiliy split the line in more than one
> point
> > >> (and anyway split(line,line) has also precision issues).
> > >
> > > Still this is a legit options.
> > >
> > >> I understand PostGIS is a big project and we have to be conservative,
> > >> that is "we don't break a fine function with changes bringing unknown
> > >> consequences for unproven benefits",
> > >> but this function is broken, and we could change it to a function
> working
> > >> most of the time   !
> > >
> > > Yes, we could change it by accepting a tolerance value.
> > > What's wrong with that ?
> > >
> > >> *Pros of this change :*
> > >>
> > >> _a line will be split by a point correctly (p_error < 10^6)
> > >> _no heavy changes in code or methods
> > >>
> > >> *Cons of this change :*
> > >>
> > >> _the line would be split by any points close enough
> (lengthofline/10^12).
> > >> (that is all points less than 10 micro meters away from a line 1000
> km long
> > >> !).
> > >
> > > PostGIS doesn't deal with unit, so you don't know if those are
> > > micro meters or micro peta meters.
> > > You might be splitting a milli meter line by a point...
> > >
> > > All I'm saying is I'd like precision/tolerance to be _EXPLICIT_.
> > > Particularely important when you're about to run a _set_ of operations
> > > and you want them to be _consistent_ as per precision.
> > >
> > >> *I would also like adding ST_Split(geom,geom,tolerance).*
> > >> In fact it could be an easy wrapper around ST_Split(geom,geom) if this
> > >> function was working. (something like: if the point is DWithin, split
> the
> > >> closest part of the line)
> > >
> > > Please go ahead. Such a function would be accepted.
> > >
> > > NOTE: TopoGeo_addPoint does take a tolerane, and does use ST_Split
> after
> > >       snapping the line to a DWithin point to implement it.
> > >       Haing it available natively in C would speed that up.
> > >
> > > --strk;
> > >
> > >  ()  ASCII ribbon campaign        - against html e-mail
> > >  /\  http://www.asciiribbon.org   - against proprietary attachments
> > > _______________________________________________
> > > postgis-devel mailing list
> > > postgis-devel at lists.osgeo.org
> > > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
> --
>
>  ()  ASCII ribbon campaign        - against html e-mail
>  /\  http://www.asciiribbon.org   - against proprietary attachments
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20131202/2f5090e5/attachment.html>
-------------- next part --------------
	---------------------------------------------
	--Copyright Remi-C Thales IGN 01/12/2013
	--
	--
	--this script test the proposed ST_split(line,point) modification (introducing some epsilon).
	--
	--------------------------------------------
	-----Script abstract
	----------------------------
	--	
	--	_Test if a line is always split by a point interpolated along the line
	--	_Test if a line is not split by a point too far (10^-10 x length of the line) of the line
	--
	--
	------dependencies---------
	--
	--	
	------------------------------------------

			--test of a line being well split by point on line, and not being split by points too far from line
			--	recommendend parameter : 
			--	pnum = 100 000 (number of random line/point generated)
			--	prec = 10 (max number of digits in coordinates of line/point)
			--should return true & true
			--warning : please execut in same transaction the "setseed" so that the random is reproducible

				SELECT setseed(0.192169354297221); --settign seed so every execution gives same result
				--DROP TABLE  IF EXISTS  temp_test_precision_geos_serie_max;
				--CREATE TABLE temp_test_precision_geos_serie_max AS
				WITH point_number AS (
					SELECT power(10,prec.p-1)::numeric As pprec, 100000 AS pnum, trunc(random() ::numeric,prec.p) AS rand, prec.p AS prec , power(10,-10) AS allowed_distance
					FROM (SELECT 10 aS p) AS prec
				),
				 serie AS (
						SELECT s1 AS s
						FROM point_number,generate_series(1,pnum) AS s1
				) ,
				 points AS(
					SELECT ST_MakePoint(
							trunc((s*random()/pnum )::numeric,prec)*pnum
							,trunc((s*random()/pnum )::numeric,prec)*pnum )AS p1
						,ST_MakePoint(
							trunc((s*random()/pnum )::numeric,prec)*pnum
							,trunc((s*random()/pnum )::numeric,prec)*pnum ) AS p2
					FROM point_number,serie
				),
				line AS (
					SELECT *, St_MakeLine(p1,p2) AS line
					FROM points
				),
				ipoint AS (
					SELECT row_number() over() AS uid
						,* 
						, ST_LineInterpolatePoint(
							line,  rand) AS ipoint 
						,trunc(random()::numeric,prec) AS random_trans
					FROM point_number,line
					ORDER BY p1 ASC , p2 ASC
				) ,
							verif_well_splitten AS (
								SELECT 
									ST_NumGeometries(ST_Split(line,ipoint)) AS num_geom
									, line, ipoint  
								FROM ipoint
							) ,
							result_well_splitten AS (
							SELECT row_number() OVEr () AS id_qgis,*, ST_AsText(ipoint) as t_cpoint, ST_AsText(line) AS t_line
							FROM verif_well_splitten
							WHERE num_geom  =1 
							)
							,number_well_splitten_false  AS (
									SELECT 
										CASE WHEN  count(*) =0 THEN 
											TRUE 
										ELSE 
											FALSE 
										END AS result
									FROM result_well_splitten)
				, ipoint_translated AS ( 
					--we create point like ipoint, but translated along the normal of seg (random orientation) of a random distance between allowed_distance and 2*allowed_distance
					SELECT
					*
					,ST_MakePoint(
					ST_X(ipoint) +sign(1-random_trans)*  allowed_distance * (1+random_trans) * (ST_Y(p1)-ST_Y(p2))
					,ST_Y(ipoint) + sign(1-random_trans) * allowed_distance * (1+random_trans) * (ST_X(p2)-ST_X(p1))
					) 	AS ipoint_t
					FROM ipoint
				)
				,split_by_ipoint_t AS (
					SELECT 
						-- ST_NumGeometries(ST_Split(line,ipoint_t)) AS num_geom
						ST_CollectionExtract(ST_Split(line,ipoint_t),2) AS split_ipoint_t
						, line, ipoint, ipoint_t
					FROM ipoint_translated
					)

				,result_split_by_ipoint_t AS (
					SELECT row_number() OVEr () AS id_qgis,*, ST_AsText(ipoint_t) as t_ipoint_t, ST_AsText(line) AS t_line 
					FROM split_by_ipoint_t
					WHERE ST_NumGeometries(split_ipoint_t)!=1 OR ( ST_Equals( split_ipoint_t ,line)=false )
					) 
				,number_split_by_ipoint_t AS (
									SELECT 
										CASE WHEN  count(*) =0 THEN 
											TRUE 
										ELSE 
											FALSE 
										END AS result
									FROM result_split_by_ipoint_t)
				SELECT r1.result AS are_lines_well_splitten_by_on_line_points, r2.result AS are_lines_not_splitten_by_far_points
				FROM number_well_splitten_false AS r1, number_split_by_ipoint_t as r2