[postgis-devel] Distance2D Review

Nicklas Avén nicklas.aven at jordogskog.no
Wed Nov 11 16:27:56 PST 2009


Hallo I'm struggling a little with this distance thing and hope for some help. 1) I have been looking a little on deserializing early and working with LWGEOM as you suggested Paul. It would be wonderful, and it would probably be much cleaner code and easier to work with the logic. But I have two questions about it, is there a function for creating bbox after deserializing or what is the best way of doing that? next question is about the number of subgeometries. As it works now the number om subgeometries is presented in lwgeom_inspected, but I see no function inspecting a LWGEOM or something corresponding.  I just committed a change from void functions to integer as Paul suggested. I think a now have it "carrying" any false-value all the way up and out. My question is what to to do with the double-functions that are supposed to return the distance. If they got a false answer from the underlying functions I now just throw an error that something is wrong. Then the compiler va!
 rned that I didn't return values in all situations from a function that should return values so I put a "return 0.0;" after the error. I guess it doesn't matter since the error stops the process, but it cannot be the right way to do it. It looks dangorous that 0.0 is ready to take of as an answer at onceafter the error-message. If this gets more and more confusing it's because I should have gone sleeping some hours ago. I will continue looking at all this but I might not get time before next week. I also did some small tests to see how it competes to the old calculation on very small geometries.My test was 1 dataset of 300 000 two-vertex lines spread all over Norway. Then i ran a distance-calculation between all those lines and first another tvo-vertexline and then a three-vertexline and then a four and then ten vertexes. What I found was that the old brutal force was about 20% faster when measuring between 2-vertex lineswhen measuring between 2-vertex lines and a 3 vertex !
 line they were about the same speed. Then the new calculation was faster and 2 vertex-line to 10 vertexline was about 3 times faster with the new algoritm. The question is:Is the usage of 2-vertex lines aginst 2 vertex lines so common so we should try take those situations the old way? This have to be investigated better too I think.  Now it's time for sleep. /Nicklas    
2009-11-10 nicklas.aven at jordogskog.no wrote:

>
Great Paul>
thanks>
 >
>From my level of experience I think I could have expected worse :-)>
 >
I hopefylly get some time this evening to give it a try.>
 >
About the testing-ability:>
What I have done to test it aginst large datasets is to do something like:>

> select a.gid, b.gid from table1 a, table2 b
>  where abs(st_distance(a.the_geom, b.the_geom)-new_distance(a.the_geom, b.the_geom))>0.0000001;
>>
with st_distance defined pointing to the 1.4 released library and new_distance pointing to "my" library.>
 >
The pro with this is that we then test even my brute force part aginst 1.4 since that is also a little rewritten.>
 >
or should I try that binding?>
 >
If the above is returning any rows, st_shortestline is very convenient to see what the new function is doing wrong. >
 >
 >
/Nicklas>
 >
 
>
> 2009-11-10 Paul Ramsey wrote:
>
> Hi Niklas, I'm just reading through the code, and I'm going to note
> >things into this e-mail as I go along:
> >
> >- Try and keep private structures out of the liblwgeom.h file. So far
> >I see BBOXDIST, W_BBOX, LISTSTRUCT which are entirely confined to
> >measures.c. Put them into ./liblwgeom/measures.h for now (we'll have a
> >go at rationalizing public and private prototypes more attractively in
> >the PostGIS 2.0 round).
> >- Follow the code formatting guidelines in ./STYLE please (you can
> >retroactively apply them with astyle though)
> >- I see no test cases in cu_measures.c for your functions, we're
> >trying to get as much testing as possible in the library level unit
> >tests as we go along, so for new functions in particular we should aim
> >to get coverage
> >- This construction is a little odd:
> >
> > DISTPTS thedl;
> > DISTPTS *dl = &thedl;
> > lw_dist2d_comp( lw1,lw2,dl);
> >
> > Generally people just write it this way, without the extra variable,
> >it has the same effect:
> >
> > lw_dist2d_comp( lw1,lw2, &thedl);
> >
> > - You've declared move of your working functions with a void return
> >(since the return values are actually in the *DISTPTS). I'm not sure
> >if you have failure cases checked deeper in your functions yet, but
> >one way to propagate up error cases without things like null-pointer
> >checks is to return LW_TRUE or LW_FALSE from the functions so that
> >higher level error handling code can do the "right thing" with them.
> >- Unfortunately, you've followed the coding style of the original
> >functions, which is working more-or-less directly with serialized
> >*uchar. So, for convenience, you've defined things like W_BBOX to have
> >easier access to the bounds without having to re-pull them from the
> >structure each time. If you de-serilized to LWGEOM early on and worked
> >with LWGEOM, things might be less ugly.
> >- *Also*, in PostGIS 2.0 I am going to be getting rid of all these
> >*uchar valued calculation functions and replacing them with code that
> >works on the LWGEOM structures. So you can future-proof your code by
> >setting it up that way from the start...
> >- For your domain-specific structs, prefixing them so they are all
> >commonly named would be nice: DIST2D_RESULTS, DIST2D_LIST, etc.
> >- In DISTPTS, you have a member, 'thedir' which should really be named
> >'mode' and valued with a macro or enumeration (DIST2D_MAX, DIST2D_MIN)
> >rather than magic numbers.
> >- In DISTPTS, you have a member 'd' which should be 'dist' or 'distance'.
> >- I'm not enamoured of the down-the-rabbit-hole style in which
> >functions seem to delegate to function2 variants here and there for no
> >clear reason. Document what happens in each section of your function,
> >great new function points for code that is called repeatedly or in
> >different contexts.
> >- Mind you, the existing style in measures.c is really ugly to start
> >with, so c'est la vie.
> >* - I would really like to see two SQL bindings, an ST_Distance() that
> >calls through your proposed stack and an ST_Distance_Old() that calls
> >directly into the brute force functions. That way during the testing
> >phase we can compare the methods for accuracy on *lots* of different
> >data easily. We can remove those SQL bindings before release time.
> >- Things might fall apart if you are fed EMPTY geometries. I see some
> >places where npoints is not checked for > 0 on point arrays (or, that
> >pointarrays aren't checked for non-null on deserialization).
> >
> >The most important thing to me is (*) above: enabling the ability to
> >test correctness on a large range of data during the 1.5 pre-release
> >period.
> >
> >Hopefully Mark can take a quick look at the code this week, but since
> >it's going to be re-written in 2.0 to some extent anyways, the
> >stylistic stuff is less important to me than it might otherwise be.
> >
> >Paul
> >_______________________________________________
> >postgis-devel mailing list
> >postgis-devel at postgis.refractions.net
> >postgis.refractions.net/mailman/listinfo/postgis-devel
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20091112/0d32579c/attachment.html>


More information about the postgis-devel mailing list