[postgis-users] Center of Points Collection

pcreso at pcreso.com pcreso at pcreso.com
Wed Sep 16 19:12:39 PDT 2009


Hi guys,

One concern I have is with the sample sizes being suggested here as "proof/disproof of concept".

Pretty much any statistical sampling approach becomes more robust with increased sample size, so many cases with only a few points are likely to be anomalous.

The other aspect that is not being addressed is the question of testing for significance.

A mean (avg) of 10 with a SD of 10 suggests the mean is not that useful as a descriptive/representative statistic, a mean of 10 with an SD of 1 is probably reasonable. The avg() function in an RDBMS returns no measure of spread/fit, so is pretty easily misused on datasets where an avg() is not an appropriate statistic.

Any statistical value representing a dataset without some measure of variance or fit is quite probably misleading, which is why RDBMS's are not the best tool for such measures, despite their convenience.

For example, a polygon is defined as "the set of points contained by ..." but the centroid representing the polygon is often defined by the points delineating the perimeter, which is actually a linestring, not the polygon it contains. There are going to be many cases where this causes problems.

One, relatively simple approach (as these things go :-) & somewhat more robust method for calculating a centroid is to generate a grid of points within the bbox of the polygon, toss out those not within the polygon itself, & take the avg(X) & avg(Y) of the remaining points. At least this provides a centroid based on the polygon, rather than the perimeter, but can also generate a centroid outside the polygon for complex or concave shapes. 

If we really want a statistically robust solution, a database (spatially enabled or not) is generally unlikely to be the ideal tool.

Cheers,

Brent Wood


--- On Thu, 9/17/09, Simon Greener <simon at spatialdbadvisor.com> wrote:

> From: Simon Greener <simon at spatialdbadvisor.com>
> Subject: Re: [postgis-users] Center of Points Collection
> To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
> Date: Thursday, September 17, 2009, 11:24 AM
> Chris,
> 
> Never thought my method was theoretically correct: just had
> a bit of fun working out how one would do it given
> no database (except Oracle) has a MEDIAN aggregate cf AVG.
> 
> I looked at your URL and decided I didn't have the time to
> look at implementing it.
> 
> regards
> SImon
> On Thu, 17 Sep 2009 02:08:47 +1000, Chris Hermansen <chris.hermansen at timberline.ca>
> wrote:
> 
> > A median in X will either yield a point in the set or
> the average of the
> > two points bracketing the (theoretical) median. 
> Same in Y.  In 2D the
> > medians of both will often produce a point that is not
> in either set,
> > even in cases where there is not an interpolation
> going on.  Also what
> > does one do about duplicate coordinates?
> >
> > In the median there is clearly the concept of "half of
> the points being
> > on one side of the median, half on the other
> side".  It's this
> > "sidedness" in 2D that makes me fret.
> >
> > Here is an example for which I can't conceive of a
> great solution:
> >
> > POINT(1 6),
> > POINT(0 5),
> > POINT(0 4),
> > POINT(2 3),
> > POINT(2 2),
> > POINT(2 1),
> > POINT(1 0)
> >
> > What is the median of that?  Simon, what your
> propose (medians of x and
> > y) should I think give POINT(1.5 3) if duplicates are
> only counted once
> > and POINT(1 3) if not.
> >
> > Simon Greener wrote:
> >> What would happen if you calculated the median of
> the x and y
> >> ordinates instead of the average?
> >>
> >> WITH points As
> >> (
> >> SELECT st_collect(column1) as the_geom
> >>   FROM (values ('POINT(0 0)'),
> >>             
>   ('POINT(0 1)'),
> >>             
>   ('POINT(0 2)'),
> >>             
>   ('POINT(1 0)')) as foo
> >> )
> >> SELECT astext(st_makepoint(avg(x),avg(y)))
> >>   FROM ( SELECT st_x(geom) as x,
> st_y(geom) as y
> >>            FROM
> (select (ST_Dump(the_geom)).geom as geom from points
> >> as foo) as p
> >>           ORDER
> by st_x(geom) ASC, st_y(geom) ASC
> >>           LIMIT
> (select case when mod(ST_NumGeometries(the_geom),2) =
> >> 0 then 1 else 2 end from points)
> >>       
>    OFFSET (select case when
> mod(ST_NumGeometries(the_geom),2) =
> >> 0 then ST_NumGeometries(the_geom)/2-1 else
> >> (ST_NumGeometries(the_geom)-1)/2 end from points)
> >>         ) T;
> >> -- where p=1 and n=N/2-1 when the number of non
> null rows N is even,
> >> or p=2 and n=(N-1)/2 when N is odd.
> >>
> >> "POINT(0 1)"
> >>
> >> See
> >> http://scorreiait.wordpress.com/2008/10/28/how-to-compute-a-median-in-sql
> >>
> >> S
> >> On Wed, 16 Sep 2009 05:25:29 +1000, Chris
> Hermansen
> >> <chris.hermansen at timberline.ca>
> wrote:
> >>
> >>> Hmm on further thought I'm not completely
> clear on the definition of
> >>> "median" in two dimensions, either.  I
> found this interesting article on
> >>> Google, including a way cool Java app for
> calculating and graphing:
> >>>
> >>> http://www.tiac.net/~sw/2007/11/2D_median/index.html
> >>>
> >>> pcreso at pcreso.com
> wrote:
> >>>> Hi guys,
> >>>>
> >>>> A bit more difficult, & way out in
> left field, but if you use PL/R
> >>>> to create a median function for Postgres,
> you could build your point
> >>>> from the median(X) & (Y) values
> instead of the average.
> >>>>
> >>>> Where this would actually lie obviously
> depends on the distribution
> >>>> of the points. The centroid is most
> affected by (actually defined
> >>>> by) outlying points, the avg somewhat less
> & the median less still.
> >>>>
> >>>> Of course once you have PL/R to play with,
> you have much more
> >>>> flexibility to look at returning
> statistics from datasets than just
> >>>> the median.
> >>>>
> >>>> Cheers,
> >>>>
> >>>>   Brent Wood
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --- On Tue, 9/15/09, Kevin Neufeld <kneufeld at refractions.net>
> wrote:
> >>>>
> >>>>
> >>>>> From: Kevin Neufeld <kneufeld at refractions.net>
> >>>>> Subject: Re: [postgis-users] Center of
> Points Collection
> >>>>> To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
> >>>>> Date: Tuesday, September 15, 2009,
> 8:51 AM
> >>>>> Paul Ramsey wrote:
> >>>>>
> >>>>>> Faster than creating a multipoint
> is to recognize that
> >>>>>>
> >>>>> ST_Centroid()
> >>>>>
> >>>>>> is just going to return the center
> of the bbox of the
> >>>>>>
> >>>>> collection
> >>>>>
> >>>>>> anyways...
> >>>>>>
> >>>>> Unfortunately, Paul, ST_Centroid
> returns the center of
> >>>>> mass, not the center of the bbox.
> >>>>>
> >>>>> SELECT
> astext(st_centroid(st_collect(column1))),
> >>>>> FROM (values ('POINT(0 0)'),
> >>>>>
> >>>>>    ('POINT(0 1)'),
> >>>>>
> >>>>>    ('POINT(0 2)'),
> >>>>>
> >>>>>    ('POINT(1 0)')) as foo;
> >>>>>       astext
> >>>>> ------------------
> >>>>>  POINT(0.25 0.75)
> >>>>> (1 row)
> >>>>>
> >>>>> Your second post, taking the avg of
> the x,y does seem to be
> >>>>> the nice approach, and produces the
> same results as
> >>>>> ST_Centroid - the center of mass.
> >>>>>
> >>>>> SELECT
> astext(st_makepoint(avg(st_x(column1)),
> >>>>> avg(st_y(column1))))
> >>>>> FROM (values ('POINT(0 0)'),
> >>>>>
> >>>>>    ('POINT(0 1)'),
> >>>>>
> >>>>>    ('POINT(0 2)'),
> >>>>>
> >>>>>    ('POINT(1 0)')) as foo;
> >>>>>       astext
> >>>>> ------------------
> >>>>>  POINT(0.25 0.75)
> >>>>> (1 row)
> >>>>>
> >>>>> If Dustin is after the center of the
> collection, then
> >>>>> something along your first suggestion
> might be more
> >>>>> appropriate.
> >>>>> (taking the center of the extents)
> >>>>>
> >>>>> Cheers,
> >>>>> Kevin
> >>>>>
> _______________________________________________
> >>>>> postgis-users mailing list
> >>>>> postgis-users at postgis.refractions.net
> >>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >>>>>
> >>>>>
> >>>>
> _______________________________________________
> >>>> postgis-users mailing list
> >>>> postgis-users at postgis.refractions.net
> >>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
> 
> 
> -- 
> SpatialDB Advice and Design, Solutions Architecture and
> Programming,
> Oracle Database 10g Administrator Certified Associate;
> Oracle Database 10g SQL Certified Professional
> Oracle Spatial, SQL Server, PostGIS, MySQL, ArcSDE,
> Manifold GIS, FME, Radius Topology and Studio Specialist.
> 39 Cliff View Drive, Allens Rivulet, 7150, Tasmania,
> Australia.
> Website: www.spatialdbadvisor.com
>    Email: simon at spatialdbadvisor.com
>    Voice: +61 362 396397
> Mobile: +61 418 396391
> Skype: sggreener
> Longitude: 147.20515 (147° 12' 18" E)
> Latitude: -43.01530 (43° 00' 55" S)
> NAC:W80CK 7SWP3
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 



More information about the postgis-users mailing list