[postgis-users] Center of Points Collection

pcreso at pcreso.com pcreso at pcreso.com
Wed Sep 16 20:36:44 PDT 2009


Hi Chris,

Agreed, but my point is that a simple mean is not always going to be a meaningful value, nor is a centroid, to represent a set of values. And an RDBMS does not easily tell you how representative your of your dataset the mean/centroid is.

Cheers,

  Brent 


--- On Thu, 9/17/09, Chris Hermansen <chris.hermansen at timberline.ca> wrote:

> From: Chris Hermansen <chris.hermansen at timberline.ca>
> Subject: Re: [postgis-users] Center of Points Collection
> To: pcreso at pcreso.com, "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
> Date: Thursday, September 17, 2009, 3:19 PM
> Hi Brent;
> 
> One thing to remember is that finding the center of mass
> isn't necessarily a statistical problem - unless the points
> are a sample of some population.
> 
> If a point set contains 10 points, that's what it contains,
> and its center of mass has a physical meaning.
> Chris Hermansen        chris.hermansen at timberline.ca
> tel+1.604.714.2878 · fax+1.604.733.0631 ·
> mob+1.778.840.4625
> Timberline Natural Resource Group · www.timberline.ca
> 401 · 958 West 8th Avenue  · Vancouver BC · Canada
> · V5Z 1E5
> 
> 
> -----Original Message-----
> From: pcreso at pcreso.com
> 
> Date: Wed, 16 Sep 2009 19:12:39 
> To: PostGIS Users Discussion<postgis-users at postgis.refractions.net>
> Subject: Re: [postgis-users] Center of Points Collection
> 
> 
> Hi guys,
> 
> One concern I have is with the sample sizes being suggested
> here as "proof/disproof of concept".
> 
> Pretty much any statistical sampling approach becomes more
> robust with increased sample size, so many cases with only a
> few points are likely to be anomalous.
> 
> The other aspect that is not being addressed is the
> question of testing for significance.
> 
> A mean (avg) of 10 with a SD of 10 suggests the mean is not
> that useful as a descriptive/representative statistic, a
> mean of 10 with an SD of 1 is probably reasonable. The avg()
> function in an RDBMS returns no measure of spread/fit, so is
> pretty easily misused on datasets where an avg() is not an
> appropriate statistic.
> 
> Any statistical value representing a dataset without some
> measure of variance or fit is quite probably misleading,
> which is why RDBMS's are not the best tool for such
> measures, despite their convenience.
> 
> For example, a polygon is defined as "the set of points
> contained by ..." but the centroid representing the polygon
> is often defined by the points delineating the perimeter,
> which is actually a linestring, not the polygon it contains.
> There are going to be many cases where this causes
> problems.
> 
> One, relatively simple approach (as these things go :-)
> & somewhat more robust method for calculating a centroid
> is to generate a grid of points within the bbox of the
> polygon, toss out those not within the polygon itself, &
> take the avg(X) & avg(Y) of the remaining points. At
> least this provides a centroid based on the polygon, rather
> than the perimeter, but can also generate a centroid outside
> the polygon for complex or concave shapes. 
> 
> If we really want a statistically robust solution, a
> database (spatially enabled or not) is generally unlikely to
> be the ideal tool.
> 
> Cheers,
> 
> Brent Wood
> 
> 
> --- On Thu, 9/17/09, Simon Greener <simon at spatialdbadvisor.com>
> wrote:
> 
> > From: Simon Greener <simon at spatialdbadvisor.com>
> > Subject: Re: [postgis-users] Center of Points
> Collection
> > To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
> > Date: Thursday, September 17, 2009, 11:24 AM
> > Chris,
> > 
> > Never thought my method was theoretically correct:
> just had
> > a bit of fun working out how one would do it given
> > no database (except Oracle) has a MEDIAN aggregate cf
> AVG.
> > 
> > I looked at your URL and decided I didn't have the
> time to
> > look at implementing it.
> > 
> > regards
> > SImon
> > On Thu, 17 Sep 2009 02:08:47 +1000, Chris Hermansen
> <chris.hermansen at timberline.ca>
> > wrote:
> > 
> > > A median in X will either yield a point in the
> set or
> > the average of the
> > > two points bracketing the (theoretical)
> median. 
> > Same in Y.  In 2D the
> > > medians of both will often produce a point that
> is not
> > in either set,
> > > even in cases where there is not an
> interpolation
> > going on.  Also what
> > > does one do about duplicate coordinates?
> > >
> > > In the median there is clearly the concept of
> "half of
> > the points being
> > > on one side of the median, half on the other
> > side".  It's this
> > > "sidedness" in 2D that makes me fret.
> > >
> > > Here is an example for which I can't conceive of
> a
> > great solution:
> > >
> > > POINT(1 6),
> > > POINT(0 5),
> > > POINT(0 4),
> > > POINT(2 3),
> > > POINT(2 2),
> > > POINT(2 1),
> > > POINT(1 0)
> > >
> > > What is the median of that?  Simon, what your
> > propose (medians of x and
> > > y) should I think give POINT(1.5 3) if duplicates
> are
> > only counted once
> > > and POINT(1 3) if not.
> > >
> > > Simon Greener wrote:
> > >> What would happen if you calculated the
> median of
> > the x and y
> > >> ordinates instead of the average?
> > >>
> > >> WITH points As
> > >> (
> > >> SELECT st_collect(column1) as the_geom
> > >>   FROM (values ('POINT(0 0)'),
> > >>             
> >   ('POINT(0 1)'),
> > >>             
> >   ('POINT(0 2)'),
> > >>             
> >   ('POINT(1 0)')) as foo
> > >> )
> > >> SELECT astext(st_makepoint(avg(x),avg(y)))
> > >>   FROM ( SELECT st_x(geom) as x,
> > st_y(geom) as y
> > >>            FROM
> > (select (ST_Dump(the_geom)).geom as geom from points
> > >> as foo) as p
> > >>           ORDER
> > by st_x(geom) ASC, st_y(geom) ASC
> > >>           LIMIT
> > (select case when mod(ST_NumGeometries(the_geom),2) =
> > >> 0 then 1 else 2 end from points)
> > >>       
> >    OFFSET (select case when
> > mod(ST_NumGeometries(the_geom),2) =
> > >> 0 then ST_NumGeometries(the_geom)/2-1 else
> > >> (ST_NumGeometries(the_geom)-1)/2 end from
> points)
> > >>         ) T;
> > >> -- where p=1 and n=N/2-1 when the number of
> non
> > null rows N is even,
> > >> or p=2 and n=(N-1)/2 when N is odd.
> > >>
> > >> "POINT(0 1)"
> > >>
> > >> See
> > >> http://scorreiait.wordpress.com/2008/10/28/how-to-compute-a-median-in-sql

> > >>
> > >> S
> > >> On Wed, 16 Sep 2009 05:25:29 +1000, Chris
> > Hermansen
> > >> <chris.hermansen at timberline.ca>
> > wrote:
> > >>
> > >>> Hmm on further thought I'm not
> completely
> > clear on the definition of
> > >>> "median" in two dimensions, either.  I
> > found this interesting article on
> > >>> Google, including a way cool Java app
> for
> > calculating and graphing:
> > >>>
> > >>> http://www.tiac.net/~sw/2007/11/2D_median/index.html

> > >>>
> > >>> pcreso at pcreso.com
> > wrote:
> > >>>> Hi guys,
> > >>>>
> > >>>> A bit more difficult, & way out
> in
> > left field, but if you use PL/R
> > >>>> to create a median function for
> Postgres,
> > you could build your point
> > >>>> from the median(X) & (Y) values
> > instead of the average.
> > >>>>
> > >>>> Where this would actually lie
> obviously
> > depends on the distribution
> > >>>> of the points. The centroid is most
> > affected by (actually defined
> > >>>> by) outlying points, the avg somewhat
> less
> > & the median less still.
> > >>>>
> > >>>> Of course once you have PL/R to play
> with,
> > you have much more
> > >>>> flexibility to look at returning
> > statistics from datasets than just
> > >>>> the median.
> > >>>>
> > >>>> Cheers,
> > >>>>
> > >>>>   Brent Wood
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --- On Tue, 9/15/09, Kevin Neufeld
> <kneufeld at refractions.net>
> > wrote:
> > >>>>
> > >>>>
> > >>>>> From: Kevin Neufeld <kneufeld at refractions.net>
> > >>>>> Subject: Re: [postgis-users]
> Center of
> > Points Collection
> > >>>>> To: "PostGIS Users Discussion"
> <postgis-users at postgis.refractions.net>
> > >>>>> Date: Tuesday, September 15,
> 2009,
> > 8:51 AM
> > >>>>> Paul Ramsey wrote:
> > >>>>>
> > >>>>>> Faster than creating a
> multipoint
> > is to recognize that
> > >>>>>>
> > >>>>> ST_Centroid()
> > >>>>>
> > >>>>>> is just going to return the
> center
> > of the bbox of the
> > >>>>>>
> > >>>>> collection
> > >>>>>
> > >>>>>> anyways...
> > >>>>>>
> > >>>>> Unfortunately, Paul, ST_Centroid
> > returns the center of
> > >>>>> mass, not the center of the
> bbox.
> > >>>>>
> > >>>>> SELECT
> > astext(st_centroid(st_collect(column1))),
> > >>>>> FROM (values ('POINT(0 0)'),
> > >>>>>
> > >>>>>    ('POINT(0 1)'),
> > >>>>>
> > >>>>>    ('POINT(0 2)'),
> > >>>>>
> > >>>>>    ('POINT(1 0)')) as foo;
> > >>>>>       astext
> > >>>>> ------------------
> > >>>>>  POINT(0.25 0.75)
> > >>>>> (1 row)
> > >>>>>
> > >>>>> Your second post, taking the avg
> of
> > the x,y does seem to be
> > >>>>> the nice approach, and produces
> the
> > same results as
> > >>>>> ST_Centroid - the center of
> mass.
> > >>>>>
> > >>>>> SELECT
> > astext(st_makepoint(avg(st_x(column1)),
> > >>>>> avg(st_y(column1))))
> > >>>>> FROM (values ('POINT(0 0)'),
> > >>>>>
> > >>>>>    ('POINT(0 1)'),
> > >>>>>
> > >>>>>    ('POINT(0 2)'),
> > >>>>>
> > >>>>>    ('POINT(1 0)')) as foo;
> > >>>>>       astext
> > >>>>> ------------------
> > >>>>>  POINT(0.25 0.75)
> > >>>>> (1 row)
> > >>>>>
> > >>>>> If Dustin is after the center of
> the
> > collection, then
> > >>>>> something along your first
> suggestion
> > might be more
> > >>>>> appropriate.
> > >>>>> (taking the center of the
> extents)
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Kevin
> > >>>>>
> >_______________________________________________
> > >>>>> postgis-users mailing list
> > >>>>> postgis-users at postgis.refractions.net
> > >>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users

> > >>>>>
> > >>>>>
> > >>>>
> >_______________________________________________
> > >>>> postgis-users mailing list
> > >>>> postgis-users at postgis.refractions.net
> > >>>> http://postgis.refractions.net/mailman/listinfo/postgis-users

> > >>>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> > >
> > 
> > 
> > -- 
> > SpatialDB Advice and Design, Solutions Architecture
> and
> > Programming,
> > Oracle Database 10g Administrator Certified
> Associate;
> > Oracle Database 10g SQL Certified Professional
> > Oracle Spatial, SQL Server, PostGIS, MySQL, ArcSDE,
> > Manifold GIS, FME, Radius Topology and Studio
> Specialist.
> > 39 Cliff View Drive, Allens Rivulet, 7150, Tasmania,
> > Australia.
> > Website: www.spatialdbadvisor.com
> >    Email: simon at spatialdbadvisor.com
> >    Voice: +61 362 396397
> > Mobile: +61 418 396391
> > Skype: sggreener
> > Longitude: 147.20515 (147° 12' 18" E)
> > Latitude: -43.01530 (43° 00' 55" S)
> > NAC:W80CK 7SWP3
> >_______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users

> > 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

> 



More information about the postgis-users mailing list