[postgis-users] 'Clustering' records in space and time

Thu Jul 15 13:01:39 PDT 2010

On Thursday 15 July 2010, Pierre Racine wrote:
> What should happen when event A is at a distance n minus epsilon from B, B
> is at a distance n-epsilon from C but A is at a distance 2*n-epsilon from
> C? Should A and C be in the same cluster with B?
>
> Pierre

Interesting. The choice of clustering algorithm would need to be based on the 
questions the OP was trying to answer. Without much thought (warning!) I 
pictured a 3D space (x, y, time) partitioned around medoids (PAM algorithm) 
of data. 

In this very simple case chunks of data in (x, y, time) space would be 
collected based on their proximity. For this to work, space and time 
coordinates would need to be standardized accordingly... For x and y, I think 
that subtracting the mean and dividing by the standard deviation should do. I 
am not sure about the standardization of time... maybe the same thing, but 
applied to the number of seconds | minutes | hours | days elapsed since the 
start of the experiment?

Dylan

> >-----Original Message-----
> >From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-
> >bounces at postgis.refractions.net] On Behalf Of Dylan Beaudette
> >Sent: 15 juillet 2010 15:10
> >To: will at thearete.co.uk; PostGIS Users Discussion
> >Subject: Re: [postgis-users] 'Clustering' records in space and time
> >
> >Hi,
> >
> >Can you give us some hints about your data?
> >
> >1. how many records
> >2. temporal domain (i.e. 1 year?)
> >3. spatial domain (local, regional, continental?)
> >
> >If you don't have too much data, you may be able to standardize them, and
> >apply an algorithm like PAM, or CLARA (see cluster package in R).
> >
> >Cheers,
> >Dylan
> >
> >On Thursday 15 July 2010, William Furnass wrote:
> >> I have a PostGIS table of records describing events therefore the
> >> table has a timestamp attribute.  I wish to replace 'clusters' of
> >> events that occur within a m-hour window and a spatial radius of n
> >> with single events which have the mean timestamp and central position
> >> of the cluster.  I understand that I can quantize my data spatially
> >> using the St_SnapToGrid function but using this function alone I lose
> >> some of the distinct events that occurred at the same point in space
> >> but at very different times (it's my understanding that St_SnapToGrid
> >> only allows one point to be stored at each node in the grid).  Also, I
> >> am unsure as to how I could use St_SnapToGrid in such a way so as not
> >> to relocate points that are unique within the aforementioned spatial
> >> and temporal window boundaries.
> >>
> >> Has anyone any suggestions as to how this can be achieved
> >> programmatically using SQL (rather than a graphical tool)?  Should I
> >> perhaps be looking to use R to spatially and temporally cluster my
> >> data?  Apologies if the description of my problem isn't particularly
> >> clear; it's been a long day:)
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> >--
> >Dylan Beaudette
> >Soil Resource Laboratory
> >http://casoilresource.lawr.ucdavis.edu/
> >University of California at Davis
> >530.754.7341
> >_______________________________________________
> >postgis-users mailing list
> >postgis-users at postgis.refractions.net
> >http://postgis.refractions.net/mailman/listinfo/postgis-users

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341