[postgis-users] 'Clustering' records in space and time

William Furnass will at thearete.co.uk
Fri Jul 16 09:59:47 PDT 2010


Thanks Pierre and Dylan for your helpful replies.  FYI my dataset is
90K records describing events that occurred over 14 years over an area
of 50^2m.

The suggestion of using R's PAM and CLARA functions for clustering
lead me to the 'dbscan' algorithm which may well be a better choice
for my needs as one doesn't need to know in advance how many clusters
require identification.  "Clusters require a minimum no of points
(MinPts) within a maximum distance (eps) around one of its members
(the seed). Any point within eps around any point which satisfies the
seed condition is a cluster member (recursively). Some points may not
belong to any clusters."
(http://bm2.genes.nig.ac.jp/RGM2/R_current/library/fpc/man/dbscan.html).

Another approach I'm considering is to discretize 2D space and time
(three dimensions) into a cellular matrix, associate each event with a
cell and amalgamate all records that have the same cell reference.
This would of course fail to cluster 'close' events that happen to
fall either side of a cell divide but _might_ be easy to implement
using say PL/SQL.

For reference it appears that a clustering function for PostGIS has
already been proposed:

http://opengeo.org/products/coredevelopment/postgis/bi-utilities/

Thanks again for pointing me towards PAM/CLARA.

Cheers,

Will

On 15 July 2010 21:33, Pierre Racine <Pierre.Racine at sbf.ulaval.ca> wrote:
> I would suggest you ask your question to the r-sig-geo mailing list. You will get a R solution. You can then get your PostGIS table from R using the gdal/ogr package or use PL/R in PostgreSQL.
>
> Pierre
>
>>-----Original Message-----
>>From: Dylan Beaudette [mailto:debeaudette at ucdavis.edu]
>>Sent: 15 juillet 2010 16:02
>>To: Pierre Racine
>>Cc: PostGIS Users Discussion; will at thearete.co.uk
>>Subject: Re: [postgis-users] 'Clustering' records in space and time
>>
>>On Thursday 15 July 2010, Pierre Racine wrote:
>>> What should happen when event A is at a distance n minus epsilon from B, B
>>> is at a distance n-epsilon from C but A is at a distance 2*n-epsilon from
>>> C? Should A and C be in the same cluster with B?
>>>
>>> Pierre
>>
>>Interesting. The choice of clustering algorithm would need to be based on the
>>questions the OP was trying to answer. Without much thought (warning!) I
>>pictured a 3D space (x, y, time) partitioned around medoids (PAM algorithm)
>>of data.
>>
>>In this very simple case chunks of data in (x, y, time) space would be
>>collected based on their proximity. For this to work, space and time
>>coordinates would need to be standardized accordingly... For x and y, I think
>>that subtracting the mean and dividing by the standard deviation should do. I
>>am not sure about the standardization of time... maybe the same thing, but
>>applied to the number of seconds | minutes | hours | days elapsed since the
>>start of the experiment?
>>
>>Dylan
>>
>>
>>> >-----Original Message-----
>>> >From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-
>>> >bounces at postgis.refractions.net] On Behalf Of Dylan Beaudette
>>> >Sent: 15 juillet 2010 15:10
>>> >To: will at thearete.co.uk; PostGIS Users Discussion
>>> >Subject: Re: [postgis-users] 'Clustering' records in space and time
>>> >
>>> >Hi,
>>> >
>>> >Can you give us some hints about your data?
>>> >
>>> >1. how many records
>>> >2. temporal domain (i.e. 1 year?)
>>> >3. spatial domain (local, regional, continental?)
>>> >
>>> >If you don't have too much data, you may be able to standardize them, and
>>> >apply an algorithm like PAM, or CLARA (see cluster package in R).
>>> >
>>> >Cheers,
>>> >Dylan
>>> >
>>> >On Thursday 15 July 2010, William Furnass wrote:
>>> >> I have a PostGIS table of records describing events therefore the
>>> >> table has a timestamp attribute.  I wish to replace 'clusters' of
>>> >> events that occur within a m-hour window and a spatial radius of n
>>> >> with single events which have the mean timestamp and central position
>>> >> of the cluster.  I understand that I can quantize my data spatially
>>> >> using the St_SnapToGrid function but using this function alone I lose
>>> >> some of the distinct events that occurred at the same point in space
>>> >> but at very different times (it's my understanding that St_SnapToGrid
>>> >> only allows one point to be stored at each node in the grid).  Also, I
>>> >> am unsure as to how I could use St_SnapToGrid in such a way so as not
>>> >> to relocate points that are unique within the aforementioned spatial
>>> >> and temporal window boundaries.
>>> >>
>>> >> Has anyone any suggestions as to how this can be achieved
>>> >> programmatically using SQL (rather than a graphical tool)?  Should I
>>> >> perhaps be looking to use R to spatially and temporally cluster my
>>> >> data?  Apologies if the description of my problem isn't particularly
>>> >> clear; it's been a long day:)
>>> >> _______________________________________________
>>> >> postgis-users mailing list
>>> >> postgis-users at postgis.refractions.net
>>> >> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>> >
>>> >--
>>> >Dylan Beaudette
>>> >Soil Resource Laboratory
>>> >http://casoilresource.lawr.ucdavis.edu/
>>> >University of California at Davis
>>> >530.754.7341
>>> >_______________________________________________
>>> >postgis-users mailing list
>>> >postgis-users at postgis.refractions.net
>>> >http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>>
>>--
>>Dylan Beaudette
>>Soil Resource Laboratory
>>http://casoilresource.lawr.ucdavis.edu/
>>University of California at Davis
>>530.754.7341
>



More information about the postgis-users mailing list