[postgis-users] 'Clustering' records in space and time
Dan Putler
dan.putler at sauder.ubc.ca
Fri Jul 16 10:17:35 PDT 2010
Hi Will,
Yes, DBSCAN is a much better choice for what you want to do. However,
how to include the temporal element becomes something of an issue with
DBSCAN or PAM/CLARA. Then there are the issues of selecting an epsilon
radius and min pts for the DBSCAN algorithm. At this point, given the
nature your questions, you are likely to find the R-sig-Geo mailing list
a better choice than the PostGIS-User list. Here is the link to
subscribe to that list: https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Dan
On 07/16/2010 09:59 AM, William Furnass wrote:
> Thanks Pierre and Dylan for your helpful replies. FYI my dataset is
> 90K records describing events that occurred over 14 years over an area
> of 50^2m.
>
> The suggestion of using R's PAM and CLARA functions for clustering
> lead me to the 'dbscan' algorithm which may well be a better choice
> for my needs as one doesn't need to know in advance how many clusters
> require identification. "Clusters require a minimum no of points
> (MinPts) within a maximum distance (eps) around one of its members
> (the seed). Any point within eps around any point which satisfies the
> seed condition is a cluster member (recursively). Some points may not
> belong to any clusters."
> (http://bm2.genes.nig.ac.jp/RGM2/R_current/library/fpc/man/dbscan.html).
>
> Another approach I'm considering is to discretize 2D space and time
> (three dimensions) into a cellular matrix, associate each event with a
> cell and amalgamate all records that have the same cell reference.
> This would of course fail to cluster 'close' events that happen to
> fall either side of a cell divide but _might_ be easy to implement
> using say PL/SQL.
>
> For reference it appears that a clustering function for PostGIS has
> already been proposed:
>
> http://opengeo.org/products/coredevelopment/postgis/bi-utilities/
>
> Thanks again for pointing me towards PAM/CLARA.
>
> Cheers,
>
> Will
>
> On 15 July 2010 21:33, Pierre Racine<Pierre.Racine at sbf.ulaval.ca> wrote:
>
>> I would suggest you ask your question to the r-sig-geo mailing list. You will get a R solution. You can then get your PostGIS table from R using the gdal/ogr package or use PL/R in PostgreSQL.
>>
>> Pierre
>>
>>
>>> -----Original Message-----
>>> From: Dylan Beaudette [mailto:debeaudette at ucdavis.edu]
>>> Sent: 15 juillet 2010 16:02
>>> To: Pierre Racine
>>> Cc: PostGIS Users Discussion; will at thearete.co.uk
>>> Subject: Re: [postgis-users] 'Clustering' records in space and time
>>>
>>> On Thursday 15 July 2010, Pierre Racine wrote:
>>>
>>>> What should happen when event A is at a distance n minus epsilon from B, B
>>>> is at a distance n-epsilon from C but A is at a distance 2*n-epsilon from
>>>> C? Should A and C be in the same cluster with B?
>>>>
>>>> Pierre
>>>>
>>> Interesting. The choice of clustering algorithm would need to be based on the
>>> questions the OP was trying to answer. Without much thought (warning!) I
>>> pictured a 3D space (x, y, time) partitioned around medoids (PAM algorithm)
>>> of data.
>>>
>>> In this very simple case chunks of data in (x, y, time) space would be
>>> collected based on their proximity. For this to work, space and time
>>> coordinates would need to be standardized accordingly... For x and y, I think
>>> that subtracting the mean and dividing by the standard deviation should do. I
>>> am not sure about the standardization of time... maybe the same thing, but
>>> applied to the number of seconds | minutes | hours | days elapsed since the
>>> start of the experiment?
>>>
>>> Dylan
>>>
>>>
>>>
>>>>> -----Original Message-----
>>>>> From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-
>>>>> bounces at postgis.refractions.net] On Behalf Of Dylan Beaudette
>>>>> Sent: 15 juillet 2010 15:10
>>>>> To: will at thearete.co.uk; PostGIS Users Discussion
>>>>> Subject: Re: [postgis-users] 'Clustering' records in space and time
>>>>>
>>>>> Hi,
>>>>>
>>>>> Can you give us some hints about your data?
>>>>>
>>>>> 1. how many records
>>>>> 2. temporal domain (i.e. 1 year?)
>>>>> 3. spatial domain (local, regional, continental?)
>>>>>
>>>>> If you don't have too much data, you may be able to standardize them, and
>>>>> apply an algorithm like PAM, or CLARA (see cluster package in R).
>>>>>
>>>>> Cheers,
>>>>> Dylan
>>>>>
>>>>> On Thursday 15 July 2010, William Furnass wrote:
>>>>>
>>>>>> I have a PostGIS table of records describing events therefore the
>>>>>> table has a timestamp attribute. I wish to replace 'clusters' of
>>>>>> events that occur within a m-hour window and a spatial radius of n
>>>>>> with single events which have the mean timestamp and central position
>>>>>> of the cluster. I understand that I can quantize my data spatially
>>>>>> using the St_SnapToGrid function but using this function alone I lose
>>>>>> some of the distinct events that occurred at the same point in space
>>>>>> but at very different times (it's my understanding that St_SnapToGrid
>>>>>> only allows one point to be stored at each node in the grid). Also, I
>>>>>> am unsure as to how I could use St_SnapToGrid in such a way so as not
>>>>>> to relocate points that are unique within the aforementioned spatial
>>>>>> and temporal window boundaries.
>>>>>>
>>>>>> Has anyone any suggestions as to how this can be achieved
>>>>>> programmatically using SQL (rather than a graphical tool)? Should I
>>>>>> perhaps be looking to use R to spatially and temporally cluster my
>>>>>> data? Apologies if the description of my problem isn't particularly
>>>>>> clear; it's been a long day:)
>>>>>> _______________________________________________
>>>>>> postgis-users mailing list
>>>>>> postgis-users at postgis.refractions.net
>>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>>
>>>>> --
>>>>> Dylan Beaudette
>>>>> Soil Resource Laboratory
>>>>> http://casoilresource.lawr.ucdavis.edu/
>>>>> University of California at Davis
>>>>> 530.754.7341
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at postgis.refractions.net
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>
>>>
>>> --
>>> Dylan Beaudette
>>> Soil Resource Laboratory
>>> http://casoilresource.lawr.ucdavis.edu/
>>> University of California at Davis
>>> 530.754.7341
>>>
>>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
More information about the postgis-users
mailing list