[postgis-users] spatial distribution maps (heat maps?)
Mr. Puneet Kishor
punk.kish at gmail.com
Tue Dec 6 19:09:59 PST 2011
Hi all,
Thanks to your ideas, I was able to successfully implement spatial clustering. I queried the data via Perl and DBD::Pg (Perl is my tool of choice for most things), and used Statistics::R to bridge it with R. KMeans is built in the standard R package. Ten lines of code, and it was all done.
796 my $R = Statistics::R->new();
797 $clusters = 100;
798 $R->set( 'n.clusters', $clusters);
799 $R->set( 'lon', \@lon);
800 $R->set( 'lat', \@lat);
801 $R->run(q`sample.data <- cbind(lon, lat)`);
802 $R->run(q`cl <- kmeans(sample.data, n.clusters)`);
803
804 my $lon = $R->get( 'as.numeric(cl$centers[,1])' );
805 my $lat = $R->get( 'as.numeric(cl$centers[,2])' );
806 my $size = $R->get( 'cl$size' );
Takes about 8 secs for 120K records, but I am sure I can make it a tad faster. That said, it doesn't really matter because once I calculate the clustering, I store it on disk using Storable. Subsequent calls are a couple of hundred milliseconds.
On Dec 6, 2011, at 1:58 AM, Phil James wrote:
> I would use something that is designed to do this analysis - R would be an obvious choice but there are others - using Python you can connect to Postgis to grab the data and then rpy to run R commands from python - we have used this configuration successfully using Django and it works fast enough for the web generating an image - this could of course be optimised to cache outputs if performance is an issue.
>
>>> ..
>>
>>
>> Reading the above made me realize that I should have rephrased my question -- I don't want to create images on the server side. I realize now that what I really want to do is to do spatial clustering on the server side and then send the data to the browser. I wrote my own very naive clustering routine in Perl, and also tried Algorithm::KMeans [1]. This kind of analysis allows me to create a summary of my data that I can then plot on the client (see image at [2]).
>>
>> Of course, my algorithm is way too naive, and waaaaay too slow, although I *can* compute the summary and cache it using Storable.
>>
>> So, here is my rephrased question -- I am looking to do spatial clustering on my Pg data. The added complication is that I do not have access to WKTRaster.
>>
>>
>> [1] http://search.cpan.org/~avikak/Algorithm-KMeans-1.30/lib/Algorithm/KMeans.pm
>> [2] http://dl.dropbox.com/u/3526821/occurrences.png
>>
More information about the postgis-users
mailing list