[GRASS-dev] Spatial clustering of vector objects?

Thu May 4 19:28:28 PDT 2017

On Thu, May 4, 2017 at 2:18 PM, Benjamin Ducke <benducke at fastmail.fm> wrote:
> On 04/05/17 19:22, Markus Neteler wrote:
>> Hi,
>>
>> in order to parallelize some heavy computation I was wondering how to
>> do spatial clustering of vector objects, i.e. building footprints
>> (vector polygons).
>>
>> I have to perform zonal statistics on thousands of buildings and would
>> like to split them up into "tiles" and then run the computation in
>> parallel for each tile.
>>
>> The examples in v.cluster look somehow promising
>> https://grass.osgeo.org/grass72/manuals/v.cluster.html
>>
>> but in the best case each "tile" would contain a similar amount of
>> buildings in order to balance the computation across the CPUs.
>
> Hi,
>
> I think that you would need to partition
> space into overlapping tiles, with the
> amount of overlap depending on the maximum
> distance parameter of the clustering algorithm.
> Otherwise you would get a serious edge effect
> in each tile.
>
> Prior to spatial clustering, you could use a cluster
> algorithm that aims to produce clusters with
> (nearly) equal number of points for "tiling":
>
> https://stats.stackexchange.com/questions/8744/clustering-procedure-where-each-cluster-has-an-equal-number-of-points
>
> You would then select the points for each
> cluster, buffer their convex hull by the max
> distance of your spatial cluster algorithm
> and set the working region for each "tile" to
> be the bounding box of the buffered convex
> hull (don't forget to catch all points from
> all other clusters that fall within the "tile"
> and add them to the working region's set).
>
> If that works, please make it a GRASS add-on...
>
> Regarding building footprints, I guess another
> tricky part is how to represent them as
> points: Centroids? Outer edge vertices? Both?
>
> Oh, by the way: A fellow computer scientist
> who works a lot with concurrent processing
> once told me that the frequently used
>
> number of processes = number of CPUs/cores
>
> is actually not ideal! Apparently, modern
> CPU schedulers are optimized to handle many
> more processes than there are CPUs/cores,
> and if the two counts match, then you can
> get fringe situations where processes keep
> getting transferred between cores, which
> incurs a huge performance penalty. His
> recommendation was to use a factor of
> about 2.5 (times more processes than cores).
>
> I never got around to testing his theory,
> but if you have the time, I'd love to know!
>
> Best,
>
> Ben
>
>>
>> Any idea?
>>
>> thanks,
>> Markus
>> _______________________________________________
>> grass-dev mailing list
>> grass-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/grass-dev
>>
>
>
>
> --
> Dr. Benjamin Ducke
> {*} Geospatial Consultant
> {*} GIS Developer
>
> Spatial technology for the masses, not the classes:
> experience free and open source GIS at http://gvsigce.org
> _______________________________________________
> grass-dev mailing list
> grass-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/grass-dev

Not sure if it's applicable here, but you could also try to use the
quadtree segmentation in v.surf.rst, there is an output parameter
treeseg. You need to postprocess it - v.category, v.type, v.centroid
to get areas.

Anna