[postgis-users] ST_ClusterDBSCAN: is it deterministic?

Daniel Baston dbaston at gmail.com
Sun Jan 24 10:16:02 PST 2021


Hi Giuseppe,

You can order the inputs by anything you like; OVER(ORDER BY feature_id)
would work just as well. If you have an example that is not deterministic
despite ordered inputs, I'd be curious to see it if you can share.

Thanks,
Dan

On Sun, Jan 24, 2021 at 12:33 PM Giuseppe Broccolo <g.broccolo.7 at gmail.com>
wrote:

> Hi Daniel,
>
> Il giorno ven 22 gen 2021 alle ore 18:07 Daniel Baston <dbaston at gmail.com>
> ha scritto:
>
>> It should be deterministic for most real data if the inputs are ordered
>> consistently, using the OVER() clause as you suggest. It's possible that
>> there may be a contrived situation involving duplicates in the input where
>> a result would be different (as GEOS STRtree is using std::sort instead of
>> std::stable_sort), but I'm not sure. Also, there are sometimes multiple
>> possible clusterings that satisfy the DBSCAN algorithm, so it is expected
>> that the results may differ from different implementations or different
>> orderings of the same input.
>>
>
> Thank you for the answer. I think I'll try to define the partition with
> the ORDER BY geom clause in order to check if I can obtain more
> determinism. If I correctly understood, the ORDER BY should add a further
> step with preordering the geometries using an Hilbert curve. Of course,
> this would impact the overall duration of the query.
>
> Giuseppe.
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210124/e5c4ccb4/attachment.html>


More information about the postgis-users mailing list