[Qgis-user] Similarity index for pairs of features with the highest overlap?

Tue Nov 4 10:39:32 PST 2025

Laurent and list,

On Tue, Nov 4, 2025 at 4:17 AM celati Laurent via QGIS-User <
qgis-user at lists.osgeo.org> wrote:

> Dear all,
>
> A few weeks ago, I posted an initial message:
>
> https://lists.osgeo.org/pipermail/qgis-user/2025-October/055789.html
>
> But after further thought, I'm creating a new one to provide additional
> inputs/clarification.
> As a reminder, my goal is to find a method/tool to compare the
> similarity/dissimilarity between two polygonal vector layers consisting of
> multiple features. In my use case, this comparison concerns a polygonal
> layer resulting from photo-interpretation work (modeling of large
> physiognomic units/habitats). I reattached a screenshot; this layer appears
> in blue.
> The other polygonal vector layer is a polygonal vector layer resulting
> from a segmentation process (using an OTB tool available via QGIS). (It is
> yellow in the attached screenshot):
>
> [image: image.png]
>
> In my previous post, I mentioned a possible way for one similarity
> indicator: "for each photo-interpreted feature/polygon, calculate the
> percentage (%) of area covered by the feature from the segmentation layer
> that has the highest overlap."
> To be more specific, in the example in the screenshot, it could be:
> "for feature number 11 (blue) from the photo-interpretation layer, I would
> like to know the proportion of the area covered by the feature from the
> segmentation layer that has the highest overlap with this feature 11 (in
> this specific case, it is probably feature 42 in yellow)."
> This could be: area of feature 42 overlapping with feature 11 / total area
> of feature 11.
>
> But after further thought, I think this is insufficient. That i need to go
> a little further.
>
> Regarding these methods for quantifying the accuracy of a
> segmentation/fidelity to photo-interpretation. Regarding the level/degree
> of similarity, I think i should integrate a "bi-similarity" aspect/logic:
> using an approach based *on pairs* of the most overlapping polygons:
>
> *step 1*
> - For each polygon feature in the input layer, retrieve the ID and
> geometry of the polygon from the segmentation layer that has the highest
> overlap. Calculate the overlap rate.
> (e.g., surface area of feature 42 overlapping with feature 11 / *total
> surface area of feature 11*).
>
> *Step 2:*
> - Once this first indicator is calculated, perform the calculation for the
> same pair/sequence of polygons calculated in step 1: perform the same
> calculation but this time on the total surface area of feature 42 (polygon
> from the segmentation).
> (e.g., surface area of feature 42 overlapping with feature 11 /* total
> surface area of feature 42*).
>
> *Step 1/2 variant:*
> I was thinking that a one-step variant/synthesis/summary of steps 1 and 2
> could be:
> - (e.g., surface area of feature 42 overlapping with feature 11 / *total
> surface area constituted by feature 11 AND feature 42).*?
>

If you look at your three steps above, the numerator is always the same -
the surface area of feature 42 overlapping with feature 11.  In PostGIS you
will want the area of the intersection, something like this:

SELECT ...,ST_Area(ST_Intersection(interp.geom,segment.geom) as overlap
FROM ...

You can read about these two predicates here:

https://postgis.net/docs/ST_Intersection.html
https://postgis.net/docs/ST_Area.html

In your WHERE clause you can use ST_Intersects() to narrow your search to
all polygons that intersect with each other, something like

WHERE ST_Intersects(interp.geom,segment.geom) ...

Read about that here:

https://postgis.net/docs/ST_Intersection.html

Along with that core, you can add the total surface area of the interpreted
and segmented layers to the SELECT clause, something like:

SELECT ...,ST_Area(ST_Intersection(interp.geom,segment.geom) as
overlap_area, ST_Area(interp.geom) as interpreted_area,
ST_Area(segment.geom) segmented_area, (ST_Area(interp.geom) +
ST_Area(segment.geom)) as total_area FROM ...

You can also use division in the expressions in the SELECT clause to do
your division for you, something like

(ST_Area(ST_Intersection(interp.geom,segment.geom) / ST_Area(interp.geom))
as step1_area

If you're going to do division, it's always good practice to guard against
divide-by-zero using a CASE expression, something like:

CASE WHEN ST_Area(interp.geom) = 0 THEN 0 ELSE
(ST_Area(ST_Intersection(interp.geom,segment.geom) / ST_Area(interp.geom))
END as step1_area, ...

> * Step 3:*
> The idea would be, based on steps 1/2, to have a kind of overall score at
> the layer level. This could be a kind of average/median for all pairs of
> polygons? Or another metric?
>

Why not just report all three and decide what you want later?
Alternatively you can take the remote sensing kind of approach:

https://gsp.humboldt.edu/olm/courses/GSP_216/lessons/accuracy/metrics.html

-- 
Chris Hermansen · clhermansen "at" gmail "dot" com

C'est ma façon de parler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-user/attachments/20251104/7b46a110/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 1005887 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/qgis-user/attachments/20251104/7b46a110/attachment-0001.png>