# [postgis-users] Comparing two Building Footprint Tables

Clifford Snow clifford at snowandsnow.us
Sat Aug 23 22:52:22 PDT 2014

```Thanks for the solution. This seems do able, but will push my learning a
bit. I might select a small area using voting districts to keep the
processing more manageable.

Thanks for the help,
Clifford

On Sat, Aug 23, 2014 at 3:43 AM, Rémi Cura <remi.cura at gmail.com> wrote:

> Hey,
> I did something akin, I decided then for a 2 step process : find potential
> matches, then keep none or the better match.
> The problem is: given 2 tables A (old) and B(new), find a mapping between
> A and B,
> with the following limits: a building could be new, destroyed , same or
> changed. (Note that we exclude to track the fact that a building has been
> split in 2 buildings. In this case we want to get a changed building and a
> new building. It is a simple extension tough)
> This is a [0;1]<->[0;1] mapping
> My hypothesis was that if old and new geometry where the same object, they
> should share some space.
> First you find the building that haven't changed or that have changed
> (excluding the creation and destruction then)
>
> For this you need a measure to quantify how much a building is similar to
> another. Basically you are looking for a function f(A.row, B.row ) ->
> score.
> If you have only geometry, I would suggest the shared area (That is
> ST_Area(ST_Intersection(A.geom,B.geom)) ).
> If you have more semantic information (number of floors, admin district,
> whatever), you can inlcude it as well (I used name of street with fuzzy
> matching)
>
> You also need a threshold T on score : if (fA.row,B.Row)< T -> they are
> not related building. Else these two buildings are related.
> (please note that you may normalize the score by for exemple
> LOWEST(ST_Area(A.geom), ST_Area(B.geom)) if your buildings are widely
> different)
>
> Now you could already have a result but it would be extremely slow .
> You would simply do :
> for every row of A . find the row of B so that f(A.row,B.row) is maximal.
> (SELECT DISTINCT ON (A.id, B.id) A.id AS id_a, B.id AS id_b
> FROM A,B
> ORDER BY A.id ASC, f(A.geom,B.geom) DESC ;B.id ASC)
> It would be very slow because you would do 300k * 300 k test on f().
> The idea is then :
> for buildings to share some area, they should be close enough (distance
> d). So we add a pre-step to find the building from B that are close from A.
>
> The final query looks like this (I can't test it here). Please note that
> this is suboptimal and could be written with far less CTE (only 2)
> (although not as clearly).
> Please also note that everything is symmetrical between A and B. meaning
> that the matching B<->A is the same result a the matching A<->B. this
> should be the case!
>
>
> CREATE TABLE A_matching_B AS
> WITH geometry_filtering AS ( --for every building in A, find the building
> in B that could be potential matches
>     SELECT A.id AS id_a, A.geom AS geom_a, B.id AS id_b, B.geom  AS geom_b
>     FROM A,B
>     WHERE ST_DWithin(1.geom,B.geom,d) =TRUE --note , can be replaced by
> "ST_Intersects(A.geom, B.geom)= TRUE" if you don't fear precision issues
>     --note : this query gives pairs of building from (A,B) that are
> potential matching
> )
> ,computing_score AS ( --for each potential matching, compute the score of
> this pair
>     SELECT id_a , id_b
>         , ST_Area(ST_Intersection(geom_a,geom_b)) as score
>              --note : can be replaced by something more subtle if you have
> more attributes / measure
>              --note : two identical buildings would have a big score, two
> totally different vuilding would have a score of 0
>         , LOWEST(ST_Area(geom_a),ST_Area(geom_b)) AS lowest_area --we keep
> it for latter
>     FROM geometry_filtering
> )
> ,choosing_0_or_1_match AS ( --we are going to keep at most 1 pair per
> building.
>     SELECT DISTINCT ON (id_a,id_b) id_a, id_b --note that the distinct
> close could simply be (id_a), we keep id_b for symmetry sake.
>         ,score ,lowest_area
>     FROM computing_score
>     WHERE score > T --here we allow the possibility that a building has no
> match
>     ORDER BY id_a ASC, score DESC ,id_b ASC --this is paramount. Combined
> with the DISTINCT, it ensure for each building of A, we get the building
> from B that has the highest score
> )
> ,has_the_building_changed AS ( --this is optional. We fabric a boolean to
> decide if the two matching buildings are different or not.
>     SELECT *
>         , CASE WHEN abs(1-score/lowests_area)>0.01 --this is to allow
> precision issues : some building could be identical but footprints
> differing very little due to rounding precision changes
>                 THEN TRUE
>             ELSE
>                 FALSE
>             END AS has_the_building_changed
>     FROM choosing_0_or_1_match
> )
>     SELECT gid_a , gid_b --this are the final matching pairs
>     FROM has_the_building_changed ;
>
>
> Now you have a table which give you the information
> _which building from A is the same as which building from B
> _which building from A is the same building but has changed its footprint
> from which building from B
>
> You still need to get the building from A that have been destroyed in B
> and the building from B that are not in A (they have been constructed).
>
> You can do this the following way :
> if a building from A is not in the A_matching_B table, it means it has
> been destroyed
> if a building from B is not in the A_matching_B table, it means it has
> been constructed.
>
>
> You can round all of this in a single query :
>
> CREATE TABLE A_full_matching_B AS
>
>     SELECT id AS id_a, NULL::int as id_b, 'in_A_but_not_B___destroyed' AS
> status
>     FROM A
>     WHERE NOT EXISTS (SELECT 1 FROM A_matching_B WHERE A.id=
> A_matching_B.id_a)
> UNION ALL
>     SELECT id_a , id_b,  'identical' AS status
>     FROM A_matching_B
>     WHERE has_the_building_changed = FALSE
> UNION ALL
>     SELECT id_a , id_b,  'changed' AS status
>     FROM A_matching_B
>     WHERE has_the_building_changed = TRUE
> UNION ALL
>     SELECT NULL::int AS id_a,id_b, 'in_B_but_not_A____constructed' AS
> status
>     FROM B
>     WHERE NOT EXISTS (SELECT 1 FROM A_matching_B WHERE B.id=
> A_matching_B.id_a) ;
>
>
> To test this, please place a "LIMIT 1000" in the first CTE, and be sure to
> have indexes on id and geom from both tables (no need for id if they are
> primary key)
> I hope this has been useful and would be glad to have some timings of
> execution and result in your case.
> I would expect something around 10 minutes?
>
> Cheers,
> Rémi-C
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>

--
@osm_seattle
osm_seattle.snowandsnow.us
OpenStreetMap: Maps with a human touch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20140823/c6c5c5a3/attachment.html>
```