[postgis-users] Comparing two Building Footprint Tables
Rémi Cura
remi.cura at gmail.com
Sat Aug 23 03:43:46 PDT 2014
Hey,
I did something akin, I decided then for a 2 step process : find potential
matches, then keep none or the better match.
The problem is: given 2 tables A (old) and B(new), find a mapping between A
and B,
with the following limits: a building could be new, destroyed , same or
changed. (Note that we exclude to track the fact that a building has been
split in 2 buildings. In this case we want to get a changed building and a
new building. It is a simple extension tough)
This is a [0;1]<->[0;1] mapping
My hypothesis was that if old and new geometry where the same object, they
should share some space.
First you find the building that haven't changed or that have changed
(excluding the creation and destruction then)
For this you need a measure to quantify how much a building is similar to
another. Basically you are looking for a function f(A.row, B.row ) ->
score.
If you have only geometry, I would suggest the shared area (That is
ST_Area(ST_Intersection(A.geom,B.geom)) ).
If you have more semantic information (number of floors, admin district,
whatever), you can inlcude it as well (I used name of street with fuzzy
matching)
You also need a threshold T on score : if (fA.row,B.Row)< T -> they are not
related building. Else these two buildings are related.
(please note that you may normalize the score by for exemple
LOWEST(ST_Area(A.geom), ST_Area(B.geom)) if your buildings are widely
different)
Now you could already have a result but it would be extremely slow .
You would simply do :
for every row of A . find the row of B so that f(A.row,B.row) is maximal.
(SELECT DISTINCT ON (A.id, B.id) A.id AS id_a, B.id AS id_b
FROM A,B
ORDER BY A.id ASC, f(A.geom,B.geom) DESC ;B.id ASC)
It would be very slow because you would do 300k * 300 k test on f().
The idea is then :
for buildings to share some area, they should be close enough (distance d).
So we add a pre-step to find the building from B that are close from A.
The final query looks like this (I can't test it here). Please note that
this is suboptimal and could be written with far less CTE (only 2)
(although not as clearly).
Please also note that everything is symmetrical between A and B. meaning
that the matching B<->A is the same result a the matching A<->B. this
should be the case!
CREATE TABLE A_matching_B AS
WITH geometry_filtering AS ( --for every building in A, find the building
in B that could be potential matches
SELECT A.id AS id_a, A.geom AS geom_a, B.id AS id_b, B.geom AS geom_b
FROM A,B
WHERE ST_DWithin(1.geom,B.geom,d) =TRUE --note , can be replaced by
"ST_Intersects(A.geom, B.geom)= TRUE" if you don't fear precision issues
--note : this query gives pairs of building from (A,B) that are
potential matching
)
,computing_score AS ( --for each potential matching, compute the score of
this pair
SELECT id_a , id_b
, ST_Area(ST_Intersection(geom_a,geom_b)) as score
--note : can be replaced by something more subtle if you have
more attributes / measure
--note : two identical buildings would have a big score, two
totally different vuilding would have a score of 0
, LOWEST(ST_Area(geom_a),ST_Area(geom_b)) AS lowest_area --we keep
it for latter
FROM geometry_filtering
)
,choosing_0_or_1_match AS ( --we are going to keep at most 1 pair per
building.
SELECT DISTINCT ON (id_a,id_b) id_a, id_b --note that the distinct
close could simply be (id_a), we keep id_b for symmetry sake.
,score ,lowest_area
FROM computing_score
WHERE score > T --here we allow the possibility that a building has no
match
ORDER BY id_a ASC, score DESC ,id_b ASC --this is paramount. Combined
with the DISTINCT, it ensure for each building of A, we get the building
from B that has the highest score
)
,has_the_building_changed AS ( --this is optional. We fabric a boolean to
decide if the two matching buildings are different or not.
SELECT *
, CASE WHEN abs(1-score/lowests_area)>0.01 --this is to allow
precision issues : some building could be identical but footprints
differing very little due to rounding precision changes
THEN TRUE
ELSE
FALSE
END AS has_the_building_changed
FROM choosing_0_or_1_match
)
SELECT gid_a , gid_b --this are the final matching pairs
FROM has_the_building_changed ;
Now you have a table which give you the information
_which building from A is the same as which building from B
_which building from A is the same building but has changed its footprint
from which building from B
You still need to get the building from A that have been destroyed in B and
the building from B that are not in A (they have been constructed).
You can do this the following way :
if a building from A is not in the A_matching_B table, it means it has been
destroyed
if a building from B is not in the A_matching_B table, it means it has been
constructed.
You can round all of this in a single query :
CREATE TABLE A_full_matching_B AS
SELECT id AS id_a, NULL::int as id_b, 'in_A_but_not_B___destroyed' AS
status
FROM A
WHERE NOT EXISTS (SELECT 1 FROM A_matching_B WHERE A.id=
A_matching_B.id_a)
UNION ALL
SELECT id_a , id_b, 'identical' AS status
FROM A_matching_B
WHERE has_the_building_changed = FALSE
UNION ALL
SELECT id_a , id_b, 'changed' AS status
FROM A_matching_B
WHERE has_the_building_changed = TRUE
UNION ALL
SELECT NULL::int AS id_a,id_b, 'in_B_but_not_A____constructed' AS status
FROM B
WHERE NOT EXISTS (SELECT 1 FROM A_matching_B WHERE B.id=
A_matching_B.id_a) ;
To test this, please place a "LIMIT 1000" in the first CTE, and be sure to
have indexes on id and geom from both tables (no need for id if they are
primary key)
I hope this has been useful and would be glad to have some timings of
execution and result in your case.
I would expect something around 10 minutes?
Cheers,
Rémi-C
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20140823/37f01ea7/attachment.html>
More information about the postgis-users
mailing list