<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div></div>
<div style="font-family: inherit; font-size: inherit; color: inherit; background-color: transparent;">
Hi,</div>
<div style="font-family: inherit; font-size: inherit; color: inherit; background-color: transparent;">
<br>
</div>
<div style="font-family: inherit; font-size: inherit; color: inherit; background-color: transparent;">
Out of curiosity, why are you indexing the intermediate table by ST_Pointonsurface ?</div>
<div style="font-family: inherit; font-size: inherit; color: inherit; background-color: transparent;">
The documentation doesn't guarantee that the returned point is always the same, so it introduces some degree of randomness.</div>
<div style="font-family: inherit; font-size: inherit; color: inherit; background-color: transparent;">
This should not have an influence on the number of rows in res, but could have one in the default sort order when joining it with layer_union.</div>
<div style="font-family: inherit; font-size: inherit; color: inherit; background-color: transparent;">
Maybe indexing on st_centroid and/or using an explicit order by on res_id when building res_assoc could make the processs more stabke ?</div>
<div></div>
<br>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Martin Davis <mtnclimb@gmail.com><br>
<b>Sent:</b> Friday, July 11, 2025 21:24<br>
<b>To:</b> celati Laurent <laurent.celati@gmail.com><br>
<b>Cc:</b> postgis-users@lists.osgeo.org <postgis-users@lists.osgeo.org><br>
<b>Subject:</b> Re: Qgis/Postgis : Multi Union (25 polygonal layers)</font>
<div> </div>
</div>
<div>
<div dir="ltr">The reply on GIS StackExchange was a good one, so reproducing it here for this list:
<div><br>
</div>
<div>Can you investigate further as to the differences between two identical runs, apart from the number of features? Do you get the same behaviour working on a small subset of the data? Can you see if its "sliver" features or other tiny differences due perhaps
to arithmetic rounding errors - these can result in non-reproducibility in parallel processing systems when the order of operations isn't well-defined (ie (a+b+c) could be (a+b)+c or a+(b+c) which might not be equal because of floating point precision...)</div>
<div><br>
</div>
<div>I would add that this situation can happen even when there is no parallelism. It can arise whenever there is not determinism in the order of inputs to operations throughout the process. You might be able to provide this by sorting every input and intermediate
result. But it's questionable whether this would be worth the time and effort.</div>
<div><br>
</div>
<div>It would be interesting to know the difference in total result area between the different runs. This should be very low.</div>
<div><br>
</div>
<div>A more detailed test would be to match the two results (via point-in-polygon) and then compare matched polygon boundaries (via Hausdorff distance). If there is a significant difference between two results, that would be worth investigation.</div>
</div>
<br>
<div class="x_gmail_quote x_gmail_quote_container">
<div dir="ltr" class="x_gmail_attr">On Fri, Jul 11, 2025 at 3:22 AM celati Laurent <<a href="mailto:laurent.celati@gmail.com">laurent.celati@gmail.com</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div dir="ltr">
<div>Good afternoon,</div>
<div>Thanks so much for your message. </div>
<div>I succeed in executing a postgis script without error message. It is working. </div>
<div>However, i notice a lack of <span lang="en"><span><span>reliability/stabilization.</span></span></span><span lang="en"> <span><span>Because when I've rerun the same process several times, i never end up with exactly the same number of features in my intermediate/final
result tables. <br>
I'm taking the liberty to share you the sql script. </span></span></span></div>
<div>
<div dir="ltr"><span lang="en"><span><span>The screenshot compares the number of objects for test v1 and test v2, (which are identical tests).</span></span>
<span><span>We can see that there is a difference in the res_final table, but also in the res table.</span></span><span><span> I ran several tests agai.</span></span> <span><span>Still with different numbers of objects for the res and res_final tables.</span></span>
<span><span>Often with larger differences than the one shown in the screenshot.</span></span><span><span> </span></span></span></div>
<div dir="ltr"><span lang="en"><span><span><br>
</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span>Number of entities for each table:</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span><b>Test 1</b>:</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span>layer_union table: 1026194 </span></span></span></div>
<div dir="ltr"><span lang="en"><span><span>res table : 1462661</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span> res_assoc table : 1462661</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span> res_final table : 1462661 </span></span></span></div>
<div dir="ltr"><span lang="en"><span><span><br>
</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span><b>Test 2</b></span></span></span></div>
<div dir="ltr"><span lang="en"><span><span><b> </b>layer_union table : 1026194 </span></span></span></div>
<div dir="ltr"><span lang="en"><span><span>res table 1462645</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span> res_assoc table : 1462645</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span> res_final table : 1462635</span></span></span></div>
<div dir="ltr"><span lang="en"><span><span><br>
</span></span></span></div>
<div><span lang="en"><span><span>I share below/and attach the script : <br>
</span></span></span><br>
<blockquote>--Import all shp in postgis db<br>
-- union ALL des geom et attributs des 28 data sources dans une seule table<br>
drop table if exists layer_union;<br>
create table layer_union as<br>
select inrae_2014.data_id, inrae_2014.geom from inrae_2014 UNION ALL<br>
select aesn_2006.data_id, aesn_2006.geom from aesn_2006 UNION ALL<br>
select aesn_2019.data_id, aesn_2019.geom from aesn_2019 UNION ALL<br>
--(...)etc.<br>
<br>
-- res table<br>
drop table if exists res;<br>
create table res as<br>
with tmp as <br>
(select st_union(ST_Force2D(st_boundary(geom))) as geom from layer_union<br>
)<br>
select (st_dump(st_collectionextract(st_polygonize(geom), 3))).path[1] as id,<br>
(st_dump(st_collectionextract(st_polygonize(geom), 3))).geom::geometry(polygon, 2154) as geom<br>
from tmp;<br>
<br>
-- res table id unique<br>
alter table res add column res_id int generated always as identity primary key;<br>
-- res table index on pointOnSurfacee<br>
create index on res using gist (st_pointOnSurface(geom));<br>
analyze res;<br>
<br>
-- res_assoc table<br>
--JOIN simple for filter polygons without link with input polygons (for instance : holes for data input)<br>
drop table if exists res_assoc;<br>
create table res_assoc as<br>
select res.res_id, array_agg(l.data_id) as data_id_ori, count(distinct l.data_id) as num_sources<br>
from res join layer_union l on st_contains(l.geom, st_pointonsurface(res.geom))<br>
group by res.res_id;<br>
-- res_assoc table : index creation <br>
create index on res_assoc(res_id);<br>
analyse res_assoc;<br>
<br>
----cleaning: we remove from the res table the polygons that did not match in res_assoc:<br>
-- these are polygons representing holes in the input layers<br>
delete from res<br>
where not exists (<br>
select null<br>
from res_assoc ra<br>
where ra.res_id = res.res_id);</blockquote>
<br>
-- -- Final table with the new polygons and the source polygon information, as a join:<br>
-- Much faster to create a table than to update the res table (after adding the desired columns).<br>
drop table if exists res_final;<br>
create table res_final as<br>
select ra.res_id, data_id_ori, num_sources, geom::geometry(polygon, 2154) as geom<br>
</div>
<div>from res_assoc ra join res r on ra.res_id = r.res_id;</div>
<div><br>
</div>
<div>Thanks so much</div>
</div>
</div>
<br>
</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">Le mar. 8 juil. 2025 à 02:54, <<a href="mailto:snorris@hillcrestgeo.ca">snorris@hillcrestgeo.ca</a>> a écrit :<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div>
<div>Here is working example of Martin's suggestion, for a job that sounds fairly similar:
<div><a href="https://github.com/bcgov/harvest-restrictions/blob/main/sql/overlay.sql" data-auth="NotApplicable">https://github.com/bcgov/harvest-restrictions/blob/main/sql/overlay.sql</a></div>
<div><br>
</div>
<div>
<div><br>
<blockquote type="cite">
<div>On Jul 7, 2025, at 4:45 PM, Martin Davis <<a href="mailto:mtnclimb@gmail.com">mtnclimb@gmail.com</a>> wrote:</div>
<br>
<div>
<div dir="ltr">I'd characterize your use case as "Overlay of overlapping polygonal datasets". The basic state-of-the art for solving this using PostGIS is still the solution Paul outlined in
<a href="https://blog.cleverelephant.ca/2019/07/postgis-overlays.html" data-auth="NotApplicable">
https://blog.cleverelephant.ca/2019/07/postgis-overlays.html</a> (or see <a href="https://dr-jts.github.io/postgis-patterns/overlay/overlay.html#count-overlap-depth-in-set-of-polygons" data-auth="NotApplicable">https://dr-jts.github.io/postgis-patterns/overlay/overlay.html#count-overlap-depth-in-set-of-polygons</a>
for more ideas).
<div><br>
</div>
<div>Basically, you node and polygonize to make a flat coverage, and then join back to the parent layers to determine attribution (including counts).<br>
<div><br>
</div>
<div>Doing this in a single query might be slow for very large datasets like yours, though. You might be able to partition your large dataset and run smaller queries, possibly in parallel. Also, it might be better to overlay the small layers first, and then
overlay that with the big layer. And if you don't care about overlaps in the big layer (or if there are none), that makes it much easier, since you can process each big-layer polygon independently (and ideally in parallel). </div>
</div>
</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Mon, Jul 7, 2025 at 1:16 PM celati Laurent <<a href="mailto:laurent.celati@gmail.com">laurent.celati@gmail.com</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div>Dear all, </div>
<div>I'm working with QGIS and PostGIS. As input, I have 25 polygonal layers covering a large area (multicities area). One of these data is a very large dataset (1 million objects). The other 24 are much smaller (a maximum of a hundred objects).</div>
For information, I should point out that some of these polygonal datasets are in "multi-part features" mode and others in "single-part features" mode. I imagine this may ultimately have a slight impact on the method/result. These 25 polygonal .shp files have
highly variable, non-homogeneous/non-harmonized data structures. Each layer has a "data_id" field that allows to define/link/reference, for each feature, its membership in the layer. For example, all values in the "data_id" field for the first layer have
a value of '1'. For the second layer, the field values are '2', etc.<br>
<br>
My goal would be to be able to apply/adapt the existing QGIS geoprocessing tool called "Multiple Union":<br>
<a href="https://docs.qgis.org/3.40/en/docs/user_manual/processing_algs/qgis/vectoroverlay.html#union-multiple" data-auth="NotApplicable">https://docs.qgis.org/3.40/en/docs/user_manual/processing_algs/qgis/vectoroverlay.html#union-multiple</a><br>
<br>
Below a screenshot from the QGIS documentation :<br>
<br>
<div><span id="x_m_5716296859835898179m_808034701577126799cid:ii_mcswuu7s0"><image.png></span></div>
<div><br>
</div>
My goal would be to have an output file:<br>
<br>
<ul>
<li> Which would be the result of the union/overlay of the 25 input data. To use the terms of the QGIS documentation, the processing should check for overlaps between features within the 25 layers and create separate features for the overlapping and non-overlapping
parts. This "multiple union" geoprocessing seems interesting for my goal where there is no overlap (a, NULL; b, NULL; c, NULL).</li></ul>
<ul>
<li>For areas where there is an overlap, the QGIS union geoprocessing creates as many identical overlapping features as there are features participating in this overlap. This doesn't bother me. But since, ultimately, I'd like a field in the result/output file
to allow, for each feature, to retrieve the list of input layers that participate/contribute to this result feature (in order to retrieve the origin/source of the data). I was wondering/thinking it might be better if only one feature was created per overlapping
area?</li></ul>
<ul>
<li> I'd like a field in the result file to allow, for each feature, to retrieve the list of input layers that participate/contribute to this result feature. In order to retrieve the origin/source of the data.</li></ul>
<ul>
<li>Ideally, a field that allows you to retrieve the number (COUNT) of layers that contribute to this feature (at least 1 layer, at most 25 layers).</li></ul>
<ul>
<li>Regarding the non-geometric attributes/fields, I would like to be able to specify the selection/define the list of fields I ultimately want to keep. I don't want to keep all of the fields, but rather just some of the fields for each of the 25 input layers.</li></ul>
<br>
I imagine it's recommended to do this processing in PostGIS rather than QGIS? I can, if necessary, import my 25 SHP files into a PostGIS database. I also imagine it's important to keep in mind that the "multi-part features" / "single-part pieces/features" mode
of the input layers can affect the result. If I'm using a PostGIS query, I was thinking it might be helpful to force all features to be in single-part mode (using the PostGIS 'st_dump' function?).<br>
<br>
<div>In advance, Thanks so much for your help, guidance.</div>
<br>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</body>
</html>