[QGIS-Developer] point in polygon strikes again: big performance issue

Nyall Dawson nyall.dawson at gmail.com
Sun Mar 14 15:37:36 PDT 2021


On Mon, 15 Mar 2021 at 05:54, Jorge Gustavo Rocha <jgr at di.uminho.pt> wrote:
>
> Hi,
>
> I've done a QGIS demonstration last week at the local medical school, to
> show how easy is to get data and process it with QGIS. But I failed with
> a very simple aggregate expression. QGIS is not able to compute the
> expression "immediately". It takes one regular coffee to compute the
> expression (starting from cold water in the kettle).
>
> The problem is easy to replicate. It just uses one polygon layer (225
> countries) and a point layer (+-4000 points related to COVID cases
> reported).
>
> I've used the natural earth countries shapefile and COVID values from a
> csv file.
>
> These two layers can be retrieved with:
>
> wget
> https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip
> wget
> https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-04-2021.csv
>
> For the same country, like Germany or Italy, there are values for each
> state or province in the CSV file, as illustrated in [1]. I want to
> aggregate all values of the country.
>
> I've created a virtual field in country layer. I've used this expression
> to compute the total active cases for each country:
>
> aggregate(layer:='03-04-2021', aggregate:='sum', expression:="Active",
> filter:=contains( geometry(@parent), $geometry) )
>
> It takes several minutes do evaluate and, if you have the attribute
> table opened, it will take the same amount again to fill the attribute
> table. If I try to use the aggregated field in symbology, it will try to
> evaluate it and it will be a problem.
>
> How can I improve the writing of such spatial expressions?

I would advise not using a virtual field in this circumstance. A
virtual field is evaluated every time the feature is retrieved, and
it's really not optimal for an expression which takes some time to
evaluate.

Use field calculator or the corresponding processing tools and perform
the calculation as a one-off instead.

Nyall


>
> My findings:
>
> 1) Using contains or intersect is the same. No difference in performance.
>
> 2) Using a shapefile instead of the origin CSV does not improve the
> performance
>
> 3) Adding spatial indexes (via 'Create spatial index' processing tool)
> to both shapefiles does not improve the performance
>
> Before you ask:
>
> a) The are topological errors in the Natural Earth source. Correcting
> them or just using the good polygons does not solve the problem. There
> is no difference in performance.
>
> b) For this specific COVID layer, a simple attribute join could be used.
> The point here is to understand how this spatial expression can be
> improved in terms of performance.
>
> c) This can be done in Postgresql with:
>
> select wf.sovereignt, sum(cases."Active")
> from world wf, "03-04-2021" cases
> where st_contains(wf.geom, cases.geom)
> group by wf.sovereignt;
>
> and it is computed "instantaneously". I know that too, but the point is
> to improve QGIS.
>
> Regards,
>
> Jorge Gustavo
>
> [1] https://nextcloud.geomaster.pt/index.php/s/ZNR87PHBrBJjmmC
> [2] https://nextcloud.geomaster.pt/index.php/s/bxyXTfN3J4moKHr
>
>
>
> --
> Jorge Gustavo Rocha
> Departamento de Informática
> Universidade do Minho
> 4710-057 Braga
> Gabinete 3.29 (Piso 3)
> Tel: +351 253604480
> Fax: +351 253604471
> Móvel: +351 910333888
> skype: nabocudnosor
> _______________________________________________
> QGIS-Developer mailing list
> QGIS-Developer at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer


More information about the QGIS-Developer mailing list