[QGIS-Developer] point in polygon strikes again: big performance issue

Jorge Gustavo Rocha jgr at di.uminho.pt
Sun Mar 14 12:54:32 PDT 2021


Hi,

I've done a QGIS demonstration last week at the local medical school, to
show how easy is to get data and process it with QGIS. But I failed with
a very simple aggregate expression. QGIS is not able to compute the
expression "immediately". It takes one regular coffee to compute the
expression (starting from cold water in the kettle).

The problem is easy to replicate. It just uses one polygon layer (225
countries) and a point layer (+-4000 points related to COVID cases
reported).

I've used the natural earth countries shapefile and COVID values from a
csv file.

These two layers can be retrieved with:

wget
https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip
wget
https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-04-2021.csv

For the same country, like Germany or Italy, there are values for each
state or province in the CSV file, as illustrated in [1]. I want to
aggregate all values of the country.

I've created a virtual field in country layer. I've used this expression
to compute the total active cases for each country:

aggregate(layer:='03-04-2021', aggregate:='sum', expression:="Active",
filter:=contains( geometry(@parent), $geometry) )

It takes several minutes do evaluate and, if you have the attribute
table opened, it will take the same amount again to fill the attribute
table. If I try to use the aggregated field in symbology, it will try to
evaluate it and it will be a problem.

How can I improve the writing of such spatial expressions?

My findings:

1) Using contains or intersect is the same. No difference in performance.

2) Using a shapefile instead of the origin CSV does not improve the
performance

3) Adding spatial indexes (via 'Create spatial index' processing tool)
to both shapefiles does not improve the performance

Before you ask:

a) The are topological errors in the Natural Earth source. Correcting
them or just using the good polygons does not solve the problem. There
is no difference in performance.

b) For this specific COVID layer, a simple attribute join could be used.
The point here is to understand how this spatial expression can be
improved in terms of performance.

c) This can be done in Postgresql with:

select wf.sovereignt, sum(cases."Active")
from world wf, "03-04-2021" cases
where st_contains(wf.geom, cases.geom)
group by wf.sovereignt;

and it is computed "instantaneously". I know that too, but the point is
to improve QGIS.

Regards,

Jorge Gustavo

[1] https://nextcloud.geomaster.pt/index.php/s/ZNR87PHBrBJjmmC
[2] https://nextcloud.geomaster.pt/index.php/s/bxyXTfN3J4moKHr



-- 
Jorge Gustavo Rocha
Departamento de Informática
Universidade do Minho
4710-057 Braga
Gabinete 3.29 (Piso 3)
Tel: +351 253604480
Fax: +351 253604471
Móvel: +351 910333888
skype: nabocudnosor


More information about the QGIS-Developer mailing list