[postgis-users] Incorrect Results for ST_ValueCount with large polygons
Michael Treglia
mtreglia at gmail.com
Wed Sep 5 11:17:46 PDT 2018
Hi All,
As one more follow-up on this thread... it sounds like the reason for the
memory error I received is likely the large size of unioned rasters. For
st_valuecount, it sounds like the st_union operation is not actually
necessary, but without it, the st_clip operation seems to falter when a
polygon overlaps multiple raster tiles (which is why I previously was
seeing incorrect pixel counts for focal polygons without the st_union).
Some of this discovery on my part is documented in a StackExchange thread
here -
https://gis.stackexchange.com/questions/294482/error-quantifying-land-cover-classes-by-polygon-in-postgis
(and thanks to Pierre Racine for the helpful comments that led me to figure
out what was going on).
And I believe the issue with st_clip is already documented here:
https://trac.osgeo.org/postgis/ticket/3457 (If I'm missing something, or
if there's a way around this in PostGIS as-is, suggestions are welcome).
Cheers,
Mike
On Thu, Jul 19, 2018 at 5:04 PM Michael Treglia <mtreglia at gmail.com> wrote:
> Hi All,
>
> As partial resolution to this, below are a couple of working examples of
> SQL, thanks to an example from Regina Obe's book.
>
> These work on smaller subsets of the data I've used for testing, though
> unfortunately in running on the entire datasets, after running for hours, I
> receive an error: Invalid memory allocation request size 1109458944.
> Thoughts/suggestions welcome, but otherwise I hope these examples are
> useful to tothers.
>
> Mike
>
>
> --SQL CODE
> with
> cte as (
> select bbl,
> st_valuecount(
> st_clip(st_union(p.rast), geom_2263),1, true,
> ARRAY[0,1,2,3,4,5,6,7]
> ) as pv
> FROM
> base_rasters.landcover6in as p
> inner join
> results_scratch.polys_test
> on st_intersects(p.rast, geom_2263)
> group by bbl, geom_2263
> )
> SELECT bbl,
> (pv).value, sum((pv).count)
> from cte
> group by bbl, (pv).value
> ORDER by bbl, (pv).value;
> --End SQL
>
> And to get the results with land cover by polygon in a 'wide' format
> (e.g., amounts of different land cover in columns):
>
> --SQL CODE
> with
> cte as (
> select bbl,geom_2263,
> st_valuecount(
> st_clip(st_union(p.rast), geom_2263),1, true,
> ARRAY[0,1,2,3,4,5,6,7]
> ) as pv
> FROM
> base_rasters.landcover6in as p
> inner join
> results_scratch.polys_test
> on st_intersects(p.rast, geom_2263)
> group by bbl, geom_2263
> )
> SELECT bbl, geom_2263,
> (sum((pv).count) filter (where (pv).value =
> 0)::numeric/sum((pv).count))::numeric(5,4) as prop_na,
> (sum((pv).count) filter (where (pv).value =
> 1)::numeric/sum((pv).count))::numeric(5,4) as prop_canopy,
> (sum((pv).count) filter (where (pv).value =
> 2)::numeric/sum((pv).count))::numeric(5,4) as prop_grass,
> (sum((pv).count) filter (where (pv).value =
> 3)::numeric/sum((pv).count))::numeric(5,4) as prop_bare,
> (sum((pv).count) filter (where (pv).value =
> 4)::numeric/sum((pv).count))::numeric(5,4) as prop_water,
> (sum((pv).count) filter (where (pv).value =
> 5)::numeric/sum((pv).count))::numeric(5,4) as prop_bldgs,
> (sum((pv).count) filter (where (pv).value =
> 6)::numeric/sum((pv).count))::numeric(5,4) as prop_roads,
> (sum((pv).count) filter (where (pv).value =
> 7)::numeric/sum((pv).count))::numeric(5,4) as prop_paved,
> ((sum((pv).count) filter (where (pv).value = 0)) *
> 0.25)::numeric(12,2) as area_na,
> ((sum((pv).count) filter (where (pv).value = 1)) *
> 0.25)::numeric(12,2) as area_canopy,
> ((sum((pv).count) filter (where (pv).value = 2)) *
> 0.25)::numeric(12,2) as area_grass,
> ((sum((pv).count) filter (where (pv).value = 3)) *
> 0.25)::numeric(12,2) as area_bare,
> ((sum((pv).count) filter (where (pv).value = 4)) *
> 0.25)::numeric(12,2) as area_water,
> ((sum((pv).count) filter (where (pv).value = 5)) *
> 0.25)::numeric(12,2) as area_bldgs,
> ((sum((pv).count) filter (where (pv).value = 6)) *
> 0.25)::numeric(12,2) as area_roads,
> ((sum((pv).count) filter (where (pv).value = 7)) *
> 0.25)::numeric(12,2) as area_paved
> from cte
> group by bbl, geom_2263
> ORDER by bbl;
> --End SQL
>
> On Wed, Jul 11, 2018 at 4:59 PM, Michael Treglia <mtreglia at gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> Sorry for multiple emails on this, but wanted to ping this list again to
>> see if there is anything obvious I'm missing or should check regarding this
>> issue. I've reiterated context/the problem/sample sql code below, and if
>> there's other info that would be helpful, please let me know.
>>
>> Thanks! Sincerely,
>> Mike
>>
>> # To reiterate:
>> ## The goal/issue:
>> I'm trying to calculate # of pixels from a categorical raster, by value,
>> within distinct polygons of a vector layer. I have code, below, that seems
>> to work accurately for small polygons, though I'm getting inaccuracies in
>> large polygons - the addition of some pixels, including some from a value
>> that aren't actually contained within a focal polygon. (I'm comparing to
>> available calculations for the same overlay, along with results from QGIS
>> Zonal Histogram tool). I'd like to get this working for this specific
>> overlay, but then be able to apply the SQL code to comparable situations.
>>
>> ##Data:
>> The polygons are parcels, ranging quite widely in size;
>> The raster dataset is a 6" land cover dataset loaded as an in-db raster
>> via raster2pgsql;
>> raster2pgsql -s 2263 -d -C -t 128x128 -M -I -l 4,16
>> "landcover_2010_nyc_05ft.img" base_rasters.landcover6in | psql -h localhost
>> -U postgres -d dbname -v ON_ERROR_STOP=1
>>
>> ## System Info:
>> I'm running PostGIS 2.4.4 with PostgreSQL v 10.3 on Windows 7 x64
>>
>> ## Two sets of sample code that give the same result:
>> 1)
>> SELECT bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
>> FROM base_rasters.landcover6in
>> INNER join results_scratch.polys_test
>> ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
>> ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1,
>> TRUE, ARRAY[1, 2, 3, 4, 5, 6, 7]) AS pvc
>> GROUP BY (pvc).VALUE, bbl
>> ORDER BY bbl, (pvc).VALUE;
>>
>> 2)
>> SELECT
>> bbl, (value_count).value, SUM((value_count).count) AS count
>> FROM
>> (
>> select
>> bbl,
>> rid,
>> ST_ValueCount(
>> ST_Union(ST_Clip(rast, geom_2263, TRUE)), 1, TRUE, ARRAY[1, 2, 3,
>> 4, 5, 6, 7]
>> ) value_count
>> FROM
>> (SELECT bbl, geom_2263 FROM results_scratch.polys_test) v,
>> (SELECT rid, rast FROM base_rasters.landcover6in) r
>> WHERE ST_Intersects(rast, geom_2263)
>> GROUP BY bbl, rid, geom_2263
>> ) i
>> GROUP BY bbl, value
>> ORDER BY bbl, value
>>
>>
>>
>> On Fri, Jun 22, 2018 at 5:12 PM, Michael Treglia <mtreglia at gmail.com>
>> wrote:
>>
>>> Hi again all,
>>>
>>> Sorry for the multiple messages on this - realizing the query I posted
>>> was somewhat off from what I intended - below are examples that are getting
>>> at what I'm going for but exhibiting the issue I described (sometimes
>>> including pixels in counts aren't actually be within polygons [i.e., the
>>> categorical value isn't actually within the polygon based on visual
>>> inspection & this same analysis with other tools]).
>>>
>>> Simpler/quicker:
>>>
>>> SELECT bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
>>> FROM base_rasters.landcover6in
>>> INNER join results_scratch.polys_test
>>> ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
>>> ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1,
>>> TRUE, ARRAY[1, 2, 3, 4, 5, 6, 7]) AS pvc
>>> GROUP BY (pvc).VALUE, bbl
>>> ORDER BY bbl, (pvc).VALUE;
>>>
>>>
>>>
>>> A longer, less efficient version that gives the same results (but unions
>>> the clipped raster tiles):
>>>
>>>
>>> SELECT
>>> bbl, (value_count).value, SUM((value_count).count) AS count
>>> FROM
>>> (
>>> select
>>> bbl,
>>> rid,
>>> ST_ValueCount(
>>> ST_Union(ST_Clip(rast, geom_2263, TRUE)), 1, TRUE, ARRAY[1, 2,
>>> 3, 4, 5, 6, 7]
>>> ) value_count
>>> FROM
>>> (SELECT bbl, geom_2263 FROM results_scratch.polys_test) v,
>>> (SELECT rid, rast FROM base_rasters.landcover6in) r
>>> WHERE ST_Intersects(rast, geom_2263)
>>> GROUP BY bbl, rid, geom_2263
>>> ) i
>>> GROUP BY bbl, value
>>> ORDER BY bbl, value
>>>
>>> On Thu, Jun 21, 2018 at 4:23 PM, Michael Treglia <mtreglia at gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm basically trying to compute # of pixels of land cover type by
>>>> polygon, for lots of polygons.
>>>>
>>>> I'm working with a query a query like this:
>>>> SELECT bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
>>>> FROM base_rasters.landcover6in
>>>> INNER join results_scratch.polys_test
>>>> ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
>>>> ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1) AS
>>>> pvc
>>>> GROUP BY (pvc).VALUE, bbl
>>>> ORDER BY (pvc).VALUE;
>>>>
>>>> For smaller polygons, spot-checking, it looks like I'm generally
>>>> getting the right results. However, for larger polygons, not always. For
>>>> example, though a polygon might not have roads, I'm getting non-0 counts
>>>> for road pixels within a polygon (polygons have unique identifiers as
>>>> 'bbl'). The total area of pixels counted ends up being larger than the
>>>> polygon (larger than would be attributable to differences between polygons
>>>> edges & pixels)
>>>>
>>>> I'm thinking it's an issue when there are multiple raster tiles
>>>> included... but not entirely sure. Any suggestions on whether there's
>>>> something off in this query that might be causing an issue like I describe?
>>>>
>>>> Thanks in advance for any suggestions!
>>>> Mike
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20180905/8d7cb5a8/attachment.html>
More information about the postgis-users
mailing list