[postgis-users] Incorrect Results for ST_ValueCount with large polygons

Michael Treglia mtreglia at gmail.com
Wed Sep 5 11:17:46 PDT 2018


Hi All,

As one more follow-up on this thread... it sounds like the reason for the
memory error I received is likely the large size of unioned rasters. For
st_valuecount, it sounds like the st_union operation is not actually
necessary, but without it, the st_clip operation seems to falter when a
polygon overlaps multiple raster tiles (which is why I previously was
seeing incorrect pixel counts for focal polygons without the st_union).

Some of this discovery on my part is documented in a StackExchange thread
here -
https://gis.stackexchange.com/questions/294482/error-quantifying-land-cover-classes-by-polygon-in-postgis
(and thanks to Pierre Racine for the helpful comments that led me to figure
out what was going on).

And I believe the issue with st_clip is already documented here:
https://trac.osgeo.org/postgis/ticket/3457   (If I'm missing something, or
if there's a way around this in PostGIS as-is, suggestions are welcome).

Cheers,
Mike


On Thu, Jul 19, 2018 at 5:04 PM Michael Treglia <mtreglia at gmail.com> wrote:

> Hi All,
>
> As partial resolution to this, below are a couple of working examples of
> SQL, thanks to an example from Regina Obe's book.
>
> These work on smaller subsets of the data I've used for testing, though
> unfortunately in running on the entire datasets, after running for hours, I
> receive an error: Invalid memory allocation request size 1109458944.
> Thoughts/suggestions welcome, but otherwise I hope these examples are
> useful to tothers.
>
> Mike
>
>
> --SQL CODE
> with
>     cte as (
>         select bbl,
>             st_valuecount(
>                 st_clip(st_union(p.rast), geom_2263),1, true,
> ARRAY[0,1,2,3,4,5,6,7]
>                 ) as pv
>             FROM
>                 base_rasters.landcover6in as p
>                 inner join
>                 results_scratch.polys_test
>             on st_intersects(p.rast, geom_2263)
>             group by bbl, geom_2263
>         )
> SELECT bbl,
>     (pv).value, sum((pv).count)
> from cte
> group by bbl, (pv).value
> ORDER by bbl, (pv).value;
> --End SQL
>
> And to get the results with land cover by polygon in a 'wide' format
> (e.g., amounts of different land cover in columns):
>
> --SQL CODE
> with
>     cte as (
>         select bbl,geom_2263,
>             st_valuecount(
>                 st_clip(st_union(p.rast), geom_2263),1, true,
> ARRAY[0,1,2,3,4,5,6,7]
>                 ) as pv
>             FROM
>                 base_rasters.landcover6in as p
>                 inner join
>                 results_scratch.polys_test
>             on st_intersects(p.rast, geom_2263)
>             group by bbl, geom_2263
>         )
> SELECT bbl, geom_2263,
>     (sum((pv).count) filter (where (pv).value =
> 0)::numeric/sum((pv).count))::numeric(5,4) as prop_na,
>     (sum((pv).count) filter (where (pv).value =
> 1)::numeric/sum((pv).count))::numeric(5,4) as prop_canopy,
>     (sum((pv).count) filter (where (pv).value =
> 2)::numeric/sum((pv).count))::numeric(5,4)  as prop_grass,
>     (sum((pv).count) filter (where (pv).value =
> 3)::numeric/sum((pv).count))::numeric(5,4)  as prop_bare,
>     (sum((pv).count) filter (where (pv).value =
> 4)::numeric/sum((pv).count))::numeric(5,4)  as prop_water,
>     (sum((pv).count) filter (where (pv).value =
> 5)::numeric/sum((pv).count))::numeric(5,4)  as prop_bldgs,
>     (sum((pv).count) filter (where (pv).value =
> 6)::numeric/sum((pv).count))::numeric(5,4)  as prop_roads,
>     (sum((pv).count) filter (where (pv).value =
> 7)::numeric/sum((pv).count))::numeric(5,4)  as prop_paved,
>     ((sum((pv).count) filter (where (pv).value = 0)) *
> 0.25)::numeric(12,2) as area_na,
>     ((sum((pv).count) filter (where (pv).value = 1)) *
> 0.25)::numeric(12,2) as area_canopy,
>     ((sum((pv).count) filter (where (pv).value = 2)) *
> 0.25)::numeric(12,2) as area_grass,
>     ((sum((pv).count) filter (where (pv).value = 3)) *
> 0.25)::numeric(12,2) as area_bare,
>     ((sum((pv).count) filter (where (pv).value = 4)) *
> 0.25)::numeric(12,2) as area_water,
>     ((sum((pv).count) filter (where (pv).value = 5)) *
> 0.25)::numeric(12,2) as area_bldgs,
>     ((sum((pv).count) filter (where (pv).value = 6)) *
> 0.25)::numeric(12,2) as area_roads,
>     ((sum((pv).count) filter (where (pv).value = 7)) *
> 0.25)::numeric(12,2) as area_paved
> from cte
> group by bbl, geom_2263
> ORDER by bbl;
> --End SQL
>
> On Wed, Jul 11, 2018 at 4:59 PM, Michael Treglia <mtreglia at gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> Sorry for multiple emails on this, but wanted to ping this list again to
>> see if there is anything obvious I'm missing or should check regarding this
>> issue. I've reiterated context/the problem/sample sql code below, and if
>> there's other info that would be helpful, please let me know.
>>
>> Thanks! Sincerely,
>> Mike
>>
>> # To reiterate:
>> ## The goal/issue:
>> I'm trying to calculate # of pixels from a categorical raster, by value,
>> within distinct polygons of a vector layer. I have code, below, that seems
>> to work accurately for small polygons, though I'm getting inaccuracies in
>> large polygons - the addition of some pixels, including some from a value
>> that aren't actually contained within a focal polygon. (I'm comparing to
>> available calculations for the same overlay, along with results from QGIS
>> Zonal Histogram tool). I'd like to get this working for this specific
>> overlay, but then be able to apply the SQL code to comparable situations.
>>
>> ##Data:
>> The polygons are parcels, ranging quite widely in size;
>> The raster dataset is a 6" land cover dataset loaded as an in-db raster
>> via raster2pgsql;
>> raster2pgsql -s 2263 -d -C -t 128x128 -M -I -l 4,16
>> "landcover_2010_nyc_05ft.img" base_rasters.landcover6in | psql -h localhost
>> -U postgres -d dbname -v ON_ERROR_STOP=1
>>
>> ## System Info:
>> I'm running  PostGIS 2.4.4 with PostgreSQL v 10.3 on Windows 7 x64
>>
>> ## Two sets of sample code that give the same result:
>> 1)
>> SELECT  bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
>>  FROM base_rasters.landcover6in
>>   INNER join  results_scratch.polys_test
>>   ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
>>     ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1,
>> TRUE, ARRAY[1, 2, 3, 4, 5, 6, 7]) AS pvc
>>   GROUP BY (pvc).VALUE, bbl
>>  ORDER BY bbl, (pvc).VALUE;
>>
>> 2)
>>  SELECT
>>    bbl, (value_count).value, SUM((value_count).count) AS count
>>   FROM
>>     (
>>     select
>>     bbl,
>>       rid,
>>       ST_ValueCount(
>>         ST_Union(ST_Clip(rast, geom_2263, TRUE)), 1, TRUE, ARRAY[1, 2, 3,
>> 4, 5, 6, 7]
>>       ) value_count
>>     FROM
>>       (SELECT bbl, geom_2263 FROM results_scratch.polys_test) v,
>>       (SELECT rid, rast FROM base_rasters.landcover6in) r
>>     WHERE ST_Intersects(rast, geom_2263)
>>     GROUP BY bbl, rid, geom_2263
>>     ) i
>>   GROUP BY bbl, value
>>   ORDER BY bbl, value
>>
>>
>>
>> On Fri, Jun 22, 2018 at 5:12 PM, Michael Treglia <mtreglia at gmail.com>
>> wrote:
>>
>>> Hi again all,
>>>
>>> Sorry for the multiple messages on this - realizing the query I posted
>>> was somewhat off from what I intended - below are examples that are getting
>>> at what I'm going for but exhibiting the issue I described (sometimes
>>> including pixels in counts aren't actually be within polygons [i.e., the
>>> categorical value isn't actually within the polygon based on visual
>>> inspection & this same analysis with other tools]).
>>>
>>> Simpler/quicker:
>>>
>>>   SELECT  bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
>>>  FROM base_rasters.landcover6in
>>>   INNER join  results_scratch.polys_test
>>>   ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
>>>     ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1,
>>> TRUE, ARRAY[1, 2, 3, 4, 5, 6, 7]) AS pvc
>>>   GROUP BY (pvc).VALUE, bbl
>>>  ORDER BY bbl, (pvc).VALUE;
>>>
>>>
>>>
>>> A longer, less efficient version that gives the same results (but unions
>>> the clipped raster tiles):
>>>
>>>
>>>  SELECT
>>>    bbl, (value_count).value, SUM((value_count).count) AS count
>>>   FROM
>>>     (
>>>     select
>>>     bbl,
>>>       rid,
>>>       ST_ValueCount(
>>>         ST_Union(ST_Clip(rast, geom_2263, TRUE)), 1, TRUE, ARRAY[1, 2,
>>> 3, 4, 5, 6, 7]
>>>       ) value_count
>>>     FROM
>>>       (SELECT bbl, geom_2263 FROM results_scratch.polys_test) v,
>>>       (SELECT rid, rast FROM base_rasters.landcover6in) r
>>>     WHERE ST_Intersects(rast, geom_2263)
>>>     GROUP BY bbl, rid, geom_2263
>>>     ) i
>>>   GROUP BY bbl, value
>>>   ORDER BY bbl, value
>>>
>>> On Thu, Jun 21, 2018 at 4:23 PM, Michael Treglia <mtreglia at gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm basically trying to compute # of pixels of land cover type by
>>>> polygon, for lots of polygons.
>>>>
>>>> I'm working with a query a query like this:
>>>> SELECT  bbl, (pvc).VALUE, SUM((pvc).COUNT) AS tot_pix
>>>>  FROM base_rasters.landcover6in
>>>>   INNER join  results_scratch.polys_test
>>>>   ON ST_Intersects(landcover6in.rast, polys_test.geom_2263),
>>>>     ST_ValueCount(ST_Clip(landcover6in.rast,polys_test.geom_2263),1) AS
>>>> pvc
>>>>   GROUP BY (pvc).VALUE, bbl
>>>>  ORDER BY (pvc).VALUE;
>>>>
>>>> For smaller polygons, spot-checking, it looks like I'm generally
>>>> getting the right results. However, for larger polygons, not always. For
>>>> example, though a polygon might not have roads, I'm getting non-0 counts
>>>> for road pixels within a polygon (polygons have unique identifiers as
>>>> 'bbl').  The total area of pixels counted ends up being larger than the
>>>> polygon (larger than would be attributable to differences between polygons
>>>> edges & pixels)
>>>>
>>>> I'm thinking it's an issue when there are multiple raster tiles
>>>> included... but not entirely sure. Any suggestions on whether there's
>>>> something off in this query that might be causing an issue like I describe?
>>>>
>>>> Thanks in advance for any suggestions!
>>>> Mike
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20180905/8d7cb5a8/attachment.html>


More information about the postgis-users mailing list