[postgis-devel] [postgis-users] Large GeoTIFF ingestion by PostGIS

Tue May 8 09:38:24 PDT 2018

Thank you, I think I got the main pattern of making a binary operation
However, some things need a bit more in-depth understanding

Please, consider the following example
Legend:
+ -- + and | are borders of a tile of A (the picture below depicts 1 tile)
+ == + and || are borders of a tile of B (the picture below depicts 4 tiles)

+=====+=====+
||           ||          ||
||     +---||----+    ||
||     |     ||     |     ||
+=====+=====+
||     |     ||     |     ||
||     +---||----+    ||
||           ||          ||
+=====+=====+

What happens if tile sizes of A and B do not coincide and when they
are spatially not aligned:
Does ST_Union(B.rast) become a large raster consisting of 4 tiles?
(borders are === and ||)

Should I also pre-create some indexes to accelerate
ST_Intersects(A.rast, B.rast)? Which indexes may help?

Does bbox operation ~= make sense for the above case (tiles are not
spatially aligned and there is no equal bbox in B for a bbox in A)?

A storage question: does PostGIS guarantees to persist tiles on disk
when I issue SELECT ... INTO (save the result to a new table)? Are
there any related read/write caches for raster data?

2018-05-07 7:09 GMT+03:00 Regina Obe <lr at pcorp.us>:
>
>> Regina,
>
>> Thank you for answering and sharing the experience.
>
>> Storing a large raster in the database as a single raster record is pretty much out of the question.  You'd quickly exceed the max rowxcol pixel allowed.  I forget exactly what that limit > is.  In addition  the spatial index on the rasters would be useless so any join operation you do or filtering by spatial location will be painfully slow.
>
>> I could guess that "rowxcol" probably should be read as "row x
>> (multiply) col", i.e. the number of pixels per one raster object
>
> Correct sorry forgot that row has two meanings (a db row, vs. rowxcol when talking about rasters).
>
>>
>> What kind of operations are you planning to do with the rasters?  That will determine the tile size you should use.
>
>> The main workload will be reclassification, map algebra, resampling and similar operations.
>> However, if unary operations like interpolation are easy to express (just run the operation on each table row), others are not.
>> For example, I would like to subtract one raster of another: A - B.
>> We import big rasters as tiles that are stored in a table with plenty of rows.
>> If A and B cover different spatial areas and have different resolutions should I
>> 1) find the common bounding box for A and B, say bbox
>> 2) clip A and B using the bbox (A and B can be already in-db, so this should be slower than out-db; this is actually a step of a more complex query)
>> 3) resample A or B to the resolution of B or A (some common resolution)
>> 4) finally subtract A - B
>> which tiles should I subtract from each other?
>> how can I be sure that tiles with the same rid from A and B cover exactly the same spatial extent?
>> is there any better way to accomplish such a simple operation like the difference?
>> will spatial indexing or any other facility help somehow in this situation?
>
>> I suppose most difficulties arise from a raster to be dissolved into separate tiles within a single table leading to complex SQL queries.
>> Am I right or there are simpler ways to work with large rasters?
>
> A lot of questions here.  I would suspect you can skip the ST_Clip step, though you might want to verify with timings.
>
> So basic query would look something like below not tested
>
> https://postgis.net/docs/manual-dev/RT_ST_MapAlgebra_expr.html
> https://postgis.net/docs/manual-dev/RT_ST_Union.html
> https://postgis.net/docs/manual-dev/RT_ST_Resample.html
>
>
> SELECT  A.rid,  ST_MapAlgegra( A.rast, 1, ST_Resample(ST_Union(B.rast), A.rast), 1, '[rast1.val] - [rast2.val]')  AS new_rast
> FROM A INNER JOIN B ON ST_Intersects(A.rast, B.rast)
> GROUP BY A.rid;
>
>
> Ideally you'd try to tile you raster such that each A.rast doesn't have too many  B's to union and you'd want you’re a's and B's to be probably like 128x128 so memory footprint is low.
>
> If you get your tiles exactly the same, you could dispense with the ST_Union and also use the much more efficient bbox equality operation ~=
>
> Reducing the query to
>
> SELECT  A.rid,  ST_MapAlgegra( A.rast, 1, ST_Resample(B.rast, A.rast), 1, '[rast1.val] - [rast2.val]')  AS new_rast
> FROM A INNER JOIN B ON A.rast ~= B.rast;
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users