[postgis-devel] [postgis-users] Large GeoTIFF ingestion by PostGIS

Sun May 6 21:09:44 PDT 2018

> Regina,

> Thank you for answering and sharing the experience.

> Storing a large raster in the database as a single raster record is pretty much out of the question.  You'd quickly exceed the max rowxcol pixel allowed.  I forget exactly what that limit > is.  In addition  the spatial index on the rasters would be useless so any join operation you do or filtering by spatial location will be painfully slow.

> I could guess that "rowxcol" probably should be read as "row x
> (multiply) col", i.e. the number of pixels per one raster object

Correct sorry forgot that row has two meanings (a db row, vs. rowxcol when talking about rasters).

>
> What kind of operations are you planning to do with the rasters?  That will determine the tile size you should use.

> The main workload will be reclassification, map algebra, resampling and similar operations.
> However, if unary operations like interpolation are easy to express (just run the operation on each table row), others are not.
> For example, I would like to subtract one raster of another: A - B.
> We import big rasters as tiles that are stored in a table with plenty of rows.
> If A and B cover different spatial areas and have different resolutions should I
> 1) find the common bounding box for A and B, say bbox
> 2) clip A and B using the bbox (A and B can be already in-db, so this should be slower than out-db; this is actually a step of a more complex query)
> 3) resample A or B to the resolution of B or A (some common resolution)
> 4) finally subtract A - B
> which tiles should I subtract from each other?
> how can I be sure that tiles with the same rid from A and B cover exactly the same spatial extent?
> is there any better way to accomplish such a simple operation like the difference?
> will spatial indexing or any other facility help somehow in this situation?

> I suppose most difficulties arise from a raster to be dissolved into separate tiles within a single table leading to complex SQL queries.
> Am I right or there are simpler ways to work with large rasters?

A lot of questions here.  I would suspect you can skip the ST_Clip step, though you might want to verify with timings.

So basic query would look something like below not tested

https://postgis.net/docs/manual-dev/RT_ST_MapAlgebra_expr.html
https://postgis.net/docs/manual-dev/RT_ST_Union.html
https://postgis.net/docs/manual-dev/RT_ST_Resample.html

SELECT  A.rid,  ST_MapAlgegra( A.rast, 1, ST_Resample(ST_Union(B.rast), A.rast), 1, '[rast1.val] - [rast2.val]')  AS new_rast
FROM A INNER JOIN B ON ST_Intersects(A.rast, B.rast)
GROUP BY A.rid;

Ideally you'd try to tile you raster such that each A.rast doesn't have too many  B's to union and you'd want you’re a's and B's to be probably like 128x128 so memory footprint is low.

If you get your tiles exactly the same, you could dispense with the ST_Union and also use the much more efficient bbox equality operation ~=

Reducing the query to 

SELECT  A.rid,  ST_MapAlgegra( A.rast, 1, ST_Resample(B.rast, A.rast), 1, '[rast1.val] - [rast2.val]')  AS new_rast
FROM A INNER JOIN B ON A.rast ~= B.rast;