[postgis-users] [postgis-devel] Large GeoTIFF ingestion by PostGIS

Regina Obe lr at pcorp.us
Tue May 8 11:39:20 PDT 2018


Let's clarify things a bit.  In context of PostGIS raster, spatially aligned means grid alignment which means

2 tiles have the same pixel size,skew,  srid, offset so essentially they are drawn on the same chess board as determined by this function and does not mean they represent the same tile of space.


ST_Union as I recall, will fail if the two tiles cannot be drawn on the same chess board (not aligned in raster spatial speak).  So that is why you need to align them first using something like ST_Resample or ST_Transform https://postgis.net/docs/manual-dev/RT_ST_Transform.html  (feeding ST_Transform a reference raster).  All use GDAL WARP algorithms behind the scenes.  

1) That said if the alignments are not the same ST_Union will throw an error.  Let me know if it doesn't as that sounds suspicious since ST_Union is a pixel by pixel operation so can't work with mismatched pixels.

If the tiles do not completely coincide e.g.  B is to the right of A or partially to the right of A, then you get a bigger raster tile as the output of ST_Union.
The default operation of ST_Union when not specified is to use the 'LAST' operation, which means the last raster tile with a pixel value in a given location is the pixel value that is used for that pixel slot.  

Note you can override this with FIRST, MIN, MAX, COUNT, SUM, MEAN, RANGE as noted here

Example say tile A has a pixel in positon pX1Y1 with value 1, 2 in pX2Y2
Tile B has a pixel value in position pX1Y1 with value 2,  and  no value in pX2Y2, and a value of 8 in pX3Y3

-- your resulting Tile C  will have an extent that at least covers (pX1Y1, pX2Y2, pX3Y3) and in those slots would have  pX1Y1 = 2, pX2Y2 = 2, PX3Y3 = 8

Which operation is used in most cases, only comes into play, if two pixel values in your set share the same space as in the case of pixel pX1Y1 in the example above. (the second tile B value was used)
In most cases as in Unioning a set of tiles that belong to the same coverage, each pixel is only represented in one tile, so  Union(pix A , pix B) -> will use the pixel that has a value
And the resulting is a bigger tile (the union of space of both tiles in pixels that have  values)

Which tile is considered LAST/FIRST is something people generally don't bother controlling because they are often dealing with raster coverages where there are no overlapping pixels anyway.

If you did need to control it though as in case where you do have overlapping pixels (like in case where you have observation data and want the last observation for an area to rule), you can do so by applying an order operation
For example.

SELECT ST_Union(A.rast ORDER BY A.date_observation DESC)   

2) When you did a load of your raster using raster2pgsql if you used the -I switch 

It should have created a spatial index.  If you need to do after the fact, you would use the ST_ConvexHull function.  

Raster::geometry really is ST_ConvexHull(rast) and what all the raster spatial operators use.

CREATE INDEX idx_your_rast_table_rast_gist ON your_rast_table USING gist( ST_ConvexHull(rast) );

Does bbox operation ~= make sense for the above case (tiles are not spatially aligned and there is no equal bbox in B for a bbox in A)?

No it doesn't ~= only works if your tiles have the same bounding box.
So you should use ST_Intersects or &&.  I forget the differences between the ST_Intersects and && . ST_Intersects I think you can pass in a ignore nodata and so if a tile is not completely filled, it would return different from && but a bit slower, ST_Intersects also since it works against the convex hull it can deal with skewed rasters much more efficiently whereas && would only work against bounding box of the skewed raster.

4) A storage question: does PostGIS guarantees to persist tiles on disk when I issue SELECT ... INTO (save the result to a new table)? Are there any related read/write caches for raster data?

No special read/write caches for raster - it uses temp buffers, shared buffers etc just like any other data type.

It will persist the raster data if it is an in-db raster.  If it's an out-db raster, you are just copying the meta data.

However if you apply an operation such as  ST_Clip or ST_Reclass, your out-db gets copied into the db.

I'm going to try to update the docs to reflect which functions have that behavior and which ones do not.

e.g. if you do ST_Tile, your out-db is still out-db, it's just that each tile ends up referencing a smaller part of the file.

Hope that helps,

-----Original Message-----
From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Antonio Rodriges
Sent: Tuesday, May 08, 2018 12:38 PM
To: PostGIS Users Discussion <postgis-users at lists.osgeo.org>
Cc: PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
Subject: Re: [postgis-users] [postgis-devel] Large GeoTIFF ingestion by PostGIS

Thank you, I think I got the main pattern of making a binary operation However, some things need a bit more in-depth understanding

Please, consider the following example
+ -- + and | are borders of a tile of A (the picture below depicts 1 
+ tile) == + and || are borders of a tile of B (the picture below 
+ depicts 4 tiles)

||           ||          ||
||     +---||----+    ||
||     |     ||     |     ||
||     |     ||     |     ||
||     +---||----+    ||
||           ||          ||

What happens if tile sizes of A and B do not coincide and when they are spatially not aligned:
Does ST_Union(B.rast) become a large raster consisting of 4 tiles?
(borders are === and ||)

Should I also pre-create some indexes to accelerate ST_Intersects(A.rast, B.rast)? Which indexes may help?

Does bbox operation ~= make sense for the above case (tiles are not spatially aligned and there is no equal bbox in B for a bbox in A)?

A storage question: does PostGIS guarantees to persist tiles on disk when I issue SELECT ... INTO (save the result to a new table)? Are there any related read/write caches for raster data?

2018-05-07 7:09 GMT+03:00 Regina Obe <lr at pcorp.us>:
>> Regina,
>> Thank you for answering and sharing the experience.
>> Storing a large raster in the database as a single raster record is pretty much out of the question.  You'd quickly exceed the max rowxcol pixel allowed.  I forget exactly what that limit > is.  In addition  the spatial index on the rasters would be useless so any join operation you do or filtering by spatial location will be painfully slow.
>> I could guess that "rowxcol" probably should be read as "row x
>> (multiply) col", i.e. the number of pixels per one raster object
> Correct sorry forgot that row has two meanings (a db row, vs. rowxcol when talking about rasters).
>> What kind of operations are you planning to do with the rasters?  That will determine the tile size you should use.
>> The main workload will be reclassification, map algebra, resampling and similar operations.
>> However, if unary operations like interpolation are easy to express (just run the operation on each table row), others are not.
>> For example, I would like to subtract one raster of another: A - B.
>> We import big rasters as tiles that are stored in a table with plenty of rows.
>> If A and B cover different spatial areas and have different 
>> resolutions should I
>> 1) find the common bounding box for A and B, say bbox
>> 2) clip A and B using the bbox (A and B can be already in-db, so this 
>> should be slower than out-db; this is actually a step of a more 
>> complex query)
>> 3) resample A or B to the resolution of B or A (some common 
>> resolution)
>> 4) finally subtract A - B
>> which tiles should I subtract from each other?
>> how can I be sure that tiles with the same rid from A and B cover exactly the same spatial extent?
>> is there any better way to accomplish such a simple operation like the difference?
>> will spatial indexing or any other facility help somehow in this situation?
>> I suppose most difficulties arise from a raster to be dissolved into separate tiles within a single table leading to complex SQL queries.
>> Am I right or there are simpler ways to work with large rasters?
> A lot of questions here.  I would suspect you can skip the ST_Clip step, though you might want to verify with timings.
> So basic query would look something like below not tested
> https://postgis.net/docs/manual-dev/RT_ST_MapAlgebra_expr.html
> https://postgis.net/docs/manual-dev/RT_ST_Union.html
> https://postgis.net/docs/manual-dev/RT_ST_Resample.html
> SELECT  A.rid,  ST_MapAlgegra( A.rast, 1, 
> ST_Resample(ST_Union(B.rast), A.rast), 1, '[rast1.val] - [rast2.val]')  
> AS new_rast FROM A INNER JOIN B ON ST_Intersects(A.rast, B.rast) GROUP 
> BY A.rid;
> Ideally you'd try to tile you raster such that each A.rast doesn't have too many  B's to union and you'd want you re a's and B's to be probably like 128x128 so memory footprint is low.
> If you get your tiles exactly the same, you could dispense with the 
> ST_Union and also use the much more efficient bbox equality operation 
> ~=
> Reducing the query to
> SELECT  A.rid,  ST_MapAlgegra( A.rast, 1, ST_Resample(B.rast, A.rast), 
> 1, '[rast1.val] - [rast2.val]')  AS new_rast FROM A INNER JOIN B ON 
> A.rast ~= B.rast;
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
postgis-users mailing list
postgis-users at lists.osgeo.org

More information about the postgis-users mailing list