[postgis-users] [postgis-devel] Large GeoTIFF ingestion by PostGIS

Antonio Rodriges antonio.rrz at gmail.com
Sat May 19 01:48:59 PDT 2018


Regina,

Thank you very much for your time and thorough description of the
PostGIS functionality! This helps a lot!
Sorry for a bit late response.

I now understand the details related to spatial alignment, but what is skew?

It would be great also to clarify the behaviour of ST_Union for the
case of different tile sizes.
I am repeating the query and the illustration for completeness.

SELECT  A.rid,  ST_MapAlgegra( A.rast, 1,
ST_Resample(ST_Union(B.rast), A.rast), 1, '[rast1.val] - [rast2.val]')
 AS new_rast
FROM A INNER JOIN B ON ST_Intersects(A.rast, B.rast)
GROUP BY A.rid;

Legend:
+ -- + and | are borders of a tile of A (the picture below depicts 1 tile)
+ == + and || are borders of a tile of B (the picture below depicts 4 tiles)

+=====+=====+
||           ||          ||
||     +---||----+    ||
||     |     ||     |     ||
+=====+=====+
||     |     ||     |     ||
||     +---||----+    ||
||           ||          ||
+=====+=====+

Now, to compute a single resulting tile, PostGIS needs to UNION 4 tiles of B.
What happens next?
A) does PostGIS leave this big tile in the resulting table as is?
If this is the case, we would have many large overlapping tiles and
much increased, redundant storage.
B) another way is to remove nodatavalues after ST_MapAlgebra and we
will get a tile with a spatial extent not exceeding tile A.
C) or we could re-tile B *before* UNION so that each tile from B has a
respective tile from A (equal bounding boxes)
D) another option is to re-tile the resulting tiles afer ST_MapAlgegra
eliminating overlapping areas and getting smaller tiles

Which case of the above is the default PostGIS behaviour?
Can RT_Retile help for cases (C) or (D)? I did not find an example in
the documentation: https://postgis.net/docs/RT_Retile.html

> However if you apply an operation such as  ST_Clip or ST_Reclass, your out-db gets copied into the db.
Why this is the case? Is this done to simplify the development?
Actually we could store out-db a clipped or reclassified raster,
couldn't we?

Thank you!
Antonio

2018-05-08 21:39 GMT+03:00 Regina Obe <lr at pcorp.us>:
> Antonio,
>
> Let's clarify things a bit.  In context of PostGIS raster, spatially aligned means grid alignment which means
>
> 2 tiles have the same pixel size,skew,  srid, offset so essentially they are drawn on the same chess board as determined by this function and does not mean they represent the same tile of space.
>
> http://postgis.net/docs/manual-dev/RT_ST_SameAlignment.html
>
>
> ST_Union as I recall, will fail if the two tiles cannot be drawn on the same chess board (not aligned in raster spatial speak).  So that is why you need to align them first using something like ST_Resample or ST_Transform https://postgis.net/docs/manual-dev/RT_ST_Transform.html  (feeding ST_Transform a reference raster).  All use GDAL WARP algorithms behind the scenes.
>
> 1) That said if the alignments are not the same ST_Union will throw an error.  Let me know if it doesn't as that sounds suspicious since ST_Union is a pixel by pixel operation so can't work with mismatched pixels.
>
> If the tiles do not completely coincide e.g.  B is to the right of A or partially to the right of A, then you get a bigger raster tile as the output of ST_Union.
> The default operation of ST_Union when not specified is to use the 'LAST' operation, which means the last raster tile with a pixel value in a given location is the pixel value that is used for that pixel slot.
>
> Note you can override this with FIRST, MIN, MAX, COUNT, SUM, MEAN, RANGE as noted here
> https://postgis.net/docs/manual-dev/RT_ST_Union.html
>
> Example say tile A has a pixel in positon pX1Y1 with value 1, 2 in pX2Y2
> Tile B has a pixel value in position pX1Y1 with value 2,  and  no value in pX2Y2, and a value of 8 in pX3Y3
>
> -- your resulting Tile C  will have an extent that at least covers (pX1Y1, pX2Y2, pX3Y3) and in those slots would have  pX1Y1 = 2, pX2Y2 = 2, PX3Y3 = 8
>
> Which operation is used in most cases, only comes into play, if two pixel values in your set share the same space as in the case of pixel pX1Y1 in the example above. (the second tile B value was used)
> In most cases as in Unioning a set of tiles that belong to the same coverage, each pixel is only represented in one tile, so  Union(pix A , pix B) -> will use the pixel that has a value
> And the resulting is a bigger tile (the union of space of both tiles in pixels that have  values)
>
> Which tile is considered LAST/FIRST is something people generally don't bother controlling because they are often dealing with raster coverages where there are no overlapping pixels anyway.
>
> If you did need to control it though as in case where you do have overlapping pixels (like in case where you have observation data and want the last observation for an area to rule), you can do so by applying an order operation
> For example.
>
> SELECT ST_Union(A.rast ORDER BY A.date_observation DESC)
> FROM A;
>
> 2) When you did a load of your raster using raster2pgsql if you used the -I switch
> https://postgis.net/docs/manual-dev/using_raster_dataman.html#RT_Raster_Loader
>
> It should have created a spatial index.  If you need to do after the fact, you would use the ST_ConvexHull function.
>
> Raster::geometry really is ST_ConvexHull(rast) and what all the raster spatial operators use.
>
> CREATE INDEX idx_your_rast_table_rast_gist ON your_rast_table USING gist( ST_ConvexHull(rast) );
>
>
> 3)
> Does bbox operation ~= make sense for the above case (tiles are not spatially aligned and there is no equal bbox in B for a bbox in A)?
>
> No it doesn't ~= only works if your tiles have the same bounding box.
> So you should use ST_Intersects or &&.  I forget the differences between the ST_Intersects and && . ST_Intersects I think you can pass in a ignore nodata and so if a tile is not completely filled, it would return different from && but a bit slower, ST_Intersects also since it works against the convex hull it can deal with skewed rasters much more efficiently whereas && would only work against bounding box of the skewed raster.
>
>
> 4) A storage question: does PostGIS guarantees to persist tiles on disk when I issue SELECT ... INTO (save the result to a new table)? Are there any related read/write caches for raster data?
>
> No special read/write caches for raster - it uses temp buffers, shared buffers etc just like any other data type.
>
> It will persist the raster data if it is an in-db raster.  If it's an out-db raster, you are just copying the meta data.
>
> However if you apply an operation such as  ST_Clip or ST_Reclass, your out-db gets copied into the db.
>
> I'm going to try to update the docs to reflect which functions have that behavior and which ones do not.
>
> e.g. if you do ST_Tile, your out-db is still out-db, it's just that each tile ends up referencing a smaller part of the file.
>
>
>
> Hope that helps,
> Regina
>
>
>
>
>
> -----Original Message-----
> From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Antonio Rodriges
> Sent: Tuesday, May 08, 2018 12:38 PM
> To: PostGIS Users Discussion <postgis-users at lists.osgeo.org>
> Cc: PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
> Subject: Re: [postgis-users] [postgis-devel] Large GeoTIFF ingestion by PostGIS
>
> Thank you, I think I got the main pattern of making a binary operation However, some things need a bit more in-depth understanding
>
> Please, consider the following example
> Legend:
> + -- + and | are borders of a tile of A (the picture below depicts 1
> + tile) == + and || are borders of a tile of B (the picture below
> + depicts 4 tiles)
>
> +=====+=====+
> ||           ||          ||
> ||     +---||----+    ||
> ||     |     ||     |     ||
> +=====+=====+
> ||     |     ||     |     ||
> ||     +---||----+    ||
> ||           ||          ||
> +=====+=====+
>
> What happens if tile sizes of A and B do not coincide and when they are spatially not aligned:
> Does ST_Union(B.rast) become a large raster consisting of 4 tiles?
> (borders are === and ||)
>
> Should I also pre-create some indexes to accelerate ST_Intersects(A.rast, B.rast)? Which indexes may help?
>
> Does bbox operation ~= make sense for the above case (tiles are not spatially aligned and there is no equal bbox in B for a bbox in A)?
>
> A storage question: does PostGIS guarantees to persist tiles on disk when I issue SELECT ... INTO (save the result to a new table)? Are there any related read/write caches for raster data?
>
> 2018-05-07 7:09 GMT+03:00 Regina Obe <lr at pcorp.us>:
>>
>>> Regina,
>>
>>> Thank you for answering and sharing the experience.
>>
>>> Storing a large raster in the database as a single raster record is pretty much out of the question.  You'd quickly exceed the max rowxcol pixel allowed.  I forget exactly what that limit > is.  In addition  the spatial index on the rasters would be useless so any join operation you do or filtering by spatial location will be painfully slow.
>>
>>> I could guess that "rowxcol" probably should be read as "row x
>>> (multiply) col", i.e. the number of pixels per one raster object
>>
>> Correct sorry forgot that row has two meanings (a db row, vs. rowxcol when talking about rasters).
>>
>>>
>>> What kind of operations are you planning to do with the rasters?  That will determine the tile size you should use.
>>
>>> The main workload will be reclassification, map algebra, resampling and similar operations.
>>> However, if unary operations like interpolation are easy to express (just run the operation on each table row), others are not.
>>> For example, I would like to subtract one raster of another: A - B.
>>> We import big rasters as tiles that are stored in a table with plenty of rows.
>>> If A and B cover different spatial areas and have different
>>> resolutions should I
>>> 1) find the common bounding box for A and B, say bbox
>>> 2) clip A and B using the bbox (A and B can be already in-db, so this
>>> should be slower than out-db; this is actually a step of a more
>>> complex query)
>>> 3) resample A or B to the resolution of B or A (some common
>>> resolution)
>>> 4) finally subtract A - B
>>> which tiles should I subtract from each other?
>>> how can I be sure that tiles with the same rid from A and B cover exactly the same spatial extent?
>>> is there any better way to accomplish such a simple operation like the difference?
>>> will spatial indexing or any other facility help somehow in this situation?
>>
>>> I suppose most difficulties arise from a raster to be dissolved into separate tiles within a single table leading to complex SQL queries.
>>> Am I right or there are simpler ways to work with large rasters?
>>
>> A lot of questions here.  I would suspect you can skip the ST_Clip step, though you might want to verify with timings.
>>
>> So basic query would look something like below not tested
>>
>> https://postgis.net/docs/manual-dev/RT_ST_MapAlgebra_expr.html
>> https://postgis.net/docs/manual-dev/RT_ST_Union.html
>> https://postgis.net/docs/manual-dev/RT_ST_Resample.html
>>
>>
>> SELECT  A.rid,  ST_MapAlgegra( A.rast, 1,
>> ST_Resample(ST_Union(B.rast), A.rast), 1, '[rast1.val] - [rast2.val]')
>> AS new_rast FROM A INNER JOIN B ON ST_Intersects(A.rast, B.rast) GROUP
>> BY A.rid;
>>
>>
>> Ideally you'd try to tile you raster such that each A.rast doesn't have too many  B's to union and you'd want you re a's and B's to be probably like 128x128 so memory footprint is low.
>>
>> If you get your tiles exactly the same, you could dispense with the
>> ST_Union and also use the much more efficient bbox equality operation
>> ~=
>>
>> Reducing the query to
>>
>> SELECT  A.rid,  ST_MapAlgegra( A.rast, 1, ST_Resample(B.rast, A.rast),
>> 1, '[rast1.val] - [rast2.val]')  AS new_rast FROM A INNER JOIN B ON
>> A.rast ~= B.rast;
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users


More information about the postgis-users mailing list