Composing raster tiles?

Sat Nov 9 03:39:16 PST 2024

On Sat, 9 Nov 2024 at 11:52, Regina Obe <lr at pcorp.us> wrote:

> The main overhead I see with postgis raster are the following
>
>
>
>    1. Limitation of raster sizes of about 1GB as I recall
>    2. Detoasting of data which is almost always needed
>    https://www.postgresql.org/docs/current/storage-toast.html
>    3. Mem copying of data, probably more so than using say GDAL directly
>    4. Probably easier to do parallel operations on command line (though
>    admittedly I haven’t tried)
>
>
In a recent experience where I have to join vector data with lot of raster
data (stored on S3, with resolutions from 0.25 to 0.0027°, some of them
about 200GB), I found the most efficient way was to keep rasters as files
(COG, ZSTD-compressed) and to perform the raster query using
gdallocationinfo (that can process millions of coords in seconds (last run:
COPY 1 333 259  3.983s total).

The process consist of computing the pixels under each vector geometry (and
only under geoms, with st_squareGrid and st_snapToGrid for polygons, and
just st_segmentize and st_snapToGrid for lines) using raster metadata
(resolution, origin), and to feed these pixel centroids to
gdallocationinfo, storing the result back to postgis.
It is very easy to parallelize the process with bash and gnu parallel and I
was able to process ~40M polygon buildings against ~270 TIFF files at ~30m
resolution containing flood data in about 8 hours on a macbook pro
connecting to S3 for raster data. (complete process intersects polygons
with each raster pixel and performs global statistics)

An example of command line calling gdallocationinfo from vector table could
be, using a result table like this:

drop table if exists gdal_res;
create unlogged table gdal_res (
    val float,
    fid int,
    cgrid_id int primary key
);

Then in bash

time psql -X --no-align --quiet --tuples-only --field-separator " "
--pset footer=off -d mydb -c "
    select st_x(st_centroid(geom)), st_y(st_centroid(geom)), fid, id
    from cgrid" | gdallocationinfo -field_sep " " -valonly -geoloc \
    S3:/bucket/dem09.tif | psql -X -d mydb -c "COPY gdal_res from
STDIN NULL '' delimiter ' '"

In this way, no intermediate data is written to disk.
Pixels seen as polygon or bboxes can then be processed against vector data.
gdallocationinfo in echo mode (-E option) allows you to pass extra
information aside from x-y coordinates, like feature identifiers.
These ids can be used in Postgis to JOIN against vector features very
easily/efficiently.
Compared to python approach reading data with Rasterio for instance,
performance was waaayyy better.

Nicolas

Admittedly my experience with using GDAL and python directly is very low
> compared to doing everything in the database so I really don’t know the
> benefits of doing it all in GDAL.
>
>
>
> I still prefer to use the database hammer over having to learn yet another
> tool.  My sense is also that the amount of SQL/plpgsql code you have to
> write would be much less than trying to pull your stuff out of the DB and
> stuff it in python/GDAL to do the same thing.  This is all to say if the
> performance is good enough for you, I say go for that hammer even if you
> think you are just dealing with a nail.
>
>
>
>
>
>
>
> *From:* Thiemo Kellner <thiemo at gelassene-pferde.biz>
> *Sent:* Friday, November 8, 2024 12:16 PM
> *To:* PostGIS Users <postgis-users at lists.osgeo.org>
> *Subject:* Re: Composing raster tiles?
>
>
>
> Thanks for sharing your experience. Is that saying that PostGIS has
> considerable overhead? Sorry, if that is naiv. I might be suffering from
> the hammer nail deficiency. My background is very database heavy, so too
> much might look like a database to me.
>
> 08.11.2024 16:15:32 Vera Green <vera.green.ca at gmail.com>:
>
> We use command line GDAL for all our Easter processes, if your data is
> large I recommend you look into that option.
>
>
>
> On Fri, Nov 8, 2024, 6:12 AM David Haynes <haynesd2 at gmail.com> wrote:
>
> I'll try and help you with A & B
>
>
>
> a) Is it more efficient to convert the raster to vector data and calculate
> on the those than to calculate directly on the raster?
>
> >> I don't think it would be faster to convert to vector because that
> would be dumping the raster into polygons. I think for
> specifically calculating slope, you are better off staying in raster.
>
> b) To my understanding, if I calculate the slope on a raster tile,
> the slope,… of the borders will have accuracy problems. My I idea, was
> to "stitch" a tile with its direct neighbours, calculate on the composed
> tile, and either save a cropped calculated composed tile to its  original
> dimension or save the calculated composed tile as is,  probably the latter.
>
>
>
> >> Overall correct, some small adjustments. For any raster operation, the
> tiles are operated on independently, so you need to "stitch" them together.
> I've provided a couple of ways in pseduo code
>
>
>
> Way 1
>
> 1) Use ST_Union and make a big tile,
>
> 2) Do the spatial operation,
>
> 3) Break it up using ST_Tile()
> The downside is you might run out of memory doing this. Also consider
> ST_MemUnion()
>
>
>
> SELECT ST_Tile(ST_Slope(ST_Union(r1.ras)), 350,350)
>
> FROM raster_table
>
>
>
> Way 2)
>
> A second option is to basically create an aggregate, which is likely
> faster.
>
> 1) Union the tiles based on ST_Touch  - make mega_tiles
>
> 2) Do the spatial operation on the mega_tiles
>
> 3) Clip the mega_tiles by the old tiles bounding box
>
>
>
> WITH rtest as
> (
> SELECT r1.ras, ST_Union (r1.ras) as megatile
> FROM raster_table r1
> LEFT JOIN raster_table r2
> ON ST_Touches (r1.geom, r2.geom)
> GROUP BY r1.ras
> )
> SELECT ST_CLIP(ST_SLOPE(megatile), ST_Envelop(r1.rast) ) as ras
> FROM rtest
>
>
>
> Maybe someone wants to make an aggregate for the raster functions?
>
>
>
> On Thu, Nov 7, 2024 at 3:31 PM <thiemo at gelassene-pferde.biz> wrote:
>
> Hi
>
> In my project
>
> https://sourceforge.net/p/treintaytres/code/HEAD/tree/trunk/code_files/data_storage/
> I have the
> table
>
> TABLE_SCHEMA    TABLE_NAME      DATA_TYPE       TYPE_NAME
>  COLUMN_NAME
> treintaytres    topo_files␟t    1111            uuid            id
> treintaytres    topo_files␟t    93              timestamptz     entry_pit
> treintaytres    topo_files␟t    1111            uuid            source_id
> treintaytres    topo_files␟t    12              text            file_name
> treintaytres    topo_files␟t    1111            raster          tile
> treintaytres    topo_files␟t    93              timestamptz
>  file_creation_pit
> treintaytres    topo_files␟t    12              text            file_hash
>
> TILE contains topographical height raster data from OpenTopography.
> They are from different regions, let's say, some tiles cover
> Switzerland, some cover New Zealand. I want to create slope and other
> data from the height data and I have some questions I hope you can
> answer or point me to answers.
>
> a) Is it more efficient to convert the raster to vector data and
> calculate on the those than to calculate directly on the raster?
>
> b) To my understanding, if I calculate the slope on a raster tile, the
> slope,… of the borders will have accuracy problems. My I idea, was to
> "stitch" a tile with its direct neighbours, calculate on the composed
> tile, and either save a cropped calculated composed tile to its
> original dimension or save the calculated composed tile as is,
> probably the latter.
> Can I compose as follows?
> with RASTER_NEIGHBORS as (         select R1.TILE   as CURRENT_TILE
>                                           ,R2.TILE   as NEIGHBOR_TILE
>                                           ,R1.ID     as CURRENT_ID
>                                       from TOPO_FILES␟T R1
>                            left outer join TOPO_FILES␟T R2
>                                         on ST_Touches(R1.TILE
>                                                      ,R2.TILE)
>                                         or ST_Intersects(R1.TILE
>                                                         ,R2.TILE)
>                                      where TRUE
>                                        --and R1.ID =
> '6b8ca53a-bb5f-4c2b-a9c9-94b6a706e9b0'
>                                        and TRUE)
>     ,NION as (select CURRENT_TILE as TILE
>                     ,CURRENT_ID
>                 from RASTER_NEIGHBORS
>               union
>               select NEIGHBOR_TILE as TILE
>                     ,CURRENT_ID
>                 from RASTER_NEIGHBORS)
>    select ST_Union(TILE) as COMPOSED_TILE
>          ,CURRENT_ID
>      from NION
> group by CURRENT_ID;
>
> c) Finally, I want to select all the areas where slope, TRI,… conform
> certain criteria and have a minium surface size. Do I do it this
> better on vector data and do I need to do this on data composed of all
> the contiguous areas?
>
> I would be grateful for any nudge into the right direction. Maybe URLs
> with samples.
>
> Kind regards
>
> Thiemo
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20241109/0a8517d2/attachment.htm>