[postgis-devel] GSoC idea: in-situ access for rasters

Vladimir Kikhtenko kva911 at gmail.com
Thu Apr 11 01:23:00 PDT 2013


Hello, list!

I want to present my idea for GSoC'13 project. It is about interacting
with raster data without preloading it in the database first.
Using Foreign Data Wrappers (quite new Postgres 9 feature) we can
create foreign table for some raster file that will read data directly
from the file, when the query is issued. I suppose that this foreign
table will contain rows for each pixel in the source file and columns
for each dataset in it. Also, some columns will represent the
geolocation of data. For example table can look like:

CREATE FOREIGN TABLE foreign_raster (
    xpos int4, -- this two columns describe pixel's position in raster
    ypos int4,
    footprint geometry OPTIONS (type 'footprint'), -- geolocation of a pixel
    layer1 float8 OPTIONS (sds 'Atmospheric Optical Depth'),
    layer2 int2 OPTIONS (sds 'Atmospheric Optical Depth Model', type 'byte'),
) SERVER hvault_service
  OPTIONS (filename '/some/path/some-file.hdf');

Even more, we can create a catalog of files, so foreign table will
contain pixels from each of them. This approach can be useful when you
have large archive of raster data you want to access like it were in
database, but do not want to actually create copy of it due to disk
usage constraints. Another option is to store timestamps for files in
catalog, so we can request time-series for some point or region of
interest.

Also FDW API gives possibility to analyze the query tree before
execution, which leads to additional optimizations. We can examine
query quals and filter catalog, so we will only read files that
contribute to the query result. For example, if we store the footprint
of whole raster in catalog, and the query contains clause like
ST_Contains(footprint, ST_GeometryFromText('POINT( 43.19 64.90)')), we
can process only files that contain that point in their footprint.

I've already implemented this idea for HDF files as a part of my MSc
thesis that will defend in June. See
https://github.com/kikht/fdb/tree/master/hvault (sorry, the code is
not very well commented, yet). We are using it in our lab to access
100Tb archive of MODIS images. I think it is possible to implement
access to files through GDAL, so many file formats would be supported
in one move.

Do you interested in such project? Any comments or questions are
highly appreciated.

--
Vladimir Kikhtenko
Novosibirsk State University, Russia



More information about the postgis-devel mailing list