[postgis-users] Extracting variable information from netcdf, imported as raster to a table

Manaswini Ganjam manu.ganjam at gmail.com
Fri Nov 17 08:06:16 PST 2023


Hi, Tobias, and Regina,

Can I implement this for multiband netcdfs, because I am pretty confident
about bands and data order as my data units for time are the number of days
since the first day of the year. For example days since 1950-01-01, band
numbers go like 1,2,3,4....365. I can find different logic and extract the
timestamp, because we have good information about this aspect from rasters
in the database. Because the scale factor and offset value is same for all
the bands?

Also, I am curious to know if there is any other function that can address
this in postgis?

Thank you,
Manaswini

On Fri, 17 Nov 2023 at 03:14, Tobias Schmid <schmid.tobi at gmail.com> wrote:

> Sorry, the Screenshot was missing.
>
> [image: image.png]
>
> Am Fr., 17. Nov. 2023 um 09:02 Uhr schrieb Tobias Schmid <
> schmid.tobi at gmail.com>:
>
>> Hello Manswini,
>> Hello Regina,
>> I have read your e-mails with great interest. Many of the described
>> problems seem familiar to me. I have been working with temporally and
>> spatially resolved (weather) raster data for over 10 years.
>>
>> Here are a few comments and recommendations:
>>
>> 1. After some missteps I stopped using multiband-raster. Why?
>>
>>    - What happens if individual days (or hours) are missing?
>>    - Can I always trust the order of the bands?
>>
>> 2. I work with 1-band grids. My database structure looks like this:
>> [image: image.png]
>>
>> 3. The import is using raster2pgsql. Here is snippet:
>>
>>   def raster2pgsql(outGtiffPath, dbSchema, dbTable, dbHost=conf.dbHost,
>>> dbPort=conf.dbPort, dbName=conf.dbName, dbUser=conf.dbUser):
>>>     try:
>>>         tableNotEmptySQL = f"SELECT EXISTS (SELECT * FROM
>>> {dbSchema}.{dbTable} LIMIT 1);"
>>>         if executeSQL(tableNotEmptySQL, fetchone=True):
>>>             cmd = f"raster2pgsql -a -F -n filename -s 4326
>>> {outGtiffPath} {dbSchema}.{dbTable} | psql -h {dbHost} -p {dbPort} -d
>>> {dbName} -U {dbUser}"
>>>         else:
>>>             cmd = f"raster2pgsql -a -C -F -n filename -s 4326
>>> {outGtiffPath} {dbSchema}.{dbTable} | psql -h {dbHost} -p {dbPort} -d
>>> {dbName} -U {dbUser}"
>>
>>
>>>         subprocess.call(cmd, shell=True)
>>
>>
>>>         return True
>>
>>
>>>     except Exception as e:
>>>         print(f"Importing '{outGtiffPath}' to postgres db failed.")
>>>         print(f"{e}\n")
>>>         with open(conf.failedRaster2pgsql, 'a') as file:
>>>             file.write(f"{datetime.now().strftime('%Y-%m-%d
>>> %X')}\t{outGtiffPath}\t{e}\n")
>>>         return False
>>
>>
>> 4. I read the parameters "scale" and "offset" via Python and write them
>> to the database.
>>
>>             hourlyRaster = gdal.Open(str(outGtiffPath))
>>>             hourlyRasterScale = hourlyRaster.GetRasterBand(1).GetScale()
>>>             hourlyRasterOffset =
>>> hourlyRaster.GetRasterBand(1).GetOffset()
>>>             hourlyRaster = None
>>
>>
>>>             insertTimestampSQL = f"SET TIMEZONE = 'UTC'; UPDATE
>>> {dataset}.{dbTable} SET timestamp_w_tz = TO_TIMESTAMP('{timeStamp}',
>>> 'YYYY-MM-DD HH24:MI'), rast_scale_factor = {hourlyRasterScale}, rast_offset
>>> = {hourlyRasterOffset} WHERE filename = '{outGtiff}';"
>>
>>
>> 5. The query:
>>
>> select timestamp_w_tz
>>> , st_value(rast, st_setsrid(st_makepoint(15, 50), 4326)) * rast_scale_factor
>>> + rast_offset as value_temperature
>>> from era5.temperature_2m
>>> where timestamp_w_tz >= '1982-09-18 00:00'
>>> and timestamp_w_tz < '1982-09-19 00:00'
>>>
>>
>> 6. The result (temperature in Kelvin. Of course.)
>>
>>> +---------------------------------+------------------+
>>> |timestamp_w_tz                   |value_temperature |
>>> +---------------------------------+------------------+
>>> |1982-09-18 00:00:00.000000 +00:00|287.70160804659935|
>>> |1982-09-18 01:00:00.000000 +00:00|287.47021164940014|
>>> |1982-09-18 02:00:00.000000 +00:00|286.0640335433434 |
>>> |1982-09-18 03:00:00.000000 +00:00|285.56675378590086|
>>> |1982-09-18 04:00:00.000000 +00:00|285.41434365889944|
>>> |1982-09-18 05:00:00.000000 +00:00|285.3075453217306 |
>>> |1982-09-18 06:00:00.000000 +00:00|286.3343668343021 |
>>> |1982-09-18 07:00:00.000000 +00:00|287.2543900097047 |
>>> |1982-09-18 08:00:00.000000 +00:00|290.88219602540977|
>>> |1982-09-18 09:00:00.000000 +00:00|292.3985099166719 |
>>> |1982-09-18 10:00:00.000000 +00:00|297.1844104010519 |
>>> |1982-09-18 11:00:00.000000 +00:00|297.37019500841853|
>>> |1982-09-18 12:00:00.000000 +00:00|297.58267920007745|
>>> |1982-09-18 13:00:00.000000 +00:00|297.85078752567847|
>>> |1982-09-18 14:00:00.000000 +00:00|298.1689575718274 |
>>> |1982-09-18 15:00:00.000000 +00:00|297.20666005462874|
>>> |1982-09-18 16:00:00.000000 +00:00|296.9697012440353 |
>>> |1982-09-18 17:00:00.000000 +00:00|294.35870439679223|
>>> |1982-09-18 18:00:00.000000 +00:00|292.8045660944494 |
>>> |1982-09-18 19:00:00.000000 +00:00|292.6032067295789 |
>>> |1982-09-18 20:00:00.000000 +00:00|289.9332483003572 |
>>> |1982-09-18 21:00:00.000000 +00:00|289.59839101402565|
>>> |1982-09-18 22:00:00.000000 +00:00|287.79616907430096|
>>> |1982-09-18 23:00:00.000000 +00:00|287.87181789646223|
>>> +---------------------------------+------------------+
>>
>>
>> I know that my example does not match your problem 100%. It is also
>> time-consuming for you to establish the 1-band grid approach. It requires
>> you to change your database structure. Sorry about that.
>>
>> Hopefully the comments will help you. You are also welcome to contact me
>> personally.
>>
>> Best regards
>> Tobias
>>
>> Am Fr., 17. Nov. 2023 um 05:32 Uhr schrieb Manaswini Ganjam via
>> postgis-users <postgis-users at lists.osgeo.org>:
>>
>>> Regina,
>>> Thank you for the suggestions, the code looks clean and the computation
>>> was much better, but I still have the same issue, the values are
>>> unrealistic, this happened to me before as well (when I tried to run
>>> st_value for one raster band, with some x and y). Find the enclosed
>>> screenshot.
>>> [image: image.png]
>>> I have test run the code for one prcp file and the prcp values are
>>> negative, and the no data value for this dataset is -32768 (most of the
>>> values are -32767 or -32750) and I have not preprocessed this data. Do we
>>> need to manually unpack this data? If that is the case, how do I access the
>>> scale_factor and add_offset attributes packed with netcdf?
>>>
>>> This information is from the climate data readme file:The data has been
>>> packed into short integers (2 bytes instead of 4 byte reals) to save space.
>>> You must unpack that data to get the correct floating point representation
>>> of the data. Each netCDF variable that has been packed has an add_offset
>>> and scale_factor attribute associated with it. Some software automatically
>>> unpacks the data when it is read.
>>> The formula to unpack the data is:
>>> unpacked value = add_offset + ( (packed value) * scale_factor )
>>>
>>> Thank you,
>>> Manaswini
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, 16 Nov 2023 at 22:02, Regina Obe <lr at pcorp.us> wrote:
>>>
>>>> One more optimization.  I think you can get rid of the FOREACH
>>>> band_number too and reduce all that to this query
>>>>
>>>>
>>>>
>>>> So all this goes
>>>>
>>>>
>>>>
>>>>         band_numbers := ARRAY(SELECT generate_series(1, 366));
>>>>
>>>>         -- Loop through all bands within the file
>>>>         FOREACH band_number IN ARRAY band_numbers LOOP
>>>>
>>>> :
>>>>
>>>> LOOP
>>>>
>>>>
>>>>
>>>> Use ST_NumBands instead of relying on your rasters all having 366 bands
>>>> https://postgis.net/docs/RT_ST_NumBands.html
>>>>
>>>> So replace all the above with this:
>>>>
>>>>
>>>>
>>>>             INSERT INTO extracted_data_bbox (year, year_day, lat, lon,
>>>> prcp, tmax, tmin, observation_time)
>>>>             SELECT year_from_filename, n.band_number, ST_Y(sc.geom),
>>>> ST_X(sc.geom), CASE WHEN variable_name = 'prcp' THEN sc.val ELSE NULL END,
>>>>                     CASE WHEN variable_name = 'tmax' THEN sc.val ELSE
>>>> NULL END,
>>>>                     CASE WHEN variable_name = 'tmin' THEN sc.val ELSE
>>>> NULL END, observation_time
>>>>
>>>>            FROM  generate_series(1,
>>>> ST_NumBands(raster_record.clipped_raster) ) AS n(band_number),
>>>> ST_PixelAsCentroids(raster_record.clipped_raster, n.band_number, true) AS
>>>> sc;
>>>>
>>>>
>>>>
>>>> That will solve the issue of you going over the band count of your
>>>> raster, which might be an issue you are running into
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Regina Obe <lr at pcorp.us>
>>>> *Sent:* Thursday, November 16, 2023 9:46 PM
>>>> *To:* 'Manaswini Ganjam' <manu.ganjam at gmail.com>
>>>> *Cc:* 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org>
>>>> *Subject:* RE: [postgis-users] Extracting variable information from
>>>> netcdf, imported as raster to a table
>>>>
>>>>
>>>>
>>>> I don’t know much about netCDF but I would assume GDAL would handle the
>>>> packing unraveling behind the scenes.
>>>>
>>>>
>>>>
>>>> Some things I notice wrong
>>>>
>>>>
>>>>
>>>> https://postgis.net/docs/RT_ST_PixelAsCentroids.html  is a set
>>>> returning function, so you can’t just stuff it in a record variable.  It
>>>> has to go in a table.
>>>>
>>>> So I’d get rid of this line
>>>>
>>>>
>>>>
>>>>           SELECT * FROM
>>>> ST_PixelAsCentroids(raster_record.clipped_raster, band_number, true) INTO
>>>> pixel_data;
>>>>
>>>> I’d also scrap this, because ultimately you want to work on all
>>>> centroids not the first one that falls out of the tree
>>>>
>>>> EXECUTE format('SELECT ST_Value($1, $2, $3, true)',
>>>> raster_record.clipped_raster, band_number, centroid_point) INTO
>>>> variable_value USING raster_record.clipped_raster, band_number,
>>>> centroid_point;
>>>>
>>>>
>>>>
>>>> I’d change this
>>>>
>>>>
>>>>             -- Insert the extracted data into the extracted_data_bbox
>>>> table
>>>>             INSERT INTO extracted_data_bbox (year, year_day, lat, lon,
>>>> prcp, tmax, tmin, observation_time)
>>>>             VALUES (year_from_filename, band_number,
>>>> ST_Y(centroid_point), ST_X(centroid_point), CASE WHEN variable_name =
>>>> 'prcp' THEN variable_value ELSE NULL END,
>>>>                     CASE WHEN variable_name = 'tmax' THEN
>>>> variable_value ELSE NULL END,
>>>>                     CASE WHEN variable_name = 'tmin' THEN
>>>> variable_value ELSE NULL END, observation_time);
>>>>
>>>>
>>>>
>>>> To this:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *            INSERT INTO extracted_data_bbox (year, year_day, lat, lon,
>>>> prcp, tmax, tmin, observation_time)            SELECT year_from_filename,
>>>> band_number, ST_Y(sc.geom), ST_X(sc.geom), CASE WHEN variable_name = 'prcp'
>>>> THEN sc.val ELSE NULL END,                    CASE WHEN variable_name =
>>>> 'tmax' THEN sc.val ELSE NULL END,                    CASE WHEN
>>>> variable_name = 'tmin' THEN sc.val ELSE NULL END, observation_time*
>>>>
>>>> *           FROM  ST_PixelAsCentroids(raster_record.clipped_raster,
>>>> band_number, true) AS sc;*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Manaswini Ganjam <manu.ganjam at gmail.com>
>>>> *Sent:* Thursday, November 16, 2023 1:19 PM
>>>> *To:* Regina Obe <lr at pcorp.us>
>>>> *Cc:* PostGIS Users Discussion <postgis-users at lists.osgeo.org>
>>>> *Subject:* Re: [postgis-users] Extracting variable information from
>>>> netcdf, imported as raster to a table
>>>>
>>>>
>>>>
>>>> Thank you, Regina,
>>>>
>>>>
>>>>
>>>> I apologize, I have shared the older version of my code, I have tried
>>>> ST_pixelascentroid and ST_value and the issue was I messed up the
>>>> parameters of centroids and that caused the errors now I corrected that as
>>>> well. I think the code below is a good representation of my approach, I
>>>> would like to iterate for every raster in a table, for every band and for
>>>> every lat and lon and extract st value, and insert it to a table,
>>>>
>>>>
>>>>
>>>> psql:/home/manaswini/MEGA/bbgc_uw_rpackage/rproject/raster2pgsql/extract_table_queryyyy.sql:66:
>>>> NOTICE:  Invalid band index (must use 1-based). Returning empty set for all
>>>> the coordinates and bands. What could be the issue.
>>>>
>>>>
>>>>
>>>> Is this because my data is packed? the readme file of the data has this
>>>> information: Details about the data packing:
>>>>
>>>> The data has been packed into short integers (2 bytes instead of 4 byte reals) to save space. You must unpack that data to get the correct floating point representation of the data. Each netCDF variable that has been packed has an add_offset and scale_factor attribute associated with it. Some software automatically unpacks the data when it is read.
>>>>
>>>>
>>>>
>>>> The formula to unpack the data is:
>>>>
>>>>
>>>>
>>>> unpacked value = add_offset + ( (packed value) * scale_factor )
>>>>
>>>>
>>>>
>>>> For more information see here:
>>>>
>>>> https://www.unidata.ucar.edu/software/netcdf/workshops/2010/bestpractices/Packing.html
>>>>
>>>>
>>>>
>>>> There's also another attribute called "missing_value". In this case all the -32768 values you see are missing. Only the grid points outside the downscaling domain is given the missing data value.
>>>>
>>>>
>>>>
>>>> The packing saves a lot of space, that is why the data is packed.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -- Create the precipitation_temperature_data table
>>>> DROP TABLE IF EXISTS extracted_data_bbox;
>>>> CREATE TABLE extracted_data_bbox (
>>>>     id SERIAL PRIMARY KEY,
>>>>     year integer,
>>>>     year_day integer,
>>>>     lat double precision,
>>>>     lon double precision,
>>>>     prcp double precision,
>>>>     tmax double precision,
>>>>     tmin double precision,
>>>>     observation_time timestamp
>>>> );
>>>>
>>>> -- Loop through all records in gfdl_03_bbox
>>>> DO $$
>>>> DECLARE
>>>>     raster_record RECORD;
>>>>     year_from_filename integer;
>>>>     observation_time timestamp;
>>>>     centroid_point geometry;
>>>>     variable_name text;
>>>>     variable_value double precision;
>>>>     band_number integer;
>>>>     band_numbers integer[];
>>>>     pixel_data RECORD;
>>>> BEGIN
>>>>     FOR raster_record IN (SELECT * FROM gfdl_03_bbox) LOOP
>>>>         SELECT regexp_replace(raster_record.filename,
>>>> '.*_(\d{4})[^0-9]+', '\1')::integer INTO year_from_filename;
>>>>
>>>>         -- Determine the variable name based on the filename
>>>>         IF raster_record.filename ~ 'prcp' THEN
>>>>             variable_name := 'prcp';
>>>>         ELSIF raster_record.filename ~ 'tmax' THEN
>>>>             variable_name := 'tmax';
>>>>         ELSIF raster_record.filename ~ 'tmin' THEN
>>>>             variable_name := 'tmin';
>>>>         ELSE
>>>>             RAISE EXCEPTION 'Unknown variable in filename: %',
>>>> raster_record.filename;
>>>>         END IF;
>>>>
>>>>         band_numbers := ARRAY(SELECT generate_series(1, 366));
>>>>
>>>>         -- Loop through all bands within the file
>>>>         FOREACH band_number IN ARRAY band_numbers LOOP
>>>>             -- Print the band number to the PostgreSQL log
>>>>             RAISE NOTICE 'Band Number: %', band_number;
>>>>
>>>>             observation_time := MAKE_DATE(year_from_filename, 1, 1) +
>>>> (band_number - 1) * interval '1 day';
>>>>
>>>>             SELECT * FROM
>>>> ST_PixelAsCentroids(raster_record.clipped_raster, band_number, true) INTO
>>>> pixel_data;
>>>>
>>>>             -- Extract the centroid point from pixel_data
>>>>             centroid_point := pixel_data.geom;
>>>>
>>>>             -- Extract the variable value based on the variable name
>>>>             EXECUTE format('SELECT ST_Value($1, $2, $3, true)',
>>>> raster_record.clipped_raster, band_number, centroid_point) INTO
>>>> variable_value USING raster_record.clipped_raster, band_number,
>>>> centroid_point;
>>>>
>>>>             -- Insert the extracted data into the extracted_data_bbox
>>>> table
>>>>             INSERT INTO extracted_data_bbox (year, year_day, lat, lon,
>>>> prcp, tmax, tmin, observation_time)
>>>>             VALUES (year_from_filename, band_number,
>>>> ST_Y(centroid_point), ST_X(centroid_point), CASE WHEN variable_name =
>>>> 'prcp' THEN variable_value ELSE NULL END,
>>>>                     CASE WHEN variable_name = 'tmax' THEN
>>>> variable_value ELSE NULL END,
>>>>                     CASE WHEN variable_name = 'tmin' THEN
>>>> variable_value ELSE NULL END, observation_time);
>>>>         END LOOP;
>>>>     END LOOP;
>>>> END;
>>>> $$;
>>>>
>>>> I have tried the following query to check values for a raster band :
>>>> the output is again null
>>>> psql:/home/manaswini/MEGA/bbgc_uw_rpackage/rproject/raster2pgsql/test_query_fff.sql:22:
>>>> NOTICE:  Band: 20, X: 1, Y: 135, Value: <NULL>
>>>>
>>>> :
>>>>
>>>> DO $$
>>>> DECLARE
>>>>     band_number integer := 20; -- Replace with the desired band number
>>>>     x_coord integer;
>>>>     y_coord integer;
>>>>     pixel_value double precision;
>>>>     srid integer := 4326; -- Replace with the correct SRID for your data
>>>> BEGIN
>>>>     -- Loop through all x and y coordinates in the raster band
>>>>     FOR x_coord IN (SELECT generate_series(1,
>>>> ST_Width(rast.clipped_raster)) FROM gfdl_03_bbox AS rast WHERE
>>>> rast.filename = 'prcp_03_2034.nc') LOOP
>>>>         FOR y_coord IN (SELECT generate_series(1,
>>>> ST_Height(rast.clipped_raster)) FROM gfdl_03_bbox AS rast WHERE
>>>> rast.filename = 'prcp_03_2034.nc') LOOP
>>>>             -- Get the pixel value at the current x and y coordinates
>>>>             SELECT ST_Value(ST_SetSRID(rast.clipped_raster, srid),
>>>> band_number, ST_SetSRID(ST_Point(x_coord, y_coord), srid)) INTO pixel_value
>>>>             FROM gfdl_03_bbox AS rast
>>>>             WHERE rast.filename = 'prcp_03_2034.nc';
>>>>
>>>>             -- Print or use the pixel value as needed
>>>>             RAISE NOTICE 'Band: %, X: %, Y: %, Value: %', band_number,
>>>> x_coord, y_coord, pixel_value;
>>>>         END LOOP;
>>>>     END LOOP;
>>>> END;
>>>>
>>>> Thank you,
>>>>
>>>> Manaswini
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 15 Nov 2023 at 20:42, Regina Obe <lr at pcorp.us> wrote:
>>>>
>>>> I didn’t realize that netCDF also has a vector component.
>>>>
>>>>
>>>>
>>>> https://gdal.org/drivers/vector/netcdf.html#vector-netcdf
>>>>
>>>>
>>>>
>>>> To read the vector component, you’d use the ogr_fdw extension to read
>>>> that rather than postgis_raster extension.
>>>>
>>>>
>>>>
>>>> Details here -  https://github.com/pramsey/pgsql-ogr-fdw
>>>>
>>>>
>>>>
>>>> So perhaps that is what your python is doing reading the vector
>>>> component.  I don’t know if netcdf mixes those in one data set or not since
>>>> I have no experience using netcdf.
>>>>
>>>>
>>>>
>>>> I recall you said you are using the OSGeo Live 16 distribution.  I just
>>>> checked the OSGeoLive 16 does include ogr_fdw
>>>>
>>>>
>>>>
>>>> So do
>>>>
>>>>
>>>>
>>>> CREATE EXTENSION ogr_fdw;
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The list of supported formats you can see with this query:
>>>>
>>>>
>>>>
>>>> SELECT name FROM unnest(ogr_fdw_drivers()) AS f(name) ORDER BY name;
>>>>
>>>>
>>>>
>>>> Which for osgeolive 16, I see netCDF listed
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Regina Obe <lr at pcorp.us>
>>>> *Sent:* Wednesday, November 15, 2023 6:19 PM
>>>> *To:* 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org>
>>>> *Cc:* 'Manaswini Ganjam' <manu.ganjam at gmail.com>
>>>> *Subject:* RE: [postgis-users] Extracting variable information from
>>>> netcdf, imported as raster to a table
>>>>
>>>>
>>>>
>>>> Just confirming some stuff, since I’m not completely following:
>>>>
>>>>
>>>>
>>>> Raster_record.rast column is of type raster correct?  IF so ST_X and
>>>> ST_Y won’t work since those are for geometry types.
>>>>
>>>>
>>>>
>>>> Also ST_Value(raster_record.rast, band_number), won’t work either since
>>>> that expects as input a geometry or x,y on the raster you want the value
>>>> you.
>>>>
>>>> I would think you would have gotten an error with that, which makes me
>>>> feel I’m missing something critical.
>>>>
>>>>
>>>>
>>>> If you want to extract all the pixels in a raster, you’d do something
>>>> like https://postgis.net/docs/RT_ST_PixelAsPoints.html
>>>>
>>>>
>>>>
>>>> SELECT pp.x, pp.y, pp.val, ST_X(pp.geom) AS lon, ST_Y(pp.geom) AS lat
>>>>
>>>> FROM raster_record,
>>>>
>>>> ST_PixelAsPoints(raster_record.rast, 1) AS pp
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* postgis-users <postgis-users-bounces at lists.osgeo.org> *On
>>>> Behalf Of *Manaswini Ganjam via postgis-users
>>>> *Sent:* Wednesday, November 15, 2023 2:01 PM
>>>> *To:* postgis-users at lists.osgeo.org
>>>> *Cc:* Manaswini Ganjam <manu.ganjam at gmail.com>
>>>> *Subject:* [postgis-users] Extracting variable information from
>>>> netcdf, imported as raster to a table
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I have been trying to download s3 cloud stored gridded climate data and
>>>> generate tables with variables, lat, lon and timestamp (year, yearday). To
>>>> achieve this I used raster2pgsql and imported multiple netcdf files into a
>>>> database table.
>>>>
>>>>
>>>>
>>>> Question: How to achieve the extraction of variables using postgis? I
>>>> tried using ST_value, ST_pixelaspoints but I was getting errors, mainly due
>>>> to the format in which netcdfs are stored in the database (the error says
>>>> can't load some characters like 00E30100082...), I even tried changing
>>>> the datatype to float but still did not work. I mean it is probably not
>>>> simple like selecting a variable from the netcdf. I have enclosed my sql
>>>> query below:
>>>>
>>>>
>>>>
>>>>   -- Iterate through all raster files in the table
>>>>     FOR raster_record IN (SELECT * FROM gfdl_03_prcp) LOOP
>>>>         -- Determine the year from the raster file name, assuming the
>>>> format is 'prcp_03_1950.nc'
>>>>         SELECT
>>>>             regexp_replace(raster_record.filename, '.*_(\d{4})\.nc',
>>>> '\1')::integer
>>>>         INTO
>>>>             year;
>>>>
>>>>         -- Calculate the start date of the year
>>>>         year_start := (year || '-01-01')::date;
>>>>
>>>>         -- Determine if the year is a leap year
>>>>         is_leap_year := EXTRACT(ISODOW FROM (year_start + interval '1
>>>> year')) = 7;
>>>>
>>>>         -- Set the number of bands for the year (365 for non-leap
>>>> years, 366 for leap years)
>>>>         FOR band_number IN 1..(CASE WHEN is_leap_year THEN 366 ELSE 365
>>>> END) LOOP
>>>>             -- Calculate the observation_time using the year and band
>>>> number
>>>>             observation_time := year_start + (band_number - 1) *
>>>> interval '1 day';
>>>>
>>>>             -- Extract X (lon) and Y (lat) coordinates from the raster
>>>>             SELECT
>>>>                 ST_X(raster_record.rast) AS lon,
>>>>                 ST_Y(raster_record.rast) AS lat
>>>>             INTO
>>>>                 lon,
>>>>                 lat;
>>>>
>>>>             -- Insert the lat, lon, prcp, and observation_time into the
>>>> extracted_values table
>>>>             INSERT INTO extracted_values (lat, lon, prcp,
>>>> observation_time)
>>>>             VALUES
>>>>                 (lat, lon, ST_Value(raster_record.rast, band_number),
>>>> observation_time);
>>>>
>>>>             -- Increment the counter
>>>>             counter := counter + 1;
>>>>
>>>>             -- Commit the transaction periodically in batches
>>>>             IF counter % batch_size = 0 THEN
>>>>                 COMMIT;
>>>>             END IF;
>>>>         END LOOP;
>>>>     END LOOP;
>>>>
>>>>
>>>>
>>>>
>>>> The metadata for the two files is as follows:
>>>>
>>>>
>>>>
>>>> File from database:
>>>>
>>>> {'NC_GLOBAL#Conventions': 'CF-1.5',
>>>>
>>>>  'NC_GLOBAL#GDAL': 'GDAL 3.6.4, released 2023/04/17',
>>>>
>>>>  'NC_GLOBAL#history': 'Wed Nov 15 13:32:13 2023: GDAL CreateCopy( not_clipped_prcp.nc, ... )'}
>>>>
>>>> File before loading into the database:
>>>>
>>>> {'lat#units': 'degrees_north',
>>>>
>>>>  'lon#units': 'degrees_east',
>>>>
>>>>  'NC_GLOBAL#title': 'Daily statistically downscaled CMIP5 data for the United States and southern Canada east of the Rocky Mountains, version 1.0, realization 1, 0.1x0.1 degree spatial resolution.',
>>>>
>>>>  'NETCDF_DIM_EXTRA': '{time}',
>>>>
>>>>  'NETCDF_DIM_time_DEF': '{366,4}',
>>>>
>>>>  'NETCDF_DIM_time_VALUES': '{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14....362,363,364,365}',
>>>>
>>>>  'prcp#add_offset': '819.17499',
>>>>
>>>>  'prcp#long_name': 'daily precipitation accumulation',
>>>>
>>>>  'prcp#missing_value': '-32768',
>>>>
>>>>  'prcp#scale_factor': '0.025',
>>>>
>>>>  'prcp#units': 'mm',
>>>>
>>>>  'prcp#_FillValue': '-32768',
>>>>
>>>>  'time#units': 'days since 1952-1-1 0:0:0.0'}
>>>>
>>>>
>>>>
>>>> In case this information is useful: Previously I used python to extract variable information and generate a csv or table using this variable information, and the code is enclosed below. In the code I extracted variable values using lon = dataset.variables['lon'][:] and iterated for loops to write them all in csv.
>>>>
>>>> Python code:
>>>>
>>>> import netCDF4 as nc
>>>>
>>>> # Step 1: Read the NetCDF file
>>>> filename = "/home/manaswini/prcp_03_1950.nc"
>>>> dataset = nc.Dataset(filename)
>>>> dataset.set_auto_mask(False)
>>>> dataset.set_auto_scale(True)
>>>>
>>>>
>>>>
>>>> lon = dataset.variables['lon'][:]
>>>> lat = dataset.variables['lat'][:]
>>>> time = dataset.variables['time'][:]
>>>> prcp = dataset.variables['prcp'][:]
>>>>
>>>>
>>>>
>>>> import numpy as np
>>>> import csv
>>>>
>>>> # csv_buffer
>>>> csv_buffer = open('output.csv', 'w', newline='')
>>>> csv_writer = csv.writer(csv_buffer)
>>>>
>>>> # Iterate through grid points and write to CSV buffer
>>>> for i in enumerate(lon):
>>>>     for j in enumerate(lat):
>>>>         for k in enumerate(time):
>>>>          csv_writer.writerow([lat[j], lon[i], prcp[i][j][k], year[k],
>>>> yearday[k]])
>>>>
>>>>
>>>> # Close the CSV buffer
>>>> csv_buffer.close()
>>>>
>>>>
>>>>
>>>> Thank you,
>>>>
>>>> Manaswini Ganjam
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Manaswini Ganjam
>>>>
>>>
>>>
>>> --
>>> Manaswini Ganjam
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>
>>

-- 
Manaswini Ganjam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231117/a8af5690/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 232418 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231117/a8af5690/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 7895 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231117/a8af5690/attachment-0001.png>


More information about the postgis-users mailing list