[postgis-users] Extracting variable information from netcdf, imported as raster to a table

Manaswini Ganjam manu.ganjam at gmail.com
Wed Nov 15 11:00:56 PST 2023


Hi,
I have been trying to download s3 cloud stored gridded climate data and
generate tables with variables, lat, lon and timestamp (year, yearday). To
achieve this I used raster2pgsql and imported multiple netcdf files into a
database table.

Question: How to achieve the extraction of variables using postgis? I tried
using ST_value, ST_pixelaspoints but I was getting errors, mainly due to
the format in which netcdfs are stored in the database (the error says
can't load some characters like 00E30100082...), I even tried changing
the datatype to float but still did not work. I mean it is probably not
simple like selecting a variable from the netcdf. I have enclosed my sql
query below:

  -- Iterate through all raster files in the table
    FOR raster_record IN (SELECT * FROM gfdl_03_prcp) LOOP
        -- Determine the year from the raster file name, assuming the
format is 'prcp_03_1950.nc'
        SELECT
            regexp_replace(raster_record.filename, '.*_(\d{4})\.nc',
'\1')::integer
        INTO
            year;

        -- Calculate the start date of the year
        year_start := (year || '-01-01')::date;

        -- Determine if the year is a leap year
        is_leap_year := EXTRACT(ISODOW FROM (year_start + interval '1
year')) = 7;

        -- Set the number of bands for the year (365 for non-leap years,
366 for leap years)
        FOR band_number IN 1..(CASE WHEN is_leap_year THEN 366 ELSE 365
END) LOOP
            -- Calculate the observation_time using the year and band number
            observation_time := year_start + (band_number - 1) * interval
'1 day';

            -- Extract X (lon) and Y (lat) coordinates from the raster
            SELECT
                ST_X(raster_record.rast) AS lon,
                ST_Y(raster_record.rast) AS lat
            INTO
                lon,
                lat;

            -- Insert the lat, lon, prcp, and observation_time into the
extracted_values table
            INSERT INTO extracted_values (lat, lon, prcp, observation_time)
            VALUES
                (lat, lon, ST_Value(raster_record.rast, band_number),
observation_time);

            -- Increment the counter
            counter := counter + 1;

            -- Commit the transaction periodically in batches
            IF counter % batch_size = 0 THEN
                COMMIT;
            END IF;
        END LOOP;
    END LOOP;


The metadata for the two files is as follows:

File from database:

{'NC_GLOBAL#Conventions': 'CF-1.5',
 'NC_GLOBAL#GDAL': 'GDAL 3.6.4, released 2023/04/17',
 'NC_GLOBAL#history': 'Wed Nov 15 13:32:13 2023: GDAL CreateCopy(
not_clipped_prcp.nc, ... )'}

File before loading into the database:

{'lat#units': 'degrees_north',
 'lon#units': 'degrees_east',
 'NC_GLOBAL#title': 'Daily statistically downscaled CMIP5 data for the
United States and southern Canada east of the Rocky Mountains, version
1.0, realization 1, 0.1x0.1 degree spatial resolution.',
 'NETCDF_DIM_EXTRA': '{time}',
 'NETCDF_DIM_time_DEF': '{366,4}',
 'NETCDF_DIM_time_VALUES':
'{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14....362,363,364,365}',
 'prcp#add_offset': '819.17499',
 'prcp#long_name': 'daily precipitation accumulation',
 'prcp#missing_value': '-32768',
 'prcp#scale_factor': '0.025',
 'prcp#units': 'mm',
 'prcp#_FillValue': '-32768',
 'time#units': 'days since 1952-1-1 0:0:0.0'}


In case this information is useful: Previously I used python to
extract variable information and generate a csv or table using this
variable information, and the code is enclosed below. In the code I
extracted variable values using lon = dataset.variables['lon'][:] and
iterated for loops to write them all in csv.

Python code:
import netCDF4 as nc

# Step 1: Read the NetCDF file
filename = "/home/manaswini/prcp_03_1950.nc"
dataset = nc.Dataset(filename)
dataset.set_auto_mask(False)
dataset.set_auto_scale(True)

lon = dataset.variables['lon'][:]
lat = dataset.variables['lat'][:]
time = dataset.variables['time'][:]
prcp = dataset.variables['prcp'][:]

import numpy as np
import csv

# csv_buffer
csv_buffer = open('output.csv', 'w', newline='')
csv_writer = csv.writer(csv_buffer)

# Iterate through grid points and write to CSV buffer
for i in enumerate(lon):
    for j in enumerate(lat):
        for k in enumerate(time):
         csv_writer.writerow([lat[j], lon[i], prcp[i][j][k], year[k],
yearday[k]])


# Close the CSV buffer
csv_buffer.close()

Thank you,
Manaswini Ganjam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231115/667dac4e/attachment.htm>


More information about the postgis-users mailing list