[postgis-users] Extracting variable information from netcdf, imported as raster to a table
Regina Obe
lr at pcorp.us
Wed Nov 15 17:42:16 PST 2023
I didn’t realize that netCDF also has a vector component.
https://gdal.org/drivers/vector/netcdf.html#vector-netcdf
To read the vector component, you’d use the ogr_fdw extension to read that rather than postgis_raster extension.
Details here - https://github.com/pramsey/pgsql-ogr-fdw
So perhaps that is what your python is doing reading the vector component. I don’t know if netcdf mixes those in one data set or not since I have no experience using netcdf.
I recall you said you are using the OSGeo Live 16 distribution. I just checked the OSGeoLive 16 does include ogr_fdw
So do
CREATE EXTENSION ogr_fdw;
The list of supported formats you can see with this query:
SELECT name FROM unnest(ogr_fdw_drivers()) AS f(name) ORDER BY name;
Which for osgeolive 16, I see netCDF listed
From: Regina Obe <lr at pcorp.us>
Sent: Wednesday, November 15, 2023 6:19 PM
To: 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org>
Cc: 'Manaswini Ganjam' <manu.ganjam at gmail.com>
Subject: RE: [postgis-users] Extracting variable information from netcdf, imported as raster to a table
Just confirming some stuff, since I’m not completely following:
Raster_record.rast column is of type raster correct? IF so ST_X and ST_Y won’t work since those are for geometry types.
Also ST_Value(raster_record.rast, band_number), won’t work either since that expects as input a geometry or x,y on the raster you want the value you.
I would think you would have gotten an error with that, which makes me feel I’m missing something critical.
If you want to extract all the pixels in a raster, you’d do something like https://postgis.net/docs/RT_ST_PixelAsPoints.html
SELECT pp.x, pp.y, pp.val, ST_X(pp.geom) AS lon, ST_Y(pp.geom) AS lat
FROM raster_record,
ST_PixelAsPoints(raster_record.rast, 1) AS pp
From: postgis-users <postgis-users-bounces at lists.osgeo.org <mailto:postgis-users-bounces at lists.osgeo.org> > On Behalf Of Manaswini Ganjam via postgis-users
Sent: Wednesday, November 15, 2023 2:01 PM
To: postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
Cc: Manaswini Ganjam <manu.ganjam at gmail.com <mailto:manu.ganjam at gmail.com> >
Subject: [postgis-users] Extracting variable information from netcdf, imported as raster to a table
Hi,
I have been trying to download s3 cloud stored gridded climate data and generate tables with variables, lat, lon and timestamp (year, yearday). To achieve this I used raster2pgsql and imported multiple netcdf files into a database table.
Question: How to achieve the extraction of variables using postgis? I tried using ST_value, ST_pixelaspoints but I was getting errors, mainly due to the format in which netcdfs are stored in the database (the error says can't load some characters like 00E30100082...), I even tried changing the datatype to float but still did not work. I mean it is probably not simple like selecting a variable from the netcdf. I have enclosed my sql query below:
-- Iterate through all raster files in the table
FOR raster_record IN (SELECT * FROM gfdl_03_prcp) LOOP
-- Determine the year from the raster file name, assuming the format is 'prcp_03_1950.nc <http://prcp_03_1950.nc> '
SELECT
regexp_replace(raster_record.filename, '.*_(\d{4})\.nc', '\1')::integer
INTO
year;
-- Calculate the start date of the year
year_start := (year || '-01-01')::date;
-- Determine if the year is a leap year
is_leap_year := EXTRACT(ISODOW FROM (year_start + interval '1 year')) = 7;
-- Set the number of bands for the year (365 for non-leap years, 366 for leap years)
FOR band_number IN 1..(CASE WHEN is_leap_year THEN 366 ELSE 365 END) LOOP
-- Calculate the observation_time using the year and band number
observation_time := year_start + (band_number - 1) * interval '1 day';
-- Extract X (lon) and Y (lat) coordinates from the raster
SELECT
ST_X(raster_record.rast) AS lon,
ST_Y(raster_record.rast) AS lat
INTO
lon,
lat;
-- Insert the lat, lon, prcp, and observation_time into the extracted_values table
INSERT INTO extracted_values (lat, lon, prcp, observation_time)
VALUES
(lat, lon, ST_Value(raster_record.rast, band_number), observation_time);
-- Increment the counter
counter := counter + 1;
-- Commit the transaction periodically in batches
IF counter % batch_size = 0 THEN
COMMIT;
END IF;
END LOOP;
END LOOP;
The metadata for the two files is as follows:
File from database:
{'NC_GLOBAL#Conventions': 'CF-1.5',
'NC_GLOBAL#GDAL': 'GDAL 3.6.4, released 2023/04/17',
'NC_GLOBAL#history': 'Wed Nov 15 13:32:13 2023: GDAL CreateCopy( not_clipped_prcp.nc <http://not_clipped_prcp.nc> , ... )'}
File before loading into the database:
{'lat#units': 'degrees_north',
'lon#units': 'degrees_east',
'NC_GLOBAL#title': 'Daily statistically downscaled CMIP5 data for the United States and southern Canada east of the Rocky Mountains, version 1.0, realization 1, 0.1x0.1 degree spatial resolution.',
'NETCDF_DIM_EXTRA': '{time}',
'NETCDF_DIM_time_DEF': '{366,4}',
'NETCDF_DIM_time_VALUES': '{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14....362,363,364,365}',
'prcp#add_offset': '819.17499',
'prcp#long_name': 'daily precipitation accumulation',
'prcp#missing_value': '-32768',
'prcp#scale_factor': '0.025',
'prcp#units': 'mm',
'prcp#_FillValue': '-32768',
'time#units': 'days since 1952-1-1 0:0:0.0'}
In case this information is useful: Previously I used python to extract variable information and generate a csv or table using this variable information, and the code is enclosed below. In the code I extracted variable values using lon = dataset.variables['lon'][:] and iterated for loops to write them all in csv.
Python code:
import netCDF4 as nc
# Step 1: Read the NetCDF file
filename = "/home/manaswini/prcp_03_1950.nc <http://prcp_03_1950.nc> "
dataset = nc.Dataset(filename)
dataset.set_auto_mask(False)
dataset.set_auto_scale(True)
lon = dataset.variables['lon'][:]
lat = dataset.variables['lat'][:]
time = dataset.variables['time'][:]
prcp = dataset.variables['prcp'][:]
import numpy as np
import csv
# csv_buffer
csv_buffer = open('output.csv', 'w', newline='')
csv_writer = csv.writer(csv_buffer)
# Iterate through grid points and write to CSV buffer
for i in enumerate(lon):
for j in enumerate(lat):
for k in enumerate(time):
csv_writer.writerow([lat[j], lon[i], prcp[i][j][k], year[k], yearday[k]])
# Close the CSV buffer
csv_buffer.close()
Thank you,
Manaswini Ganjam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231115/f2eec0f3/attachment.htm>
More information about the postgis-users
mailing list