[postgis-users] Extracting variable information from netcdf, imported as raster to a table

Regina Obe lr at pcorp.us
Fri Nov 17 20:00:03 PST 2023


The format once it’s loaded in the database is no longer netcdf, so if it’s not a standard raster metadata item, it probably isn’t in the database or at the very least we don’t have a function to access it.  So you’d need to read that information before it gets in the database as Tobias is doing in his example

 

From: Manaswini Ganjam <manu.ganjam at gmail.com> 
Sent: Friday, November 17, 2023 10:01 PM
To: Regina Obe <lr at pcorp.us>
Cc: Tobias Schmid <schmid.tobi at gmail.com>; PostGIS Users Discussion <postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] Extracting variable information from netcdf, imported as raster to a table

 

Regina, 

 

I am sure that function does not extract offset and scale values provided with the netcdf to unpack data. The way I understand this concept is, the value in unpacked data (which is not the variable value) is used along with the scale and offset parameters provided with the netcdf to calculate the true variable values. 

 

I am thinking I can use https://www.postgresql.org/docs/9.1/plpython.html extension and write a function within the database but the main problem I see here is I am unsure whether Gdal actually imports this information into the database and we don't know how to access it or does it not import these parameters into the database when we use raster2pgsql? for some reason I am certain that they would not develop a tool to actually import data but ignore a part of the dataset right? Maybe we don't know where to find that information from rasters within the database?. Any thoughts?

 

Thank you,

Manaswini

 

 

On Fri, 17 Nov 2023 at 18:59, Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> > wrote:

Thanks Tobias.  

Manaswini,

 

If you trust the netcdf order to represent real days, then might as well stick with your multiband I guess.

If you are sure it’s the same for all bands, then yes you should be able to apply the same and assume it’s the same for all bands.

 

Regarding the scale factor – Tobias, in your code

            hourlyRasterScale = hourlyRaster.GetRasterBand(1).GetScale()
            hourlyRasterOffset = hourlyRaster.GetRasterBand(1).GetOffset()

 

 

I guess this isn’t the same as what you get with https://postgis.net/docs/en/RT_ST_ScaleX.html ? Just assuming X and Y scale are the same which they usually are?

And that applies across the raster.

 

Seems a shame you have to do this part in Python and not compute it in the database.

 

If it is information that GDAL exposes as part of the Band MetaData, perhaps we can add it to this function -- https://postgis.net/docs/en/RT_ST_BandMetaData.html  as extra outputs.

 

 

Manaswin,

One function I really like to use before I even bother extracting data, is the ST_Histogram function to get a certain idea of if the distribution of the data makes sense in your raster tile before you even bother trying to extract pixel values from it.  I’d break your data say into 10 buckets

 

 

https://postgis.net/docs/en/RT_ST_Histogram.html

 

 

 

 

 

 

From: Manaswini Ganjam <manu.ganjam at gmail.com <mailto:manu.ganjam at gmail.com> > 
Sent: Friday, November 17, 2023 11:06 AM
To: Tobias Schmid <schmid.tobi at gmail.com <mailto:schmid.tobi at gmail.com> >
Cc: PostGIS Users Discussion <postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> >; Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> >
Subject: Re: [postgis-users] Extracting variable information from netcdf, imported as raster to a table

 

Hi, Tobias, and Regina,

 

Can I implement this for multiband netcdfs, because I am pretty confident about bands and data order as my data units for time are the number of days since the first day of the year. For example days since 1950-01-01, band numbers go like 1,2,3,4....365. I can find different logic and extract the timestamp, because we have good information about this aspect from rasters in the database. Because the scale factor and offset value is same for all the bands?

 

Also, I am curious to know if there is any other function that can address this in postgis?

 

Thank you,

Manaswini

 

On Fri, 17 Nov 2023 at 03:14, Tobias Schmid <schmid.tobi at gmail.com <mailto:schmid.tobi at gmail.com> > wrote:

Sorry, the Screenshot was missing. 

 



 

Am Fr., 17. Nov. 2023 um 09:02 Uhr schrieb Tobias Schmid <schmid.tobi at gmail.com <mailto:schmid.tobi at gmail.com> >:

Hello Manswini, 
Hello Regina, 
I have read your e-mails with great interest. Many of the described problems seem familiar to me. I have been working with temporally and spatially resolved (weather) raster data for over 10 years. 

Here are a few comments and recommendations: 

1. After some missteps I stopped using multiband-raster. Why? 

*	What happens if individual days (or hours) are missing? 
*	Can I always trust the order of the bands? 

2. I work with 1-band grids. My database structure looks like this:

   

3. The import is using raster2pgsql. Here is snippet: 

 

  def raster2pgsql(outGtiffPath, dbSchema, dbTable, dbHost=conf.dbHost, dbPort=conf.dbPort, dbName=conf.dbName, dbUser=conf.dbUser):
    try:
        tableNotEmptySQL = f"SELECT EXISTS (SELECT * FROM {dbSchema}.{dbTable} LIMIT 1);"
        if executeSQL(tableNotEmptySQL, fetchone=True):
            cmd = f"raster2pgsql -a -F -n filename -s 4326 {outGtiffPath} {dbSchema}.{dbTable} | psql -h {dbHost} -p {dbPort} -d {dbName} -U {dbUser}"
        else:
            cmd = f"raster2pgsql -a -C -F -n filename -s 4326 {outGtiffPath} {dbSchema}.{dbTable} | psql -h {dbHost} -p {dbPort} -d {dbName} -U {dbUser}"


        subprocess.call(cmd, shell=True)


        return True


    except Exception as e:
        print(f"Importing '{outGtiffPath}' to postgres db failed.")
        print(f"{e}\n")
        with open(conf.failedRaster2pgsql, 'a') as file:
            file.write(f"{datetime.now().strftime('%Y-%m-%d %X')}\t{outGtiffPath}\t{e}\n")
        return False  


4. I read the parameters "scale" and "offset" via Python and write them to the database. 

            hourlyRaster = gdal.Open(str(outGtiffPath))
            hourlyRasterScale = hourlyRaster.GetRasterBand(1).GetScale()
            hourlyRasterOffset = hourlyRaster.GetRasterBand(1).GetOffset()
            hourlyRaster = None

 

            insertTimestampSQL = f"SET TIMEZONE = 'UTC'; UPDATE {dataset}.{dbTable} SET timestamp_w_tz = TO_TIMESTAMP('{timeStamp}', 'YYYY-MM-DD HH24:MI'), rast_scale_factor = {hourlyRasterScale}, rast_offset = {hourlyRasterOffset} WHERE filename = '{outGtiff}';"  


5. The query: 

 

select timestamp_w_tz
, st_value(rast, st_setsrid(st_makepoint(15, 50), 4326)) * rast_scale_factor + rast_offset as value_temperature
from era5.temperature_2m
where timestamp_w_tz >= '1982-09-18 00:00'
and timestamp_w_tz < '1982-09-19 00:00'

 

6. The result (temperature in Kelvin. Of course.)

+---------------------------------+------------------+
|timestamp_w_tz                   |value_temperature |
+---------------------------------+------------------+
|1982-09-18 00:00:00.000000 +00:00|287.70160804659935|
|1982-09-18 01:00:00.000000 +00:00|287.47021164940014|
|1982-09-18 02:00:00.000000 +00:00|286.0640335433434 |
|1982-09-18 03:00:00.000000 +00:00|285.56675378590086|
|1982-09-18 04:00:00.000000 +00:00|285.41434365889944|
|1982-09-18 05:00:00.000000 +00:00|285.3075453217306 |
|1982-09-18 06:00:00.000000 +00:00|286.3343668343021 |
|1982-09-18 07:00:00.000000 +00:00|287.2543900097047 |
|1982-09-18 08:00:00.000000 +00:00|290.88219602540977|
|1982-09-18 09:00:00.000000 +00:00|292.3985099166719 |
|1982-09-18 10:00:00.000000 +00:00|297.1844104010519 |
|1982-09-18 11:00:00.000000 +00:00|297.37019500841853|
|1982-09-18 12:00:00.000000 +00:00|297.58267920007745|
|1982-09-18 13:00:00.000000 +00:00|297.85078752567847|
|1982-09-18 14:00:00.000000 +00:00|298.1689575718274 |
|1982-09-18 15:00:00.000000 +00:00|297.20666005462874|
|1982-09-18 16:00:00.000000 +00:00|296.9697012440353 |
|1982-09-18 17:00:00.000000 +00:00|294.35870439679223|
|1982-09-18 18:00:00.000000 +00:00|292.8045660944494 |
|1982-09-18 19:00:00.000000 +00:00|292.6032067295789 |
|1982-09-18 20:00:00.000000 +00:00|289.9332483003572 |
|1982-09-18 21:00:00.000000 +00:00|289.59839101402565|
|1982-09-18 22:00:00.000000 +00:00|287.79616907430096|
|1982-09-18 23:00:00.000000 +00:00|287.87181789646223|
+---------------------------------+------------------+

 

I know that my example does not match your problem 100%. It is also time-consuming for you to establish the 1-band grid approach. It requires you to change your database structure. Sorry about that. 

Hopefully the comments will help you. You are also welcome to contact me personally. 

Best regards
Tobias

 

Am Fr., 17. Nov. 2023 um 05:32 Uhr schrieb Manaswini Ganjam via postgis-users <postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> >:

Regina, 

Thank you for the suggestions, the code looks clean and the computation was much better, but I still have the same issue, the values are unrealistic, this happened to me before as well (when I tried to run st_value for one raster band, with some x and y). Find the enclosed screenshot.



I have test run the code for one prcp file and the prcp values are negative, and the no data value for this dataset is -32768 (most of the values are -32767 or -32750) and I have not preprocessed this data. Do we need to manually unpack this data? If that is the case, how do I access the scale_factor and add_offset attributes packed with netcdf? 

 

This information is from the climate data readme file:The data has been packed into short integers (2 bytes instead of 4 byte reals) to save space. You must unpack that data to get the correct floating point representation of the data. Each netCDF variable that has been packed has an add_offset and scale_factor attribute associated with it. Some software automatically unpacks the data when it is read.

The formula to unpack the data is:

unpacked value = add_offset + ( (packed value) * scale_factor )

 

Thank you,

Manaswini





 

 

 

On Thu, 16 Nov 2023 at 22:02, Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> > wrote:

One more optimization.  I think you can get rid of the FOREACH band_number too and reduce all that to this query

 

So all this goes

 

        band_numbers := ARRAY(SELECT generate_series(1, 366));

        -- Loop through all bands within the file
        FOREACH band_number IN ARRAY band_numbers LOOP

:

LOOP

 

Use ST_NumBands instead of relying on your rasters all having 366 bands https://postgis.net/docs/RT_ST_NumBands.html

So replace all the above with this:

 

            INSERT INTO extracted_data_bbox (year, year_day, lat, lon, prcp, tmax, tmin, observation_time)
            SELECT year_from_filename, n.band_number, ST_Y(sc.geom), ST_X(sc.geom), CASE WHEN variable_name = 'prcp' THEN sc.val ELSE NULL END,
                    CASE WHEN variable_name = 'tmax' THEN sc.val ELSE NULL END,
                    CASE WHEN variable_name = 'tmin' THEN sc.val ELSE NULL END, observation_time

           FROM  generate_series(1, ST_NumBands(raster_record.clipped_raster) ) AS n(band_number), ST_PixelAsCentroids(raster_record.clipped_raster, n.band_number, true) AS sc;

 

That will solve the issue of you going over the band count of your raster, which might be an issue you are running into 

 

 

From: Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> > 
Sent: Thursday, November 16, 2023 9:46 PM
To: 'Manaswini Ganjam' <manu.ganjam at gmail.com <mailto:manu.ganjam at gmail.com> >
Cc: 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> >
Subject: RE: [postgis-users] Extracting variable information from netcdf, imported as raster to a table

 

I don’t know much about netCDF but I would assume GDAL would handle the packing unraveling behind the scenes.

 

Some things I notice wrong

 

https://postgis.net/docs/RT_ST_PixelAsCentroids.html  is a set returning function, so you can’t just stuff it in a record variable.  It has to go in a table.

So I’d get rid of this line

 

          SELECT * FROM ST_PixelAsCentroids(raster_record.clipped_raster, band_number, true) INTO pixel_data;

I’d also scrap this, because ultimately you want to work on all centroids not the first one that falls out of the tree

EXECUTE format('SELECT ST_Value($1, $2, $3, true)', raster_record.clipped_raster, band_number, centroid_point) INTO variable_value USING raster_record.clipped_raster, band_number, centroid_point;

 

I’d change this


            -- Insert the extracted data into the extracted_data_bbox table
            INSERT INTO extracted_data_bbox (year, year_day, lat, lon, prcp, tmax, tmin, observation_time)
            VALUES (year_from_filename, band_number, ST_Y(centroid_point), ST_X(centroid_point), CASE WHEN variable_name = 'prcp' THEN variable_value ELSE NULL END,
                    CASE WHEN variable_name = 'tmax' THEN variable_value ELSE NULL END,
                    CASE WHEN variable_name = 'tmin' THEN variable_value ELSE NULL END, observation_time);

 

To this:

 

            INSERT INTO extracted_data_bbox (year, year_day, lat, lon, prcp, tmax, tmin, observation_time)
            SELECT year_from_filename, band_number, ST_Y(sc.geom), ST_X(sc.geom), CASE WHEN variable_name = 'prcp' THEN sc.val ELSE NULL END,
                    CASE WHEN variable_name = 'tmax' THEN sc.val ELSE NULL END,
                    CASE WHEN variable_name = 'tmin' THEN sc.val ELSE NULL END, observation_time

           FROM  ST_PixelAsCentroids(raster_record.clipped_raster, band_number, true) AS sc;

 

 

 

 

From: Manaswini Ganjam <manu.ganjam at gmail.com <mailto:manu.ganjam at gmail.com> > 
Sent: Thursday, November 16, 2023 1:19 PM
To: Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> >
Cc: PostGIS Users Discussion <postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> >
Subject: Re: [postgis-users] Extracting variable information from netcdf, imported as raster to a table

 

Thank you, Regina,

 

I apologize, I have shared the older version of my code, I have tried ST_pixelascentroid and ST_value and the issue was I messed up the parameters of centroids and that caused the errors now I corrected that as well. I think the code below is a good representation of my approach, I would like to iterate for every raster in a table, for every band and for every lat and lon and extract st value, and insert it to a table, 

 

psql:/home/manaswini/MEGA/bbgc_uw_rpackage/rproject/raster2pgsql/extract_table_queryyyy.sql:66: NOTICE:  Invalid band index (must use 1-based). Returning empty set for all the coordinates and bands. What could be the issue. 

 

Is this because my data is packed? the readme file of the data has this information: Details about the data packing:

The data has been packed into short integers (2 bytes instead of 4 byte reals) to save space. You must unpack that data to get the correct floating point representation of the data. Each netCDF variable that has been packed has an add_offset and scale_factor attribute associated with it. Some software automatically unpacks the data when it is read.
 
The formula to unpack the data is:
 
unpacked value = add_offset + ( (packed value) * scale_factor )
 
For more information see here:
https://www.unidata.ucar.edu/software/netcdf/workshops/2010/bestpractices/Packing.html
 
There's also another attribute called "missing_value". In this case all the -32768 values you see are missing. Only the grid points outside the downscaling domain is given the missing data value.
 
The packing saves a lot of space, that is why the data is packed.

 

 

-- Create the precipitation_temperature_data table
DROP TABLE IF EXISTS extracted_data_bbox;
CREATE TABLE extracted_data_bbox (
    id SERIAL PRIMARY KEY,
    year integer,
    year_day integer,
    lat double precision,
    lon double precision,
    prcp double precision,
    tmax double precision,
    tmin double precision,
    observation_time timestamp
);

-- Loop through all records in gfdl_03_bbox
DO $$
DECLARE
    raster_record RECORD;
    year_from_filename integer;
    observation_time timestamp;
    centroid_point geometry;
    variable_name text;
    variable_value double precision;
    band_number integer;
    band_numbers integer[];
    pixel_data RECORD;
BEGIN
    FOR raster_record IN (SELECT * FROM gfdl_03_bbox) LOOP
        SELECT regexp_replace(raster_record.filename, '.*_(\d{4})[^0-9]+', '\1')::integer INTO year_from_filename;

        -- Determine the variable name based on the filename
        IF raster_record.filename ~ 'prcp' THEN
            variable_name := 'prcp';
        ELSIF raster_record.filename ~ 'tmax' THEN
            variable_name := 'tmax';
        ELSIF raster_record.filename ~ 'tmin' THEN
            variable_name := 'tmin';
        ELSE
            RAISE EXCEPTION 'Unknown variable in filename: %', raster_record.filename;
        END IF;

        band_numbers := ARRAY(SELECT generate_series(1, 366));

        -- Loop through all bands within the file
        FOREACH band_number IN ARRAY band_numbers LOOP
            -- Print the band number to the PostgreSQL log
            RAISE NOTICE 'Band Number: %', band_number;

            observation_time := MAKE_DATE(year_from_filename, 1, 1) + (band_number - 1) * interval '1 day';

            SELECT * FROM ST_PixelAsCentroids(raster_record.clipped_raster, band_number, true) INTO pixel_data;

            -- Extract the centroid point from pixel_data
            centroid_point := pixel_data.geom;

            -- Extract the variable value based on the variable name
            EXECUTE format('SELECT ST_Value($1, $2, $3, true)', raster_record.clipped_raster, band_number, centroid_point) INTO variable_value USING raster_record.clipped_raster, band_number, centroid_point;

            -- Insert the extracted data into the extracted_data_bbox table
            INSERT INTO extracted_data_bbox (year, year_day, lat, lon, prcp, tmax, tmin, observation_time)
            VALUES (year_from_filename, band_number, ST_Y(centroid_point), ST_X(centroid_point), CASE WHEN variable_name = 'prcp' THEN variable_value ELSE NULL END,
                    CASE WHEN variable_name = 'tmax' THEN variable_value ELSE NULL END,
                    CASE WHEN variable_name = 'tmin' THEN variable_value ELSE NULL END, observation_time);
        END LOOP;
    END LOOP;
END;
$$;

I have tried the following query to check values for a raster band : the output is again null psql:/home/manaswini/MEGA/bbgc_uw_rpackage/rproject/raster2pgsql/test_query_fff.sql:22: NOTICE:  Band: 20, X: 1, Y: 135, Value: <NULL>

:

DO $$
DECLARE
    band_number integer := 20; -- Replace with the desired band number
    x_coord integer;
    y_coord integer;
    pixel_value double precision;
    srid integer := 4326; -- Replace with the correct SRID for your data
BEGIN
    -- Loop through all x and y coordinates in the raster band
    FOR x_coord IN (SELECT generate_series(1, ST_Width(rast.clipped_raster)) FROM gfdl_03_bbox AS rast WHERE rast.filename = 'prcp_03_2034.nc <http://prcp_03_2034.nc> ') LOOP
        FOR y_coord IN (SELECT generate_series(1, ST_Height(rast.clipped_raster)) FROM gfdl_03_bbox AS rast WHERE rast.filename = 'prcp_03_2034.nc <http://prcp_03_2034.nc> ') LOOP
            -- Get the pixel value at the current x and y coordinates
            SELECT ST_Value(ST_SetSRID(rast.clipped_raster, srid), band_number, ST_SetSRID(ST_Point(x_coord, y_coord), srid)) INTO pixel_value
            FROM gfdl_03_bbox AS rast
            WHERE rast.filename = 'prcp_03_2034.nc <http://prcp_03_2034.nc> ';

            -- Print or use the pixel value as needed
            RAISE NOTICE 'Band: %, X: %, Y: %, Value: %', band_number, x_coord, y_coord, pixel_value;
        END LOOP;
    END LOOP;
END;

Thank you,

Manaswini

 

 

On Wed, 15 Nov 2023 at 20:42, Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> > wrote:

I didn’t realize that netCDF also has a vector component.

 

https://gdal.org/drivers/vector/netcdf.html#vector-netcdf

 

To read the vector component, you’d use the ogr_fdw extension to read that rather than postgis_raster extension.

 

Details here -  https://github.com/pramsey/pgsql-ogr-fdw

 

So perhaps that is what your python is doing reading the vector component.  I don’t know if netcdf mixes those in one data set or not since I have no experience using netcdf.

 

I recall you said you are using the OSGeo Live 16 distribution.  I just checked the OSGeoLive 16 does include ogr_fdw 

 

So do 

 

CREATE EXTENSION ogr_fdw;

 

 

The list of supported formats you can see with this query:

 

SELECT name FROM unnest(ogr_fdw_drivers()) AS f(name) ORDER BY name;

 

Which for osgeolive 16, I see netCDF listed

 

 

 

 

 

From: Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> > 
Sent: Wednesday, November 15, 2023 6:19 PM
To: 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> >
Cc: 'Manaswini Ganjam' <manu.ganjam at gmail.com <mailto:manu.ganjam at gmail.com> >
Subject: RE: [postgis-users] Extracting variable information from netcdf, imported as raster to a table

 

Just confirming some stuff, since I’m not completely following:

 

Raster_record.rast column is of type raster correct?  IF so ST_X and ST_Y won’t work since those are for geometry types.

 

Also ST_Value(raster_record.rast, band_number), won’t work either since that expects as input a geometry or x,y on the raster you want the value you.

I would think you would have gotten an error with that, which makes me feel I’m missing something critical.

 

If you want to extract all the pixels in a raster, you’d do something like https://postgis.net/docs/RT_ST_PixelAsPoints.html

 

SELECT pp.x, pp.y, pp.val, ST_X(pp.geom) AS lon, ST_Y(pp.geom) AS lat

FROM raster_record, 

ST_PixelAsPoints(raster_record.rast, 1) AS pp

 


           

 

From: postgis-users <postgis-users-bounces at lists.osgeo.org <mailto:postgis-users-bounces at lists.osgeo.org> > On Behalf Of Manaswini Ganjam via postgis-users
Sent: Wednesday, November 15, 2023 2:01 PM
To: postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> 
Cc: Manaswini Ganjam <manu.ganjam at gmail.com <mailto:manu.ganjam at gmail.com> >
Subject: [postgis-users] Extracting variable information from netcdf, imported as raster to a table

 

Hi, 

I have been trying to download s3 cloud stored gridded climate data and generate tables with variables, lat, lon and timestamp (year, yearday). To achieve this I used raster2pgsql and imported multiple netcdf files into a database table. 

 

Question: How to achieve the extraction of variables using postgis? I tried using ST_value, ST_pixelaspoints but I was getting errors, mainly due to the format in which netcdfs are stored in the database (the error says can't load some characters like 00E30100082...), I even tried changing the datatype to float but still did not work. I mean it is probably not simple like selecting a variable from the netcdf. I have enclosed my sql query below: 

 

  -- Iterate through all raster files in the table
    FOR raster_record IN (SELECT * FROM gfdl_03_prcp) LOOP
        -- Determine the year from the raster file name, assuming the format is 'prcp_03_1950.nc <http://prcp_03_1950.nc> '
        SELECT
            regexp_replace(raster_record.filename, '.*_(\d{4})\.nc', '\1')::integer
        INTO
            year;
        
        -- Calculate the start date of the year
        year_start := (year || '-01-01')::date;
        
        -- Determine if the year is a leap year
        is_leap_year := EXTRACT(ISODOW FROM (year_start + interval '1 year')) = 7;
        
        -- Set the number of bands for the year (365 for non-leap years, 366 for leap years)
        FOR band_number IN 1..(CASE WHEN is_leap_year THEN 366 ELSE 365 END) LOOP
            -- Calculate the observation_time using the year and band number
            observation_time := year_start + (band_number - 1) * interval '1 day';
            
            -- Extract X (lon) and Y (lat) coordinates from the raster
            SELECT
                ST_X(raster_record.rast) AS lon,
                ST_Y(raster_record.rast) AS lat
            INTO
                lon,
                lat;
            
            -- Insert the lat, lon, prcp, and observation_time into the extracted_values table
            INSERT INTO extracted_values (lat, lon, prcp, observation_time)
            VALUES
                (lat, lon, ST_Value(raster_record.rast, band_number), observation_time);
            
            -- Increment the counter
            counter := counter + 1;
            
            -- Commit the transaction periodically in batches
            IF counter % batch_size = 0 THEN
                COMMIT;
            END IF;
        END LOOP;
    END LOOP;
    

 

The metadata for the two files is as follows: 

 

File from database:

{'NC_GLOBAL#Conventions': 'CF-1.5',
 'NC_GLOBAL#GDAL': 'GDAL 3.6.4, released 2023/04/17',
 'NC_GLOBAL#history': 'Wed Nov 15 13:32:13 2023: GDAL CreateCopy( not_clipped_prcp.nc <http://not_clipped_prcp.nc> , ... )'}
File before loading into the database:
{'lat#units': 'degrees_north',
 'lon#units': 'degrees_east',
 'NC_GLOBAL#title': 'Daily statistically downscaled CMIP5 data for the United States and southern Canada east of the Rocky Mountains, version 1.0, realization 1, 0.1x0.1 degree spatial resolution.',
 'NETCDF_DIM_EXTRA': '{time}',
 'NETCDF_DIM_time_DEF': '{366,4}',
 'NETCDF_DIM_time_VALUES': '{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14....362,363,364,365}',
 'prcp#add_offset': '819.17499',
 'prcp#long_name': 'daily precipitation accumulation',
 'prcp#missing_value': '-32768',
 'prcp#scale_factor': '0.025',
 'prcp#units': 'mm',
 'prcp#_FillValue': '-32768',
 'time#units': 'days since 1952-1-1 0:0:0.0'}
 
In case this information is useful: Previously I used python to extract variable information and generate a csv or table using this variable information, and the code is enclosed below. In the code I extracted variable values using lon = dataset.variables['lon'][:] and iterated for loops to write them all in csv. 

Python code:

import netCDF4 as nc

# Step 1: Read the NetCDF file
filename = "/home/manaswini/prcp_03_1950.nc <http://prcp_03_1950.nc> "
dataset = nc.Dataset(filename)
dataset.set_auto_mask(False)
dataset.set_auto_scale(True)

 

lon = dataset.variables['lon'][:]
lat = dataset.variables['lat'][:]
time = dataset.variables['time'][:]
prcp = dataset.variables['prcp'][:]

 

import numpy as np
import csv

# csv_buffer
csv_buffer = open('output.csv', 'w', newline='')
csv_writer = csv.writer(csv_buffer)

# Iterate through grid points and write to CSV buffer
for i in enumerate(lon):
    for j in enumerate(lat):
        for k in enumerate(time):
         csv_writer.writerow([lat[j], lon[i], prcp[i][j][k], year[k], yearday[k]])


# Close the CSV buffer
csv_buffer.close()

 

Thank you, 

Manaswini Ganjam




 

-- 

Manaswini Ganjam




 

-- 

Manaswini Ganjam

_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> 
https://lists.osgeo.org/mailman/listinfo/postgis-users




 

-- 

Manaswini Ganjam




 

-- 

Manaswini Ganjam

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231117/e1e9949f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 7895 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231117/e1e9949f/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 232418 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231117/e1e9949f/attachment-0001.png>


More information about the postgis-users mailing list