[gdal-dev] Timestamp cast error in GDAL Parquet Directory with sqlite dialect

Michael Smith michael.smith.erdc at gmail.com
Sat Jul 26 12:53:37 PDT 2025


It is the same schema but writing the different files with must be optimizing the timestamps and adjusting the scale. I guess I need to force it to write it all with the same resolution. 

 

Yeah, not a sqlite thing but when using OGR Sql I don’t get the same error. But I can replicate the error just reading with pyarrow and geopandas.

 

Mike

 

 

-- 

Michael Smith

RSGIS Center – ERDC CRREL NH

US Army Corps

 

 

From: Even Rouault <even.rouault at spatialys.com>
Date: Saturday, July 26, 2025 at 3:42 PM
To: Michael Smith <michael.smith.erdc at gmail.com>, <gdal-dev at lists.osgeo.org>
Subject: Re: [gdal-dev] Timestamp cast error in GDAL Parquet Directory with sqlite dialect

 

Michael,

are you sure this is the *same* schema ? From the error messages (which comes from libarrow-compute itself), it would seem there's a mix of timestamp in microsecond and timestamp in nanosecond, and that the algorithm in libarrow tries to homogenize things and an overflow occur. If 237718454400000000 is a timestamp in microseconds, that corresponds to January 1st 9503...   I doubt the SQLite dialect plays any role in that.

Even

Le 26/07/2025 à 20:56, Michael Smith via gdal-dev a écrit :

I have a collection of parquet files all with the same schema, different stac collections written using geopandas to parquet. 

 

When I query at the cli or in python for a directory of parquert files using sql I get timestamp casting errors



gdal vector info -i PARQUET:s3://mybucket/stac/mds/rasters/ --sql "select * from 'rasters' where st_intersects(geometry, st_geomfromtext('POLYGON ((-68.00948853933728 17.7602787370086, -64.99052950907739 17.7602787370086, -64.99052950907739 18.6509945435268, -68.00948853933728 18.6509945435268, -68.00948853933728 17.7602787370086))'))" --dialect sqlite -f text

INFO: Open of `PARQUET:s3://grid-dev-publiclidar/stac/mds/rasters/'

      using driver `Parquet' successful.

 

Layer name: SELECT

Geometry: Polygon

ERROR 1: ReadNext() failed: Casting from timestamp[us, tz=UTC] to timestamp[ns, tz=UTC] would result in out of bounds timestamp: 237718454400000000

Feature Count: 34

ERROR 1: ReadNext() failed: Casting from timestamp[us, tz=UTC] to timestamp[ns, tz=UTC] would result in out of bounds timestamp: 237718454400000000

Extent: (-180.000000, -90.000000) - (180.000000, 83.999167)

 

If I do it file by file for all the parquet in a directory, I don’t get an error. 

 

Is this a bug or a problem with sqlite dialect and parquet?

 

 

-- 

Michael Smith

RSGIS Center – ERDC CRREL NH

US Army Corps

 



_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
-- 
http://www.spatialys.com
My software is free, but my time generally not.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20250726/6018e56f/attachment.htm>


More information about the gdal-dev mailing list