[gdal-dev] Design for sub-second accuracy in OGR ?

Mon Apr 6 14:14:57 PDT 2015

Le lundi 06 avril 2015 23:11:21, Dmitriy Baryshnikov a écrit :
> Hi Even,
> 
> It seems to me that this is duplicating of RFC 50: OGR field subtypes.
> For example we have the master field type DateTime and Subtype - Year.
> So the internal structure for date/time representation may be adopt to
> such technique.

The subtype is defined at field definition level. In all formats we currently 
handle we only know the date/time precision when reading values (and they 
might have different precision between records), so after having created the 
layer and field definitions.

> 
> Best regards,
>      Dmitry
> 
> 06.04.2015 15:02, Even Rouault пишет:
> > Le lundi 06 avril 2015 13:48:47, Even Rouault a écrit :
> >> Le lundi 06 avril 2015 11:32:33, Dmitriy Baryshnikov a écrit :
> >>> The first solution looks reasonable. But there is lack in precision
> >>> field - there the only time is significant:
> >>> 
> >>> ODTP_HMSm
> >>> ODTP_HMS
> >>> ODTP_HM
> >>> ODTP_H
> >> 
> >> As I didn't want to multiply the values in the enumeration, my intent
> >> was to reuse the ODTP_YMDxxxx values for OFTTime only.
> > 
> > I meant "for OFTTime too"
> > 
> >> This was what I wanted
> >> to intend with the precision between parenthesis in the comment of
> >> ODTP_YMDH "Year, month, day (if OFTDateTime) and hour"
> >> 
> >> Or perhaps, the enumeration should capture the most precise part of the
> >> (date)time structure  ?
> >> ODTP_Year
> >> ODTP_Month
> >> ODTP_Day
> >> ODTP_Hour
> >> ODTP_Minute
> >> ODTP_Second
> >> ODTP_Millisecond
> >> 
> >>> etc.
> >>> 
> >>> Best regards,
> >>> 
> >>>       Dmitry
> >>> 
> >>> 05.04.2015 22:25, Even Rouault пишет:
> >>>> Hi,
> >>>> 
> >>>> In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680,
> >>>> which is about lack of precision of the current datetime structure,
> >>>> I've imagined different solutions how to modify the OGRField
> >>>> structure, and failed to pick up one that would be the obvious
> >>>> solution, so opinions are welcome.
> >>>> 
> >>>> The issue is how to add (at least) microsecond accuracy to the
> >>>> datetime structure, as a few formats support it explicitely or
> >>>> implicitely : MapInfo, GPX, Atom (GeoRSS driver), GeoPackage, SQLite,
> >>>> PostgreSQL, CSV, GeoJSON, ODS, XLSX, KML (potentially GML too)...
> >>>> 
> >>>> Below a few potential solutions :
> >>>> 
> >>>> ---------------------------------------
> >>>> Solution 1) : Millisecond accuracy, second becomes a float
> >>>> 
> >>>> This is the solution I've prototyped.
> >>>> 
> >>>> typedef union {
> >>>> [...]
> >>>> 
> >>>>       struct {
> >>>>       
> >>>>           GInt16  Year;
> >>>>           GByte   Month;
> >>>>           GByte   Day;
> >>>>           GByte   Hour;
> >>>>           GByte   Minute;
> >>>>           GByte   TZFlag;
> >>>>           GByte   Precision; /* value in OGRDateTimePrecision */
> >>>>           float   Second; /* from 00.000 to 60.999 (millisecond
> >>>>           accuracy) */
> >>>>       
> >>>>       } Date;
> >>>> 
> >>>> } OGRField
> >>>> 
> >>>> So sub-second precision is representing with a single precision
> >>>> floating point number, storing both integral and decimal parts. (we
> >>>> could theorically have a hundredth of millisecond accuracy, 10^-5 s,
> >>>> since 6099999 fits on the 23 bits of the mantissa)
> >>>> 
> >>>> Another addition is the Precision field that indicates which parts of
> >>>> the datetime structure are significant.
> >>>> 
> >>>> /** Enumeration that defines the precision of a DateTime.
> >>>> 
> >>>>     * @since GDAL 2.0
> >>>>     */
> >>>> 
> >>>> typedef enum
> >>>> {
> >>>> 
> >>>>       ODTP_Undefined,     /**< Undefined */
> >>>>       ODTP_Guess,         /**< Only valid when setting through
> >>>>       SetField(i,year,
> >>>> 
> >>>> month...) where OGR will guess */
> >>>> 
> >>>>       ODTP_Y,             /**< Year is significant */
> >>>>       ODTP_YM,            /**< Year and month are significant*/
> >>>>       ODTP_YMD,           /**< Year, month and day are significant */
> >>>>       ODTP_YMDH,          /**< Year, month, day (if OFTDateTime) and
> >>>>       hour are
> >>>> 
> >>>> significant */
> >>>> 
> >>>>       ODTP_YMDHM,         /**< Year, month, day (if OFTDateTime), hour
> >>>>       and
> >>>> 
> >>>> minute are significant */
> >>>> 
> >>>>       ODTP_YMDHMS,        /**< Year, month, day (if OFTDateTime),
> >>>>       hour, minute
> >>>> 
> >>>> and integral second are significant */
> >>>> 
> >>>>       ODTP_YMDHMSm,       /**< Year, month, day (if OFTDateTime),
> >>>>       hour, minute
> >>>> 
> >>>> and second with microseconds are significant */
> >>>> } OGRDateTimePrecision;
> >>>> 
> >>>> I think this is important since "2015/04/05 17:12:34" and "2015/04/05
> >>>> 17:12:34.000" do not really mean the same thing and it might be good
> >>>> to be able to preserve the original accuracy when converting between
> >>>> formats.
> >>>> 
> >>>> A drawback of this solution is that the size of the OGRField structure
> >>>> increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on
> >>>> 64 bit). This is probably not that important since in most cases not
> >>>> that many OGRField structures are instanciated at one time (typically,
> >>>> you iterate over features one at a time).
> >>>> This could be more of a problem for use cases that involve the MEM
> >>>> driver, as it keep all features in memory.
> >>>> 
> >>>> Another drawback is that the change of the structure might not be
> >>>> directly noticed by application developers as the Second field name is
> >>>> preserved, but a new Precision field is added, so there's a risk that
> >>>> Precision is let uninitialized if the field is set through
> >>>> OGRFeature::SetField(int iFieldIndex, OGRField* psRawField). That
> >>>> could lead to unexpected formatting (but hopefully not crashes with
> >>>> defensive programming). However I'd think it is unlikely that many
> >>>> applications directly manipulate OGRField directly, instead of using
> >>>> the getters and setters of OGRFeature.
> >>>> 
> >>>> ---------------------------------------
> >>>> Solution 2) : Millisecond accuracy, second and milliseconds in
> >>>> distinct fields
> >>>> 
> >>>> typedef union {
> >>>> [...]
> >>>> 
> >>>>       struct {
> >>>>       
> >>>>           GInt16  Year;
> >>>>           GByte   Month;
> >>>>           GByte   Day;
> >>>>           GByte   Hour;
> >>>>           GByte   Minute;
> >>>>           GByte   TZFlag;
> >>>>           GByte   Precision; /* value in OGRDateTimePrecision */
> >>>>           GByte   Second; /* from 0 to 60 */
> >>>> 	
> >>>> 	GUInt16 Millisecond; /* from 0 to 999 */
> >>>> 	
> >>>>       } Date;
> >>>> 
> >>>> } OGRField
> >>>> 
> >>>> Same size of structure as in 1)
> >>>> 
> >>>> ---------------------------------------
> >>>> Solution 3) : Millisecond accuracy, pack all fields
> >>>> 
> >>>> Conceptually, this would use bit fields to avoid wasting unused bits :
> >>>> 
> >>>> typedef union {
> >>>> [...]
> >>>> 
> >>>>     struct {
> >>>>     
> >>>>       GInt16        Year;
> >>>>       GUIntBig     Month:4;
> >>>>       GUIntBig     Day:5;
> >>>>       GUIntBig     Hour:5;
> >>>>       GUIntBig     Minute:6;
> >>>>       GUIntBig     Second:6;
> >>>>       GUIntBig     Millisecond:10; /* 0-999 */
> >>>>       GUIntBig     TZFlag:8;
> >>>>       GUIntBig     Precision:4;
> >>>>    
> >>>>    } Date;
> >>>> 
> >>>> } OGRField;
> >>>> 
> >>>> This was proposed in the above mentionned ticket. And as there were
> >>>> enough remaining bits, I've also added the Precision field (and in all
> >>>> other solutions).
> >>>> 
> >>>> The advantage is that sizeof(mydate) remains 8 bytes on 32 bits
> >>>> builds.
> >>>> 
> >>>> But the C standard only defines bitfields of int/unsigned int, so this
> >>>> is not portable, plus the fact that the way bits are packed is not
> >>>> defined by the standard, so different compilers could come up with
> >>>> different packing. A workaround is to do the bit manipulation through
> >>>> macros :
> >>>> 
> >>>> typedef union {
> >>>> [...]
> >>>> 
> >>>>     struct {
> >>>> 	
> >>>> 	GUIntBig	opaque;
> >>>> 	
> >>>>     } Date;
> >>>> 
> >>>> } OGRField;
> >>>> 
> >>>> #define GET_BITS(x,y_bits,shift)        (int)(((x).Date.opaque >>
> >>>> (shift)) & ((1 << (y_bits))-1))
> >>>> 
> >>>> #define GET_YEAR(x)              (short)GET_BITS(x,16,64-16)
> >>>> #define GET_MONTH(x)             GET_BITS(x,4,64-16-4)
> >>>> #define GET_DAY(x)               GET_BITS(x,5,64-16-4-5)
> >>>> #define GET_HOUR(x)              GET_BITS(x,5,64-16-4-5-5)
> >>>> #define GET_MINUTE(x)            GET_BITS(x,6,64-16-4-5-5-6)
> >>>> #define GET_SECOND(x)            GET_BITS(x,6,64-16-4-5-5-6-6)
> >>>> #define GET_MILLISECOND(x)       GET_BITS(x,10,64-16-4-5-5-6-6-10)
> >>>> #define GET_TZFLAG(x)            GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
> >>>> #define GET_PRECISION(x)         GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)
> >>>> 
> >>>> #define SET_BITS(x,y,y_bits,shift)  (x).Date.opaque = ((x).Date.opaque
> >>>> & (~( (GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) <<
> >>>> (shift)))
> >>>> 
> >>>> #define SET_YEAR(x,val)            SET_BITS(x,val,16,64-16)
> >>>> #define SET_MONTH(x,val)           SET_BITS(x,val,4,64-16-4)
> >>>> #define SET_DAY(x,val)             SET_BITS(x,val,5,64-16-4-5)
> >>>> #define SET_HOUR(x,val)            SET_BITS(x,val,5,64-16-4-5-5)
> >>>> #define SET_MINUTE(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6)
> >>>> #define SET_SECOND(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6-6)
> >>>> #define SET_MILLISECOND(x,val)
> >>>> SET_BITS(x,val,10,64-16-4-5-5-6-6-10) #define SET_TZFLAG(x,val)
> >>>> 
> >>>>    SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8) #define SET_PRECISION(x,val)
> >>>> 
> >>>> SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)
> >>>> 
> >>>> Main advantage: the size of OGRField remains unchanged (so 8 bytes on
> >>>> 32-bit builds).
> >>>> 
> >>>> Drawback: manipulation of datetime members is less natural, but there
> >>>> are not that many places in the GDAL code base were the OGRField.Date
> >>>> members are used, so it is not much that a problem.
> >>>> 
> >>>> ---------------------------------------
> >>>> Solution 4) : Microsecond accuracy with one field
> >>>> 
> >>>> Solution 1) used a float for second and sub-second, but a float has
> >>>> only 23 bits of mantissa, which is enough to represent second with
> >>>> millisecond accuracy, but not for microsecond (you need 26 bits for
> >>>> that). So use a 32-bit integer instead of a 32-bit floating point.
> >>>> 
> >>>> typedef union {
> >>>> [...]
> >>>> 
> >>>>       struct {
> >>>>       
> >>>>           GInt16  Year;
> >>>>           GByte   Month;
> >>>>           GByte   Day;
> >>>>           GByte   Hour;
> >>>>           GByte   Minute;
> >>>>           GByte   TZFlag;
> >>>>           GByte   Precision; /* value in OGRDateTimePrecision */
> >>>>           GUInt32 Microseconds; /* 00000000 to 59999999 */
> >>>>       
> >>>>       } Date;
> >>>> 
> >>>> } OGRField
> >>>> 
> >>>> Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds
> >>>> (and remain 16 bytes on 64-bit builds)
> >>>> 
> >>>> We would need to add an extra value in OGRDateTimePrecision to mean
> >>>> the microsecond accuracy.
> >>>> 
> >>>> Not really clear we need microseconds accuracy... Most formats that
> >>>> support subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
> >>>> DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals
> >>>> beyond second. From
> >>>> http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
> >>>> PostgreSQL supports microsecond accuracy.
> >>>> 
> >>>> ---------------------------------------
> >>>> Solution 5) : Microsecond with 3 fields
> >>>> 
> >>>> A variant where we split second into 3 integer parts:
> >>>> 
> >>>> typedef union {
> >>>> [...]
> >>>> 
> >>>>       struct {
> >>>>       
> >>>>           GInt16  Year;
> >>>>           GByte   Month;
> >>>>           GByte   Day;
> >>>>           GByte   Hour;
> >>>>           GByte   Minute;
> >>>>           GByte   TZFlag;
> >>>>           GByte   Precision; /* value in OGRDateTimePrecision */
> >>>> 	
> >>>> 	GByte   Second; /* 0 to 59 */
> >>>> 	
> >>>>           GUInt16  Millisecond; /* 0 to 999 */
> >>>>           GUInt16 Microsecond; /* 0 to 999 */
> >>>>       
> >>>>       } Date;
> >>>> 
> >>>> } OGRField
> >>>> 
> >>>> Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on
> >>>> 32-bit builds (and remain 16 bytes on 64-bit builds)
> >>>> 
> >>>> ---------------------------------------
> >>>> Solution 6) : Nanosecond accuracy and beyond !
> >>>> 
> >>>> Now that we are using 16 bytes, why not having nanosecond accuracy ?
> >>>> 
> >>>> typedef union {
> >>>> [...]
> >>>> 
> >>>>       struct {
> >>>>       
> >>>>           GInt16  Year;
> >>>>           GByte   Month;
> >>>>           GByte   Day;
> >>>>           GByte   Hour;
> >>>>           GByte   Minute;
> >>>>           GByte   TZFlag;
> >>>>           GByte   Precision; /* value in OGRDateTimePrecision */
> >>>> 	
> >>>> 	double   Second; /* 0.000000000 to 60.999999999 */
> >>>> 	
> >>>>       } Date;
> >>>> 
> >>>> } OGRField
> >>>> 
> >>>> Actually we even have picosecond accuracy! (since for picoseconds, we
> >>>> need 46 bits and a double has 52 bits of mantissa). And if we use a
> >>>> 64-bit integer instead of a double, we can have femtosecond accuracy
> >>>> ;-)
> >>>> 
> >>>> Any preference ?
> >>>> 
> >>>> Even
> >>> 
> >>> _______________________________________________
> >>> gdal-dev mailing list
> >>> gdal-dev at lists.osgeo.org
> >>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com