[gdal-dev] Design for sub-second accuracy in OGR ?

Dmitriy Baryshnikov bishop.dev at gmail.com
Mon Apr 6 14:32:40 PDT 2015


Why not read all date/time data from records as accurate as possible? 
For example for OFTDate we get date by GetFieldAsDateTime 
<http://www.gdal.org/classOGRFeature.html#a6c5d2444407b07e07b79863c42ee7a49> 
and time is zero.
It's strange to analyse data structure during reading the records as we 
already have field definition.
We can use for old datasets type DateTime + SubType ODTP_YMDHMSm and new 
datasets let the user to choose the subtype. Certainly some formats 
support this new type + subtype now (i.e. Postgres/PostGIS, etc.).

Postgres data type mapping:
date -> OFTDateTime + ODTP_YMD
time-> OFTDateTime + ODTP_HMS
timestamp -> OFTDateTime + ODTP_YMDHMSm

Best regards,
     Dmitry

07.04.2015 00:14, Even Rouault пишет:
> Le lundi 06 avril 2015 23:11:21, Dmitriy Baryshnikov a écrit :
>> Hi Even,
>>
>> It seems to me that this is duplicating of RFC 50: OGR field subtypes.
>> For example we have the master field type DateTime and Subtype - Year.
>> So the internal structure for date/time representation may be adopt to
>> such technique.
> The subtype is defined at field definition level. In all formats we currently
> handle we only know the date/time precision when reading values (and they
> might have different precision between records), so after having created the
> layer and field definitions.
>
>> Best regards,
>>       Dmitry
>>
>> 06.04.2015 15:02, Even Rouault пишет:
>>> Le lundi 06 avril 2015 13:48:47, Even Rouault a écrit :
>>>> Le lundi 06 avril 2015 11:32:33, Dmitriy Baryshnikov a écrit :
>>>>> The first solution looks reasonable. But there is lack in precision
>>>>> field - there the only time is significant:
>>>>>
>>>>> ODTP_HMSm
>>>>> ODTP_HMS
>>>>> ODTP_HM
>>>>> ODTP_H
>>>> As I didn't want to multiply the values in the enumeration, my intent
>>>> was to reuse the ODTP_YMDxxxx values for OFTTime only.
>>> I meant "for OFTTime too"
>>>
>>>> This was what I wanted
>>>> to intend with the precision between parenthesis in the comment of
>>>> ODTP_YMDH "Year, month, day (if OFTDateTime) and hour"
>>>>
>>>> Or perhaps, the enumeration should capture the most precise part of the
>>>> (date)time structure  ?
>>>> ODTP_Year
>>>> ODTP_Month
>>>> ODTP_Day
>>>> ODTP_Hour
>>>> ODTP_Minute
>>>> ODTP_Second
>>>> ODTP_Millisecond
>>>>
>>>>> etc.
>>>>>
>>>>> Best regards,
>>>>>
>>>>>        Dmitry
>>>>>
>>>>> 05.04.2015 22:25, Even Rouault пишет:
>>>>>> Hi,
>>>>>>
>>>>>> In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680,
>>>>>> which is about lack of precision of the current datetime structure,
>>>>>> I've imagined different solutions how to modify the OGRField
>>>>>> structure, and failed to pick up one that would be the obvious
>>>>>> solution, so opinions are welcome.
>>>>>>
>>>>>> The issue is how to add (at least) microsecond accuracy to the
>>>>>> datetime structure, as a few formats support it explicitely or
>>>>>> implicitely : MapInfo, GPX, Atom (GeoRSS driver), GeoPackage, SQLite,
>>>>>> PostgreSQL, CSV, GeoJSON, ODS, XLSX, KML (potentially GML too)...
>>>>>>
>>>>>> Below a few potential solutions :
>>>>>>
>>>>>> ---------------------------------------
>>>>>> Solution 1) : Millisecond accuracy, second becomes a float
>>>>>>
>>>>>> This is the solution I've prototyped.
>>>>>>
>>>>>> typedef union {
>>>>>> [...]
>>>>>>
>>>>>>        struct {
>>>>>>        
>>>>>>            GInt16  Year;
>>>>>>            GByte   Month;
>>>>>>            GByte   Day;
>>>>>>            GByte   Hour;
>>>>>>            GByte   Minute;
>>>>>>            GByte   TZFlag;
>>>>>>            GByte   Precision; /* value in OGRDateTimePrecision */
>>>>>>            float   Second; /* from 00.000 to 60.999 (millisecond
>>>>>>            accuracy) */
>>>>>>        
>>>>>>        } Date;
>>>>>>
>>>>>> } OGRField
>>>>>>
>>>>>> So sub-second precision is representing with a single precision
>>>>>> floating point number, storing both integral and decimal parts. (we
>>>>>> could theorically have a hundredth of millisecond accuracy, 10^-5 s,
>>>>>> since 6099999 fits on the 23 bits of the mantissa)
>>>>>>
>>>>>> Another addition is the Precision field that indicates which parts of
>>>>>> the datetime structure are significant.
>>>>>>
>>>>>> /** Enumeration that defines the precision of a DateTime.
>>>>>>
>>>>>>      * @since GDAL 2.0
>>>>>>      */
>>>>>>
>>>>>> typedef enum
>>>>>> {
>>>>>>
>>>>>>        ODTP_Undefined,     /**< Undefined */
>>>>>>        ODTP_Guess,         /**< Only valid when setting through
>>>>>>        SetField(i,year,
>>>>>>
>>>>>> month...) where OGR will guess */
>>>>>>
>>>>>>        ODTP_Y,             /**< Year is significant */
>>>>>>        ODTP_YM,            /**< Year and month are significant*/
>>>>>>        ODTP_YMD,           /**< Year, month and day are significant */
>>>>>>        ODTP_YMDH,          /**< Year, month, day (if OFTDateTime) and
>>>>>>        hour are
>>>>>>
>>>>>> significant */
>>>>>>
>>>>>>        ODTP_YMDHM,         /**< Year, month, day (if OFTDateTime), hour
>>>>>>        and
>>>>>>
>>>>>> minute are significant */
>>>>>>
>>>>>>        ODTP_YMDHMS,        /**< Year, month, day (if OFTDateTime),
>>>>>>        hour, minute
>>>>>>
>>>>>> and integral second are significant */
>>>>>>
>>>>>>        ODTP_YMDHMSm,       /**< Year, month, day (if OFTDateTime),
>>>>>>        hour, minute
>>>>>>
>>>>>> and second with microseconds are significant */
>>>>>> } OGRDateTimePrecision;
>>>>>>
>>>>>> I think this is important since "2015/04/05 17:12:34" and "2015/04/05
>>>>>> 17:12:34.000" do not really mean the same thing and it might be good
>>>>>> to be able to preserve the original accuracy when converting between
>>>>>> formats.
>>>>>>
>>>>>> A drawback of this solution is that the size of the OGRField structure
>>>>>> increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on
>>>>>> 64 bit). This is probably not that important since in most cases not
>>>>>> that many OGRField structures are instanciated at one time (typically,
>>>>>> you iterate over features one at a time).
>>>>>> This could be more of a problem for use cases that involve the MEM
>>>>>> driver, as it keep all features in memory.
>>>>>>
>>>>>> Another drawback is that the change of the structure might not be
>>>>>> directly noticed by application developers as the Second field name is
>>>>>> preserved, but a new Precision field is added, so there's a risk that
>>>>>> Precision is let uninitialized if the field is set through
>>>>>> OGRFeature::SetField(int iFieldIndex, OGRField* psRawField). That
>>>>>> could lead to unexpected formatting (but hopefully not crashes with
>>>>>> defensive programming). However I'd think it is unlikely that many
>>>>>> applications directly manipulate OGRField directly, instead of using
>>>>>> the getters and setters of OGRFeature.
>>>>>>
>>>>>> ---------------------------------------
>>>>>> Solution 2) : Millisecond accuracy, second and milliseconds in
>>>>>> distinct fields
>>>>>>
>>>>>> typedef union {
>>>>>> [...]
>>>>>>
>>>>>>        struct {
>>>>>>        
>>>>>>            GInt16  Year;
>>>>>>            GByte   Month;
>>>>>>            GByte   Day;
>>>>>>            GByte   Hour;
>>>>>>            GByte   Minute;
>>>>>>            GByte   TZFlag;
>>>>>>            GByte   Precision; /* value in OGRDateTimePrecision */
>>>>>>            GByte   Second; /* from 0 to 60 */
>>>>>> 	
>>>>>> 	GUInt16 Millisecond; /* from 0 to 999 */
>>>>>> 	
>>>>>>        } Date;
>>>>>>
>>>>>> } OGRField
>>>>>>
>>>>>> Same size of structure as in 1)
>>>>>>
>>>>>> ---------------------------------------
>>>>>> Solution 3) : Millisecond accuracy, pack all fields
>>>>>>
>>>>>> Conceptually, this would use bit fields to avoid wasting unused bits :
>>>>>>
>>>>>> typedef union {
>>>>>> [...]
>>>>>>
>>>>>>      struct {
>>>>>>      
>>>>>>        GInt16        Year;
>>>>>>        GUIntBig     Month:4;
>>>>>>        GUIntBig     Day:5;
>>>>>>        GUIntBig     Hour:5;
>>>>>>        GUIntBig     Minute:6;
>>>>>>        GUIntBig     Second:6;
>>>>>>        GUIntBig     Millisecond:10; /* 0-999 */
>>>>>>        GUIntBig     TZFlag:8;
>>>>>>        GUIntBig     Precision:4;
>>>>>>     
>>>>>>     } Date;
>>>>>>
>>>>>> } OGRField;
>>>>>>
>>>>>> This was proposed in the above mentionned ticket. And as there were
>>>>>> enough remaining bits, I've also added the Precision field (and in all
>>>>>> other solutions).
>>>>>>
>>>>>> The advantage is that sizeof(mydate) remains 8 bytes on 32 bits
>>>>>> builds.
>>>>>>
>>>>>> But the C standard only defines bitfields of int/unsigned int, so this
>>>>>> is not portable, plus the fact that the way bits are packed is not
>>>>>> defined by the standard, so different compilers could come up with
>>>>>> different packing. A workaround is to do the bit manipulation through
>>>>>> macros :
>>>>>>
>>>>>> typedef union {
>>>>>> [...]
>>>>>>
>>>>>>      struct {
>>>>>> 	
>>>>>> 	GUIntBig	opaque;
>>>>>> 	
>>>>>>      } Date;
>>>>>>
>>>>>> } OGRField;
>>>>>>
>>>>>> #define GET_BITS(x,y_bits,shift)        (int)(((x).Date.opaque >>
>>>>>> (shift)) & ((1 << (y_bits))-1))
>>>>>>
>>>>>> #define GET_YEAR(x)              (short)GET_BITS(x,16,64-16)
>>>>>> #define GET_MONTH(x)             GET_BITS(x,4,64-16-4)
>>>>>> #define GET_DAY(x)               GET_BITS(x,5,64-16-4-5)
>>>>>> #define GET_HOUR(x)              GET_BITS(x,5,64-16-4-5-5)
>>>>>> #define GET_MINUTE(x)            GET_BITS(x,6,64-16-4-5-5-6)
>>>>>> #define GET_SECOND(x)            GET_BITS(x,6,64-16-4-5-5-6-6)
>>>>>> #define GET_MILLISECOND(x)       GET_BITS(x,10,64-16-4-5-5-6-6-10)
>>>>>> #define GET_TZFLAG(x)            GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
>>>>>> #define GET_PRECISION(x)         GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)
>>>>>>
>>>>>> #define SET_BITS(x,y,y_bits,shift)  (x).Date.opaque = ((x).Date.opaque
>>>>>> & (~( (GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) <<
>>>>>> (shift)))
>>>>>>
>>>>>> #define SET_YEAR(x,val)            SET_BITS(x,val,16,64-16)
>>>>>> #define SET_MONTH(x,val)           SET_BITS(x,val,4,64-16-4)
>>>>>> #define SET_DAY(x,val)             SET_BITS(x,val,5,64-16-4-5)
>>>>>> #define SET_HOUR(x,val)            SET_BITS(x,val,5,64-16-4-5-5)
>>>>>> #define SET_MINUTE(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6)
>>>>>> #define SET_SECOND(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6-6)
>>>>>> #define SET_MILLISECOND(x,val)
>>>>>> SET_BITS(x,val,10,64-16-4-5-5-6-6-10) #define SET_TZFLAG(x,val)
>>>>>>
>>>>>>     SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8) #define SET_PRECISION(x,val)
>>>>>>
>>>>>> SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)
>>>>>>
>>>>>> Main advantage: the size of OGRField remains unchanged (so 8 bytes on
>>>>>> 32-bit builds).
>>>>>>
>>>>>> Drawback: manipulation of datetime members is less natural, but there
>>>>>> are not that many places in the GDAL code base were the OGRField.Date
>>>>>> members are used, so it is not much that a problem.
>>>>>>
>>>>>> ---------------------------------------
>>>>>> Solution 4) : Microsecond accuracy with one field
>>>>>>
>>>>>> Solution 1) used a float for second and sub-second, but a float has
>>>>>> only 23 bits of mantissa, which is enough to represent second with
>>>>>> millisecond accuracy, but not for microsecond (you need 26 bits for
>>>>>> that). So use a 32-bit integer instead of a 32-bit floating point.
>>>>>>
>>>>>> typedef union {
>>>>>> [...]
>>>>>>
>>>>>>        struct {
>>>>>>        
>>>>>>            GInt16  Year;
>>>>>>            GByte   Month;
>>>>>>            GByte   Day;
>>>>>>            GByte   Hour;
>>>>>>            GByte   Minute;
>>>>>>            GByte   TZFlag;
>>>>>>            GByte   Precision; /* value in OGRDateTimePrecision */
>>>>>>            GUInt32 Microseconds; /* 00000000 to 59999999 */
>>>>>>        
>>>>>>        } Date;
>>>>>>
>>>>>> } OGRField
>>>>>>
>>>>>> Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds
>>>>>> (and remain 16 bytes on 64-bit builds)
>>>>>>
>>>>>> We would need to add an extra value in OGRDateTimePrecision to mean
>>>>>> the microsecond accuracy.
>>>>>>
>>>>>> Not really clear we need microseconds accuracy... Most formats that
>>>>>> support subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
>>>>>> DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals
>>>>>> beyond second. From
>>>>>> http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
>>>>>> PostgreSQL supports microsecond accuracy.
>>>>>>
>>>>>> ---------------------------------------
>>>>>> Solution 5) : Microsecond with 3 fields
>>>>>>
>>>>>> A variant where we split second into 3 integer parts:
>>>>>>
>>>>>> typedef union {
>>>>>> [...]
>>>>>>
>>>>>>        struct {
>>>>>>        
>>>>>>            GInt16  Year;
>>>>>>            GByte   Month;
>>>>>>            GByte   Day;
>>>>>>            GByte   Hour;
>>>>>>            GByte   Minute;
>>>>>>            GByte   TZFlag;
>>>>>>            GByte   Precision; /* value in OGRDateTimePrecision */
>>>>>> 	
>>>>>> 	GByte   Second; /* 0 to 59 */
>>>>>> 	
>>>>>>            GUInt16  Millisecond; /* 0 to 999 */
>>>>>>            GUInt16 Microsecond; /* 0 to 999 */
>>>>>>        
>>>>>>        } Date;
>>>>>>
>>>>>> } OGRField
>>>>>>
>>>>>> Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on
>>>>>> 32-bit builds (and remain 16 bytes on 64-bit builds)
>>>>>>
>>>>>> ---------------------------------------
>>>>>> Solution 6) : Nanosecond accuracy and beyond !
>>>>>>
>>>>>> Now that we are using 16 bytes, why not having nanosecond accuracy ?
>>>>>>
>>>>>> typedef union {
>>>>>> [...]
>>>>>>
>>>>>>        struct {
>>>>>>        
>>>>>>            GInt16  Year;
>>>>>>            GByte   Month;
>>>>>>            GByte   Day;
>>>>>>            GByte   Hour;
>>>>>>            GByte   Minute;
>>>>>>            GByte   TZFlag;
>>>>>>            GByte   Precision; /* value in OGRDateTimePrecision */
>>>>>> 	
>>>>>> 	double   Second; /* 0.000000000 to 60.999999999 */
>>>>>> 	
>>>>>>        } Date;
>>>>>>
>>>>>> } OGRField
>>>>>>
>>>>>> Actually we even have picosecond accuracy! (since for picoseconds, we
>>>>>> need 46 bits and a double has 52 bits of mantissa). And if we use a
>>>>>> 64-bit integer instead of a double, we can have femtosecond accuracy
>>>>>> ;-)
>>>>>>
>>>>>> Any preference ?
>>>>>>
>>>>>> Even
>>>>> _______________________________________________
>>>>> gdal-dev mailing list
>>>>> gdal-dev at lists.osgeo.org
>>>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20150407/20dfe20a/attachment-0001.html>


More information about the gdal-dev mailing list