[gdal-dev] Design for sub-second accuracy in OGR ?

Dmitriy Baryshnikov bishop.dev at gmail.com
Mon Apr 6 14:11:21 PDT 2015


Hi Even,

It seems to me that this is duplicating of RFC 50: OGR field subtypes.
For example we have the master field type DateTime and Subtype - Year.
So the internal structure for date/time representation may be adopt to 
such technique.

Best regards,
     Dmitry

06.04.2015 15:02, Even Rouault пишет:
> Le lundi 06 avril 2015 13:48:47, Even Rouault a écrit :
>> Le lundi 06 avril 2015 11:32:33, Dmitriy Baryshnikov a écrit :
>>> The first solution looks reasonable. But there is lack in precision
>>> field - there the only time is significant:
>>>
>>> ODTP_HMSm
>>> ODTP_HMS
>>> ODTP_HM
>>> ODTP_H
>> As I didn't want to multiply the values in the enumeration, my intent was
>> to reuse the ODTP_YMDxxxx values for OFTTime only.
> I meant "for OFTTime too"
>
>> This was what I wanted
>> to intend with the precision between parenthesis in the comment of
>> ODTP_YMDH "Year, month, day (if OFTDateTime) and hour"
>>
>> Or perhaps, the enumeration should capture the most precise part of the
>> (date)time structure  ?
>> ODTP_Year
>> ODTP_Month
>> ODTP_Day
>> ODTP_Hour
>> ODTP_Minute
>> ODTP_Second
>> ODTP_Millisecond
>>
>>> etc.
>>>
>>> Best regards,
>>>
>>>       Dmitry
>>>
>>> 05.04.2015 22:25, Even Rouault пишет:
>>>> Hi,
>>>>
>>>> In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680,
>>>> which is about lack of precision of the current datetime structure,
>>>> I've imagined different solutions how to modify the OGRField
>>>> structure, and failed to pick up one that would be the obvious
>>>> solution, so opinions are welcome.
>>>>
>>>> The issue is how to add (at least) microsecond accuracy to the datetime
>>>> structure, as a few formats support it explicitely or implicitely :
>>>> MapInfo, GPX, Atom (GeoRSS driver), GeoPackage, SQLite, PostgreSQL,
>>>> CSV, GeoJSON, ODS, XLSX, KML (potentially GML too)...
>>>>
>>>> Below a few potential solutions :
>>>>
>>>> ---------------------------------------
>>>> Solution 1) : Millisecond accuracy, second becomes a float
>>>>
>>>> This is the solution I've prototyped.
>>>>
>>>> typedef union {
>>>> [...]
>>>>
>>>>       struct {
>>>>       
>>>>           GInt16  Year;
>>>>           GByte   Month;
>>>>           GByte   Day;
>>>>           GByte   Hour;
>>>>           GByte   Minute;
>>>>           GByte   TZFlag;
>>>>           GByte   Precision; /* value in OGRDateTimePrecision */
>>>>           float   Second; /* from 00.000 to 60.999 (millisecond
>>>>           accuracy) */
>>>>       
>>>>       } Date;
>>>>
>>>> } OGRField
>>>>
>>>> So sub-second precision is representing with a single precision
>>>> floating point number, storing both integral and decimal parts. (we
>>>> could theorically have a hundredth of millisecond accuracy, 10^-5 s,
>>>> since 6099999 fits on the 23 bits of the mantissa)
>>>>
>>>> Another addition is the Precision field that indicates which parts of
>>>> the datetime structure are significant.
>>>>
>>>> /** Enumeration that defines the precision of a DateTime.
>>>>
>>>>     * @since GDAL 2.0
>>>>     */
>>>>
>>>> typedef enum
>>>> {
>>>>
>>>>       ODTP_Undefined,     /**< Undefined */
>>>>       ODTP_Guess,         /**< Only valid when setting through
>>>>       SetField(i,year,
>>>>
>>>> month...) where OGR will guess */
>>>>
>>>>       ODTP_Y,             /**< Year is significant */
>>>>       ODTP_YM,            /**< Year and month are significant*/
>>>>       ODTP_YMD,           /**< Year, month and day are significant */
>>>>       ODTP_YMDH,          /**< Year, month, day (if OFTDateTime) and
>>>>       hour are
>>>>
>>>> significant */
>>>>
>>>>       ODTP_YMDHM,         /**< Year, month, day (if OFTDateTime), hour
>>>>       and
>>>>
>>>> minute are significant */
>>>>
>>>>       ODTP_YMDHMS,        /**< Year, month, day (if OFTDateTime), hour,
>>>>       minute
>>>>
>>>> and integral second are significant */
>>>>
>>>>       ODTP_YMDHMSm,       /**< Year, month, day (if OFTDateTime), hour,
>>>>       minute
>>>>
>>>> and second with microseconds are significant */
>>>> } OGRDateTimePrecision;
>>>>
>>>> I think this is important since "2015/04/05 17:12:34" and "2015/04/05
>>>> 17:12:34.000" do not really mean the same thing and it might be good to
>>>> be able to preserve the original accuracy when converting between
>>>> formats.
>>>>
>>>> A drawback of this solution is that the size of the OGRField structure
>>>> increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on
>>>> 64 bit). This is probably not that important since in most cases not
>>>> that many OGRField structures are instanciated at one time (typically,
>>>> you iterate over features one at a time).
>>>> This could be more of a problem for use cases that involve the MEM
>>>> driver, as it keep all features in memory.
>>>>
>>>> Another drawback is that the change of the structure might not be
>>>> directly noticed by application developers as the Second field name is
>>>> preserved, but a new Precision field is added, so there's a risk that
>>>> Precision is let uninitialized if the field is set through
>>>> OGRFeature::SetField(int iFieldIndex, OGRField* psRawField). That could
>>>> lead to unexpected formatting (but hopefully not crashes with defensive
>>>> programming). However I'd think it is unlikely that many applications
>>>> directly manipulate OGRField directly, instead of using the getters and
>>>> setters of OGRFeature.
>>>>
>>>> ---------------------------------------
>>>> Solution 2) : Millisecond accuracy, second and milliseconds in distinct
>>>> fields
>>>>
>>>> typedef union {
>>>> [...]
>>>>
>>>>       struct {
>>>>       
>>>>           GInt16  Year;
>>>>           GByte   Month;
>>>>           GByte   Day;
>>>>           GByte   Hour;
>>>>           GByte   Minute;
>>>>           GByte   TZFlag;
>>>>           GByte   Precision; /* value in OGRDateTimePrecision */
>>>>           GByte   Second; /* from 0 to 60 */
>>>> 	
>>>> 	GUInt16 Millisecond; /* from 0 to 999 */
>>>> 	
>>>>       } Date;
>>>>
>>>> } OGRField
>>>>
>>>> Same size of structure as in 1)
>>>>
>>>> ---------------------------------------
>>>> Solution 3) : Millisecond accuracy, pack all fields
>>>>
>>>> Conceptually, this would use bit fields to avoid wasting unused bits :
>>>>
>>>> typedef union {
>>>> [...]
>>>>
>>>>     struct {
>>>>     
>>>>       GInt16        Year;
>>>>       GUIntBig     Month:4;
>>>>       GUIntBig     Day:5;
>>>>       GUIntBig     Hour:5;
>>>>       GUIntBig     Minute:6;
>>>>       GUIntBig     Second:6;
>>>>       GUIntBig     Millisecond:10; /* 0-999 */
>>>>       GUIntBig     TZFlag:8;
>>>>       GUIntBig     Precision:4;
>>>>    
>>>>    } Date;
>>>>
>>>> } OGRField;
>>>>
>>>> This was proposed in the above mentionned ticket. And as there were
>>>> enough remaining bits, I've also added the Precision field (and in all
>>>> other solutions).
>>>>
>>>> The advantage is that sizeof(mydate) remains 8 bytes on 32 bits builds.
>>>>
>>>> But the C standard only defines bitfields of int/unsigned int, so this
>>>> is not portable, plus the fact that the way bits are packed is not
>>>> defined by the standard, so different compilers could come up with
>>>> different packing. A workaround is to do the bit manipulation through
>>>> macros :
>>>>
>>>> typedef union {
>>>> [...]
>>>>
>>>>     struct {
>>>> 	
>>>> 	GUIntBig	opaque;
>>>> 	
>>>>     } Date;
>>>>
>>>> } OGRField;
>>>>
>>>> #define GET_BITS(x,y_bits,shift)        (int)(((x).Date.opaque >>
>>>> (shift)) & ((1 << (y_bits))-1))
>>>>
>>>> #define GET_YEAR(x)              (short)GET_BITS(x,16,64-16)
>>>> #define GET_MONTH(x)             GET_BITS(x,4,64-16-4)
>>>> #define GET_DAY(x)               GET_BITS(x,5,64-16-4-5)
>>>> #define GET_HOUR(x)              GET_BITS(x,5,64-16-4-5-5)
>>>> #define GET_MINUTE(x)            GET_BITS(x,6,64-16-4-5-5-6)
>>>> #define GET_SECOND(x)            GET_BITS(x,6,64-16-4-5-5-6-6)
>>>> #define GET_MILLISECOND(x)       GET_BITS(x,10,64-16-4-5-5-6-6-10)
>>>> #define GET_TZFLAG(x)            GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
>>>> #define GET_PRECISION(x)         GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)
>>>>
>>>> #define SET_BITS(x,y,y_bits,shift)  (x).Date.opaque = ((x).Date.opaque
>>>> & (~( (GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) <<
>>>> (shift)))
>>>>
>>>> #define SET_YEAR(x,val)            SET_BITS(x,val,16,64-16)
>>>> #define SET_MONTH(x,val)           SET_BITS(x,val,4,64-16-4)
>>>> #define SET_DAY(x,val)             SET_BITS(x,val,5,64-16-4-5)
>>>> #define SET_HOUR(x,val)            SET_BITS(x,val,5,64-16-4-5-5)
>>>> #define SET_MINUTE(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6)
>>>> #define SET_SECOND(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6-6)
>>>> #define SET_MILLISECOND(x,val)
>>>> SET_BITS(x,val,10,64-16-4-5-5-6-6-10) #define SET_TZFLAG(x,val)
>>>>    SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8) #define SET_PRECISION(x,val)
>>>> SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)
>>>>
>>>> Main advantage: the size of OGRField remains unchanged (so 8 bytes on
>>>> 32-bit builds).
>>>>
>>>> Drawback: manipulation of datetime members is less natural, but there
>>>> are not that many places in the GDAL code base were the OGRField.Date
>>>> members are used, so it is not much that a problem.
>>>>
>>>> ---------------------------------------
>>>> Solution 4) : Microsecond accuracy with one field
>>>>
>>>> Solution 1) used a float for second and sub-second, but a float has
>>>> only 23 bits of mantissa, which is enough to represent second with
>>>> millisecond accuracy, but not for microsecond (you need 26 bits for
>>>> that). So use a 32-bit integer instead of a 32-bit floating point.
>>>>
>>>> typedef union {
>>>> [...]
>>>>
>>>>       struct {
>>>>       
>>>>           GInt16  Year;
>>>>           GByte   Month;
>>>>           GByte   Day;
>>>>           GByte   Hour;
>>>>           GByte   Minute;
>>>>           GByte   TZFlag;
>>>>           GByte   Precision; /* value in OGRDateTimePrecision */
>>>>           GUInt32 Microseconds; /* 00000000 to 59999999 */
>>>>       
>>>>       } Date;
>>>>
>>>> } OGRField
>>>>
>>>> Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds
>>>> (and remain 16 bytes on 64-bit builds)
>>>>
>>>> We would need to add an extra value in OGRDateTimePrecision to mean the
>>>> microsecond accuracy.
>>>>
>>>> Not really clear we need microseconds accuracy... Most formats that
>>>> support subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
>>>> DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals
>>>> beyond second. From
>>>> http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
>>>> PostgreSQL supports microsecond accuracy.
>>>>
>>>> ---------------------------------------
>>>> Solution 5) : Microsecond with 3 fields
>>>>
>>>> A variant where we split second into 3 integer parts:
>>>>
>>>> typedef union {
>>>> [...]
>>>>
>>>>       struct {
>>>>       
>>>>           GInt16  Year;
>>>>           GByte   Month;
>>>>           GByte   Day;
>>>>           GByte   Hour;
>>>>           GByte   Minute;
>>>>           GByte   TZFlag;
>>>>           GByte   Precision; /* value in OGRDateTimePrecision */
>>>> 	
>>>> 	GByte   Second; /* 0 to 59 */
>>>> 	
>>>>           GUInt16  Millisecond; /* 0 to 999 */
>>>>           GUInt16 Microsecond; /* 0 to 999 */
>>>>       
>>>>       } Date;
>>>>
>>>> } OGRField
>>>>
>>>> Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on 32-bit
>>>> builds (and remain 16 bytes on 64-bit builds)
>>>>
>>>> ---------------------------------------
>>>> Solution 6) : Nanosecond accuracy and beyond !
>>>>
>>>> Now that we are using 16 bytes, why not having nanosecond accuracy ?
>>>>
>>>> typedef union {
>>>> [...]
>>>>
>>>>       struct {
>>>>       
>>>>           GInt16  Year;
>>>>           GByte   Month;
>>>>           GByte   Day;
>>>>           GByte   Hour;
>>>>           GByte   Minute;
>>>>           GByte   TZFlag;
>>>>           GByte   Precision; /* value in OGRDateTimePrecision */
>>>> 	
>>>> 	double   Second; /* 0.000000000 to 60.999999999 */
>>>> 	
>>>>       } Date;
>>>>
>>>> } OGRField
>>>>
>>>> Actually we even have picosecond accuracy! (since for picoseconds, we
>>>> need 46 bits and a double has 52 bits of mantissa). And if we use a
>>>> 64-bit integer instead of a double, we can have femtosecond accuracy
>>>> ;-)
>>>>
>>>> Any preference ?
>>>>
>>>> Even
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev



More information about the gdal-dev mailing list