[gdal-dev] Design for sub-second accuracy in OGR ?

Sun Apr 5 13:10:27 PDT 2015

Le dimanche 05 avril 2015 21:58:58, George Watson a écrit :
> Should the second value ‘range’ be 0-59.999 instead of 0-60.999?

Arg, I knew I shouldn't have mentionned that... 60.999 is for the case where a 
leap second is inserted once every few years (e.g 1998-12-31T23:59:60.75 )... 
But I'm not sure if we ever intended to deal with leap seconds before and do 
*not* want to change that ;-) This has no influence on any of the choice 
discussed for sub-second accuracy.

> 
> George K. Watson
> 
> > On Apr 5, 2015, at 1:25 PM, Even Rouault <even.rouault at spatialys.com>
> > wrote:
> > 
> > Hi,
> > 
> > In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680, which
> > is about lack of precision of the current datetime structure, I've
> > imagined different solutions how to modify the OGRField structure, and
> > failed to pick up one that would be the obvious solution, so opinions
> > are welcome.
> > 
> > The issue is how to add (at least) microsecond accuracy to the datetime
> > structure, as a few formats support it explicitely or implicitely :
> > MapInfo, GPX, Atom (GeoRSS driver), GeoPackage, SQLite, PostgreSQL, CSV,
> > GeoJSON, ODS, XLSX, KML (potentially GML too)...
> > 
> > Below a few potential solutions :
> > 
> > ---------------------------------------
> > Solution 1) : Millisecond accuracy, second becomes a float
> > 
> > This is the solution I've prototyped.
> > 
> > typedef union {
> > [...]
> > 
> >    struct {
> >    
> >        GInt16  Year;
> >        GByte   Month;
> >        GByte   Day;
> >        GByte   Hour;
> >        GByte   Minute;
> >        GByte   TZFlag;
> >        GByte   Precision; /* value in OGRDateTimePrecision */
> >        float   Second; /* from 00.000 to 60.999 (millisecond accuracy) */
> >    
> >    } Date;
> > 
> > } OGRField
> > 
> > So sub-second precision is representing with a single precision floating
> > point number, storing both integral and decimal parts. (we could
> > theorically have a hundredth of millisecond accuracy, 10^-5 s, since
> > 6099999 fits on the 23 bits of the mantissa)
> > 
> > Another addition is the Precision field that indicates which parts of the
> > datetime structure are significant.
> > 
> > /** Enumeration that defines the precision of a DateTime.
> > 
> >  * @since GDAL 2.0
> >  */
> > 
> > typedef enum
> > {
> > 
> >    ODTP_Undefined,     /**< Undefined */
> >    ODTP_Guess,         /**< Only valid when setting through
> >    SetField(i,year,
> > 
> > month...) where OGR will guess */
> > 
> >    ODTP_Y,             /**< Year is significant */
> >    ODTP_YM,            /**< Year and month are significant*/
> >    ODTP_YMD,           /**< Year, month and day are significant */
> >    ODTP_YMDH,          /**< Year, month, day (if OFTDateTime) and hour
> >    are
> > 
> > significant */
> > 
> >    ODTP_YMDHM,         /**< Year, month, day (if OFTDateTime), hour and
> > 
> > minute are significant */
> > 
> >    ODTP_YMDHMS,        /**< Year, month, day (if OFTDateTime), hour,
> >    minute
> > 
> > and integral second are significant */
> > 
> >    ODTP_YMDHMSm,       /**< Year, month, day (if OFTDateTime), hour,
> >    minute
> > 
> > and second with microseconds are significant */
> > } OGRDateTimePrecision;
> > 
> > I think this is important since "2015/04/05 17:12:34" and "2015/04/05
> > 17:12:34.000" do not really mean the same thing and it might be good to
> > be able to preserve the original accuracy when converting between
> > formats.
> > 
> > A drawback of this solution is that the size of the OGRField structure
> > increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on 64
> > bit). This is probably not that important since in most cases not that
> > many OGRField structures are instanciated at one time (typically, you
> > iterate over features one at a time).
> > This could be more of a problem for use cases that involve the MEM
> > driver, as it keep all features in memory.
> > 
> > Another drawback is that the change of the structure might not be
> > directly noticed by application developers as the Second field name is
> > preserved, but a new Precision field is added, so there's a risk that
> > Precision is let uninitialized if the field is set through
> > OGRFeature::SetField(int iFieldIndex, OGRField* psRawField). That could
> > lead to unexpected formatting (but hopefully not crashes with defensive
> > programming). However I'd think it is unlikely that many applications
> > directly manipulate OGRField directly, instead of using the getters and
> > setters of OGRFeature.
> > 
> > ---------------------------------------
> > Solution 2) : Millisecond accuracy, second and milliseconds in distinct
> > fields
> > 
> > typedef union {
> > [...]
> > 
> >    struct {
> >    
> >        GInt16  Year;
> >        GByte   Month;
> >        GByte   Day;
> >        GByte   Hour;
> >        GByte   Minute;
> >        GByte   TZFlag;
> >        GByte   Precision; /* value in OGRDateTimePrecision */
> >        GByte   Second; /* from 0 to 60 */
> > 	
> > 	GUInt16 Millisecond; /* from 0 to 999 */
> > 	
> >    } Date;
> > 
> > } OGRField
> > 
> > Same size of structure as in 1)
> > 
> > ---------------------------------------
> > Solution 3) : Millisecond accuracy, pack all fields
> > 
> > Conceptually, this would use bit fields to avoid wasting unused bits :
> > 
> > typedef union {
> > [...]
> > 
> >  struct {
> >  
> >    GInt16        Year;
> >    GUIntBig     Month:4;
> >    GUIntBig     Day:5;
> >    GUIntBig     Hour:5;
> >    GUIntBig     Minute:6;
> >    GUIntBig     Second:6;
> >    GUIntBig     Millisecond:10; /* 0-999 */
> >    GUIntBig     TZFlag:8;
> >    GUIntBig     Precision:4;
> > 
> > } Date;
> > } OGRField;
> > 
> > This was proposed in the above mentionned ticket. And as there were
> > enough remaining bits, I've also added the Precision field (and in all
> > other solutions).
> > 
> > The advantage is that sizeof(mydate) remains 8 bytes on 32 bits builds.
> > 
> > But the C standard only defines bitfields of int/unsigned int, so this is
> > not portable, plus the fact that the way bits are packed is not defined
> > by the standard, so different compilers could come up with different
> > packing. A workaround is to do the bit manipulation through macros :
> > 
> > typedef union {
> > [...]
> > 
> >  struct {
> >  
> > 	GUIntBig	opaque;
> > 	
> >  } Date;
> > 
> > } OGRField;
> > 
> > #define GET_BITS(x,y_bits,shift)        (int)(((x).Date.opaque >>
> > (shift)) & ((1 << (y_bits))-1))
> > 
> > #define GET_YEAR(x)              (short)GET_BITS(x,16,64-16)
> > #define GET_MONTH(x)             GET_BITS(x,4,64-16-4)
> > #define GET_DAY(x)               GET_BITS(x,5,64-16-4-5)
> > #define GET_HOUR(x)              GET_BITS(x,5,64-16-4-5-5)
> > #define GET_MINUTE(x)            GET_BITS(x,6,64-16-4-5-5-6)
> > #define GET_SECOND(x)            GET_BITS(x,6,64-16-4-5-5-6-6)
> > #define GET_MILLISECOND(x)       GET_BITS(x,10,64-16-4-5-5-6-6-10)
> > #define GET_TZFLAG(x)            GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
> > #define GET_PRECISION(x)         GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)
> > 
> > #define SET_BITS(x,y,y_bits,shift)  (x).Date.opaque = ((x).Date.opaque &
> > (~( (GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) <<
> > (shift)))
> > 
> > #define SET_YEAR(x,val)            SET_BITS(x,val,16,64-16)
> > #define SET_MONTH(x,val)           SET_BITS(x,val,4,64-16-4)
> > #define SET_DAY(x,val)             SET_BITS(x,val,5,64-16-4-5)
> > #define SET_HOUR(x,val)            SET_BITS(x,val,5,64-16-4-5-5)
> > #define SET_MINUTE(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6)
> > #define SET_SECOND(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6-6)
> > #define SET_MILLISECOND(x,val)     SET_BITS(x,val,10,64-16-4-5-5-6-6-10)
> > #define SET_TZFLAG(x,val)          SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8)
> > #define SET_PRECISION(x,val)      
> > SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)
> > 
> > Main advantage: the size of OGRField remains unchanged (so 8 bytes on
> > 32-bit builds).
> > 
> > Drawback: manipulation of datetime members is less natural, but there are
> > not that many places in the GDAL code base were the OGRField.Date
> > members are used, so it is not much that a problem.
> > 
> > ---------------------------------------
> > Solution 4) : Microsecond accuracy with one field
> > 
> > Solution 1) used a float for second and sub-second, but a float has only
> > 23 bits of mantissa, which is enough to represent second with
> > millisecond accuracy, but not for microsecond (you need 26 bits for
> > that). So use a 32-bit integer instead of a 32-bit floating point.
> > 
> > typedef union {
> > [...]
> > 
> >    struct {
> >    
> >        GInt16  Year;
> >        GByte   Month;
> >        GByte   Day;
> >        GByte   Hour;
> >        GByte   Minute;
> >        GByte   TZFlag;
> >        GByte   Precision; /* value in OGRDateTimePrecision */
> >        GUInt32 Microseconds; /* 00000000 to 59999999 */
> >    
> >    } Date;
> > 
> > } OGRField
> > 
> > Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds
> > (and remain 16 bytes on 64-bit builds)
> > 
> > We would need to add an extra value in OGRDateTimePrecision to mean the
> > microsecond accuracy.
> > 
> > Not really clear we need microseconds accuracy... Most formats that
> > support subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
> > DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals
> > beyond second. From
> > http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
> > PostgreSQL supports microsecond accuracy.
> > 
> > ---------------------------------------
> > Solution 5) : Microsecond with 3 fields
> > 
> > A variant where we split second into 3 integer parts:
> > 
> > typedef union {
> > [...]
> > 
> >    struct {
> >    
> >        GInt16  Year;
> >        GByte   Month;
> >        GByte   Day;
> >        GByte   Hour;
> >        GByte   Minute;
> >        GByte   TZFlag;
> >        GByte   Precision; /* value in OGRDateTimePrecision */
> > 	
> > 	GByte   Second; /* 0 to 59 */
> > 	
> >        GUInt16  Millisecond; /* 0 to 999 */
> >        GUInt16 Microsecond; /* 0 to 999 */
> >    
> >    } Date;
> > 
> > } OGRField
> > 
> > Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on 32-bit
> > builds (and remain 16 bytes on 64-bit builds)
> > 
> > ---------------------------------------
> > Solution 6) : Nanosecond accuracy and beyond !
> > 
> > Now that we are using 16 bytes, why not having nanosecond accuracy ?
> > 
> > typedef union {
> > [...]
> > 
> >    struct {
> >    
> >        GInt16  Year;
> >        GByte   Month;
> >        GByte   Day;
> >        GByte   Hour;
> >        GByte   Minute;
> >        GByte   TZFlag;
> >        GByte   Precision; /* value in OGRDateTimePrecision */
> > 	
> > 	double   Second; /* 0.000000000 to 60.999999999 */
> > 	
> >    } Date;
> > 
> > } OGRField
> > 
> > Actually we even have picosecond accuracy! (since for picoseconds, we
> > need 46 bits and a double has 52 bits of mantissa). And if we use a
> > 64-bit integer instead of a double, we can have femtosecond accuracy ;-)
> > 
> > Any preference ?
> > 
> > Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com