[gdal-dev] Design for sub-second accuracy in OGR ?

Dmitriy Baryshnikov bishop.dev at gmail.com
Mon Apr 6 02:32:33 PDT 2015


The first solution looks reasonable. But there is lack in precision 
field - there the only time is significant:

ODTP_HMSm
ODTP_HMS
ODTP_HM
ODTP_H

etc.

Best regards,
     Dmitry

05.04.2015 22:25, Even Rouault пишет:
> Hi,
>
> In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680, which is
> about lack of precision of the current datetime structure, I've imagined
> different solutions how to modify the OGRField structure, and failed to pick up
> one that would be the obvious solution, so opinions are welcome.
>
> The issue is how to add (at least) microsecond accuracy to the datetime
> structure, as a few formats support it explicitely or implicitely : MapInfo,
> GPX, Atom (GeoRSS driver), GeoPackage, SQLite, PostgreSQL, CSV, GeoJSON, ODS,
> XLSX, KML (potentially GML too)...
>
> Below a few potential solutions :
>
> ---------------------------------------
> Solution 1) : Millisecond accuracy, second becomes a float
>
> This is the solution I've prototyped.
>
> typedef union {
> [...]
>      struct {
>          GInt16  Year;
>          GByte   Month;
>          GByte   Day;
>          GByte   Hour;
>          GByte   Minute;
>          GByte   TZFlag;
>          GByte   Precision; /* value in OGRDateTimePrecision */
>          float   Second; /* from 00.000 to 60.999 (millisecond accuracy) */
>      } Date;
> } OGRField
>
> So sub-second precision is representing with a single precision floating point
> number, storing both integral and decimal parts. (we could theorically have a
> hundredth of millisecond accuracy, 10^-5 s, since 6099999 fits on the 23 bits
> of the mantissa)
>
> Another addition is the Precision field that indicates which parts of the
> datetime structure are significant.
>
> /** Enumeration that defines the precision of a DateTime.
>    * @since GDAL 2.0
>    */
> typedef enum
> {
>      ODTP_Undefined,     /**< Undefined */
>      ODTP_Guess,         /**< Only valid when setting through SetField(i,year,
> month...) where OGR will guess */
>      ODTP_Y,             /**< Year is significant */
>      ODTP_YM,            /**< Year and month are significant*/
>      ODTP_YMD,           /**< Year, month and day are significant */
>      ODTP_YMDH,          /**< Year, month, day (if OFTDateTime) and hour are
> significant */
>      ODTP_YMDHM,         /**< Year, month, day (if OFTDateTime), hour and
> minute are significant */
>      ODTP_YMDHMS,        /**< Year, month, day (if OFTDateTime), hour, minute
> and integral second are significant */
>      ODTP_YMDHMSm,       /**< Year, month, day (if OFTDateTime), hour, minute
> and second with microseconds are significant */
> } OGRDateTimePrecision;
>
> I think this is important since "2015/04/05 17:12:34" and "2015/04/05
> 17:12:34.000" do not really mean the same thing and it might be good to be
> able to preserve the original accuracy when converting between formats.
>
> A drawback of this solution is that the size of the OGRField structure
> increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on 64 bit).
> This is probably not that important since in most cases not that many OGRField
> structures are instanciated at one time (typically, you iterate over features
> one at a time).
> This could be more of a problem for use cases that involve the MEM driver, as
> it keep all features in memory.
>
> Another drawback is that the change of the structure might not be directly
> noticed by application developers as the Second field name is preserved, but a
> new Precision field is added, so there's a risk that Precision is let
> uninitialized if the field is set through OGRFeature::SetField(int iFieldIndex,
> OGRField* psRawField). That could lead to unexpected formatting (but hopefully
> not crashes with defensive programming). However I'd think it is unlikely that
> many applications directly manipulate OGRField directly, instead of using the
> getters and setters of OGRFeature.
>
> ---------------------------------------
> Solution 2) : Millisecond accuracy, second and milliseconds in distinct fields
>
> typedef union {
> [...]
>      struct {
>          GInt16  Year;
>          GByte   Month;
>          GByte   Day;
>          GByte   Hour;
>          GByte   Minute;
>          GByte   TZFlag;
>          GByte   Precision; /* value in OGRDateTimePrecision */
>          GByte   Second; /* from 0 to 60 */
> 	GUInt16 Millisecond; /* from 0 to 999 */
>      } Date;
> } OGRField
>
> Same size of structure as in 1)
>
> ---------------------------------------
> Solution 3) : Millisecond accuracy, pack all fields
>
> Conceptually, this would use bit fields to avoid wasting unused bits :
>
> typedef union {
> [...]
>    struct {
>      GInt16        Year;
>      GUIntBig     Month:4;
>      GUIntBig     Day:5;
>      GUIntBig     Hour:5;
>      GUIntBig     Minute:6;
>      GUIntBig     Second:6;
>      GUIntBig     Millisecond:10; /* 0-999 */
>      GUIntBig     TZFlag:8;
>      GUIntBig     Precision:4;
>   } Date;
> } OGRField;
>
> This was proposed in the above mentionned ticket. And as there were enough
> remaining bits, I've also added the Precision field (and in all other
> solutions).
>
> The advantage is that sizeof(mydate) remains 8 bytes on 32 bits builds.
>
> But the C standard only defines bitfields of int/unsigned int, so this is not
> portable, plus the fact that the way bits are packed is not defined by the
> standard, so different compilers could come up with different packing. A
> workaround is to do the bit manipulation through macros :
>
> typedef union {
> [...]
>    struct {
> 	GUIntBig	opaque;
>    } Date;
> } OGRField;
>
> #define GET_BITS(x,y_bits,shift)        (int)(((x).Date.opaque >> (shift)) &
> ((1 << (y_bits))-1))
>
> #define GET_YEAR(x)              (short)GET_BITS(x,16,64-16)
> #define GET_MONTH(x)             GET_BITS(x,4,64-16-4)
> #define GET_DAY(x)               GET_BITS(x,5,64-16-4-5)
> #define GET_HOUR(x)              GET_BITS(x,5,64-16-4-5-5)
> #define GET_MINUTE(x)            GET_BITS(x,6,64-16-4-5-5-6)
> #define GET_SECOND(x)            GET_BITS(x,6,64-16-4-5-5-6-6)
> #define GET_MILLISECOND(x)       GET_BITS(x,10,64-16-4-5-5-6-6-10)
> #define GET_TZFLAG(x)            GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
> #define GET_PRECISION(x)         GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)
>
> #define SET_BITS(x,y,y_bits,shift)  (x).Date.opaque = ((x).Date.opaque & (~(
> (GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) << (shift)))
>
> #define SET_YEAR(x,val)            SET_BITS(x,val,16,64-16)
> #define SET_MONTH(x,val)           SET_BITS(x,val,4,64-16-4)
> #define SET_DAY(x,val)             SET_BITS(x,val,5,64-16-4-5)
> #define SET_HOUR(x,val)            SET_BITS(x,val,5,64-16-4-5-5)
> #define SET_MINUTE(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6)
> #define SET_SECOND(x,val)          SET_BITS(x,val,6,64-16-4-5-5-6-6)
> #define SET_MILLISECOND(x,val)     SET_BITS(x,val,10,64-16-4-5-5-6-6-10)
> #define SET_TZFLAG(x,val)          SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8)
> #define SET_PRECISION(x,val)       SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)
>
> Main advantage: the size of OGRField remains unchanged (so 8 bytes on 32-bit
> builds).
>
> Drawback: manipulation of datetime members is less natural, but there are not
> that many places in the GDAL code base were the OGRField.Date members are
> used, so it is not much that a problem.
>
> ---------------------------------------
> Solution 4) : Microsecond accuracy with one field
>
> Solution 1) used a float for second and sub-second, but a float has only 23 bits
> of mantissa, which is enough to represent second with millisecond accuracy,
> but not for microsecond (you need 26 bits for that). So use a 32-bit integer
> instead of a 32-bit floating point.
>
> typedef union {
> [...]
>      struct {
>          GInt16  Year;
>          GByte   Month;
>          GByte   Day;
>          GByte   Hour;
>          GByte   Minute;
>          GByte   TZFlag;
>          GByte   Precision; /* value in OGRDateTimePrecision */
>          GUInt32 Microseconds; /* 00000000 to 59999999 */
>      } Date;
> } OGRField
>
> Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds (and
> remain 16 bytes on 64-bit builds)
>
> We would need to add an extra value in OGRDateTimePrecision to mean the
> microsecond accuracy.
>
> Not really clear we need microseconds accuracy... Most formats that support
> subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
> DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals beyond
> second. From http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
> PostgreSQL supports microsecond accuracy.
>
> ---------------------------------------
> Solution 5) : Microsecond with 3 fields
>
> A variant where we split second into 3 integer parts:
>
> typedef union {
> [...]
>      struct {
>          GInt16  Year;
>          GByte   Month;
>          GByte   Day;
>          GByte   Hour;
>          GByte   Minute;
>          GByte   TZFlag;
>          GByte   Precision; /* value in OGRDateTimePrecision */
> 	GByte   Second; /* 0 to 59 */
>          GUInt16  Millisecond; /* 0 to 999 */
>          GUInt16 Microsecond; /* 0 to 999 */
>      } Date;
> } OGRField
>
> Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on 32-bit builds
> (and remain 16 bytes on 64-bit builds)
>
> ---------------------------------------
> Solution 6) : Nanosecond accuracy and beyond !
>
> Now that we are using 16 bytes, why not having nanosecond accuracy ?
>
> typedef union {
> [...]
>      struct {
>          GInt16  Year;
>          GByte   Month;
>          GByte   Day;
>          GByte   Hour;
>          GByte   Minute;
>          GByte   TZFlag;
>          GByte   Precision; /* value in OGRDateTimePrecision */
> 	double   Second; /* 0.000000000 to 60.999999999 */
>      } Date;
> } OGRField
>
> Actually we even have picosecond accuracy! (since for picoseconds, we need 46
> bits and a double has 52 bits of mantissa). And if we use a 64-bit integer
> instead of a double, we can have femtosecond accuracy ;-)
>
> Any preference ?
>
> Even
>



More information about the gdal-dev mailing list