[gdal-dev] Design for sub-second accuracy in OGR ?
Even Rouault
even.rouault at spatialys.com
Sun Apr 5 12:25:53 PDT 2015
Hi,
In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680, which is
about lack of precision of the current datetime structure, I've imagined
different solutions how to modify the OGRField structure, and failed to pick up
one that would be the obvious solution, so opinions are welcome.
The issue is how to add (at least) microsecond accuracy to the datetime
structure, as a few formats support it explicitely or implicitely : MapInfo,
GPX, Atom (GeoRSS driver), GeoPackage, SQLite, PostgreSQL, CSV, GeoJSON, ODS,
XLSX, KML (potentially GML too)...
Below a few potential solutions :
---------------------------------------
Solution 1) : Millisecond accuracy, second becomes a float
This is the solution I've prototyped.
typedef union {
[...]
struct {
GInt16 Year;
GByte Month;
GByte Day;
GByte Hour;
GByte Minute;
GByte TZFlag;
GByte Precision; /* value in OGRDateTimePrecision */
float Second; /* from 00.000 to 60.999 (millisecond accuracy) */
} Date;
} OGRField
So sub-second precision is representing with a single precision floating point
number, storing both integral and decimal parts. (we could theorically have a
hundredth of millisecond accuracy, 10^-5 s, since 6099999 fits on the 23 bits
of the mantissa)
Another addition is the Precision field that indicates which parts of the
datetime structure are significant.
/** Enumeration that defines the precision of a DateTime.
* @since GDAL 2.0
*/
typedef enum
{
ODTP_Undefined, /**< Undefined */
ODTP_Guess, /**< Only valid when setting through SetField(i,year,
month...) where OGR will guess */
ODTP_Y, /**< Year is significant */
ODTP_YM, /**< Year and month are significant*/
ODTP_YMD, /**< Year, month and day are significant */
ODTP_YMDH, /**< Year, month, day (if OFTDateTime) and hour are
significant */
ODTP_YMDHM, /**< Year, month, day (if OFTDateTime), hour and
minute are significant */
ODTP_YMDHMS, /**< Year, month, day (if OFTDateTime), hour, minute
and integral second are significant */
ODTP_YMDHMSm, /**< Year, month, day (if OFTDateTime), hour, minute
and second with microseconds are significant */
} OGRDateTimePrecision;
I think this is important since "2015/04/05 17:12:34" and "2015/04/05
17:12:34.000" do not really mean the same thing and it might be good to be
able to preserve the original accuracy when converting between formats.
A drawback of this solution is that the size of the OGRField structure
increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on 64 bit).
This is probably not that important since in most cases not that many OGRField
structures are instanciated at one time (typically, you iterate over features
one at a time).
This could be more of a problem for use cases that involve the MEM driver, as
it keep all features in memory.
Another drawback is that the change of the structure might not be directly
noticed by application developers as the Second field name is preserved, but a
new Precision field is added, so there's a risk that Precision is let
uninitialized if the field is set through OGRFeature::SetField(int iFieldIndex,
OGRField* psRawField). That could lead to unexpected formatting (but hopefully
not crashes with defensive programming). However I'd think it is unlikely that
many applications directly manipulate OGRField directly, instead of using the
getters and setters of OGRFeature.
---------------------------------------
Solution 2) : Millisecond accuracy, second and milliseconds in distinct fields
typedef union {
[...]
struct {
GInt16 Year;
GByte Month;
GByte Day;
GByte Hour;
GByte Minute;
GByte TZFlag;
GByte Precision; /* value in OGRDateTimePrecision */
GByte Second; /* from 0 to 60 */
GUInt16 Millisecond; /* from 0 to 999 */
} Date;
} OGRField
Same size of structure as in 1)
---------------------------------------
Solution 3) : Millisecond accuracy, pack all fields
Conceptually, this would use bit fields to avoid wasting unused bits :
typedef union {
[...]
struct {
GInt16 Year;
GUIntBig Month:4;
GUIntBig Day:5;
GUIntBig Hour:5;
GUIntBig Minute:6;
GUIntBig Second:6;
GUIntBig Millisecond:10; /* 0-999 */
GUIntBig TZFlag:8;
GUIntBig Precision:4;
} Date;
} OGRField;
This was proposed in the above mentionned ticket. And as there were enough
remaining bits, I've also added the Precision field (and in all other
solutions).
The advantage is that sizeof(mydate) remains 8 bytes on 32 bits builds.
But the C standard only defines bitfields of int/unsigned int, so this is not
portable, plus the fact that the way bits are packed is not defined by the
standard, so different compilers could come up with different packing. A
workaround is to do the bit manipulation through macros :
typedef union {
[...]
struct {
GUIntBig opaque;
} Date;
} OGRField;
#define GET_BITS(x,y_bits,shift) (int)(((x).Date.opaque >> (shift)) &
((1 << (y_bits))-1))
#define GET_YEAR(x) (short)GET_BITS(x,16,64-16)
#define GET_MONTH(x) GET_BITS(x,4,64-16-4)
#define GET_DAY(x) GET_BITS(x,5,64-16-4-5)
#define GET_HOUR(x) GET_BITS(x,5,64-16-4-5-5)
#define GET_MINUTE(x) GET_BITS(x,6,64-16-4-5-5-6)
#define GET_SECOND(x) GET_BITS(x,6,64-16-4-5-5-6-6)
#define GET_MILLISECOND(x) GET_BITS(x,10,64-16-4-5-5-6-6-10)
#define GET_TZFLAG(x) GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
#define GET_PRECISION(x) GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)
#define SET_BITS(x,y,y_bits,shift) (x).Date.opaque = ((x).Date.opaque & (~(
(GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) << (shift)))
#define SET_YEAR(x,val) SET_BITS(x,val,16,64-16)
#define SET_MONTH(x,val) SET_BITS(x,val,4,64-16-4)
#define SET_DAY(x,val) SET_BITS(x,val,5,64-16-4-5)
#define SET_HOUR(x,val) SET_BITS(x,val,5,64-16-4-5-5)
#define SET_MINUTE(x,val) SET_BITS(x,val,6,64-16-4-5-5-6)
#define SET_SECOND(x,val) SET_BITS(x,val,6,64-16-4-5-5-6-6)
#define SET_MILLISECOND(x,val) SET_BITS(x,val,10,64-16-4-5-5-6-6-10)
#define SET_TZFLAG(x,val) SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8)
#define SET_PRECISION(x,val) SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)
Main advantage: the size of OGRField remains unchanged (so 8 bytes on 32-bit
builds).
Drawback: manipulation of datetime members is less natural, but there are not
that many places in the GDAL code base were the OGRField.Date members are
used, so it is not much that a problem.
---------------------------------------
Solution 4) : Microsecond accuracy with one field
Solution 1) used a float for second and sub-second, but a float has only 23 bits
of mantissa, which is enough to represent second with millisecond accuracy,
but not for microsecond (you need 26 bits for that). So use a 32-bit integer
instead of a 32-bit floating point.
typedef union {
[...]
struct {
GInt16 Year;
GByte Month;
GByte Day;
GByte Hour;
GByte Minute;
GByte TZFlag;
GByte Precision; /* value in OGRDateTimePrecision */
GUInt32 Microseconds; /* 00000000 to 59999999 */
} Date;
} OGRField
Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds (and
remain 16 bytes on 64-bit builds)
We would need to add an extra value in OGRDateTimePrecision to mean the
microsecond accuracy.
Not really clear we need microseconds accuracy... Most formats that support
subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals beyond
second. From http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
PostgreSQL supports microsecond accuracy.
---------------------------------------
Solution 5) : Microsecond with 3 fields
A variant where we split second into 3 integer parts:
typedef union {
[...]
struct {
GInt16 Year;
GByte Month;
GByte Day;
GByte Hour;
GByte Minute;
GByte TZFlag;
GByte Precision; /* value in OGRDateTimePrecision */
GByte Second; /* 0 to 59 */
GUInt16 Millisecond; /* 0 to 999 */
GUInt16 Microsecond; /* 0 to 999 */
} Date;
} OGRField
Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on 32-bit builds
(and remain 16 bytes on 64-bit builds)
---------------------------------------
Solution 6) : Nanosecond accuracy and beyond !
Now that we are using 16 bytes, why not having nanosecond accuracy ?
typedef union {
[...]
struct {
GInt16 Year;
GByte Month;
GByte Day;
GByte Hour;
GByte Minute;
GByte TZFlag;
GByte Precision; /* value in OGRDateTimePrecision */
double Second; /* 0.000000000 to 60.999999999 */
} Date;
} OGRField
Actually we even have picosecond accuracy! (since for picoseconds, we need 46
bits and a double has 52 bits of mantissa). And if we use a 64-bit integer
instead of a double, we can have femtosecond accuracy ;-)
Any preference ?
Even
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list