[gdal-dev] Design for sub-second accuracy in OGR ?
Even Rouault
even.rouault at spatialys.com
Mon Apr 6 05:02:22 PDT 2015
Le lundi 06 avril 2015 13:48:47, Even Rouault a écrit :
> Le lundi 06 avril 2015 11:32:33, Dmitriy Baryshnikov a écrit :
> > The first solution looks reasonable. But there is lack in precision
> > field - there the only time is significant:
> >
> > ODTP_HMSm
> > ODTP_HMS
> > ODTP_HM
> > ODTP_H
>
> As I didn't want to multiply the values in the enumeration, my intent was
> to reuse the ODTP_YMDxxxx values for OFTTime only.
I meant "for OFTTime too"
> This was what I wanted
> to intend with the precision between parenthesis in the comment of
> ODTP_YMDH "Year, month, day (if OFTDateTime) and hour"
>
> Or perhaps, the enumeration should capture the most precise part of the
> (date)time structure ?
> ODTP_Year
> ODTP_Month
> ODTP_Day
> ODTP_Hour
> ODTP_Minute
> ODTP_Second
> ODTP_Millisecond
>
> > etc.
> >
> > Best regards,
> >
> > Dmitry
> >
> > 05.04.2015 22:25, Even Rouault пишет:
> > > Hi,
> > >
> > > In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680,
> > > which is about lack of precision of the current datetime structure,
> > > I've imagined different solutions how to modify the OGRField
> > > structure, and failed to pick up one that would be the obvious
> > > solution, so opinions are welcome.
> > >
> > > The issue is how to add (at least) microsecond accuracy to the datetime
> > > structure, as a few formats support it explicitely or implicitely :
> > > MapInfo, GPX, Atom (GeoRSS driver), GeoPackage, SQLite, PostgreSQL,
> > > CSV, GeoJSON, ODS, XLSX, KML (potentially GML too)...
> > >
> > > Below a few potential solutions :
> > >
> > > ---------------------------------------
> > > Solution 1) : Millisecond accuracy, second becomes a float
> > >
> > > This is the solution I've prototyped.
> > >
> > > typedef union {
> > > [...]
> > >
> > > struct {
> > >
> > > GInt16 Year;
> > > GByte Month;
> > > GByte Day;
> > > GByte Hour;
> > > GByte Minute;
> > > GByte TZFlag;
> > > GByte Precision; /* value in OGRDateTimePrecision */
> > > float Second; /* from 00.000 to 60.999 (millisecond
> > > accuracy) */
> > >
> > > } Date;
> > >
> > > } OGRField
> > >
> > > So sub-second precision is representing with a single precision
> > > floating point number, storing both integral and decimal parts. (we
> > > could theorically have a hundredth of millisecond accuracy, 10^-5 s,
> > > since 6099999 fits on the 23 bits of the mantissa)
> > >
> > > Another addition is the Precision field that indicates which parts of
> > > the datetime structure are significant.
> > >
> > > /** Enumeration that defines the precision of a DateTime.
> > >
> > > * @since GDAL 2.0
> > > */
> > >
> > > typedef enum
> > > {
> > >
> > > ODTP_Undefined, /**< Undefined */
> > > ODTP_Guess, /**< Only valid when setting through
> > > SetField(i,year,
> > >
> > > month...) where OGR will guess */
> > >
> > > ODTP_Y, /**< Year is significant */
> > > ODTP_YM, /**< Year and month are significant*/
> > > ODTP_YMD, /**< Year, month and day are significant */
> > > ODTP_YMDH, /**< Year, month, day (if OFTDateTime) and
> > > hour are
> > >
> > > significant */
> > >
> > > ODTP_YMDHM, /**< Year, month, day (if OFTDateTime), hour
> > > and
> > >
> > > minute are significant */
> > >
> > > ODTP_YMDHMS, /**< Year, month, day (if OFTDateTime), hour,
> > > minute
> > >
> > > and integral second are significant */
> > >
> > > ODTP_YMDHMSm, /**< Year, month, day (if OFTDateTime), hour,
> > > minute
> > >
> > > and second with microseconds are significant */
> > > } OGRDateTimePrecision;
> > >
> > > I think this is important since "2015/04/05 17:12:34" and "2015/04/05
> > > 17:12:34.000" do not really mean the same thing and it might be good to
> > > be able to preserve the original accuracy when converting between
> > > formats.
> > >
> > > A drawback of this solution is that the size of the OGRField structure
> > > increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on
> > > 64 bit). This is probably not that important since in most cases not
> > > that many OGRField structures are instanciated at one time (typically,
> > > you iterate over features one at a time).
> > > This could be more of a problem for use cases that involve the MEM
> > > driver, as it keep all features in memory.
> > >
> > > Another drawback is that the change of the structure might not be
> > > directly noticed by application developers as the Second field name is
> > > preserved, but a new Precision field is added, so there's a risk that
> > > Precision is let uninitialized if the field is set through
> > > OGRFeature::SetField(int iFieldIndex, OGRField* psRawField). That could
> > > lead to unexpected formatting (but hopefully not crashes with defensive
> > > programming). However I'd think it is unlikely that many applications
> > > directly manipulate OGRField directly, instead of using the getters and
> > > setters of OGRFeature.
> > >
> > > ---------------------------------------
> > > Solution 2) : Millisecond accuracy, second and milliseconds in distinct
> > > fields
> > >
> > > typedef union {
> > > [...]
> > >
> > > struct {
> > >
> > > GInt16 Year;
> > > GByte Month;
> > > GByte Day;
> > > GByte Hour;
> > > GByte Minute;
> > > GByte TZFlag;
> > > GByte Precision; /* value in OGRDateTimePrecision */
> > > GByte Second; /* from 0 to 60 */
> > >
> > > GUInt16 Millisecond; /* from 0 to 999 */
> > >
> > > } Date;
> > >
> > > } OGRField
> > >
> > > Same size of structure as in 1)
> > >
> > > ---------------------------------------
> > > Solution 3) : Millisecond accuracy, pack all fields
> > >
> > > Conceptually, this would use bit fields to avoid wasting unused bits :
> > >
> > > typedef union {
> > > [...]
> > >
> > > struct {
> > >
> > > GInt16 Year;
> > > GUIntBig Month:4;
> > > GUIntBig Day:5;
> > > GUIntBig Hour:5;
> > > GUIntBig Minute:6;
> > > GUIntBig Second:6;
> > > GUIntBig Millisecond:10; /* 0-999 */
> > > GUIntBig TZFlag:8;
> > > GUIntBig Precision:4;
> > >
> > > } Date;
> > >
> > > } OGRField;
> > >
> > > This was proposed in the above mentionned ticket. And as there were
> > > enough remaining bits, I've also added the Precision field (and in all
> > > other solutions).
> > >
> > > The advantage is that sizeof(mydate) remains 8 bytes on 32 bits builds.
> > >
> > > But the C standard only defines bitfields of int/unsigned int, so this
> > > is not portable, plus the fact that the way bits are packed is not
> > > defined by the standard, so different compilers could come up with
> > > different packing. A workaround is to do the bit manipulation through
> > > macros :
> > >
> > > typedef union {
> > > [...]
> > >
> > > struct {
> > >
> > > GUIntBig opaque;
> > >
> > > } Date;
> > >
> > > } OGRField;
> > >
> > > #define GET_BITS(x,y_bits,shift) (int)(((x).Date.opaque >>
> > > (shift)) & ((1 << (y_bits))-1))
> > >
> > > #define GET_YEAR(x) (short)GET_BITS(x,16,64-16)
> > > #define GET_MONTH(x) GET_BITS(x,4,64-16-4)
> > > #define GET_DAY(x) GET_BITS(x,5,64-16-4-5)
> > > #define GET_HOUR(x) GET_BITS(x,5,64-16-4-5-5)
> > > #define GET_MINUTE(x) GET_BITS(x,6,64-16-4-5-5-6)
> > > #define GET_SECOND(x) GET_BITS(x,6,64-16-4-5-5-6-6)
> > > #define GET_MILLISECOND(x) GET_BITS(x,10,64-16-4-5-5-6-6-10)
> > > #define GET_TZFLAG(x) GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
> > > #define GET_PRECISION(x) GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)
> > >
> > > #define SET_BITS(x,y,y_bits,shift) (x).Date.opaque = ((x).Date.opaque
> > > & (~( (GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) <<
> > > (shift)))
> > >
> > > #define SET_YEAR(x,val) SET_BITS(x,val,16,64-16)
> > > #define SET_MONTH(x,val) SET_BITS(x,val,4,64-16-4)
> > > #define SET_DAY(x,val) SET_BITS(x,val,5,64-16-4-5)
> > > #define SET_HOUR(x,val) SET_BITS(x,val,5,64-16-4-5-5)
> > > #define SET_MINUTE(x,val) SET_BITS(x,val,6,64-16-4-5-5-6)
> > > #define SET_SECOND(x,val) SET_BITS(x,val,6,64-16-4-5-5-6-6)
> > > #define SET_MILLISECOND(x,val)
> > > SET_BITS(x,val,10,64-16-4-5-5-6-6-10) #define SET_TZFLAG(x,val)
> > > SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8) #define SET_PRECISION(x,val)
> > > SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)
> > >
> > > Main advantage: the size of OGRField remains unchanged (so 8 bytes on
> > > 32-bit builds).
> > >
> > > Drawback: manipulation of datetime members is less natural, but there
> > > are not that many places in the GDAL code base were the OGRField.Date
> > > members are used, so it is not much that a problem.
> > >
> > > ---------------------------------------
> > > Solution 4) : Microsecond accuracy with one field
> > >
> > > Solution 1) used a float for second and sub-second, but a float has
> > > only 23 bits of mantissa, which is enough to represent second with
> > > millisecond accuracy, but not for microsecond (you need 26 bits for
> > > that). So use a 32-bit integer instead of a 32-bit floating point.
> > >
> > > typedef union {
> > > [...]
> > >
> > > struct {
> > >
> > > GInt16 Year;
> > > GByte Month;
> > > GByte Day;
> > > GByte Hour;
> > > GByte Minute;
> > > GByte TZFlag;
> > > GByte Precision; /* value in OGRDateTimePrecision */
> > > GUInt32 Microseconds; /* 00000000 to 59999999 */
> > >
> > > } Date;
> > >
> > > } OGRField
> > >
> > > Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds
> > > (and remain 16 bytes on 64-bit builds)
> > >
> > > We would need to add an extra value in OGRDateTimePrecision to mean the
> > > microsecond accuracy.
> > >
> > > Not really clear we need microseconds accuracy... Most formats that
> > > support subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
> > > DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals
> > > beyond second. From
> > > http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
> > > PostgreSQL supports microsecond accuracy.
> > >
> > > ---------------------------------------
> > > Solution 5) : Microsecond with 3 fields
> > >
> > > A variant where we split second into 3 integer parts:
> > >
> > > typedef union {
> > > [...]
> > >
> > > struct {
> > >
> > > GInt16 Year;
> > > GByte Month;
> > > GByte Day;
> > > GByte Hour;
> > > GByte Minute;
> > > GByte TZFlag;
> > > GByte Precision; /* value in OGRDateTimePrecision */
> > >
> > > GByte Second; /* 0 to 59 */
> > >
> > > GUInt16 Millisecond; /* 0 to 999 */
> > > GUInt16 Microsecond; /* 0 to 999 */
> > >
> > > } Date;
> > >
> > > } OGRField
> > >
> > > Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on 32-bit
> > > builds (and remain 16 bytes on 64-bit builds)
> > >
> > > ---------------------------------------
> > > Solution 6) : Nanosecond accuracy and beyond !
> > >
> > > Now that we are using 16 bytes, why not having nanosecond accuracy ?
> > >
> > > typedef union {
> > > [...]
> > >
> > > struct {
> > >
> > > GInt16 Year;
> > > GByte Month;
> > > GByte Day;
> > > GByte Hour;
> > > GByte Minute;
> > > GByte TZFlag;
> > > GByte Precision; /* value in OGRDateTimePrecision */
> > >
> > > double Second; /* 0.000000000 to 60.999999999 */
> > >
> > > } Date;
> > >
> > > } OGRField
> > >
> > > Actually we even have picosecond accuracy! (since for picoseconds, we
> > > need 46 bits and a double has 52 bits of mantissa). And if we use a
> > > 64-bit integer instead of a double, we can have femtosecond accuracy
> > > ;-)
> > >
> > > Any preference ?
> > >
> > > Even
> >
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list