[gdal-dev] Design for sub-second accuracy in OGR ?
Even Rouault
even.rouault at spatialys.com
Mon Apr 6 14:14:57 PDT 2015
Le lundi 06 avril 2015 23:11:21, Dmitriy Baryshnikov a écrit :
> Hi Even,
>
> It seems to me that this is duplicating of RFC 50: OGR field subtypes.
> For example we have the master field type DateTime and Subtype - Year.
> So the internal structure for date/time representation may be adopt to
> such technique.
The subtype is defined at field definition level. In all formats we currently
handle we only know the date/time precision when reading values (and they
might have different precision between records), so after having created the
layer and field definitions.
>
> Best regards,
> Dmitry
>
> 06.04.2015 15:02, Even Rouault пишет:
> > Le lundi 06 avril 2015 13:48:47, Even Rouault a écrit :
> >> Le lundi 06 avril 2015 11:32:33, Dmitriy Baryshnikov a écrit :
> >>> The first solution looks reasonable. But there is lack in precision
> >>> field - there the only time is significant:
> >>>
> >>> ODTP_HMSm
> >>> ODTP_HMS
> >>> ODTP_HM
> >>> ODTP_H
> >>
> >> As I didn't want to multiply the values in the enumeration, my intent
> >> was to reuse the ODTP_YMDxxxx values for OFTTime only.
> >
> > I meant "for OFTTime too"
> >
> >> This was what I wanted
> >> to intend with the precision between parenthesis in the comment of
> >> ODTP_YMDH "Year, month, day (if OFTDateTime) and hour"
> >>
> >> Or perhaps, the enumeration should capture the most precise part of the
> >> (date)time structure ?
> >> ODTP_Year
> >> ODTP_Month
> >> ODTP_Day
> >> ODTP_Hour
> >> ODTP_Minute
> >> ODTP_Second
> >> ODTP_Millisecond
> >>
> >>> etc.
> >>>
> >>> Best regards,
> >>>
> >>> Dmitry
> >>>
> >>> 05.04.2015 22:25, Even Rouault пишет:
> >>>> Hi,
> >>>>
> >>>> In an effort of revisiting http://trac.osgeo.org/gdal/ticket/2680,
> >>>> which is about lack of precision of the current datetime structure,
> >>>> I've imagined different solutions how to modify the OGRField
> >>>> structure, and failed to pick up one that would be the obvious
> >>>> solution, so opinions are welcome.
> >>>>
> >>>> The issue is how to add (at least) microsecond accuracy to the
> >>>> datetime structure, as a few formats support it explicitely or
> >>>> implicitely : MapInfo, GPX, Atom (GeoRSS driver), GeoPackage, SQLite,
> >>>> PostgreSQL, CSV, GeoJSON, ODS, XLSX, KML (potentially GML too)...
> >>>>
> >>>> Below a few potential solutions :
> >>>>
> >>>> ---------------------------------------
> >>>> Solution 1) : Millisecond accuracy, second becomes a float
> >>>>
> >>>> This is the solution I've prototyped.
> >>>>
> >>>> typedef union {
> >>>> [...]
> >>>>
> >>>> struct {
> >>>>
> >>>> GInt16 Year;
> >>>> GByte Month;
> >>>> GByte Day;
> >>>> GByte Hour;
> >>>> GByte Minute;
> >>>> GByte TZFlag;
> >>>> GByte Precision; /* value in OGRDateTimePrecision */
> >>>> float Second; /* from 00.000 to 60.999 (millisecond
> >>>> accuracy) */
> >>>>
> >>>> } Date;
> >>>>
> >>>> } OGRField
> >>>>
> >>>> So sub-second precision is representing with a single precision
> >>>> floating point number, storing both integral and decimal parts. (we
> >>>> could theorically have a hundredth of millisecond accuracy, 10^-5 s,
> >>>> since 6099999 fits on the 23 bits of the mantissa)
> >>>>
> >>>> Another addition is the Precision field that indicates which parts of
> >>>> the datetime structure are significant.
> >>>>
> >>>> /** Enumeration that defines the precision of a DateTime.
> >>>>
> >>>> * @since GDAL 2.0
> >>>> */
> >>>>
> >>>> typedef enum
> >>>> {
> >>>>
> >>>> ODTP_Undefined, /**< Undefined */
> >>>> ODTP_Guess, /**< Only valid when setting through
> >>>> SetField(i,year,
> >>>>
> >>>> month...) where OGR will guess */
> >>>>
> >>>> ODTP_Y, /**< Year is significant */
> >>>> ODTP_YM, /**< Year and month are significant*/
> >>>> ODTP_YMD, /**< Year, month and day are significant */
> >>>> ODTP_YMDH, /**< Year, month, day (if OFTDateTime) and
> >>>> hour are
> >>>>
> >>>> significant */
> >>>>
> >>>> ODTP_YMDHM, /**< Year, month, day (if OFTDateTime), hour
> >>>> and
> >>>>
> >>>> minute are significant */
> >>>>
> >>>> ODTP_YMDHMS, /**< Year, month, day (if OFTDateTime),
> >>>> hour, minute
> >>>>
> >>>> and integral second are significant */
> >>>>
> >>>> ODTP_YMDHMSm, /**< Year, month, day (if OFTDateTime),
> >>>> hour, minute
> >>>>
> >>>> and second with microseconds are significant */
> >>>> } OGRDateTimePrecision;
> >>>>
> >>>> I think this is important since "2015/04/05 17:12:34" and "2015/04/05
> >>>> 17:12:34.000" do not really mean the same thing and it might be good
> >>>> to be able to preserve the original accuracy when converting between
> >>>> formats.
> >>>>
> >>>> A drawback of this solution is that the size of the OGRField structure
> >>>> increases from 8 bytes to 12 on 32 bit builds (and remain 16 bytes on
> >>>> 64 bit). This is probably not that important since in most cases not
> >>>> that many OGRField structures are instanciated at one time (typically,
> >>>> you iterate over features one at a time).
> >>>> This could be more of a problem for use cases that involve the MEM
> >>>> driver, as it keep all features in memory.
> >>>>
> >>>> Another drawback is that the change of the structure might not be
> >>>> directly noticed by application developers as the Second field name is
> >>>> preserved, but a new Precision field is added, so there's a risk that
> >>>> Precision is let uninitialized if the field is set through
> >>>> OGRFeature::SetField(int iFieldIndex, OGRField* psRawField). That
> >>>> could lead to unexpected formatting (but hopefully not crashes with
> >>>> defensive programming). However I'd think it is unlikely that many
> >>>> applications directly manipulate OGRField directly, instead of using
> >>>> the getters and setters of OGRFeature.
> >>>>
> >>>> ---------------------------------------
> >>>> Solution 2) : Millisecond accuracy, second and milliseconds in
> >>>> distinct fields
> >>>>
> >>>> typedef union {
> >>>> [...]
> >>>>
> >>>> struct {
> >>>>
> >>>> GInt16 Year;
> >>>> GByte Month;
> >>>> GByte Day;
> >>>> GByte Hour;
> >>>> GByte Minute;
> >>>> GByte TZFlag;
> >>>> GByte Precision; /* value in OGRDateTimePrecision */
> >>>> GByte Second; /* from 0 to 60 */
> >>>>
> >>>> GUInt16 Millisecond; /* from 0 to 999 */
> >>>>
> >>>> } Date;
> >>>>
> >>>> } OGRField
> >>>>
> >>>> Same size of structure as in 1)
> >>>>
> >>>> ---------------------------------------
> >>>> Solution 3) : Millisecond accuracy, pack all fields
> >>>>
> >>>> Conceptually, this would use bit fields to avoid wasting unused bits :
> >>>>
> >>>> typedef union {
> >>>> [...]
> >>>>
> >>>> struct {
> >>>>
> >>>> GInt16 Year;
> >>>> GUIntBig Month:4;
> >>>> GUIntBig Day:5;
> >>>> GUIntBig Hour:5;
> >>>> GUIntBig Minute:6;
> >>>> GUIntBig Second:6;
> >>>> GUIntBig Millisecond:10; /* 0-999 */
> >>>> GUIntBig TZFlag:8;
> >>>> GUIntBig Precision:4;
> >>>>
> >>>> } Date;
> >>>>
> >>>> } OGRField;
> >>>>
> >>>> This was proposed in the above mentionned ticket. And as there were
> >>>> enough remaining bits, I've also added the Precision field (and in all
> >>>> other solutions).
> >>>>
> >>>> The advantage is that sizeof(mydate) remains 8 bytes on 32 bits
> >>>> builds.
> >>>>
> >>>> But the C standard only defines bitfields of int/unsigned int, so this
> >>>> is not portable, plus the fact that the way bits are packed is not
> >>>> defined by the standard, so different compilers could come up with
> >>>> different packing. A workaround is to do the bit manipulation through
> >>>> macros :
> >>>>
> >>>> typedef union {
> >>>> [...]
> >>>>
> >>>> struct {
> >>>>
> >>>> GUIntBig opaque;
> >>>>
> >>>> } Date;
> >>>>
> >>>> } OGRField;
> >>>>
> >>>> #define GET_BITS(x,y_bits,shift) (int)(((x).Date.opaque >>
> >>>> (shift)) & ((1 << (y_bits))-1))
> >>>>
> >>>> #define GET_YEAR(x) (short)GET_BITS(x,16,64-16)
> >>>> #define GET_MONTH(x) GET_BITS(x,4,64-16-4)
> >>>> #define GET_DAY(x) GET_BITS(x,5,64-16-4-5)
> >>>> #define GET_HOUR(x) GET_BITS(x,5,64-16-4-5-5)
> >>>> #define GET_MINUTE(x) GET_BITS(x,6,64-16-4-5-5-6)
> >>>> #define GET_SECOND(x) GET_BITS(x,6,64-16-4-5-5-6-6)
> >>>> #define GET_MILLISECOND(x) GET_BITS(x,10,64-16-4-5-5-6-6-10)
> >>>> #define GET_TZFLAG(x) GET_BITS(x,8,64-16-4-5-5-6-6-10-8)
> >>>> #define GET_PRECISION(x) GET_BITS(x,4,64-16-4-5-5-6-6-10-8-4)
> >>>>
> >>>> #define SET_BITS(x,y,y_bits,shift) (x).Date.opaque = ((x).Date.opaque
> >>>> & (~( (GUIntBig)((1 << (y_bits))-1) << (shift) )) | ((GUIntBig)(y) <<
> >>>> (shift)))
> >>>>
> >>>> #define SET_YEAR(x,val) SET_BITS(x,val,16,64-16)
> >>>> #define SET_MONTH(x,val) SET_BITS(x,val,4,64-16-4)
> >>>> #define SET_DAY(x,val) SET_BITS(x,val,5,64-16-4-5)
> >>>> #define SET_HOUR(x,val) SET_BITS(x,val,5,64-16-4-5-5)
> >>>> #define SET_MINUTE(x,val) SET_BITS(x,val,6,64-16-4-5-5-6)
> >>>> #define SET_SECOND(x,val) SET_BITS(x,val,6,64-16-4-5-5-6-6)
> >>>> #define SET_MILLISECOND(x,val)
> >>>> SET_BITS(x,val,10,64-16-4-5-5-6-6-10) #define SET_TZFLAG(x,val)
> >>>>
> >>>> SET_BITS(x,val,8,64-16-4-5-5-6-6-10-8) #define SET_PRECISION(x,val)
> >>>>
> >>>> SET_BITS(x,val,4,64-16-4-5-5-6-6-10-8-4)
> >>>>
> >>>> Main advantage: the size of OGRField remains unchanged (so 8 bytes on
> >>>> 32-bit builds).
> >>>>
> >>>> Drawback: manipulation of datetime members is less natural, but there
> >>>> are not that many places in the GDAL code base were the OGRField.Date
> >>>> members are used, so it is not much that a problem.
> >>>>
> >>>> ---------------------------------------
> >>>> Solution 4) : Microsecond accuracy with one field
> >>>>
> >>>> Solution 1) used a float for second and sub-second, but a float has
> >>>> only 23 bits of mantissa, which is enough to represent second with
> >>>> millisecond accuracy, but not for microsecond (you need 26 bits for
> >>>> that). So use a 32-bit integer instead of a 32-bit floating point.
> >>>>
> >>>> typedef union {
> >>>> [...]
> >>>>
> >>>> struct {
> >>>>
> >>>> GInt16 Year;
> >>>> GByte Month;
> >>>> GByte Day;
> >>>> GByte Hour;
> >>>> GByte Minute;
> >>>> GByte TZFlag;
> >>>> GByte Precision; /* value in OGRDateTimePrecision */
> >>>> GUInt32 Microseconds; /* 00000000 to 59999999 */
> >>>>
> >>>> } Date;
> >>>>
> >>>> } OGRField
> >>>>
> >>>> Same as solution 1: sizeof(OGRField) becomes 12 bytes on 32-bit builds
> >>>> (and remain 16 bytes on 64-bit builds)
> >>>>
> >>>> We would need to add an extra value in OGRDateTimePrecision to mean
> >>>> the microsecond accuracy.
> >>>>
> >>>> Not really clear we need microseconds accuracy... Most formats that
> >>>> support subsecond accuracy use ISO 8601 representation (e.g. YYYY-MM-
> >>>> DDTHH:MM:SS.sssssZ) that doesn't define the maximal number of decimals
> >>>> beyond second. From
> >>>> http://www.postgresql.org/docs/9.1/static/datatype-datetime.html,
> >>>> PostgreSQL supports microsecond accuracy.
> >>>>
> >>>> ---------------------------------------
> >>>> Solution 5) : Microsecond with 3 fields
> >>>>
> >>>> A variant where we split second into 3 integer parts:
> >>>>
> >>>> typedef union {
> >>>> [...]
> >>>>
> >>>> struct {
> >>>>
> >>>> GInt16 Year;
> >>>> GByte Month;
> >>>> GByte Day;
> >>>> GByte Hour;
> >>>> GByte Minute;
> >>>> GByte TZFlag;
> >>>> GByte Precision; /* value in OGRDateTimePrecision */
> >>>>
> >>>> GByte Second; /* 0 to 59 */
> >>>>
> >>>> GUInt16 Millisecond; /* 0 to 999 */
> >>>> GUInt16 Microsecond; /* 0 to 999 */
> >>>>
> >>>> } Date;
> >>>>
> >>>> } OGRField
> >>>>
> >>>> Drawback: due to alignment, sizeof(OGRField) becomes 16 bytes on
> >>>> 32-bit builds (and remain 16 bytes on 64-bit builds)
> >>>>
> >>>> ---------------------------------------
> >>>> Solution 6) : Nanosecond accuracy and beyond !
> >>>>
> >>>> Now that we are using 16 bytes, why not having nanosecond accuracy ?
> >>>>
> >>>> typedef union {
> >>>> [...]
> >>>>
> >>>> struct {
> >>>>
> >>>> GInt16 Year;
> >>>> GByte Month;
> >>>> GByte Day;
> >>>> GByte Hour;
> >>>> GByte Minute;
> >>>> GByte TZFlag;
> >>>> GByte Precision; /* value in OGRDateTimePrecision */
> >>>>
> >>>> double Second; /* 0.000000000 to 60.999999999 */
> >>>>
> >>>> } Date;
> >>>>
> >>>> } OGRField
> >>>>
> >>>> Actually we even have picosecond accuracy! (since for picoseconds, we
> >>>> need 46 bits and a double has 52 bits of mantissa). And if we use a
> >>>> 64-bit integer instead of a double, we can have femtosecond accuracy
> >>>> ;-)
> >>>>
> >>>> Any preference ?
> >>>>
> >>>> Even
> >>>
> >>> _______________________________________________
> >>> gdal-dev mailing list
> >>> gdal-dev at lists.osgeo.org
> >>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list