[gdal-dev] Re: OGR/Spatialite patches
Even Rouault
even.rouault at mines-paris.org
Fri Apr 1 13:56:37 EDT 2011
(dropping all individual names from the CC list to avoid gdal-dev bouncing
back the email. hopefully they are subscribed to gdal-dev, or at least they
can read the archives)
Just a few more comments :
>
> If GDAL is released under the X/MIT terms, I agree to release my
> patches too under the same license.
> Yes, I'll be happy to include my own copyright; that's not all,
> a small disclaimer would be required so to acknowledge Tuscany
> Region for financing my job.
No problem to acknowledge anyone support
>
> very good; so we can imagine the following schema:
> A) no libspatialite support: the datasource is assumed to be
> a READ-ONLY one.
> B) WRITE mode could be enabled only if libspatialite support is
> actually available.
> all this sounds very reasonable and absolutely safe.
Yes, but of course, this should only apply to spatialite databases, not
regular sqlite databases.
>
> > 4) We must be aware that using the ALTER TABLE ... ADD COLUMN ... stuff
> > will produce databases (even regular SQLite ones) that will not be
> > compatible with the latest FWTools for Windows that uses an ancient
> > sqlite versions. Might not be a big deal however because I somehow
> > remember that dropping a new version of sqlite3.dll in the bin directory
> > make it work again.
>
> a) compiling sqlite3 is a trivial task on any platform [this
> including MSVC]
> b) using obsolete versions of sqlite3 never is a safe practice;
> i.e. according to SQLite own docs:
> "Version 3.7.5 of SQLite is recommended for all new development.
> Upgrading from all prior SQLite versions is recommended."
>
> New versions not only support 'cool' features: very often they
> resolve critical bugs, and offers better performances.
> As a general rule, SQLite releases more than ten new versions for
> each year: so using a very obsolete version (one released several
> years ago) doesn't seems a good thing anyway.
> And for my personal experience, I never noticed any issue
> deriving from upgrading to a subsequent SQLite version.
Yes I understand your point. My point was that there are old software in the
wild that can by faced to data produced by newer software. But I think it is
OK to say that at that point reading the new databases require a minimum
version of sqlite3.
>
> > 5) Would you be willing to update/extend the autotest/ogr/ogr_sqlite.py
> > test to ensure that it still passes and add some tests for all the new
> > stuff, particularly the 3D geometry support.
>
> yes, of course: any hint from you about this topic is welcome.
If you have specific questions, I can help. The best is to look at the existing
code and imitate. If you have some Python experience, that should be easy.
Otherwise it should be also manageable : I had never read any book or course
about Python and was still able to extend the GDAL autotest suite ;-)
>
> > 8) I've seen you have removed the tests to detect int overflows, for
> > example "if( nPointCount< 0 || nPointCount> INT_MAX / (2 * 8))" has
> > become "if( nPointCount< 0)". Could you justify it ? I still think that
> > they are necessary to avoid issues in lines below if the nPointCount
> > value is huge enough.
>
> I suspected that "nPointCount > INT_MAX / (2 * 8))" was intended
> to be an anti-overflow check. Anyway, all this is quite meaningless.
> The above check completely relies upon the assumption that INT
> correctly represents a memory address; but that's not always true.
> On 64 bit platforms addresses are obviously 64 bit (much more
> extended than a 32 bit INT).
> And on 32 bit platforms addresses actually are UINT, not INT.
> That's not all: AFAIK there is no platform supporting memory
> allocations as wide as the word length theoretically can allow:
> e.g. on Win32 there is absolutely no way to perform a successful
> malloc() when the required size exceeds 1.5GB; but very often
> a malloc() will actually fail even when the required size
> exceeds 500MB.
> Accordingly to all this, I've removed the above checks simply
> because I strongly doubted of their actual correctness: in
> other worlds, "nPointCount > 10000000" seems to be an alternative
> heuristic anti-overflow criterion not more arbitrary than the
> other one.
The memory allocation has already been done from the total buffer size when we
reach that code. The only stuff to worry about is in fact that the if (nBytes -
8 < 2 * 8 * nPointCount ) doesn't manage to catch an invalid value of
nPointCount. If nPointCount is too big, 2 * 8 * nPointCoint will be a negative
value and the code execution will go on, leading to reading outside of the
pabyData buffer at some point in the loop. I didn't want to impose an arbitrary
value for the maximum number of points (if someone creates a record large
enough to contain 100 million points and the memory allocation success,
there's no reason to prevent it from reading it). So I decided to stick to the
signed 32bit limit, even if on 64bit platforms it could be considered as an
arbitrary threshold... Well, could we decide that you keep the existing tests
in the existing code and do whatever you feel appropriate in your new code ?
>
> > 9) About your [3c] patch. Yes indeed this is a painfull issue with
> > shapefiles (the format I mean, not the driver) where we cannot know in
> > advance if it only contains LINESTRING or MULTILINESTRING. Currently the
> > shapefile driver chooses to report the layer as having LINESTRING
> > geometry type. Maybe Frank can comment about this. I imagine this was
> > decided because some application software wouldn't know to deal with
> > multi-geometry.
>
> this one is a really annoying Shapefile own idiosyncrasy:
> my personal approach (used on spatialite own tools) is reading
> first *any* geometry contained into the Shapefile, so to
> correctly detect if all them are 'single' (LINESTRING, POLYGON)
> or 'multi' (MULTILINESTRING, MULTIPOLYGON).
> obviously, when a mix of both 'single' and 'multi' is found
> then casting 'singles' to 'multi' is absolutely required.
> please note: performing a two-pass read on the shapefile doesn't
> adversely affect performance as you can expect.
> this is because a 'warm cache' effect will arise: so the second
> pass will easily retrieve any data directly from RAM buffers
> avoiding at all any physical access to the HDD.
> anyway I haven't the slightest idea about how much difficult
> supporting such a 'two steps' strategy could be for OGR.
Technically this would be doable, but there are use cases where you will want
to read only 1 feature in a shapefile of 10 million features... So imposing the
cost of reading the whole shapefile to determine the overall layer geometry
type doesn't sound appropriate...
Ah, and I also forgot a topic. Perhaps the sqlite driver HTML page will need
some updates to reflect the fact that editing spatialite DBs will now require
linking to libspatialite. That could also be a good place to credit your
financial supporters if you want or need to do so (see
http://gdal.org/frmt_usgsdem.html or http://gdal.org/ogr/drv_oci.html)
More information about the gdal-dev
mailing list