[gdal-dev] Re: OGR/Spatialite patches

Fri Apr 1 13:56:37 EDT 2011

(dropping all individual names from the CC list to avoid gdal-dev bouncing 
back the email. hopefully they are subscribed to gdal-dev, or at least they 
can read the archives)

Just a few more comments :

> 
> If GDAL is released under the X/MIT terms, I agree to release my
> patches too under the same license.
> Yes, I'll be happy to include my own copyright; that's not all,
> a small disclaimer would be required so to acknowledge Tuscany
> Region for financing my job.

No problem to acknowledge anyone support

> 
> very good; so we can imagine the following schema:
> A) no libspatialite support: the datasource is assumed to be
>     a READ-ONLY one.
> B) WRITE mode could be enabled only if libspatialite support is
>     actually available.
> all this sounds very reasonable and absolutely safe.

Yes, but of course, this should only apply to spatialite databases, not 
regular sqlite databases.

> 
> > 4) We must be aware that using the ALTER TABLE ... ADD COLUMN ... stuff
> > will produce databases (even regular SQLite ones) that will not be
> > compatible with the latest FWTools for Windows that uses an ancient
> > sqlite versions. Might not be a big deal however because I somehow
> > remember that dropping a new version of sqlite3.dll in the bin directory
> > make it work again.
> 
> a) compiling sqlite3 is a trivial task on any platform [this
>     including MSVC]
> b) using obsolete versions of sqlite3 never is a safe practice;
>     i.e. according to SQLite own docs:
>     "Version 3.7.5 of SQLite is recommended for all new development.
>     Upgrading from all prior SQLite versions is recommended."
> 
> New versions not only support 'cool' features: very often they
> resolve critical bugs, and offers better performances.
> As a general rule, SQLite releases more than ten new versions for
> each year: so using a very obsolete version (one released several
> years ago) doesn't seems a good thing anyway.
> And for my personal experience, I never noticed any issue
> deriving from upgrading to a subsequent SQLite version.

Yes I understand your point. My point was that there are old software in the 
wild that can by faced to data produced by newer software. But I think it is 
OK to say that at that point reading the new databases require a minimum 
version of sqlite3.

> 
> > 5) Would you be willing to update/extend the autotest/ogr/ogr_sqlite.py
> > test to ensure that it still passes and add some tests for all the new
> > stuff, particularly the 3D geometry support.
> 
> yes, of course: any hint from you about this topic is welcome.

If you have specific questions, I can help. The best is to look at the existing 
code and imitate. If you have some Python experience, that should be easy. 
Otherwise it should be also manageable : I had never read any book or course 
about Python and was still able to extend the GDAL autotest suite ;-)

> 
> > 8) I've seen you have removed the tests to detect int overflows, for
> > example "if( nPointCount<  0 || nPointCount>  INT_MAX / (2 * 8))" has
> > become "if( nPointCount<  0)". Could you justify it ? I still think that
> > they are necessary to avoid issues in lines below if the nPointCount
> > value is huge enough.
> 
> I suspected that "nPointCount > INT_MAX / (2 * 8))" was intended
> to be an anti-overflow check. Anyway, all this is quite meaningless.
> The above check completely relies upon the assumption that INT
> correctly represents a memory address; but that's not always true.
> On 64 bit platforms addresses are obviously 64 bit (much more
> extended than a 32 bit INT).
> And on 32 bit platforms addresses actually are UINT, not INT.
> That's not all: AFAIK there is no platform supporting memory
> allocations as wide as the word length theoretically can allow:
> e.g. on Win32 there is absolutely no way to perform a successful
> malloc() when the required size exceeds 1.5GB; but very often
> a malloc() will actually fail even when the required size
> exceeds 500MB.
> Accordingly to all this, I've removed the above checks simply
> because I strongly doubted of their actual correctness: in
> other worlds, "nPointCount > 10000000" seems to be an alternative
> heuristic anti-overflow criterion not more arbitrary than the
> other one.

The memory allocation has already been done from the total buffer size when we 
reach that code. The only stuff to worry about is in fact that the if (nBytes - 
8 < 2 * 8 * nPointCount )  doesn't manage to catch an invalid value of 
nPointCount. If nPointCount is too big, 2 * 8 * nPointCoint will be a negative 
value and the code execution will go on, leading to reading outside of the 
pabyData buffer at some point in the loop. I didn't want to impose an arbitrary 
value for the maximum number of points (if someone creates a record large 
enough to contain 100 million points and the memory allocation success, 
there's no reason to prevent it from reading it). So I decided to stick to the 
signed 32bit limit, even if on 64bit platforms it could be considered as an 
arbitrary threshold... Well, could we decide that you keep the existing tests 
in the existing code and do whatever you feel appropriate in your new code ?

> 
> > 9) About your [3c] patch. Yes indeed this is a painfull issue with
> > shapefiles (the format I mean, not the driver) where we cannot know in
> > advance if it only contains LINESTRING or MULTILINESTRING. Currently the
> > shapefile driver chooses to report the layer as having LINESTRING
> > geometry type. Maybe Frank can comment about this. I imagine this was
> > decided because some application software wouldn't know to deal with
> > multi-geometry.
> 
> this one is a really annoying Shapefile own idiosyncrasy:
> my personal approach (used on spatialite own tools) is reading
> first *any* geometry contained into the Shapefile, so to
> correctly detect if all them are 'single' (LINESTRING, POLYGON)
> or 'multi' (MULTILINESTRING, MULTIPOLYGON).
> obviously, when a mix of both 'single' and 'multi' is found
> then casting 'singles' to 'multi' is absolutely required.
> please note: performing a two-pass read on the shapefile doesn't
> adversely affect performance as you can expect.
> this is because a 'warm cache' effect will arise: so the second
> pass will easily retrieve any data directly from RAM buffers
> avoiding at all any physical access to the HDD.
> anyway I haven't the slightest idea about how much difficult
> supporting such a 'two steps' strategy could be for OGR.

Technically this would be doable, but there are use cases where you will want 
to read only 1 feature in a shapefile of 10 million features... So imposing the 
cost of reading the whole shapefile to determine the overall layer geometry 
type doesn't sound appropriate...

Ah, and I also forgot a topic. Perhaps the sqlite driver HTML page will need 
some updates to reflect the fact that editing spatialite DBs will now require 
linking to libspatialite. That could also be a good place to credit your 
financial supporters if you want or need to do so (see 
http://gdal.org/frmt_usgsdem.html or http://gdal.org/ogr/drv_oci.html)