[gdal-dev] Proposed patch for HUGE GML files (2)

Mon Dec 12 06:20:15 EST 2011

take #2: now I'll reply to few interesting but hard to
be implemented suggestions and other minor detail
facets.

 > * Did you consider just using the OGR API to abstract
 > the access to the temporary database, instead of directly
 > using sqlite API ? This would allow to use other DB backends,
 > such as postgresql for example. Not a strong objection however
 > to justify reworking your patch. (A nice side-effect of it
 > would be that there would make build integration easy)
 >

surely a nice and smart design choice from a general
architecture point of view.
anyway, I can easily foresee many practical issues:
a) SQLite isn't client-server: and a whole DB simply
    is a trivial file. this allows to handle a temp-DB
    exactly as if it was a plain ordinary temp-file.
b) any other DBMS is client-server: this requires to
    define a rather complex connection string (several
    further arguments to be passed to ogr2ogr)
c) using a temp-DB allows to ignore at all any concern
    about already existing tables [for sure, the temp-DB
    just created is empty].
    the same cannot be safely assumed on other DBMS; and
    this easily means many further code functions to be
    implemented.
    it obviously depends on the actual DBMS target, but
    as a general case we'll surely consider to use some
    purposely set SCHEMA: and this will add some further
    complexity.
d) all this considered, directly using the SQLite own API
    surely is much more simple and easy (and probably, much
    more efficient: SQLite, when carefully optimized, is
    damnely fast)

 > * I'm wondering if the HUGE resolver could be automatically
 > selected if the GML file is bigger than a threshold ( 100 MB ? )
 > (and if sqlite is available of course).
 >

interesting enough: for now I suppose that supporting an
explicit algorithm selection surely is useful for testing
and debugging purposes.
anyway, adopting a heuristic size threshold is a nice idea
to be considered in the future;.and implementing this seems
quite easy and painless.

 > The naming GML_SKIP_RESOLVE_ELEMS=HUGE sounds a bit awckward to
 > select a resolving algorithm and not very consistant with the current
 > allowed values (ALL / NONE / commma separated values), but I also
 > find the current logic a bit twisted too (GML_SKIP_RESOLVE_ELEMS=NONE
 > means resolving all elements...).
 >

don't tell me :-D
defining GML_SKIP_RESOLVE_ELEMS=NONE to activate the resolver
is absolutely counter-intuitive.
Honestly, I've introduced GML_SKIP_RESOLVE_ELEMS=HUGE simply
because I'm a very lazy people, no other reason than this ;-)
I completely agree with you; any different name for these
GML-related args surely will be better and clearer than the
currently implemented ones.

 > * I see special cases for GML topology geometries in your code.
 > Do you have small GML samples that illustrate that and could be
 > used for regression testing ?
 >

surely yes: I've asked Andrea Peri (Tuscany Region).
He can supply several test samples in the next days

 > Please open a ticket on GDAL Trac. You mentionned the limitations
 > of the XSD parser. That could be an interesting area for improvements...
 > Downloading XSD via URLs or supporting <xs:include> doesn't look the
 > more difficult part. The real challenge is dealing with complex schema
 > (complex being everything that doesn't fit into the Simple Feature Model
 > of "flat" attributes).
 >

see above: I've asked Andrea Peri to cooperate; he's the
XSD guru, not me ;-)
we'll surely come again about this topic on the next days
(once we'll complete the internal testing stage)

bye Sandro