[gdal-dev] Proposed patch for HUGE GML files (2)
Alessandro Furieri
a.furieri at lqt.it
Mon Dec 12 06:20:15 EST 2011
take #2: now I'll reply to few interesting but hard to
be implemented suggestions and other minor detail
facets.
> * Did you consider just using the OGR API to abstract
> the access to the temporary database, instead of directly
> using sqlite API ? This would allow to use other DB backends,
> such as postgresql for example. Not a strong objection however
> to justify reworking your patch. (A nice side-effect of it
> would be that there would make build integration easy)
>
surely a nice and smart design choice from a general
architecture point of view.
anyway, I can easily foresee many practical issues:
a) SQLite isn't client-server: and a whole DB simply
is a trivial file. this allows to handle a temp-DB
exactly as if it was a plain ordinary temp-file.
b) any other DBMS is client-server: this requires to
define a rather complex connection string (several
further arguments to be passed to ogr2ogr)
c) using a temp-DB allows to ignore at all any concern
about already existing tables [for sure, the temp-DB
just created is empty].
the same cannot be safely assumed on other DBMS; and
this easily means many further code functions to be
implemented.
it obviously depends on the actual DBMS target, but
as a general case we'll surely consider to use some
purposely set SCHEMA: and this will add some further
complexity.
d) all this considered, directly using the SQLite own API
surely is much more simple and easy (and probably, much
more efficient: SQLite, when carefully optimized, is
damnely fast)
> * I'm wondering if the HUGE resolver could be automatically
> selected if the GML file is bigger than a threshold ( 100 MB ? )
> (and if sqlite is available of course).
>
interesting enough: for now I suppose that supporting an
explicit algorithm selection surely is useful for testing
and debugging purposes.
anyway, adopting a heuristic size threshold is a nice idea
to be considered in the future;.and implementing this seems
quite easy and painless.
> The naming GML_SKIP_RESOLVE_ELEMS=HUGE sounds a bit awckward to
> select a resolving algorithm and not very consistant with the current
> allowed values (ALL / NONE / commma separated values), but I also
> find the current logic a bit twisted too (GML_SKIP_RESOLVE_ELEMS=NONE
> means resolving all elements...).
>
don't tell me :-D
defining GML_SKIP_RESOLVE_ELEMS=NONE to activate the resolver
is absolutely counter-intuitive.
Honestly, I've introduced GML_SKIP_RESOLVE_ELEMS=HUGE simply
because I'm a very lazy people, no other reason than this ;-)
I completely agree with you; any different name for these
GML-related args surely will be better and clearer than the
currently implemented ones.
> * I see special cases for GML topology geometries in your code.
> Do you have small GML samples that illustrate that and could be
> used for regression testing ?
>
surely yes: I've asked Andrea Peri (Tuscany Region).
He can supply several test samples in the next days
> Please open a ticket on GDAL Trac. You mentionned the limitations
> of the XSD parser. That could be an interesting area for improvements...
> Downloading XSD via URLs or supporting <xs:include> doesn't look the
> more difficult part. The real challenge is dealing with complex schema
> (complex being everything that doesn't fit into the Simple Feature Model
> of "flat" attributes).
>
see above: I've asked Andrea Peri to cooperate; he's the
XSD guru, not me ;-)
we'll surely come again about this topic on the next days
(once we'll complete the internal testing stage)
bye Sandro
More information about the gdal-dev
mailing list