[gdal-dev] Proposed patch for HUGE GML files (1)

Alessandro Furieri a.furieri at lqt.it
Mon Dec 12 06:18:16 EST 2011


Hi Even,

we have many points to be discussed; so for the
sake of clarity I'll split my reply in two distinct
parts:
- code changes (the patch itself)
- interesting but hard to be implemented suggestions
   and other minor detail facets.
this one is my reply about the patch main core.


 > * #include "sqlite3.h" in ogr/ogrsf_frmts/gml/gmlreaderp.h
 > is not appropriate in that file, and should be only found
 > in hugefileresolver.cpp. SQLite is a very common dependency
 > in most GDAL/OGR builds, but not a compulsory one.
 > So some adjustments will be needed in the GNUmakefile/makefile.vc
 > to detect if sqlite dependency is really available before
 > trying to build hugefileresolver.cpp and calling it from
 > ogrgmldatasource.cpp
 >

all right: I've implemented your suggestion.
now you can built OGR even when libsqlite isn't available
at all; and in this case a "fake" implementation for few
methods will simply report an error (missing SQLite support)

I hope to have fixed both GNUmakefile and makefile.vc;
anyway a careful check from you surely is welcome.


 > * GML_GFS_TEMPLATE config option is general purpose, and could
 > be made also available even when xlink:href resolving is not asked.
 >

nice: the HUGE resolver internally managed the TEMPLATE related stuff.
now I've added two further classes [GFSTemplateList and GFSTempalteItem]
and a new method [PrescanForTemplate] allowing to make TEMPLATE usage
available even when the HUGE resolver isn't actually invoked.


 > * I feel you could completely remove the GML_HUGE_SQLITE_UNSAFE
 > config option, and decide it to be TRUE. I don't see any reason
 > why one would want to have recovery guarantees for a temporary file.
 > If the process is interrupted, I suppose that the full resolving
 > must be starting again and the temp file will be recreated : correct ?
 >

absolutely correct: I've completely removed GML_HUGE_SQLITE_UNSAFE;
now this one is unconditionally the default implementation.


 > * Same for GML_HUGE_TEMPFILE. Why would one want to keep the
 > temporary sqlite DB ? Or perhaps keep it undocumented for just
 > debugging purposes, and make it default to YES.
 >

now GML_HUGE_TEMPFILE simply is an undocumented debug option;
and the default setting is YES


 > * A bit correlated with the above : the use of int* pbOutIsTempFile
 > in hugeFileResolver() is confusing, because (contrary to the
 > resolvexlinks.cpp case) it is not used as an output parameter, but
 > as an input one
 >

all right, I've corrected this.


 > * In hugefileresolver.cpp, I think that the hand-parsing of XML is
 > unnecessary in gmlHugeFileCheckXrefs(). The only caller of
 > gmlHugeFileCheckXrefs() does a gmlText = 
CPLSerializeXMLTree((CPLXMLNode *)pszNode)
 > just before. Why not just exploring the nodes through the CPLXMLNode
 > objects ? This would be much cleaner and robust (in case the XML
 > serialization changes a bit), and likely a bit faster too.
 >

accepted: I've completely refactored my code attempting to use
as far as possible CPLXMLNode and its methods.
and effectively now the code is strongly simplified and better
arranged (and I hope too, much more robust and stable).

As an interesting side effect, this deep code refactoring
caused many further simplifications and optimizations to
be applied at SQL level; and the 'final' implementation
is about 50% faster than the previous one.


 > * There might be security issues due to the use of char XXX[1024].
 > I see quite a few strcpy() with no size checking. I'd suggest using
 > CPLStrlcpy().
 >

even better than this: I've carefully removed any fixed-size
buffer, using CPLString objects instead.


 > * A few naming inconsistencies. The z in papszGeomList =
 > poFeature->GetGeometryList()  should be removed because no
 > strings are used. Same for pszNode = papszGeomList[i].
 >

really sorry for this :-D
just a short consideration: I'm not at all a fan of the so-called
Hungarian naming convention (sincerely, I hate it).
AFAIK this is widely used on Microsoft own code, and GDAL/OGR is
the first open source project I've ever seen extensively using
Hungarian.
Please, be patient and tolerant about this ... I've surely
put many 'wrong' names here and there, simply because I'm
completely not accustomed to used such coding style. ;-)


 > * At line 1793 of hugefileresolver.cpp, VSIFPrintfL ( fp,
 > " <%s>\%s</%s>\n", --> the \ before %s looks wrong.
 >

sorry for this typo: many thanks to you for noticing it


 > And I see no use of XML escaping for the value of the attribute
 > (use CPLEscapeString())
 >

nice suggestion: accepted

---
Please see the attached zipfile containing the latest
version of the proposed patch

bye Sandro
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gml-huge-patch-2.zip
Type: application/octet-stream
Size: 18127 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/gdal-dev/attachments/20111212/58bf0e88/gml-huge-patch-2-0001.obj


More information about the gdal-dev mailing list