[gdal-dev] ArcGIS 10

Matt Wilkie matt.wilkie at gov.yk.ca
Tue Aug 17 15:28:37 EDT 2010


Jason, thank you for this detailed explanation of goings on behind the 
curtain. Much of it is over my head, but I appreciate being able to now 
percieve the outline of the shadowy shapes in the fog. :)

matt wilkie
--------------------------------------------
Geomatics Analyst
Information Management and Technology
Yukon Department of Environment
10 Burns Road * Whitehorse, Yukon * Y1A 4Y9
867-667-8133 Tel * 867-393-7003 Fax
http://environmentyukon.gov.yk.ca/geomatics/
--------------------------------------------

On 11/08/2010 8:38 PM, Jason Roberts wrote:
> Francis,
>
> The ticket that I referenced (http://trac.osgeo.org/gdal/ticket/3672)
> contains comprehensive details. In short: prior to ArcGIS 9.3, ArcGIS
> executed Python geoprocessing tools by creating a separate python.exe
> process with command line arguments specifying the script to run and its
> parameters. In 9.3 ESRI introduced the capability of running the tools in
> the ArcGIS process itself (ArcCatalog.exe or ArcMap.exe). The ArcGIS
> processes embedded the Python interpreter using Python's C interface; see
> the Python docs for good info about how embedding is done. This provided
> performance advantages because a new process does not have to be created
> every time a tool is run and because the communication that occurs between
> the tool and the code that starts and monitors it does not have to occur as
> cross-process calls.
>
> Presumably to make things even faster, the ArcGIS processes instantiate a
> single Python interpreter when they start up and reuse this interpreter each
> time an in-process Python tool is run. But in between tool runs, a bug in
> ArcGIS causes it to remove all of the items in Python's sys.modules
> dictionary; apparently ArcGIS is calling Py_Initialize multiple times.
> sys.modules maps the names of Python modules loaded when Python "import"
> statements are executed to the actual objects representing the modules in
> memory. When the next Python tool runs and tries to import modules, the
> interpreter checks whether those modules are in sys.modules yet. Because
> they are not, the interpreter loads them again. This produces unpredictable
> results, depending on what the modules do.
>
> For most modules, it works ok and may just result in memory leaks. (I have
> not traced through enough of the Python interpreter source code to know for
> sure about that.) But for GDAL and other complex Python packages, it causes
> problems. In GDAL's case, several of the osgeo modules implemented in Python
> code (.py files) have a relationship with modules implemented in C (.pyd
> files). In particular, gdal_array.py calls functions implemented in
> _gdal_array.pyd. _gdal_array.pyd stores a pointer to the gdal_array.py
> module so it can access things defined in it. When ArcGIS clears out
> sys.modules, gdal_array.py must be imported again and now exists at a new
> memory address. _gdal_array.pyd is also imported again, but because of how
> Python loads modules written in C, it's C "init" function is not called
> again so it is stuck with the pointer to the old gdal_array.py.
>
> Eventually the geoprocessing script wants to read or write data from or to a
> raster. It calls Band.ReadAsArray or Band.WriteArray. Those functions
> ultimately cause gdal_array.py (the new one) to call _gdal_array.pyd and
> pass in a Band object that comes from the new .py files in sys.modules. But
> _gdal_array.pyd is expecting a Band object from the old .py files and raises
> an exception saying that the object passed in was not a Band instance. (See
> the ticket for the exact message.)
>
> My workaround is to save the contents of sys.modules to a cache after the
> geoprocessing tool runs and restore it when the script starts up again,
> basically:
>
> import sys, _GeoEcoArcGISHelper
> _GeoEcoArcGISHelper.LoadModulesFromCache()
> try:
>      # The real work of the script comes here.
>      ...
> finally:
>      _GeoEcoArcGISHelper.AddModulesToCache(sys.modules.keys())
>
> The cache is implemented in a module called GeoEcoArcGISHelper which written
> in C. (The name of my Python project is GeoEco.) Here is a link to that file
> in a svn repository:
>
> http://code.nicholas.duke.edu/projects/mget/browser/MGET/Branches/Jason/Pyth
> onPackage/src/GeoEco/_GeoEcoArcGISHelper.cpp
>
> Pretty ugly but it seems to work. I reviewed it with the ESRI geoprocessor
> developer that I'm working with; he agreed it was ugly but said it should
> work. Normally with Python you are not supposed to tamper with sys.modules
> directly but I looked at the Python 2.6.5 code and it seemed like it would
> not have any ill effects. No promises, however...
>
> Jason
>
> -----Original Message-----
> From: Francis Markham [mailto:fmarkham at gmail.com]
> Sent: Wednesday, August 11, 2010 10:14 PM
> To: Jason Roberts
> Cc: Discourse Maps; gdal-dev at lists.osgeo.org
> Subject: Re: [gdal-dev] ArcGIS 10
>
> Thanks for your encyclopedic response Jason!  If you don't mind
> detailing the problems with running GDAL scripts in process, and your
> work-around for avoiding it, I would be very interested.
>
> Cheers,
>
> Francis
>
> On 12 August 2010 02:49, Jason Roberts<jason.roberts at duke.edu>  wrote:
>> ArcGIS 10 installs Python 2.6 and numpy 1.3.0. It does not install GDAL
> with
>> Python bindings. It includes a new Python module from ESRI called ArcPy.
>> This module supposedly provides the capability of reading and writing
> raster
>> layers in the form of numpy arrays, similar to GDAL’s Python bindings. I
>> have not played with it much—only very briefly in Arc 9.4 Beta 2, when it
>> did not seem to work very well. If you search the Arc 10 online
>> documentation you can probably find more information.
>>
>>
>>
>> In principle, the ArcPy module may make it less necessary those developing
>> for ArcGIS with Python to resort to GDAL to read or write rasters. If you
> do
>> wish to use GDAL from ArcGIS 9.3 or 10, you must contend with an important
>> bug in ArcGIS that affects GDAL; see
> http://trac.osgeo.org/gdal/ticket/3672.
>> I am corresponding with the ESRI developer who is investigating this. He
>> expressed an interest in fixing this in Arc 10 SP1 but has not committed
> to
>> doing so at this time.
>>
>>
>>
>> To work around that, when creating Python-based geoprocessing tools that
> use
>> GDAL, you must disable the “Run Python script in process” option in the
>> ArcGIS UI. That has certain implications on performance but is probably
> fine
>> for most people. I developed a hack that works around the ArcGIS bug that
>> allows you to enable the “Run Python script in process” but unless you are
>> pretty savvy with Python (comfortable writing C extension modules) I do
> not
>> recommend that approach. If anyone is interested I can provide an example.
>>
>>
>>
>> Best,
>>
>> Jason
>>
>>
>>
>> From: gdal-dev-bounces at lists.osgeo.org
>> [mailto:gdal-dev-bounces at lists.osgeo.org] On Behalf Of Discourse Maps
>> Sent: Wednesday, August 11, 2010 10:38 AM
>> To: gdal-dev at lists.osgeo.org
>> Subject: [gdal-dev] ArcGIS 10
>>
>>
>>
>> I hear the new ArcGIS10 has GDAL and NumPy built into the geoprocessor.
> If
>> this is true, does that mean that users will not have to install the
> various
>> Python library bindings GDAL, numpy, etc. after a full Arc10 install?
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> .
>


More information about the gdal-dev mailing list