[gdal-dev] ArcGIS 10

Jason Roberts jason.roberts at duke.edu
Wed Aug 11 23:38:31 EDT 2010


Francis,

The ticket that I referenced (http://trac.osgeo.org/gdal/ticket/3672)
contains comprehensive details. In short: prior to ArcGIS 9.3, ArcGIS
executed Python geoprocessing tools by creating a separate python.exe
process with command line arguments specifying the script to run and its
parameters. In 9.3 ESRI introduced the capability of running the tools in
the ArcGIS process itself (ArcCatalog.exe or ArcMap.exe). The ArcGIS
processes embedded the Python interpreter using Python's C interface; see
the Python docs for good info about how embedding is done. This provided
performance advantages because a new process does not have to be created
every time a tool is run and because the communication that occurs between
the tool and the code that starts and monitors it does not have to occur as
cross-process calls.

Presumably to make things even faster, the ArcGIS processes instantiate a
single Python interpreter when they start up and reuse this interpreter each
time an in-process Python tool is run. But in between tool runs, a bug in
ArcGIS causes it to remove all of the items in Python's sys.modules
dictionary; apparently ArcGIS is calling Py_Initialize multiple times.
sys.modules maps the names of Python modules loaded when Python "import"
statements are executed to the actual objects representing the modules in
memory. When the next Python tool runs and tries to import modules, the
interpreter checks whether those modules are in sys.modules yet. Because
they are not, the interpreter loads them again. This produces unpredictable
results, depending on what the modules do.

For most modules, it works ok and may just result in memory leaks. (I have
not traced through enough of the Python interpreter source code to know for
sure about that.) But for GDAL and other complex Python packages, it causes
problems. In GDAL's case, several of the osgeo modules implemented in Python
code (.py files) have a relationship with modules implemented in C (.pyd
files). In particular, gdal_array.py calls functions implemented in
_gdal_array.pyd. _gdal_array.pyd stores a pointer to the gdal_array.py
module so it can access things defined in it. When ArcGIS clears out
sys.modules, gdal_array.py must be imported again and now exists at a new
memory address. _gdal_array.pyd is also imported again, but because of how
Python loads modules written in C, it's C "init" function is not called
again so it is stuck with the pointer to the old gdal_array.py.

Eventually the geoprocessing script wants to read or write data from or to a
raster. It calls Band.ReadAsArray or Band.WriteArray. Those functions
ultimately cause gdal_array.py (the new one) to call _gdal_array.pyd and
pass in a Band object that comes from the new .py files in sys.modules. But
_gdal_array.pyd is expecting a Band object from the old .py files and raises
an exception saying that the object passed in was not a Band instance. (See
the ticket for the exact message.)

My workaround is to save the contents of sys.modules to a cache after the
geoprocessing tool runs and restore it when the script starts up again,
basically:

import sys, _GeoEcoArcGISHelper
_GeoEcoArcGISHelper.LoadModulesFromCache()
try:
    # The real work of the script comes here.
    ...
finally:
    _GeoEcoArcGISHelper.AddModulesToCache(sys.modules.keys())

The cache is implemented in a module called GeoEcoArcGISHelper which written
in C. (The name of my Python project is GeoEco.) Here is a link to that file
in a svn repository:

http://code.nicholas.duke.edu/projects/mget/browser/MGET/Branches/Jason/Pyth
onPackage/src/GeoEco/_GeoEcoArcGISHelper.cpp

Pretty ugly but it seems to work. I reviewed it with the ESRI geoprocessor
developer that I'm working with; he agreed it was ugly but said it should
work. Normally with Python you are not supposed to tamper with sys.modules
directly but I looked at the Python 2.6.5 code and it seemed like it would
not have any ill effects. No promises, however...

Jason

-----Original Message-----
From: Francis Markham [mailto:fmarkham at gmail.com] 
Sent: Wednesday, August 11, 2010 10:14 PM
To: Jason Roberts
Cc: Discourse Maps; gdal-dev at lists.osgeo.org
Subject: Re: [gdal-dev] ArcGIS 10

Thanks for your encyclopedic response Jason!  If you don't mind
detailing the problems with running GDAL scripts in process, and your
work-around for avoiding it, I would be very interested.

Cheers,

Francis

On 12 August 2010 02:49, Jason Roberts <jason.roberts at duke.edu> wrote:
> ArcGIS 10 installs Python 2.6 and numpy 1.3.0. It does not install GDAL
with
> Python bindings. It includes a new Python module from ESRI called ArcPy.
> This module supposedly provides the capability of reading and writing
raster
> layers in the form of numpy arrays, similar to GDAL’s Python bindings. I
> have not played with it much—only very briefly in Arc 9.4 Beta 2, when it
> did not seem to work very well. If you search the Arc 10 online
> documentation you can probably find more information.
>
>
>
> In principle, the ArcPy module may make it less necessary those developing
> for ArcGIS with Python to resort to GDAL to read or write rasters. If you
do
> wish to use GDAL from ArcGIS 9.3 or 10, you must contend with an important
> bug in ArcGIS that affects GDAL; see
http://trac.osgeo.org/gdal/ticket/3672.
> I am corresponding with the ESRI developer who is investigating this. He
> expressed an interest in fixing this in Arc 10 SP1 but has not committed
to
> doing so at this time.
>
>
>
> To work around that, when creating Python-based geoprocessing tools that
use
> GDAL, you must disable the “Run Python script in process” option in the
> ArcGIS UI. That has certain implications on performance but is probably
fine
> for most people. I developed a hack that works around the ArcGIS bug that
> allows you to enable the “Run Python script in process” but unless you are
> pretty savvy with Python (comfortable writing C extension modules) I do
not
> recommend that approach. If anyone is interested I can provide an example.
>
>
>
> Best,
>
> Jason
>
>
>
> From: gdal-dev-bounces at lists.osgeo.org
> [mailto:gdal-dev-bounces at lists.osgeo.org] On Behalf Of Discourse Maps
> Sent: Wednesday, August 11, 2010 10:38 AM
> To: gdal-dev at lists.osgeo.org
> Subject: [gdal-dev] ArcGIS 10
>
>
>
> I hear the new ArcGIS10 has GDAL and NumPy built into the geoprocessor. 
If
> this is true, does that mean that users will not have to install the
various
> Python library bindings GDAL, numpy, etc. after a full Arc10 install?
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>



More information about the gdal-dev mailing list