[GRASS-dev] pygrass is into GRASS trunk

Wed Oct 10 06:22:55 PDT 2012

Dear GRASS Developers,

I am pleased to announce that the pygrass library, developed during
the Google Summer of Code 2012, is now on GRASS trunk and need to be tested.

Before to continue the announcement, if you prefer the html version of this
message please see: https://gist.github.com/gists/3864411

The pygrass library has been developed take into account three main aspects,
in the order:

    1. *simplicity*, GIS it's an already complex subject, and we want to make
       the API as simple/natural/intuitive as possible but at the same time
       trying to respect the GRASS work-flow and nomenclature to make it easier
       to find information on the GRASS developer manual;
    2. *flexibility* even if pygrass want to provide an high level map
       interaction, internally used some classes that work with the
       GRASS API at very low level, such as Segment, Bbox, Ilist classes.
       Every time that a C struct has been used by a class, to leave the user
       free to implement their own logic, the ctypes pointer of the C strct
       it's available under the attribute name that start with "c_*";
    3. *performance* pygrass makes heavy use of the C GRASS functions
       every time that this is possible.

The pygrass library is divided in three main parts: Raster, Vector, and Modules.
Below some small samples, of an interactive python shell, are reported to better
understand how users and developers could interact with the *pygrass* library.
All the following examples use the maps present in the North Carolina mapset.

Raster
======

All the raster classes read and write meta-data like: categories and history.
The *pygrass* interface for raster maps is divided in 4 classes that represent
different ways to interact with rasters. In order to give greater freedom of
implementation to users and developers, each class uses a different C API,
providing the tools to fit different needs.

The **RasterRow** class reads the content of the raster row by row and writes it
in a sequential mode: row after row. Read and write the same map at the same
time is not supported by the RasterRow class.

The **RasterRowIO** class implements a row cache that allows users to read and
re-read raster rows randomly.

The **RasterSegment** class divides the map into several tiles, each tile is
saved into a file.With this class it is possible to read and write the pixel
value randomly at the same time in the same map.

The **RasterNumpy** class inherits from a *numpy.memmap* class and allows
users to interact with the map as numpy matrix.

All the Raster classes shared common methods to open, read and get
raster information.

Do simple operation with the *RasterRow* class: ::

    >>> from grass.pygrass.raster import RasterRow
    >>> elev = RasterRow('elevation')
    >>> elev.exist()
    True
    >>> elev.name  # get the raster name
    'elevation'
    >>> elev.mapset
    'PERMANENT'

Now open the raster in a read mode, get raster information and read the rows: ::

    >>> elev.open('r')  # open in read mode
    >>> elev.is_open()
    True
    >>> elev.range
    (55.578792572021484, 156.32986450195312)
    >>> for row in elev[:5]:  # show the first 5 rows
    ...     print(row[:3])    # show the first 3 columns of each row
    ...
    [ 141.99613953  141.27848816  141.37904358]
    [ 142.90461731  142.39450073  142.68611145]
    [ 143.81854248  143.54707336  143.83972168]
    [ 144.56524658  144.58493042  144.86477661]
    [ 144.99488831  145.22894287  145.57142639]
    >>> for row in elev[:5]:
    ...     print(row[:3] < 144)
    ...
    [1 1 1]
    [1 1 1]
    [1 1 1]
    [0 0 0]
    [0 0 0]

Open a new raster map, save, rename and remove: ::

    >>> new = RasterRow('new')
    >>> new.exist()
    False
    >>> new.open('w', 'CELL')  # open a new CELL map in write mode
    >>> for row in elev:
    ...     new.put_row(row < 144)  # write the boolean row
    ...
    >>> new.close()
    >>> new.mapset
    'user1'
    >>> new.name
    'new'
    >>> new.name = 'new_name'  # rename the map changing the attribute value
    >>> new.name
    'new_name'
    >>> new.exist()
    True
    >>> new.remove()  # remove the map
    >>> new.exist()
    False
    >>> elev.close()

Almost the same operations are available for the other Raster classes,
please see the documentation:
http://www.ing.unitn.it/~zambelli/projects/pygrass/raster.html

Vector
======

The *pygrass* interface defines two classes to work with vector maps, the first
class **Vector** loads a vector map without the GRASS topology, while the
**VectorTopo** class supports the topology. The *pygrass* library implements the
geometry features: **Point**, **Line**, **Boundary**, **Area**, **Isle**, each
class has its method to return vector information or to compute its properties
such as the length of a line, the distance between two points, or between a
point and a line, etc.

Here just small samples to better understand how it is now possible to interact
with the GRASS vectors and features are reported: ::

    >>> from grass.pygrass.vector import VectorTopo
    >>> municip = VectorTopo('boundary_municp_sqlite')
    >>> municip.open()
    >>> municip.number_of("areas")
    3579
    >>> municip.number_of("islands")
    2629
    >>> municip.number_of('pizzas')   # suggest the right vector type if wrong
    Traceback (most recent call last):
        ...
    ValueError: vtype not supported, use one of: 'areas', 'dblinks', 'faces',
    'holes', 'islands', 'kernels', 'line_points', 'lines', 'nodes',
'updated_lines',
    'updated_nodes', 'volumes'

Suppose that we want to select all and only the areas that have a surface bigger
than 10000 m2: ::

    >>> big = [area for area in municip.viter('areas')
    ...        if area.alive() and area.area() >= 10000]

Then it is possible to sort the areas, with: ::

    >>> from operator import methodcaller as method
    >>> big.sort(key = method('area'), reverse = True)  # sort the list
    >>> for area in big[:3]:
    ...     print area, area.area()
    ...
    Area(3102) 697521857.848
    Area(2682) 320224369.66
    Area(2552) 298356117.948

The *pygrass* library implements classes to access the Table attributes, to
manages the database connection, to make queries, to add and remove columns.

Read and write the **Link** connection with the database of a vector map: ::

    >>> municip.dblinks
    DBlinks([[Link(1, boundary_municp, sqlite)]])
    >>> link = municip.dblinks[1]
    >>> link.number
    1
    >>> link.name
    'boundary_municp'
    >>> link.table_name
    'boundary_municp_sqlite'
    >>> link.driver
    'sqlite'
    >>> link.database[-30:]
    'north_carolina/user1/sqlite.db'
    >>> link.key
    'cat'
    >>> link.name = 'boundary'
    >>> link.driver = 'pg'
    >>> link.database = 'host=localhost,dbname=grassdb'
    >>> link.key = 'gid'

>From the Link object it is possible to instantiate a **Table** object where the
user could make simple query with the **Filters** object: ::

    >>> table = link.table()
    >>> table.filters.select('cat', 'COUNTY',
'AREA','PERIMETER').order_by('AREA').limit(3)
    Filters('SELECT cat, COUNTY, AREA, PERIMETER FROM
boundary_municp_sqlite ORDER BY AREA LIMIT 3;')
    >>> cur = table.execute()
    >>> for row in cur.fetchall():
    ...     print repr(row)
    ... # cat, COUNTY, AREA, PERIMETER
    (1, u'SURRY', 0.0, 1415.331)
    (2, u'SURRY', 0.0, 48286.011)
    (3, u'CASWELL', 0.0, 5750.087)

In this way it is possible to work with table properties like name/rename the
table and add/remove/rename/cast the columns of the table: ::

    >>> table.name
    'boundary_municp_sqlite'
    >>> table.columns
    Columns([(u'cat', u'integer'), ..., (u'ACRES', u'double precision')])
    >>> table.columns.names()
    [u'cat', u'OBJECTID', u'AREA', u'PERIMETER', ..., u'ACRES']
    >>> table.columns.types()
    [u'integer', u'integer', u'double precision', ..., u'double precision']
    >>> table.columns.add('n_pizza', 'int4')
    >>> table.columns.names()[-1]
    u'n_pizzas'
    >>> table.columns.rename(u'n_pizzas', u'n_pizzas_per_person')
    >>> table.columns.names()[-1]
    u'n_pizzas_per_person'
    >>> table.columns.cast(u'n_pizzas_per_person', 'float8')
    >>> table.columns.items()[-1]
    (u'n_pizzas_per_person', u'float8')
    >>> table.columns.drop(u'n_pizzas_per_person')

For more examples with the Vector class, please see the documentation:
http://www.ing.unitn.it/~zambelli/projects/pygrass/vector.html

Modules
=======

The *pygrass* **Module** class is compatible with the *grass.run_command*
syntax. ::

    >>> from grass.pygrass.modules import Module
    >>> slope_aspect = Module("r.slope.aspect", elevation='elevation',
    ...                       slope='slp',  aspect='asp',
    ...                       format='percent', overwrite=True)

But it is possible to create a run able module object, change some attributes
and run later: ::

    >>> slope_aspect = Module("r.slope.aspect", elevation='elev',
    ...                       slope='slp',  aspect='asp',
    ...                       format='percent', overwrite=True, run_=False)
    >>> slope_aspect.inputs['elevation']
    Parameter <elev> (required:yes, type:raster, multiple:no)
    >>> slope_aspect.inputs["elevation"].value = "elevation"
    >>> slope_aspect.inputs["format"]
    Parameter <format> (required:no, type:string, multiple:no)
    >>> print slope_aspect.inputs["format"].__doc__  # get help for
the input parameter
    format: 'degrees', optional, string
        Format for reporting the slope
        Values: 'degrees', 'percent'
    >>> slope_aspect.inputs["format"].value = 'percents'  # manage and
check the errors
    Traceback (most recent call last):
        ...
    ValueError: The Parameter <format>, must be one of: ['degrees', 'percent']
    >>> slope_aspect.inputs["format"].value = 'percent'
    >>> slope_aspect.run()

Or it is possible to initialize a module, and give the parameters later, like a
python function: ::

    >>> slope_aspect = Module("r.slope.aspect")
    >>> slope_aspect(elevation='elevation', slope='slp',  aspect='asp',
    ...              format='percent', overwrite=True)

Moreover the *pygrass* Module allows the user to run GRASS modules in a
different process and to manage (wait/kill/terminate) the process, and to
manage/record the output of the standard error and the standard output, in a way
that is easier than using the actual Python API of GRASS. ::

    >>> slope_aspect = Module('r.slope.aspect')
    >>> slope_aspect(elevation='elevation', slope='slp', aspect='asp',
    ...              overwrite=True, finish_=False)
    >>> slope_aspect.popen.wait() # *.kill(), *.terminate()
    0
    >>> out, err = slope_aspect.popen.communicate()
    >>> print err
     100%
    Aspect raster map <asp> complete
    Slope raster map <slp> complete

For more examples with the Module class, please see the documentation:
http://www.ing.unitn.it/~zambelli/projects/pygrass/modules.html