[gdal-dev] extract information from polygon shapefile based on raster grid?

Chao YUE chaoyuejoy at gmail.com
Thu Apr 4 08:15:52 PDT 2013


Dear Jan and Doug,

Thanks for all these ideas. Finally I report back.
Finally I find a point shapefile data equivalent to the previous ploygon
ones.
So I take the easy approach (which must be computationally unoptimized):
1. read all the attribute data to a csv or pandas dataframe object (
http://pandas.pydata.org/)
2. search each (latitude,longtitude) pair, then assign to the column
"geoindex" the tuple which is used to
    index the final numpy array which is going to store the statiscal
information data.
3. use pandas.DataFrame.groupby('geoindex').agg(np.men, np.std, np.sum)
etc, to generate the
    generate the statiscal information I want.
4. then create an empty numpy array which conforms to the geospatial extent
of the area but with 0.5 degree
    resolution, and assign the array element values by uisng the geoindex
information in the dataframe.

if anybody is interested in this simple way, I put below a function used to
extract the statistical information.

def
dataframe_extract_statistic_info(df,target_field=None,groupby_field='geoindex'):

"""
    Extract information for target_field in dataframe "df" grouped
by

groupby_field.



Arguments:

----------
    1. target_field: the field for statistical
information.
    2. groupby_field: the field used to group
data.



Notes:

------
    1. extracted statstical information
are:
        sum, mean, std, 95percentile, 50percentile(median),
5percentile
    2. the groupby method will drop the null row in the
groupby_field

automatically.

"""
    df_size_grp =
df.groupby(groupby_field)
    dft =
df_size_grp[target_field].agg([np.sum,np.mean,np.std])
    df_number =
df_size_grp.size()
    dft['number'] =
df_number


    dflist =
[]
    statnamelist =
['per95','median','per05']
    perlist =
[95,50,5]
    func_dict =
{}
    for name,per in
zip(statnamelist,perlist):
        dflist.append(df_size_grp[target_field].agg({name:lambda x:
np.percentile(x.values,per)}))

dflist.extend([dft])
    dfall =
pa.concat(dflist,axis=1)
    return
dfall


cheers,

Chao


On Thu, Mar 28, 2013 at 1:39 PM, Newcomb, Doug <doug_newcomb at fws.gov> wrote:

> Chao,
> Depending on how detailed you want your stats to be, it might be easier
>  run the zonal statistics plugin in qgis or run v.rast.stats in GRASS 70.
>
>
> Doug
>
> On Wed, Mar 27, 2013 at 6:38 PM, Jan Heckman <jan.heckman at gmail.com>wrote:
>
>> Hello Chao,
>>
>> I may not understand your wish quite well, but I think you need some
>> datastructure which conforms to the elements of the grid and receives
>> addtional information derived from a number of shapes with attributes.
>> Your output could be another shapefile,  with the summed results as
>> attributes, like the number of fires.
>> In the case that the only spatial relationship between shapes and
>> gridelements is whether or not a shape falls inside, you can indeed
>> simplify the polygon shape to pointshapes (centerpoints) which can be
>> converted to a raster of 1's and 0's. Then you could use a dictionary to
>> put the shapeattributes in the right place. I see a problem when several
>> shapes fall into the same grid-element, having different attribute values.
>> So you would be better off with a way of rasterbuilding, where each
>> attribute is summed per gridelement over all the shapes that fall inside.
>> Such routines exist at least in several programs.
>> Alternatively, you can make a polygon shapefile which conforms to the
>> rasterelements, then do a spatial join between your (centerpoint) shapefile
>> and and the rastershape (effectively adding a field to the pointshapes
>> which is the index into the grid), then add the several attribute data per
>> gridelement by e.g. making a pivot table.
>> If you have to work with shapes that are larger than the gridelements,
>> you would need another solution; there would be a choice between sampling
>> the shapes or distributing shape attributes proportionally to calculated
>> overlap with (polygonal) gridelements.
>>
>> So, more simply still,  assuming the grid is regular and it is possible
>> to compute the gridindex from pointcoordinates:
>>
>> Add an index field to the centerpoint shape which has the attributes;
>> calculate the gridindex from the coordinates and put this into the index
>> field;
>> bring the table e.g. into excel;
>> do a pivot table for the attributes (count or sum).
>> If necessary, add the centerpoint coordinates to the resulting table;
>> convert the table into a new point shape with the desired attributes.
>>
>> Hope I got you right and this helps,
>> Jan
>>
>> On Wed, Mar 27, 2013 at 9:52 PM, Chao YUE <chaoyuejoy at gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> I am using the python binding of GDAL for reading polygon shapefiles.
>>>
>>> I have a polygon shapefile which contains the fire polygon and burned
>>> area, with typical fires burning on 10-100ha.
>>> my model simulation result is based on 0.5degree X 0.5degree. So I need
>>> to extract from this shapefile on each gridcell
>>> of the 0.5d X 0.5 grid the information about like number of fires and
>>> mean fire size etc.
>>>
>>> The approach I am thinking currently is to calculate the center of each
>>> of fire polygon, and check which 0.5d X 0.5d gridcell
>>> it falls in, then build a huge dictionary with key as the numpy index of
>>> the 0.5dX0.5d array, with the dictionary values as
>>> the list(or array or pandas dataframe) which include the information of
>>> all the polygons falling within the the same grid cell.
>>>
>>> I don't know if there is a better approach than this? I appreciate any
>>> comments.
>>>
>>> best regards,
>>>
>>> Chao
>>>
>>> --
>>>
>>> ***********************************************************************************
>>> Chao YUE
>>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
>>> UMR 1572 CEA-CNRS-UVSQ
>>> Batiment 712 - Pe 119
>>> 91191 GIF Sur YVETTE Cedex
>>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>>>
>>> ************************************************************************************
>>>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
>
>
> --
> Doug Newcomb
> USFWS
> Raleigh, NC
> 919-856-4520 ext. 14 doug_newcomb at fws.gov
>
> ---------------------------------------------------------------------------------------------------------
> The opinions I express are my own and are not representative of the
> official policy of the U.S.Fish and Wildlife Service or Dept. of the
> Interior.   Life is too short for undocumented, proprietary data formats.
>



-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130404/c64e30d9/attachment-0001.html>


More information about the gdal-dev mailing list