[gdal-dev] RFC 39

Ari Jolma ari.jolma at gmail.com
Thu May 3 02:34:57 EDT 2012


On 05/02/2012 09:56 PM, Jason Roberts wrote:
> Hi Ari,
>
> I have been following this proposal with interest. I have a couple of
> questions; sorry to bring them up late in the RFC process.
>
> It looks like the inspiration for this was the Overlay toolset in the ArcGIS
> geoprocessing toolbox. That is nice for people like me who regularly work
> with ArcGIS; it increases the possibility of being able to replace
> automation currently based on ArcGIS with something based on GDAL. Assuming
> you are aiming for basic parity with ArcGIS:

ArcGIS is an example which is assumably pretty good, and I'm not aware 
of any standards or better ones so that's the reason. Personally, I 
don't use ArcGIS, so I can't say much.

>
> 1. Would you consider implementing a Symmetrical Difference method? It is
> the only one from the ArcGIS Overlay toolset that you did not implement
> (except Spatial Join, which does not really apply here).

I'd be happy to. Can you point me to a description of the method?

>
> 2. In ArcGIS, the Intersect and Union tools support multiple inputs. Is that
> something you would consider implementing? I don't have a strong opinion on
> this. It is occasionally necessary to use all of these tools with multiple
> inputs (e.g. erase several layers from a starting layer). I'm not
> immediately sure why ESRI elected to support multiple inputs for two of
> their tools but not more of them.

We've chosen to implement these tools as C = A.method(B), i.e., they 
always produce a result layer and not change A nor B (the C has to be 
created before the actual method). For Update, Clip and Erase it might 
make sense (as the schema of the result layer is only that of input 
layer) to have the method be an in-place method: A.method(B), i.e., 
change A. Then it would be trivial to write

for all B in (a list of method layers)
     A.method(B)

I would in fact support this syntax as it is more versatile and saves 
memory in some cases. The downside is that it does not work with 
read-only data sources and thus is probably not suitable for GDAL.

In Intersection, Union and Identity the schema of the result may contain 
attributes from the method layer, thus IMO they do not fit so well into 
the in-place syntax.

In the current syntax, to run a method on several method layers:

for all B in (a list of method layers)
     C = new Layer like A
     A.method(B, C)
     delete A
     A = C

It really seems to me that this would be practical only for memory 
layers. Maybe the support for multiple layers should be left for the 
command line tool or scripting language layer.

For example I'll probably build a support into the Perl bindings for the 
in-place syntax.

>
> 3. Will these functions be exposed from the Python bindings?

They are already, if you apply the patch. The test code is written in 
Python.

Thanks for the interest.

Ari

>
> Best regards,
>
> Jason
>
> -----Original Message-----
> From: gdal-dev-bounces at lists.osgeo.org
> [mailto:gdal-dev-bounces at lists.osgeo.org] On Behalf Of Ari Jolma
> Sent: Wednesday, April 25, 2012 9:33 AM
> To: gdal-dev at lists.osgeo.org
> Subject: [gdal-dev] RFC 39
>
> Folks,
>
> I'll start a new thread for this RFC.
>
> I've uploaded a new patch, which hopefully corrects the issues Even brought
> up. Below are comments.
>
> I've not added the Append or Buffer methods, which basically work on just
> one layer. I believe such functionality is simple enough to create using
> feature iteration.
>
> Ari
>
> the patch:
> http://trac.osgeo.org/gdal/attachment/wiki/rfc39_ogr_layer_algebra/rfc39.pat
> ch
>
>
> On 04/19/2012 02:04 PM, Even Rouault wrote:
>>> http://trac.osgeo.org/gdal/wiki/rfc39_ogr_layer_algebra
>>>
>> Ari,
>>
>> I have reviewed your proposal and it looks interesting. Here are my
>> observations
>> :
>>
>> 1) Argument order. A.Operation(B, C) where C is the target layer
>> doesn't seem very intuitive, but I'm not sure I have something
>> fundamentally better to propose. An alternate solution would be to
>> make the operations static methods, and have
>> OGRLayer::Operation(output_layer, input_layer, method_layer) ? I'd
>> suggest at least to rename B as method_layer and C as result_layer to be
> more explicit on the role of each argument.
>
>
> I've changed the names to Input, Method and Output. For example the layer
> object pointers are pLayerInput etc.
>
>
>
>> 2) It would perhaps be prudent to add a char** papszOptions argument as a
>> provision for future needs.
> I've added it.
>
>
>> 3) I'm wondering if it would not be more appropriate to separate the
> creation of
>> fields of the output layer in a separate method that might be called, or
> not,
>> before the real operation. For example IdentifyPrepare() for Identity().
> Of
>> course this would complicate a bit the implementation of the operation
> because
>> it will need to look for which fields exist in the output layer and how
> the
>> input and method layer fields map to those fields. But the establishment
> of the
>> mapping and the setting of the output feature from the fields of the input
> and
>> method features could be made in common methods
>>
>>      This way the XXXXXPrepare() would not need to check that CreateField()
>> succeeds because XXXX() could work with an output layer with no fields at
> all.
>> Checking that there are not fields with same name would not be necessary
> (in
>> that case the value from B would override the value from A).
>
> I've created a new method into Feature class: SetFieldsFrom, which uses
> a map. It is similar to SetFrom, which uses a map. (thus the field copy
> could be optimized into one function)
>
> The layer methods now first check the existence of fields in the result
> layer. If there are fields, it is assumed that that is what the user
> wants. If there are no fields, a complete set of fields are created for
> fields in input and method layers (in some cases only input layer). This
> behavior could later be overridden with options.
>
>
>> 4) It would be nice to able to provide a progress callback to get progress
>> report (and also being able to interrupt) for such potentially long
> operations.
>
>
> I agree. I've added these to the API, but I've not yet added the calls.
> This requires for example the inclusion of gdal.h into ogr_api.h.
>
>
>> 5) A few remarks on the implementation itself (only code review, no actual
>> testing). You can likely wait for comments from others before heading to
> fix
>> them...
>>
>> a) CreateField() and CreateFeature() return code should be tested and
> acted
>> upon. Likely interrupt the whole process ? (not strictly necessary for
>> CreateField() is you adopt the idea developed in point 3))
>
> I've added tests. I've used the "goto done;" style, where done is a
> label just before the release of used resources and exit from the
> method. I've tried to be careful to leave no memory leaks.
>
>
>> b) I think there's something wrong with OGRGeometry *Bfilter =
>> B->GetSpatialFilter(). When you later do
>> B->SetSpatialFilter(x_geom->Intersection(Bfilter)), I believe it will
> destroy
>> the previous installed filter, and thus Bfilter will point to a garbage
> object.
>> You likely need to clone the initial spatial filter (and delete it after
>> restoring it at the end of the method)
>
> Yes. I checked all calls to GetSpatialFilter and now clone the initial
> one and destroy it at the end.
>
>
>> c) Each time you do a new OGRFeature(), you need do delete it (after the
> call to
>> CreateFeature()).
> They are now deleted.
>
>> d) GetGeometryRef() can return NULL (even for spatial layers). Currently
> it is
>> never checked, and there are a few places where it is dereferenced.
> The validity of the geometry is checked. If there is no geometry, the
> next feature is fetched.
>
>> e) Lines like x_geom_diff = x_geom_diff->Difference(y_geom) (or geom =
>> geom->Union(y_geom)) will leak the initial x_geom_diff object. You likely
> need
>> to do :
>> 	OGRGeometry* x_geom_diff_new = x_geom_diff->Difference(y_geom);
>> 	delete x_geom_diff;
>> 	x_geom_diff = x_geom_diff_new;
> Fixed.
>
>> f) The object returned by GetNextFeature() should be deleted after use.
> It is deleted, even if the iteration is continued.
>
>> g) You should be careful with SetGeometryDirectly(). You cannot pass
> directly
>> the geometry belonging to another feature because this will segfault when
> you
>> free the said feature.
> Fixed
>
>> h) Nothing wrong, but calls to UnsetField() are unnecessary for a feature
> that
>> was just created and never set before (all the lines z->UnsetField(...) )
> The unnecessary code is removed.
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev



More information about the gdal-dev mailing list