[Java-collab] First contents in osgeom repository

Tue Aug 11 13:58:39 EDT 2009

Jody Garnett wrote:

> Can we get a bit more direction on what feedback; like/dislike you
> would expect at this stage? Without running code it is a bit hard for
> me to give grounded feedback - and I am hesitant to comment on a lot
> of things without test cases and experimentation to verify :-)

Actually, your feedback is quite what I was aiming for - I would like others to scan the described concepts and the code
to find obvious mistakes or design issues that you think are definitely bad.

After discussing and/or cleaning these issues I would suggest to get to the next phase very quickly, which should focus
on the creation of geometry objects and an implementation of the geometry types (which will allow to represent the
geometries, but not provide implementations of the operations). At that point we will be able to actually work with it
and already use it for importing/exporting of arbitrary GML geometry objects. After that we should see how to put the
operators to use in an implementation (probably using JTS). Note that all of this already works in deegree 3, but I
would rather import and refactor the functionality step-by-step so we have it reviewed. If you want to see it in action
right away, I suggest to check out the deegree 3 core module and have a look at the tests at
https://svn.wald.intevation.org/svn/deegree/deegree3/core/trunk/test/org/deegree/geometry/. There are test cases for all
kinds of GML constructs as well as some basic spatial analysis tests.

Concerning interfaces vs. classes: I agree that it's good to keep it as simple as possible, so one should provide
classes whenever we can rule out that different implementations just don't make sense. Especially, for Point and
Envelope I would also like to have constructors alternatively to a factory:

Point p = new Point (0,1);

is just nicer to write than

GeometryFactory.getInstance().createPoint (0,1);

But I can very well imagine rather important use cases for having multiple implementations of the geometry types:

- A straightforward implementation that just provides beans for the interfaces.
- An implementation that is backed by JTS, i.e. that actually uses JTS objects to represent SFS geometries.
- Implementations that are backed by Oracle / PostGIS geometry types

I would opt for keeping the interfaces ease experimenting with different implementations.

I also think that providing factories can make it easier for novice users to approach the quite intricate geometry type
system. In deegree 3, we currently have a small factory for people with simple geometry needs (SFS):

- https://svn.wald.intevation.org/svn/deegree/deegree3/core/trunk/src/org/deegree/geometry/SimpleGeometryFactory.java

and one for those who need all the complex stuff

- https://svn.wald.intevation.org/svn/deegree/deegree3/core/trunk/src/org/deegree/geometry/GeometryFactory.java

> I can think of a couple of things to do in order to review these
> interfaces. Check them against GML, check them against ISO19107 etc..

I think it would be nice to have a matrix that explains the mapping between our types, GML 2, GML 3.1.1, GML 3.2.1 and
ISO 19107. Any volunteers?

>> The interfaces provide a faithful representation for all kinds of
>> geometry elements that are allowed according to GML 3.1.1 (except for
>> the two elements gml:_ImplicitGeometry + gml:Grid which are still
>> missing). Note that the geometry elements in GML 3.2.1 are a subset of
>> GML 3.1.1 (I think they basically deprecated gml:MultiPolygon and
>> gml:MultiCurve and removed gml:Ring from the geometry substitution
>> group), which is why the interface design refers to 3.1.1 rather than
>> 3.2.1 (comments?).
> 
> Interesting. In general fewer classes are better then more; it may be
> worth providing a factory that is willing to create a "ring" or
> "multicruve" and produce the correct thing. This would give us a
> chance to document how to represent these constructs.

No question -- fewer are better. We could just strip the MultiPolygon and MultiLineString types out and use MultiSurface
and MultiCurve for representing gml:MultiLineString and gml:MultiPolygon. Here's the use case that motivated me to add
the two types in the first place:

At the moment

1. gml:MultiPolygon gets parsed -> MultiPolygon object
2. MultiPolygon gets exported -> gml:MultiPolygon

If we only had MultiCurve:

1. gml:MultiPolygon gets parsed -> MultiCurve object
2. MultiCurve gets exported -> gml:MultiCurve

Originally, I thought that it would be important to keep the information whether it was a MultiLineString or a
MultiCurve to allow a most faithful representation without ambiguities. However, I now also tend to remove the
additional types, as this use case does not seem to be that important after all and one could also implement heuristics
in GML exporters that test the contents for the most appropriate representation...

> Although not explicitly mentioned (because we are at interfaces right
> now) it is important to me to separate out construction from
> validation. This was one of the main lessons learned from our earlier
> work.

Agreed. Btw, in the deegree3 core SVN repository, there is also a rudimentary validator (which works separated from the
construction).

> Although not mentioned in your list (and I don't have the code to
> check at work here). I would like to comment on Point:
> - I want to be sure we don't have getX() and getY(). The have caused
> so much confusion in letting people assume x/y even when their
> coordinate system is radial; or happens to be in y/x order etc... I
> would like to remove these methods and just have get( int
> )/getOrdiante( int ) etc...

You're absolutely right, I fixed that in the SVN already. I went for Point#get0(), Point#get1(), Point#get2() and
Point#get(int) because these are methods that are used very, very often and should have short names.

> - How to handle a Point as a Geometry? Or as a internal construct?
> This is the difference between treating it as a Geometry or treating
> it in the same manner as CurveSegment. We had a horrible confusion in
> GeoAPI with a Point (ie the Geometry) and a DirectPosition/Position
> (the internal construct which was set up to allow a Point to be shared
> between different Geometry and mix and match both uses in a single
> Curve). While the aim was noble - I would like to make sure we avoid
> any confusion :-) With this in mind I would like to make the following
> bold suggestion:
> a) make Point a Geometry (it appears to be GeometricPrimitve that
> extends Geometry
> b) make Points the internal construct (so even a Point would have an
> internal Points consisting of a single entry)
> c) add methods to Points so we can test if a member is a 1st class
> Point (ie shared with another Geometry)
> Now the above suggestion may be terrible; it is just placed here to
> get conversation going.

For the sake of simplicity, I would opt for using Point everywhere. To avoid the creation of large numbers of Point
objects, implementations of Point*s* can use "implicit" Point objects, i.e. one can retrieve an iterator to access the
contents as Point objects. I made an implementation that is backed by an array of doubles and does not create a new
Point object for every #next(), but reuses an instance that's associated with the iterator. See method #iterator() in

https://svn.wald.intevation.org/svn/deegree/deegree3/core/trunk/src/org/deegree/geometry/standard/points/PackedPoints.java

Note that GML allows the usage of anonymous and identified Point elements (with gml:id). If it has no gml:id, it can
easily be stored in an implicit fashion.

> 
>> - 1 CRS interface (org.osgeo.commons.crs.CRS): Placeholder interface
>> that allows to represent CRS information.
>> - 4 UOM related interfaces (org.osgeo.commons.uom): These are included
>> so some methods (e.g. Geometry#isWithinDistance(...)) can be expressed
>> in a unit-aware fashion. Comments?
> 
> Can we just decide to use JSR-275 now that it actually works?

We're evaluating it right now and it looks fine so far. However, we stumbled upon some minor questions in putting it to
use. Can you help out here?

>> - 1 ObjectProperties interface (org.osgeo.commons.ObjectProperties):
>> Placeholder interface for properties that can be attached to Geometry
>> objects. One important use case are the GML standard properties that
>> each GML geometry object allows for (e.g. multiple gml:metaDataProperty
>> elements, gml:name and gml:description properties).
> 
> I assume these are going to be lazily created as needed in order to
> avoid the memory overhead?

Sure. It could also be attached only if needed, so #getAttachedProperties() just returns null if the geometry has no
such properties.

>> - 1 PrecisionModel interface
>> (org.osgeo.geometry.precision.PrecisionModel): Placeholder interface for
>> a precision definition that can be attached to geometries. It's not 100%
>> clear to me yet how this should be defined and how to interoperate with
>> topological operators, spatial analysis and the exporting of geometries.
> 
> They amount to "rounding" with a bit of metadata to tell what kind of
> numeric field we are working ontop of (floating point; or fixed
> point). This information would really like to be stored with the CRS;
> indeed if I was in a greenfield situation I would record this
> information for each Axis. When you are implementing an operation you
> use the precision model to round the answer as you go; when you have
> performed "enough" recursion your rounded answer is the same as the
> rounded answer of the previous iteration. This sanity check is used to
> avoid doing more work then needed (ie why calculate to twenty decimal
> places if the WFS output format is only going to print out eight
> significant digits? It is also used to ensure that the resulting
> geometry is actually valid at the indicated precision - it is easy to
> make a geometry that is valid; and then serve it up with the correct
> precision and have it be invalid. Usual example is a line string that
> when served up has both the start and end point in the same location.
> This concept also effects intersection; where the lines are produced.
> Internally it can notice that the start and end point are the same;
> and provide a Point as a result rather then a LineString.

O.k.

> 
>> - 2 beans that represent parameter sets needed for the definition of
>> certain curve segments:
>> org.osgeo.geometry.primitive.segments.AffinePlacement and
>> org.osgeo.geometry.primitive.Knot. I think that it doesn't make sense to
>> allow different implementations here, and therefore these are classes,
>> not interfaces.
> 
> Small comment:
> - KnotType is an enum - we may need an open set of types here? Rather
> then a closed set up enumerations. Do you have anything similar to a
> CodeList? Basically before Java 5 Enum provided syntactic sugar it was
> always possible to set up an object like an Enum; and when you do it
> yourself you have the option of adding more constants over time.

IMHO, we don't need an open set here. Is there any difference between the other enums (e.g. CurveType) and KnotType? If
there's coming more in GML 4, we just would have to extend the enum types. I think with open enums you lose the main
benefit: the certainty that you are really handling all possible cases in a switch.

> - Just for consistency refactor KnotType into an internal class (ie
> Knot.Type) in order not to pollute the file system with what is an
> detail of the Knot class.
> - Geometry.GeometryType can also be changed to Geometry.Type - and
> should also be an open ended set?
> - MultiGeometry.MultiGeometryType?
> - Curve.CurveType? especially this one

Agreed.

> - Primitive.PrimitiveType - this one is especially interesting as it
> should be the D in your DIM-9 matrix; as such I would be tempted to
> supply my own ordinal values here in order to make any relate code
> much easier to read.

Interesting. How would this look like?

> For many of these "types" I am wondering why they are required as they
> seem to duplicate information already present in the class hierarchy?
> Example Geometry.GeometryType.ENVELOPE is duplicated by the Envelope
> class is it not?

Correct. It's a convenience thing, but there's also a very important technical problem that they solve:

In order to cope with xlinked geometries (e.g. a MultiGeometry that references it's member's contents by xlinks), we
wrote Proxy geometry types:

https://svn.wald.intevation.org/svn/deegree/deegree3/core/trunk/src/org/deegree/geometry/gml/refs/

These allow for easy handling of document internal xlinks and the lazy resolving of external xlinks -- e.g. if a
geometry containing xlinks is traversed, the referenced external geometries can parsed on the fly. However, if we only
know that a reference is of type gml:_Geometry, we can only use an unspecific GeometryReference proxy. And this makes
instanceof-switching impossible here. And this is solved by using the #getType() methods for discriminating the geometry
types. However, if we find any better way to deal with that challenge, we could of course remove them.

>> --- Notes on the geometry types ---
>>
>> Altough it deviates from the GML 3.1.1 geometry hierarchy, Envelope is
>> currently derived from Geometry (interestingly, in GML 3.0, gml:Envelope
>> was actually substitutable for gml:_Geometry). This was done for
>> pragmatic reasons, as we observed that one often has to deal with a
>> Geometry or an Envelope (e.g. in Filters). By making them derive a
>> common interface, such cases are simpler to be implemented/used. If we
>> don't have such a common type, many, many methods have to be defined and
>> implemented twice (e.g. Geometry#intersects(Geometry) and
>> Geometry#intersects(Envelope)). Alternatively, we could create a common
>> base type above Geometry/Envelope (e.g. GeomOrEnvelope). What do you think?
> 
> - getWidth() and getHeight() shoudl be replaced with a getSpan( int )
> to avoid any CRS based confusion

Agreed. If you like, go ahead and change it.

> - having it as a Geometry is fine; may wish a toSurface() method to
> help people duck out of corner cases? Actually a factory method that
> will turn a Envelope into a Surface would be smarter and less of an
> API Burden?

I opt for Envelope#toSurface(), as common use cases should be achievable in a most convenient way.

 > That is tough; an ease of use vs explicit tradeoff...
> - For cases where a new Geometry is produced I would recommend asking
> people to do the transformation themselves? That way it is very
> obvious what CRS the result is in. If not we would get order dependent
> results which would drive me a bit baty?

Good point.

> - If we ask people to do that for getUnion() then I would recommend
> asking them to do the same steps for intersects for consistency.
> - The methods are not consistent: isBeyond follows java bean naming;
> intersects does not. It is also odd for me to see getBuffer and
> getUnion follow java bean naming conventions when these things are not
> infact properties. I would prefer to see:
> - use java bean naming for the actual data model ... getId(),
> getCoordinateSystem()
> - use raw method names for operations and calcualtions: buffer(
> measure ), centroid(), intersects( geometry ), touches( geometry ),
> transform( CRS )
> - I would be open to a naming convention: toGeometry( CRS crs ) -
> rather than transform; toCentroid(); queryTouches( Geometry ) etc...
> but only if it adds something worth typing extra characters for

I see the inconsistencies. I will check against the method names in the ISO spec. and report back.

> 
>> - The common batch of spatial analysis operations (e.g.
>> #getIntersection(Geometry) or getConvexHull()). How should one treat
>> different CRS here?
>> - #getGeometryType() provides a way to differentiate the type of a
>> geometry object. Note that there are comparable methods on the deeper
>> levels of the hierarchy (e.g. Curve#getCurveType()). This was motivated
>> by the idea to ease the working with the (quite complex) hierarchy, for
>> example in drawing code. I am not a big fan of using instanceof tests
>> (as they are inherently unsafe), so I thought it would be nice to
>> provide a way to use the more safe and convenient switch statements for
>> this (safer,  as Eclipse creates warnings if you leave out any choices).
>> What do you think?
> 
> Comments about this are above; I understand you are not found of
> instance of tests; this would be a good use of a utility class I
> guess? Even with this in mind you can make a getType() method return a
> Class and use a swtich statement on the class with no loss of clarity.
> 
>> - #getId() / #setId(). As ids are very integral to GML, I provided
>> methods for working with the identifier.
> 
> We have been forced to make up or own Identifier - in response to
> GMLObjectIdentifier in order to handle namespace issues. Will a simple
> string be sufficient here?

Interesting. Can you elaborate why a string does not suffice here? In GML, they are just defined as XML IDs, so the only
restriction is that they must match the NCName production rule (and are unique in a document).

> 
>> - #getCoordinateSystem() / #setCoordinateSystem(...)
> 
> I will point out that getCoordinateSystem is not actually a coordinate
> system (ie Cartesian, radial). Can we shorten this to getCRS()?

Sure.

> 
>> - #getAttachedProperties() / #setAttachedProperties(...) for coping with
>> the GML standard properties that each GML geometry object allows for
>> (e.g. multiple gml:metaDataProperty elements, gml:name and
>> gml:description properties).
> 
> Commented above about lazy creation; I would consider leaving this as
> an implementation details (ie the creation of an internal Properties
> object and/or Map) and adding methods for getName(), getDescription()
> etc... although it depends on how many of these we expect to ahve?

Just would like to throw in that we should avoid to make this too GML (and especially GML-version) centric.

> Hopefully the above is some feedback to start things off? I would like
> to define the factory API as I find it hard to image how these
> interfaces are used on their own. I notice that the gvSig
> representative recommended the split between class and interface.
> After years of watching JTS succeed I wonder if we should consider
> forgoing this split? We could simply define classes for these data
> structures; as long as we have separated out the operation
> implementation we could do this with no loss of flexibility; and the
> resulting Java library would be *much* easier for people to use. There
> are many downsides of course - Andrea beloved Points / PointArray /
> CoordianteSequence construct is very important - I believe we would
> need an interface in this case.

Thanks for the valuable input, Jody. For the details on interface vs. class see above.

Best regards,
Markus

-- 
Markus Schneider

l a t / l o n  GmbH
Aennchenstrasse 19           53177 Bonn, Germany
phone ++49 +228 184960       fax ++49 +228 1849629
http://www.lat-lon.de        http://www.deegree.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
Url : http://lists.osgeo.org/pipermail/java-collab/attachments/20090811/9ac71c61/signature.bin