[Java-collab] Introduction and a few suggestions on recent topics

Wed Aug 12 17:39:15 EDT 2009

Hi java-collab members,

let me quickly introduce myself:
My name is Matthias Basler and I am a German Java developer working in a GIS company (what surprise) using GeoTools to some extent. I will be known to some of the GeoTools/Udig fraction, since I created the German translation of Udig (and hope to continue to do so).

I am reading this list more or less passively (as I usually do) and I will very likely not take part in active development for several reasons, one being that our company hasn't yet chosen to support OS development and that dong so in my spare time would be too unreliable.

On the other hand I have, on a private level, played around quite a bit with geometries and such, so I hope I can bring in some experience. I am, however, well aware that my deeper knowledge of standards like GML is limited. (But I hope to improve on it ;-) )

Some tome ago I "played around" with an own geometry library for a few reasons:
(a) to find out what my personal preferences are, aka what would I prefer such the library to look like and
(b) to find out what could/should be done differently than in the implementations I knew (mainly Geotools) and what was solved well there.
(c) to compare different implementation approaches, e.g. JTS based vs. minimal storage vs. maximum performance.

My code is not publicly available currently, so I will try to outline below what I learned and experienced so far:

- Interfaces? Yes.

One thing that became quickly clear to me is that different implementations for classes are needed to avoid f.e. that "foreign" geometry objects (coming from a legacy geometry library etc) have to be unpacked and re-created, which means performance and memory overhead. Instead it should be possible to wrap them for direct use.

Also I realized that a library with a clear set of interfaces! is easier to understand and use for a beginner, because this design clearly splits up public and internal aspects. The implementations stuff can and should be hidden to the user.

- Factory/Builder? Yes.

In order to support the above point of hiding the implementation from the user a factory and/or builder is required. And yes, it should be just one or a few classes to avoid the user having to search for the correct factory. (Of course, developers should be free to call "new XYZImpl()" if they want, but programmser *using* the library should not be required to know that XYZImpl exists!)

In my case I arranged that the builder could be initialized with different "implementations" (i.e. produce two different types of geometry implementations)

I also equipped the builder with conversion functions (e.g. from JTS) for the convenience of the user. Depending on the geometry implementations the builder either actually converts the JTS object or just wraps it.

- Point/Envelope

Of course I also came across the question whether envelopes/bounds and coordinates are actually full-fledged geometries or not?
My solution for this was to have a design like this:

  - IGeometry
      (just any geometric object, includes CRS info)
  - IFeatureGeometry extends IGeometry
      (adds identifier and metadata)
  - Abstract classes on top of each,
    implementing shared functions

Geometric operations (like union, getBounds etc.) can be done with Geometry, but only IFeatureGeometry can actually be used in features, topologies etc.

Rectangular envelopes and coordinates would just implement IGeometry, but not IFeatureGeometry, and the rectangular envelope would not allow CRS transformations (for obvious mathematical reasons).

- CRS usage in operations

My preference is to perform CRS transformations where they can be implemented most efficiently - in the geometry implementation.

Therefore I chose to do geometry transformation "on the fly", with the CRS of the initial geometry being the relevant one.
So for geom1.union(geom2) the result would have the CRS of geom1. I also chose to allow an explicit target CRS when exporting to f.e. JTS or Java2D or when calculating the bounds of an object. IMHO more flexibility means more convenience for the user, in this case.

In any case: If the target CRS is the same (or not specified) optimizations kick in.

- Axis order

I tried to be VERY explicit here. So I used getOrdinate(), getOrd1(), getSpan() and so on, which accept the CRS axis order, and also stored the coordinates accordingly.

BUT for the convenience of the user I also added getX() (and so on) to the API and made it clear through the JavaDoc that getX() would return the actual value along the axis west-east, if the CRS defines such axis.
(See below for the CRS discussion.)

So I now have the "generic" and a well-defined XY-based solution. The latter is handy for conversion from/to JTS and other such tasks.

I also introduced a boolean "respectAxisOrder" flags to several methods to allow the API user to *explicitely* specify if f.e. an export to 2D JTS should keep an YX axis order (if defined so by the CRS) or should switch axes to XY in this case. IMHO this is better than making assumptions of what the user might want.

- CRS handling: CRSInfo suggestion

One problem was how to combine the following CRS requirements:
  1.) Take up as few space in the geometry as possible
      (memory efficiency)
  2.) How to avoid duplicate computations (e.g. which axis is X)
      (performance efficiency)
  3.) Avoid code duplication

I put a low of thoughts in this, but finally came up with a simple, clean solution I am very happy with. I simple created an immutable wrapper around the immutable GeoAPI CRS object. I called it CRSInfo.

CRSInfo has several methods for getting the actual CRS, computing axis order, lenient axis order, CRS dimensions, computing 2D CRS from 3D CRS and others. All computed information is cached. When queried for its CRS, the geometry just returns
myCRSInfo.getCRS().

Since CRSInfo is immutable I can easily share it between millions of geometries with almost no overhead. Each geometry only has a reference to its CRSInfo, no more. Assuming that an application uses only a handful of CRS at a time I even added a CRSInfo cache which further reduces duplicate computations.

- CRS API/implementation

The point currently worrying me most in this project here is how you will cope with CRS API/implementation(s). (Yes, I have read the IRC logs.)

So far I chose GeoAPI with no doubt that this sound API would become the de-facto standard. Now people on this list tell me the opposite. I am worried, to say the least.

My preference from a user's POV would be to not have some "least common denominator" wrapper interface, but directly reference an established API like GeoAPI.

At latest when you do the operations you will have to define some specific API (for coordinate transformation and such) and then I will see what you'll come up with. I hope you find a solution that is easy to use (for a user) and still flexible enough to work for all involved parties.

- Performance: LiteCoordinateSequence 

I've done several perfomance and memory tests and found that the GeoTools solution of LiteCoordinateSequence (storing coordiantes as double array) was optimal in most use cases, especially for storing large quantities of geoemtries in memory. I just extended the concept to cope with 3D coordinates as well. Works like a charm.

I then equipped said coordinate sequence with the CRSInfo described above and it could now quickly transform itself to another CRS. In most geometries I could forward most functions (like transformations, bounds calculation etc) down to the coordinate sequence, which would do the "hard work". I prefer this way of centralization of repeating tasks.

By allowing public read access to the raw array one can write optimized converters (see LiteShape & Co. in GeoTools).

- Type metadata: enumns suggested

I gatherer from the discussion that it is useful when classes know their type. While I have not yet come across such use need, I'd suggest to use enums for this, if indeed required. I find enums very powerful, because you can attach a lot of metadata to enums (such as translations, dimensionality, maybe calculations, ...) and get a clear API.

- GeoAPI compatibility possible?

May I suggest that functions in the new geometry API get equal names as in GeoAPI if they *are* equal in definition and functions that are different (or return a different class, have a different contract etc.) get different names.

I'd definitely be pissed off (sorry) if it would not be possible to write a geometry class implementing both GeoAPI and your new API at the same time because of API method incompatibilities. But I fear this will happen (starting with getCRS() ...).

Obviously GeoAPI has more users that some here believe. ;-)

That's it for now. I how that some of the ideas outlined will prove helpful. Feel free to ask if I was unclear (and split this in separate threads...).
-- 
Matthias Basler
matthiasbasler at earthflight.org