[fdo-internals] FDO PostGIS provider developments

Thu Dec 10 08:16:06 EST 2009

Hi Jason,

A few thoughts on the current discussion.

I don't think that there's any argument that the current provider is deficient, but at the same time I don't think performance comparisons are all that useful.  The current version is incredibly slow compared to other implementations of PostGIS connectivity (for instance, in uDig, QGIS, MapServer, etc), so six times faster than incredibly slow may not be equal to good.  To use performance as an argument for switching implementations, I'd want to see comparisons against the same data for King.Oracle, SDF, etc.

I'd agree that the performance comparisons between the current and GenericRdbms based provider are not useful in of themselves. The more important question is the level of effort to tune each provider. As part of our recent investigations, we dusted off the generic provider (which hadn't be touched since 2007) and upgraded it to FDO 3.5.  In 2008, some FDO enhancements were made to significantly improve MapGuide performance, when drawing from feature sources with large schemas. These included partial DescribeSchema and new GetSchemaNames and GetClassNames commands, which were implemented at the GenericRdbms level. In order to take advantage of the improvements provided by these enhancements, the generic provider required a 2 line code change to expose the new commands in the capabilities. Supporting these in the current provider would require a complete implementation, which would take more effort.

Performance comparisons against other PostGIS connectivity implementations would be very useful and would give us a good indication of how well tuned the provider currently is. Another thing we've done with other RDBMS providers is write small applications that go directly against the RDBMS and then compare timings with the provider. This tells us the amount of overhead that the provider introduces.

Comparisons with King.Oracle and SDF would be interesting but these would be apples/oranges since the underlying RDBMS's are different.

So, you're balancing this argument against immediate cost and long term maintenance for the ADSK developers, which makes a lot of sense from your perspective and is understandable.

How this balances out would be the crucial question. The two level (generic and specific) approach has some drawbacks:

*         The generic framework is complex and presents developers with a learning curve

*         If each level is maintained by a different development team, extra coordination and communication effort is required, especially when the specific level team requires enhancements to the generic level.

but so does  copy/paste (separate code bases for each provider) approach:

*         Improvements made for one provider can't be picked up for free by the other providers. Code porting effort is required for the other providers to realize these improvements.

*         Since the generic and specific parts would be intermingled in the provider code, each provider's source code would be expected to diverge over time, making these code ports progressively more expensive to do.

When we estimated the remaining work for the two providers, we found the amount of work to be significantly less for the generic provider. There were outstanding items for the current provider, which were already working in the generic provider, due to the functionality it was picking up from the generic level.

This doesn't really bear on this argument but I have to say that my impression has been that there has been more ADSK support for desktop features / enhancements than for the sheer performance required for scalable web mapping.

This may be true but there still has been considerable work done to tune the providers written by ADSK. The original versions of GenericRdbms providers, such as MySQL, had some serious performance problems, especially in the retrieval of large schemas. However, these providers are much faster today. The priority for tuning these providers has been around feature select and schema retrieval. Improvements in these areas have helped both desktop and web mapping applications.

Anyway.... The FDO (and MapGuide) development communities are still very much in the fledgling state of development and, in my personal opinion, decisions like this one will have a strong influence on whether we have the potential to ever move beyond this stage.

This decision strongly influences the PostGIS provider but I'm not sure it really goes beyond that. I doubt if it would influence the writers of future RDBMS providers very much; they will go with whatever approach best suits their circumstances.  Of our current providers, some are based on the generic framework, and some are not. Regardless of the approach taken, provider developers will have examples to start from.

The attachment lists some other potential problems with the two level approach but I don't think they will be major issues for the GenericRdbms PostGIS provider going forward. Although I'm wandering off the main topic a bit, I'd like to go through some of these points:

Our experience in PostGIS FDO has been that of either (a) having to
bring in specializations due to slightly different implementations than
the Generic writer expected

There was an issue, where one of the FDO metaschema columns had a PostGres reserved name, and a specialization was needed to resolve it.  When one such case is hit, it would raise the concern that there may be other cases. Once  there are too many of these specializations, we lose the advantages of the two level approach and are only left with the added complexity. However, this turned out to be the only case where a significant specialization was needed.  On second look, we were also able to do some minor fixes to the generic level to eliminate the need for this particular specialization.

 (b) inheriting some
lowest-common-denominator assumptions that we did not really want,
brought into Generic because of which specific databases got implemented
first (MySQL, ODBC).

If I remember right, there was an issue with handling autoincremented properties, via the generic level functions that support the autoincrementing style used by MySQL and SQL Server. However, the generic level also supports sequence style autoincrementing, which fits better with PostGres. We were able to add autoincremented property support to the generic PostGIS provider, without much effort, by using the sequence style functions.

Looking through the code, I couldn't see  any other MySQL biases in the generic level that get in the way of the PostGIS provider. Also, since 2007, the SQLServerSpatial provider has been developed, proving that the generic level is flexible enough to adapt to other RDBMS's such as SQL Server 2008.

One of the interesting things about PostGres is that it is an object relational DBMS. For example, a table can be created by sub-classing it from another table. As an experiment, I tried adding table inheritance support to the generic provider and it didn't take much effort, so a generic-based provider can accommodate database-specific features.

The experience in Geotools has been, instructively, quite similar.  An
examination of the actual PostGIS Geotools implementation at this point
will find a good deal of sub-classing and re-implementation of
supposedly generic things back down inside the PostGIS datastore.

As mentioned above, the  re-implementing of generic things in the specific level is not that pervasive; there is still a lot that happens at the generic level. The schema retrieval performance enhancements, mentioned earlier on, were picked up almost for free by the generic provider. These types of generic level improvements are getting automatically propogated to the providers, without being blocked by excessive specific-level implementations.

But the long term effect is to make the entire structure
brittle... what do you do when you find a bug in the abstract database
level? Fix it, and you could break workarounds throughout the
implementations.  Leave it, and...

The problem, with abstract level changes breaking the implementations, can be mitigated by unit tests. If the tests for all generic-based providers are run regularly then we'll catch these regressions fairly quickly. Ideally, there shouldn't be workarounds at the specific levels;  problems in the abstract level should be fixed as they are encountered rather than worked around. However, I realize this can be very difficult to do when two different teams handle the different levels; in which case the copy/paste option might be advantageous.

It seems like a far more efficient system would simply have one "well
structured, high quality" example on a "relatively standard" database,
and let new implementors do code re-use through simple copy-and-paste.
Then you could be assured that the implementations will converge on
quality over time, and that people mucking about in the superclass layer
cannot accidentally break implementations.

It is certainly possible to break implementations with generic level changes but conversely, the specific implementations pick up generic level improvements for free. The net effect over time would be positive since improvements should outweigh regressions.

>From my own experience with copy and paste implementations, I've seen the opposite effect; the implementations tend to diverge over time making it progressively more difficult to propagate improvements from one implementation to the others.

Brent.

From: fdo-internals-bounces at lists.osgeo.org [mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Jason Birch
Sent: Monday, November 30, 2009 12:18 PM
To: FDO Internals Mail List
Subject: Re: [fdo-internals] FDO PostGIS provider developments

Hi Orest,

I don't think that there's any argument that the current provider is deficient, but at the same time I don't think performance comparisons are all that useful.  The current version is incredibly slow compared to other implementations of PostGIS connectivity (for instance, in uDig, QGIS, MapServer, etc), so six times faster than incredibly slow may not be equal to good.  To use performance as an argument for switching implementations, I'd want to see comparisons against the same data for King.Oracle, SDF, etc.

I think this boils down to a single argument. The way I see it, moving this provider into the Generic RDBMS framework precludes the possibility of future non-ADSK involvement in the development and maintenance of the provider.  I base this on the level of frustration that Mateusz had coming up to speed on the framework initially, and the number of special cases that had to be implemented which culminated in him feeling that in the long run it was better for the community to re-implement from scratch than to continue working within the framework.  Paul's summary of this decision to the list, after several months of painful work (which generated the code you're planning to take over) highlights these problems:

http://n2.nabble.com/fdopostgis-td2050070.html#a2050070

So, you're balancing this argument against immediate cost and long term maintenance for the ADSK developers, which makes a lot of sense from your perspective and is understandable.  This does mean, however, that there is almost no potential for non-ADSK involvement in future development and enhancements to the provider.  By doing this, you are essentially deciding to take development of this provider entirely inhouse, and committing to its future support and enhancement.  This doesn't really bear on this argument but I have to say that my impression has been that there has been more ADSK support for desktop features / enhancements than for the sheer performance required for scalable web mapping.

Anyway.... The FDO (and MapGuide) development communities are still very much in the fledgling state of development and, in my personal opinion, decisions like this one will have a strong influence on whether we have the potential to ever move beyond this stage.

Jason

2009/11/29 Orest Halustchak
In the end, we determined that taking the earlier code base, adding support for the recent fdo interface changes, and completing other parts that weren't finished would take much less time. Also, based on performance comparisons, we would get something that was much faster on inserts and selects, e.g. the select performance is about six times faster and schema describe is about three times faster. We couldn't compare insert times very well because the current provider kept crashing after a certain point and we couldn't insert a large number of features.
At the same time, we are planning to change the connection parameters to separate out the database name from the service name. This will make it easier for users. They can identify the service (e.g. localhost:5432), and then see the available datastores from which they can choose in a UI. Then, PostGIS schema simply will map to FDO schema. The main drawback to this is that any users with existing MapGuide feature sources and layer definitions will have to update them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/fdo-internals/attachments/20091210/969870e5/attachment-0001.html