[fdo-internals] FDO PostGIS provider developments

Thu Dec 10 14:48:00 EST 2009

Hi Brent,

First of all I would like to point out as I already wrote  in my first
emails, priority is to have good provider and it is up to those who will do
it to pick their choice.

I am not sure if I understood all points in your email but I am certain
that specifically written provider can always be better then the one using
common generic base. Also I would always encourage providers which will
explore specific RDBMS ways of doing things.

And regarding schema retrieval improvements, the biggest problem was in
current FDO clients not in providers. They were wrongly
doing describe schema over and over again. But it is not so important for
this topic, just wanted to share my opinion.

Haris

On Thu, Dec 10, 2009 at 2:16 PM, Brent Robinson <brent.robinson at autodesk.com
> wrote:

>  Hi Jason,
>
>
>
> A few thoughts on the current discussion.
>
>
>
> I don't think that there's any argument that the current provider is
> deficient, but at the same time I don't think performance comparisons are
> all that useful.  The current version is incredibly slow compared to other
> implementations of PostGIS connectivity (for instance, in uDig, QGIS,
> MapServer, etc), so six times faster than incredibly slow may not be equal
> to good.  To use performance as an argument for switching implementations,
> I'd want to see comparisons against the same data for King.Oracle, SDF, etc.
>
>
>
> I’d agree that the performance comparisons between the current and
> GenericRdbms based provider are not useful in of themselves. The more
> important question is the level of effort to tune each provider. As part of
> our recent investigations, we dusted off the generic provider (which hadn’t
> be touched since 2007) and upgraded it to FDO 3.5.  In 2008, some FDO
> enhancements were made to significantly improve MapGuide performance, when
> drawing from feature sources with large schemas. These included partial
> DescribeSchema and new GetSchemaNames and GetClassNames commands, which were
> implemented at the GenericRdbms level. In order to take advantage of the
> improvements provided by these enhancements, the generic provider required a
> 2 line code change to expose the new commands in the capabilities.
> Supporting these in the current provider would require a complete
> implementation, which would take more effort.
>
>
>
> Performance comparisons against other PostGIS connectivity implementations
> would be very useful and would give us a good indication of how well tuned
> the provider currently is. Another thing we’ve done with other RDBMS
> providers is write small applications that go directly against the RDBMS and
> then compare timings with the provider. This tells us the amount of overhead
> that the provider introduces.
>
>
>
> Comparisons with King.Oracle and SDF would be interesting but these would
> be apples/oranges since the underlying RDBMS’s are different.
>
>
>
> So, you're balancing this argument against immediate cost and long term
> maintenance for the ADSK developers, which makes a lot of sense from your
> perspective and is understandable.
>
>
>
> How this balances out would be the crucial question. The two level (generic
> and specific) approach has some drawbacks:
>
>
>
> ·         The generic framework is complex and presents developers with a
> learning curve
>
> ·         If each level is maintained by a different development team,
> extra coordination and communication effort is required, especially when the
> specific level team requires enhancements to the generic level.
>
>
>
> but so does  copy/paste (separate code bases for each provider) approach:
>
>
>
> ·         Improvements made for one provider can’t be picked up for free
> by the other providers. Code porting effort is required for the other
> providers to realize these improvements.
>
> ·         Since the generic and specific parts would be intermingled in
> the provider code, each provider’s source code would be expected to diverge
> over time, making these code ports progressively more expensive to do.
>
>
>
> When we estimated the remaining work for the two providers, we found the
> amount of work to be significantly less for the generic provider. There were
> outstanding items for the current provider, which were already working in
> the generic provider, due to the functionality it was picking up from the
> generic level.
>
>
>
> This doesn't really bear on this argument but I have to say that my
> impression has been that there has been more ADSK support for desktop
> features / enhancements than for the sheer performance required for scalable
> web mapping.
>
>
>
> This may be true but there still has been considerable work done to tune
> the providers written by ADSK. The original versions of GenericRdbms
> providers, such as MySQL, had some serious performance problems, especially
> in the retrieval of large schemas. However, these providers are much faster
> today. The priority for tuning these providers has been around feature
> select and schema retrieval. Improvements in these areas have helped both
> desktop and web mapping applications.
>
>
>
> Anyway.... The FDO (and MapGuide) development communities are still very
> much in the fledgling state of development and, in my personal opinion,
> decisions like this one will have a strong influence on whether we have the
> potential to ever move beyond this stage.
>
>
>
> This decision strongly influences the PostGIS provider but I’m not sure it
> really goes beyond that. I doubt if it would influence the writers of future
> RDBMS providers very much; they will go with whatever approach best suits
> their circumstances.  Of our current providers, some are based on the
> generic framework, and some are not. Regardless of the approach taken,
> provider developers will have examples to start from.
>
>
>
> The attachment lists some other potential problems with the two level
> approach but I don’t think they will be major issues for the GenericRdbms
> PostGIS provider going forward. Although I’m wandering off the main topic a
> bit, I’d like to go through some of these points:
>
>
>
> Our experience in PostGIS FDO has been that of either (a) having to
> bring in specializations due to slightly different implementations than
> the Generic writer expected
>
>
>
> There was an issue, where one of the FDO metaschema columns had a PostGres
> reserved name, and a specialization was needed to resolve it.  When one such
> case is hit, it would raise the concern that there may be other cases. Once
>  there are too many of these specializations, we lose the advantages of the
> two level approach and are only left with the added complexity. However,
> this turned out to be the only case where a significant specialization was
> needed.  On second look, we were also able to do some minor fixes to the
> generic level to eliminate the need for this particular specialization.
>
>
>
>  (b) inheriting some
> lowest-common-denominator assumptions that we did not really want,
> brought into Generic because of which specific databases got implemented
> first (MySQL, ODBC).
>
>
>
>
>
> If I remember right, there was an issue with handling autoincremented
> properties, via the generic level functions that support the
> autoincrementing style used by MySQL and SQL Server. However, the generic
> level also supports sequence style autoincrementing, which fits better with
> PostGres. We were able to add autoincremented property support to the
> generic PostGIS provider, without much effort, by using the sequence style
> functions.
>
>
>
> Looking through the code, I couldn’t see  any other MySQL biases in the
> generic level that get in the way of the PostGIS provider. Also, since 2007,
> the SQLServerSpatial provider has been developed, proving that the generic
> level is flexible enough to adapt to other RDBMS’s such as SQL Server 2008.
>
>
>
> One of the interesting things about PostGres is that it is an object
> relational DBMS. For example, a table can be created by sub-classing it from
> another table. As an experiment, I tried adding table inheritance support to
> the generic provider and it didn’t take much effort, so a generic-based
> provider can accommodate database-specific features.
>
>
>
> The experience in Geotools has been, instructively, quite similar.  An
> examination of the actual PostGIS Geotools implementation at this point
> will find a good deal of sub-classing and re-implementation of
> supposedly generic things back down inside the PostGIS datastore.
>
>
>
> As mentioned above, the  re-implementing of generic things in the specific
> level is not that pervasive; there is still a lot that happens at the
> generic level. The schema retrieval performance enhancements, mentioned
> earlier on, were picked up almost for free by the generic provider. These
> types of generic level improvements are getting automatically propogated to
> the providers, without being blocked by excessive specific-level
> implementations.
>
>
>
> But the long term effect is to make the entire structure
> brittle... what do you do when you find a bug in the abstract database
> level? Fix it, and you could break workarounds throughout the
> implementations.  Leave it, and...
>
>
>
> The problem, with abstract level changes breaking the implementations, can
> be mitigated by unit tests. If the tests for all generic-based providers are
> run regularly then we’ll catch these regressions fairly quickly. Ideally,
> there shouldn’t be workarounds at the specific levels;  problems in the
> abstract level should be fixed as they are encountered rather than worked
> around. However, I realize this can be very difficult to do when two
> different teams handle the different levels; in which case the copy/paste
> option might be advantageous.
>
>
>
> It seems like a far more efficient system would simply have one "well
> structured, high quality" example on a "relatively standard" database,
> and let new implementors do code re-use through simple copy-and-paste.
> Then you could be assured that the implementations will converge on
> quality over time, and that people mucking about in the superclass layer
> cannot accidentally break implementations.
>
>
>
> It is certainly possible to break implementations with generic level
> changes but conversely, the specific implementations pick up generic level
> improvements for free. The net effect over time would be positive since
> improvements should outweigh regressions.
>
>
>
> From my own experience with copy and paste implementations, I’ve seen the
> opposite effect; the implementations tend to diverge over time making it
> progressively more difficult to propagate improvements from one
> implementation to the others.
>
>
>
> Brent.
>
>
>
>
>
> *From:* fdo-internals-bounces at lists.osgeo.org [mailto:
> fdo-internals-bounces at lists.osgeo.org] *On Behalf Of *Jason Birch
> *Sent:* Monday, November 30, 2009 12:18 PM
>
> *To:* FDO Internals Mail List
> *Subject:* Re: [fdo-internals] FDO PostGIS provider developments
>
>
>
> Hi Orest,
>
>
>
> I don't think that there's any argument that the current provider is
> deficient, but at the same time I don't think performance comparisons are
> all that useful.  The current version is incredibly slow compared to other
> implementations of PostGIS connectivity (for instance, in uDig, QGIS,
> MapServer, etc), so six times faster than incredibly slow may not be equal
> to good.  To use performance as an argument for switching implementations,
> I'd want to see comparisons against the same data for King.Oracle, SDF, etc.
>
>
>
> I think this boils down to a single argument. The way I see it, moving this
> provider into the Generic RDBMS framework precludes the possibility of
> future non-ADSK involvement in the development and maintenance of the
> provider.  I base this on the level of frustration that Mateusz had coming
> up to speed on the framework initially, and the number of special cases that
> had to be implemented which culminated in him feeling that in the long run
> it was better for the community to re-implement from scratch than to
> continue working within the framework.  Paul's summary of this decision to
> the list, after several months of painful work (which generated the code
> you're planning to take over) highlights these problems:
>
>
>
> http://n2.nabble.com/fdopostgis-td2050070.html#a2050070
>
>
>
> So, you're balancing this argument against immediate cost and long term
> maintenance for the ADSK developers, which makes a lot of sense from your
> perspective and is understandable.  This does mean, however, that there is
> almost no potential for non-ADSK involvement in future development and
> enhancements to the provider.  By doing this, you are essentially deciding
> to take development of this provider entirely inhouse, and committing to its
> future support and enhancement.  This doesn't really bear on this argument
> but I have to say that my impression has been that there has been more ADSK
> support for desktop features / enhancements than for the sheer performance
> required for scalable web mapping.
>
>
>
> Anyway.... The FDO (and MapGuide) development communities are still very
> much in the fledgling state of development and, in my personal opinion,
> decisions like this one will have a strong influence on whether we have the
> potential to ever move beyond this stage.
>
>
>
> Jason
>
>
>
> 2009/11/29 Orest Halustchak
>
> In the end, we determined that taking the earlier code base, adding support
> for the recent fdo interface changes, and completing other parts that
> weren’t finished would take much less time. Also, based on performance
> comparisons, we would get something that was much faster on inserts and
> selects, e.g. the select performance is about six times faster and schema
> describe is about three times faster. We couldn’t compare insert times very
> well because the current provider kept crashing after a certain point and we
> couldn’t insert a large number of features.
>
> At the same time, we are planning to change the connection parameters to
> separate out the database name from the service name. This will make it
> easier for users. They can identify the service (e.g. localhost:5432), and
> then see the available datastores from which they can choose in a UI. Then,
> PostGIS schema simply will map to FDO schema. The main drawback to this is
> that any users with existing MapGuide feature sources and layer definitions
> will have to update them.
>
> _______________________________________________
> fdo-internals mailing list
> fdo-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/fdo-internals
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/fdo-internals/attachments/20091210/cf307c2c/attachment-0001.html