[mapguide-internals] RE: PATCH: Raster stability fixes, ticket#462

Traian Stanev traian.stanev at autodesk.com
Sun Mar 29 16:30:42 EDT 2009


With my changes as described, pooling is disabled for GDAL, like it is for SDF and SHP, using this setting in serverconfig.ini:

DataConnectionPoolExcludedProviders = OSGeo.SDF,OSGeo.SHP,OSGeo.Gdal

So there is no pool size to reach, unless I misunderstand how the pooling works...

I agree that it is important to be stable -- I think my proposed change is as stable as the one you propose, while simplifying the code by removing all the locks and also treating the GDAL provider like a regular provider, and allowing for potentially better performance. 

As you say, my claim of course hinges on GDAL (and the underlying driver) being able to read simultaneously from the same file, using multiple GDALDataSet instances. The GDAL FAQ claims that this scenario should work fine (http://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe) and I have successfully tested it with TIFF files. It is impossible to prove that they all work in all cases -- it's only possible to prove if a driver doesn't work, by providing a counterexample.

Traian


> -----Original Message-----
> From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-
> internals-bounces at lists.osgeo.org] On Behalf Of Haris Kurtagic
> Sent: Sunday, March 29, 2009 2:45 PM
> To: MapGuide Internals Mail List
> Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes,
> ticket#462
> 
> Yes, I am aware of that, I write that in my first email. But, still it
> is semi fix. Depends when you reach pool size.
> 
> GDAL is not thread safe ( when I looked at source code more then year
> ago it was not). According to Frank with 1.6 multithreading is yet to
> be
> confirmed/tested, I hope Frank will comment this.
> I think stable MG is very importnat and I want to make changes which
> are
> proven to be as such .
> 
> But, problem is not only with GDAL provider.
> 
> And eventually even in case as FDO GDAL provider is written now, it
> should work when connections are executed properly in MG.
> 
> Haris
> 
> -----Original Message-----
> From: mapguide-internals-bounces at lists.osgeo.org
> [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Traian
> Stanev
> Sent: Sunday, March 29, 2009 4:54 PM
> To: MapGuide Internals Mail List
> Subject: RE: [mapguide-internals] RE: PATCH: Raster stability
> fixes,ticket#462
> 
> 
> Hi Haris,
> 
> I agree that the connection manager is wrong. However, I used the same
> settings for GDAL as we use for other providers, and which work ok with
> multiple connections. So fixing the connection manager will (probably)
> have no effect on what I tried.
> 
> Why would the provider still be single threaded if I allow multiple
> GDAL
> FDO connections, and remove the global lock from the provider and the
> global raster lock from the MG code? I checked the documentation for
> GDAL and in theory it does allow multiple threads to read from the same
> file at once, so GDAL itself is not single-threaded either.
> 
> It seems to me that fixing what is really a refcounting problem in the
> provider by adding more locks is not the best fix.
> 
> Traian
> 
> ________________________________________
> From: mapguide-internals-bounces at lists.osgeo.org
> [mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Haris
> Kurtagic
> [haris at sl-king.com]
> Sent: Sunday, March 29, 2009 4:54 AM
> To: MapGuide Internals Mail List
> Subject: RE: [mapguide-internals] RE: PATCH:  Raster stability fixes,
> ticket#462
> 
> Hi Traian,
> by just adding ref count to FDO GDA provider you did went into lock,
> with single connection to provider.
> That is where connection manager is wrong, it was constantly waiting
> for
> connection (it is not problem of guard in stylization).
> By enabling connection pooling it is still single thread because
> provider itself is single threaded.
> With only ref count in FDO GDAL , MG will not work properly.
> 
> I think we need to slow down a bit and find a solution for all issues
> we
> have there. It is a connected problem and we need to find solution for
> few related problems.
> 
> - Connection manager not using connection ref count for deciding
> used/unused connections
> - Really allow executing of single threaded providers
> - I have feeling that because some problems previously were tried to
> solve too quickly we ended up with gdal blocked all the way, perhaps
> that is not needed.
> 
> Haris
> 
> -----Original Message-----
> From: mapguide-internals-bounces at lists.osgeo.org
> [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Traian
> Stanev
> Sent: Sunday, March 29, 2009 7:22 AM
> To: MapGuide Internals Mail List
> Cc: FDO Internals Mail List
> Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes,
> ticket#462
> 
> 
> For what it's worth, patching the provider to addref the parent
> connection from the feature reader fixes the problem. This is on top of
> Friday's trunk MapGuide and FDO code, without any other extra fixes. I
> do think it is a bug not to addref the connection from the reader, when
> it in fact depends on it to be alive, in order to get stuff from it at
> a
> later time.
> 
> I am pretty sure the addref fixes the problem, because the test I am
> using is fairly brutal and produces the crash/deadlock in less than 5
> seconds every time I run it with unpatched GDAL provider. With the
> change, I ran it multiple times to completion.
> 
> I am unhappy with the fix though, because it leaves CPU usage by
> mgserver at ~5%, even though I am doing something like 16 simultaneous
> requests... So then I decided to remove the global mutex from inside
> the
> provider -- that didn't make a difference in CPU usage, and the test
> still passed without problems. So that mutex can go. Then I messed with
> the connection pooling settings in serverconfig.ini -- those didn't
> make
> any difference. I tried them all for the GDAL provider and the test
> still passed with 5% CPU usage. I don't think the process is blocked
> for
> I/O because the TIFF I am using is not huge and I also generated an
> overlay pyramid (which I highly recommend to anyone by the way, it
> makes
> a huge performance difference for TIFFs).
> 
> 
> 
> 
> Traian
> 
> 
> > -----Original Message-----
> > From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-
> > internals-bounces at lists.osgeo.org] On Behalf Of Haris Kurtagic
> > Sent: Saturday, March 28, 2009 4:43 AM
> > To: MapGuide Internals Mail List
> > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > ticket #462
> >
> > Regarding MG raster problems ( or any other provider ) the main
> reason
> 
> > is connection manager.
> > If you would set connection pool size 1 , for sdf provider MG will
> not
> 
> > work properly.
> >
> > The patch Jason submitted, is workaround for issue. It will not cause
> > GDAL to work slower, because provider itself is basically
> > single-threaded now.
> > That was done trough work we were doing years or so ago. I was
> looking
> 
> > at provider back then and realized that making gdal calls guarded
> will
> 
> > make it more stable. Frank then submitted patch with guard on every
> > single gdal library call.
> >
> > I will try to explain what is happing now:
> > 1. Map with more than one gdal layer
> > 2. There are two calls in server code execute query and stylize 3.
> One
> 
> > thread will open connection in execute query and went to stylize
> > procedure 4. Another thread will come into execute query and because
> > it could be another layer it will try to open new connection 5.
> > Connection manager will check pool size ( 1 in gdal case ) and
> because
> 
> > provider didn't inc. reference count to connection (not  way to solve,
> 
> > but anyhow provider needs to increment ) it will delete previous
> > connection and create new one 6. Access violation in first thread
> > while stylization because of deleted connection
> >
> >
> > In shorter version things to change:
> > 1. FDO GDAL provider is not increasing reference count on FDO
> > connection when used in other classes like georasterband , which is
> > later used in query (wrong) 2. connection manager is managing
> > connection pool by using fdo connection reference count ( wrong ) 3.
> > it looks like working for other providers, because connection pool
> > size is 200 and other providers are incrementing connection pointer
> in
> 
> > select command for example, but if there would be provider with
> select
> 
> > command with no reference count to connection, it will crash too 4.
> > Server code is making assumption that pool size for single threaded
> is
> 
> > 1 (not correct), we need to make code change to allow for making
> calls
> 
> > to provider single threaded
> >
> > Haris
> >
> > -----Original Message-----
> > From: mapguide-internals-bounces at lists.osgeo.org
> > [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of
> Bruce
> 
> > Dechant
> > Sent: Friday, March 27, 2009 4:55 PM
> > To: MapGuide Internals Mail List
> > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > ticket
> > #462
> >
> > I totally agree - the locking needs to be in the provider.
> >
> > Thanks,
> > Bruce
> >
> >
> > -----Original Message-----
> > From: mapguide-internals-bounces at lists.osgeo.org
> > [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of
> > Traian Stanev
> > Sent: Friday, March 27, 2009 9:40 AM
> > To: MapGuide Internals Mail List
> > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > ticket
> > #462
> >
> >
> > It does look like a sledgehammer approach -- what if some day there
> is
> 
> > a raster provider that works ok and doesn't need the locks? If
> > anything, the lock should be in the FDO provider.
> >
> > Traian
> >
> >
> > -----Original Message-----
> > From: mapguide-internals-bounces at lists.osgeo.org
> > [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Walt
> > Welton-Lair
> > Sent: Friday, March 27, 2009 11:11 AM
> > To: MapGuide Internals Mail List
> > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > ticket
> > #462
> >
> > The memory patch (mapguide_raster_unalloc.5.patch) looks good, and
> > I'll submit that.
> >
> > The stability patch (mapguide_raster_stability.patch) looks
> > reasonable, but I'd like another 1 or 2 people to also look at it.
> > It's significant since it limits MapGuide to processing one raster
> > stylization request at a time, but I guess that's better than having
> a
> 
> > dead lock.  Anyone know where the thread-unsafe code actually is?  Is
> > it in the provider itself?
> > Can the mutex be moved closer to the actual thread-unsafe code?
> >
> > Walt
> >
> > -----Original Message-----
> > From: mapguide-internals-bounces at lists.osgeo.org
> > [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of
> Jason
> 
> > Birch
> > Sent: Friday, March 27, 2009 3:10 AM
> > To: mapguide-internals at lists.osgeo.org
> > Subject: [mapguide-internals] PATCH: Raster stability fixes, ticket
> > #462
> >
> > Hi,
> >
> > Haris has been working on the raster stability problem against
> > MapGuide 2.0.2.  I was having the same problems, and was able to
> > duplicate on 2.1.  I have worked with Haris to create patches against
> > trunk that address a couple of the problems and increase stability
> immensely.
> >
> > Could someone please review the patches attached to:
> >
> > http://trac.osgeo.org/mapguide/ticket/462
> >
> > and commit them to trunk if appropriate?
> >
> > My feeling is that the connection management stuff may need some
> > additional work, since limiting the connection pool to 1 for
> > single-threaded providers doesn't appear to have the desired effect.
> > However, getting both of these patches into 2.1 would at least
> improve
> 
> > stability for Raster users.
> >
> > Jason_______________________________________________
> > mapguide-internals mailing list
> > mapguide-internals at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > _______________________________________________
> > mapguide-internals mailing list
> > mapguide-internals at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > _______________________________________________
> > mapguide-internals mailing list
> > mapguide-internals at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > _______________________________________________
> > mapguide-internals mailing list
> > mapguide-internals at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > _______________________________________________
> > mapguide-internals mailing list
> > mapguide-internals at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> _______________________________________________
> mapguide-internals mailing list
> mapguide-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> _______________________________________________
> mapguide-internals mailing list
> mapguide-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> _______________________________________________
> mapguide-internals mailing list
> mapguide-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapguide-internals


More information about the mapguide-internals mailing list