[fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster
stability fixes, ticket #462
Traian Stanev
traian.stanev at autodesk.com
Sun Mar 29 14:46:15 EDT 2009
Hi Steve,
The problem Haris found does not have much to do with thread safety -- you could write code to produce the problem within a single thread. It has to do with a raster feature reader using a weak reference to the FDO connection when it needs to be a strong one (i.e. it should addref the connection). If you release the connection and then try to use the reader, it will crash.
Traian
> -----Original Message-----
> From: fdo-internals-bounces at lists.osgeo.org [mailto:fdo-internals-
> bounces at lists.osgeo.org] On Behalf Of Steve Dang
> Sent: Sunday, March 29, 2009 2:22 PM
> To: FDO Internals Mail List; MapGuide Internals Mail List
> Subject: RE: [fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster
> stability fixes, ticket #462
>
> I'm not surprised if it is related to some FDO pointer reference
> counting problems (i.e. all FdoIDisposable derived classes are not
> thread safe on reference counting). I have run load tests against the
> Autodesk Raster provider using huge data set before. I will try to load
> test the GDAL provider when I have a chance.
>
> Thanks,
> Steve.
> ________________________________________
> From: fdo-internals-bounces at lists.osgeo.org [fdo-internals-
> bounces at lists.osgeo.org] On Behalf Of Traian Stanev
> Sent: Sunday, March 29, 2009 11:17 AM
> To: MapGuide Internals Mail List
> Cc: FDO Internals Mail List
> Subject: [fdo-internals] RE: [mapguide-internals] RE: PATCH: Raster
> stability fixes, ticket #462
>
> I attached a new patch for the provider to the ticket 462, which speeds
> up GDAL FDO connection opening by 95%, in addition to the refcounting
> fix, and the global mutex removal. The global mutex is still needed in
> two places where I did not disable it.
>
> This change allows you to completely disable MG connection pooling for
> the GDAL provider (like for SHP and SDF) and still have very good
> raster speed, since opening a connection is fast now. Also, the global
> raster mutex can be removed from the MG code.
>
> I will continue testing with this configuration, but so far it looks
> stable and fast. Obviously it is impossible to prove that a problem no
> longer exists, when threads are involved, but I will throw some more
> difficult tests at it -- so far I have only tested with a TIFF-based
> layer only. The TIFF is 450 megs or so.
>
> Traian
>
>
> > -----Original Message-----
> > From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-
> > internals-bounces at lists.osgeo.org] On Behalf Of Traian Stanev
> > Sent: Sunday, March 29, 2009 1:29 AM
> > To: MapGuide Internals Mail List
> > Cc: FDO Internals Mail List
> > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > ticket #462
> >
> > I attached the patch for the GDAL provider to ticket 462.
> >
> > Traian
> >
> > > -----Original Message-----
> > > From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-
> > > internals-bounces at lists.osgeo.org] On Behalf Of Traian Stanev
> > > Sent: Sunday, March 29, 2009 1:22 AM
> > > To: MapGuide Internals Mail List
> > > Cc: FDO Internals Mail List
> > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > > ticket #462
> > >
> > >
> > > For what it's worth, patching the provider to addref the parent
> > > connection from the feature reader fixes the problem. This is on
> top
> > of
> > > Friday's trunk MapGuide and FDO code, without any other extra fixes.
> > I
> > > do think it is a bug not to addref the connection from the reader,
> > when
> > > it in fact depends on it to be alive, in order to get stuff from it
> > at
> > > a later time.
> > >
> > > I am pretty sure the addref fixes the problem, because the test I
> am
> > > using is fairly brutal and produces the crash/deadlock in less than
> 5
> > > seconds every time I run it with unpatched GDAL provider. With the
> > > change, I ran it multiple times to completion.
> > >
> > > I am unhappy with the fix though, because it leaves CPU usage by
> > > mgserver at ~5%, even though I am doing something like 16
> > simultaneous
> > > requests... So then I decided to remove the global mutex from
> inside
> > > the provider -- that didn't make a difference in CPU usage, and the
> > > test still passed without problems. So that mutex can go. Then I
> > messed
> > > with the connection pooling settings in serverconfig.ini -- those
> > > didn't make any difference. I tried them all for the GDAL provider
> > and
> > > the test still passed with 5% CPU usage. I don't think the process
> is
> > > blocked for I/O because the TIFF I am using is not huge and I also
> > > generated an overlay pyramid (which I highly recommend to anyone by
> > the
> > > way, it makes a huge performance difference for TIFFs).
> > >
> > >
> > >
> > >
> > > Traian
> > >
> > >
> > > > -----Original Message-----
> > > > From: mapguide-internals-bounces at lists.osgeo.org
> [mailto:mapguide-
> > > > internals-bounces at lists.osgeo.org] On Behalf Of Haris Kurtagic
> > > > Sent: Saturday, March 28, 2009 4:43 AM
> > > > To: MapGuide Internals Mail List
> > > > Subject: RE: [mapguide-internals] RE: PATCH: Raster stability
> fixes,
> > > > ticket #462
> > > >
> > > > Regarding MG raster problems ( or any other provider ) the main
> > > reason
> > > > is connection manager.
> > > > If you would set connection pool size 1 , for sdf provider MG
> will
> > > not
> > > > work properly.
> > > >
> > > > The patch Jason submitted, is workaround for issue. It will not
> > cause
> > > > GDAL to work slower, because provider itself is basically
> > > > single-threaded now.
> > > > That was done trough work we were doing years or so ago. I was
> > > looking
> > > > at provider back then and realized that making gdal calls guarded
> > > will
> > > > make it more stable. Frank then submitted patch with guard on
> every
> > > > single gdal library call.
> > > >
> > > > I will try to explain what is happing now:
> > > > 1. Map with more than one gdal layer
> > > > 2. There are two calls in server code execute query and stylize 3.
> > > One
> > > > thread will open connection in execute query and went to stylize
> > > > procedure 4. Another thread will come into execute query and
> > because
> > > > it could be another layer it will try to open new connection 5.
> > > > Connection manager will check pool size ( 1 in gdal case ) and
> > > because
> > > > provider didn't inc. reference count to connection (not way to
> > solve,
> > > > but anyhow provider needs to increment ) it will delete previous
> > > > connection and create new one 6. Access violation in first thread
> > > > while stylization because of deleted connection
> > > >
> > > >
> > > > In shorter version things to change:
> > > > 1. FDO GDAL provider is not increasing reference count on FDO
> > > > connection when used in other classes like georasterband , which
> is
> > > > later used in query (wrong) 2. connection manager is managing
> > > > connection pool by using fdo connection reference count ( wrong )
> 3.
> > > > it looks like working for other providers, because connection
> pool
> > > > size is 200 and other providers are incrementing connection
> pointer
> > > in
> > > > select command for example, but if there would be provider with
> > > select
> > > > command with no reference count to connection, it will crash too
> 4.
> > > > Server code is making assumption that pool size for single
> threaded
> > > is
> > > > 1 (not correct), we need to make code change to allow for making
> > > calls
> > > > to provider single threaded
> > > >
> > > > Haris
> > > >
> > > > -----Original Message-----
> > > > From: mapguide-internals-bounces at lists.osgeo.org
> > > > [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of
> > > Bruce
> > > > Dechant
> > > > Sent: Friday, March 27, 2009 4:55 PM
> > > > To: MapGuide Internals Mail List
> > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > > > ticket
> > > > #462
> > > >
> > > > I totally agree - the locking needs to be in the provider.
> > > >
> > > > Thanks,
> > > > Bruce
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: mapguide-internals-bounces at lists.osgeo.org
> > > > [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of
> > > > Traian Stanev
> > > > Sent: Friday, March 27, 2009 9:40 AM
> > > > To: MapGuide Internals Mail List
> > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > > > ticket
> > > > #462
> > > >
> > > >
> > > > It does look like a sledgehammer approach -- what if some day
> there
> > > is
> > > > a raster provider that works ok and doesn't need the locks? If
> > > > anything, the lock should be in the FDO provider.
> > > >
> > > > Traian
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: mapguide-internals-bounces at lists.osgeo.org
> > > > [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of
> > Walt
> > > > Welton-Lair
> > > > Sent: Friday, March 27, 2009 11:11 AM
> > > > To: MapGuide Internals Mail List
> > > > Subject: [mapguide-internals] RE: PATCH: Raster stability fixes,
> > > > ticket
> > > > #462
> > > >
> > > > The memory patch (mapguide_raster_unalloc.5.patch) looks good,
> and
> > > > I'll submit that.
> > > >
> > > > The stability patch (mapguide_raster_stability.patch) looks
> > > > reasonable, but I'd like another 1 or 2 people to also look at it.
> > > > It's significant since it limits MapGuide to processing one
> raster
> > > > stylization request at a time, but I guess that's better than
> > having
> > > a
> > > > dead lock. Anyone know where the thread-unsafe code actually is?
> > Is
> > > > it in the provider itself?
> > > > Can the mutex be moved closer to the actual thread-unsafe code?
> > > >
> > > > Walt
> > > >
> > > > -----Original Message-----
> > > > From: mapguide-internals-bounces at lists.osgeo.org
> > > > [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of
> > > Jason
> > > > Birch
> > > > Sent: Friday, March 27, 2009 3:10 AM
> > > > To: mapguide-internals at lists.osgeo.org
> > > > Subject: [mapguide-internals] PATCH: Raster stability fixes,
> ticket
> > > > #462
> > > >
> > > > Hi,
> > > >
> > > > Haris has been working on the raster stability problem against
> > > > MapGuide 2.0.2. I was having the same problems, and was able to
> > > > duplicate on 2.1. I have worked with Haris to create patches
> > against
> > > > trunk that address a couple of the problems and increase
> stability
> > > immensely.
> > > >
> > > > Could someone please review the patches attached to:
> > > >
> > > > http://trac.osgeo.org/mapguide/ticket/462
> > > >
> > > > and commit them to trunk if appropriate?
> > > >
> > > > My feeling is that the connection management stuff may need some
> > > > additional work, since limiting the connection pool to 1 for
> > > > single-threaded providers doesn't appear to have the desired
> effect.
> > > > However, getting both of these patches into 2.1 would at least
> > > improve
> > > > stability for Raster users.
> > > >
> > > > Jason_______________________________________________
> > > > mapguide-internals mailing list
> > > > mapguide-internals at lists.osgeo.org
> > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > > > _______________________________________________
> > > > mapguide-internals mailing list
> > > > mapguide-internals at lists.osgeo.org
> > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > > > _______________________________________________
> > > > mapguide-internals mailing list
> > > > mapguide-internals at lists.osgeo.org
> > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > > > _______________________________________________
> > > > mapguide-internals mailing list
> > > > mapguide-internals at lists.osgeo.org
> > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > > > _______________________________________________
> > > > mapguide-internals mailing list
> > > > mapguide-internals at lists.osgeo.org
> > > > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> > _______________________________________________
> > mapguide-internals mailing list
> > mapguide-internals at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapguide-internals
> _______________________________________________
> fdo-internals mailing list
> fdo-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/fdo-
> internals_______________________________________________
> fdo-internals mailing list
> fdo-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/fdo-internals
More information about the fdo-internals
mailing list