[fdo-internals] RE: SDF 3.0 Memory Leak

Fri Jul 18 11:20:05 EDT 2008

Hi Greg,

In my experience even touching the reader with Dim a As String =
reader.GetString("SomeField") will cause memory to start building up.
There's no reference to underlying unmanaged object to release during reader
traversing. Since I'm dealing with cca. million of records there's no way to
stop trashing the thread. The memory just remains reserved even after reader
gets Closed, Disposed, nulled, GC.Collect()-ed, connection closed/disposed,
etc. Maybe there's no help to it, maybe it's simply up to the managed
wrapper, but I'm curious if there's any workaround to this.

Regards,
Maksim Sestic

  _____  

From: fdo-internals-bounces at lists.osgeo.org
[mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Greg Boone
Sent: Friday, July 18, 2008 17:00
To: Carl Jokl
Cc: FDO Internals Mail List
Subject: [fdo-internals] RE: SDF 3.0 Memory Leak

Why don't you try modifying the benchmark to stop storing all the attribute
values in a single dictionary and see what the side effect is? If the memory
usage drops to an acceptable level then the issue is most likely the
benchmark implementation, not the SDF Provider.

Greg

From: Carl Jokl [mailto:carl.jokl at keynetix.com] 
Sent: Friday, July 18, 2008 10:07 AM
To: Greg Boone
Subject: RE: SDF 3.0 Memory Leak

The test for memory was not the intention of the benchmark but just a side
effect. It was hardly the most professional approach but to observe the
memory usage I opened windows task manager and watched the memory usage of
the FDOBenchmark process.

The benchmark reads entries up to an arbitary batch size. This is part of
the FDOBatch class. Once the batch size is reached the time to load that
single batch is saved and all the data loaded in that batch is discarded.
The reason being that if we were to test with a really large file of the
order of gigabites then the entries could not all  be loaded into memory.
Using batches was supposed to make the test scalable up to larger sizes due
to the assumption that discarding the data in a batch would free up the
memory it has occupied ready to load another batch. Bear in mind that the
benchmark was just about timing how long data took to load and not (at least
in this version) doing anything with the data. Yesterday evening while doing
some other mapguide work which had me looking into the C++ sourcecode I
noted that the Dispose() function would call the C++ delete on the native
object. This gave me the idea of altering the benchmark code so that both
the FDO batch and FDO Entry classes would have a dispose method which would
explicitly go through every map guide value object and call the dispose
method. My thinking was that perhaps the FDO wrappers to these data objects
were being garbage collected but the native components which they wrapped
were remaining in memory and could then explain the memory leak. I put this
to the test but having put the code in place to dispose of all the value
objects the memory usage of the application during and after the benchmark
were still the same. I also endeavoured to explicitly dispose of any FDO
classes as soon as they were no longer needed be it connections, feature
readers, schemas, class definitions. Disposing of them explicitly in this
way still did not seem to impact on the memory used when observing the
footprint though task manager.

I am informed by a colleague that AutoDesk are in position of some profiling
tools which could more effectively diagnose what is going on with the memory
usage than I am able to.

Regards

Carl

From: Greg Boone [mailto:greg.boone at autodesk.com] 
Sent: 18 July 2008 14:50
To: Carl Jokl
Subject: RE: SDF 3.0 Memory Leak

Excuse my insistence here, but I need to understand the memory read dynamics
of the benchmark. 

So, the benchmark opens a 344MB SDF file. It then reads all the features and
stores a copy of all the Attribute and Geometry values for each feature in
memory in a dictionary. Is this the case?

What is your expected result in this situation. As I read it, memory usage
will increase as each feature is read and stored in the dictionary, up until
memory usage reaches ~340 MB. 

Greg

From: Carl Jokl [mailto:carl.jokl at keynetix.com] 
Sent: Friday, July 18, 2008 9:43 AM
To: Greg Boone
Subject: RE: SDF 3.0 Memory Leak

This was the intent but on it's own doing this isn't of great value. The
idea is to time how long loading takes so it performs a loading operation
from and SDF file times how long it takes to load the contents of the file
while discarding the data once it has loaded. The time here on it's own
isn't very useful but for the benchmark and identical set of data was stored
in a PostGIS database and the time taken to load the contents of the SDF
file through the SDF FDO provider were compared with the time taken to load
identical data from the PostGIS database through the PostGIS provider.

The times were then compared to see how the retrieval times compared. I was
expecting PostGIS to be a bit slower because the SDF just had to read in
from a Flat file and this would likely be happening in the same process as
the calling benchmark. By comparison the PostGIS data is in a PostGIS
database running in it's own process and piping data out over TCP/IP. That
was bound to be extra overhead vs the SDF provider. 

The point of the benchmark was that at Keynetix we are migrating a legacy
MapGuide 6 application to MapGuide enterprise. As part of this migration the
question of the best way to store the migrated data came up. The options are
to continue using SDF flat files, use PostGIS, use SQLServer or if we really
had lots of money to use Oracle. To help make that  kind of decision I was
asked to try and compare the speed of data manipulation from the various
data sources. If I had time I would have done reading, writing and querying
etc as was my original intention. As I went through development however I
was put under increasing pressure just to get some figures back and so I
just implemented reading in this version.

The benchmark showed that the PostGIS FDO provider was much much slower than
SDF, far more so than I expected. I was a bit suspicious of this and so
wrote a test Java program to load the same data directly via JDBC as well as
using PostGIS with PGAdmin and running queries  and that was much much
faster (the Java program was able to load the data faster than the SDF
provider).

This as well as some log analysis on PostGIS pointed the speed problems to
most likely be caused by the quality of the PostGIS fdo provider
implementation.

By this point the benchmarks had served their purpose for the most part. It
was dug out again when my colleague was writing a migration application to
migrate our legacy SDF 2.0 data to SDF 3.0. This also used FDO in parts but
was plagued by the Memory out of error problem. When that was discovered I
dug out this benchmark application to see if it too had problems with
escalating requirements for memory. I found that it did albeit not to the
point of having a memory out of error exception. 

I hope that explains what the code was for. I commented out the PostGIS
benchmark as it will not be of much use to you unless you have a PostGIS
data source to test with but that provider did not appear to have a memory
leak problem that I could tell. 

Regards

Carl  

From: Greg Boone [mailto:greg.boone at autodesk.com] 
Sent: 18 July 2008 14:22
To: Carl Jokl
Subject: RE: SDF 3.0 Memory Leak

Can I ask what is the ultimate purpose of the Benchmark code? After looking
at the source code, it seems that the benchmark opens the SDF file, reads
all the features, both attributes and geometry values, and stores them in
memory. Is this the intent?

Greg

From: Carl Jokl [mailto:carl.jokl at keynetix.com] 
Sent: Friday, July 18, 2008 9:04 AM
To: Greg Boone
Subject: SDF 3.0 Memory Leak

Greg

I attach a copy of the benchmark test program. I apologise that it is not as
well structured as I would like as I was under pressure to get some results
back quickly and originally the benchmarks were going to cover reading
writing querying and removing etc. In the end only reading was implemented.
The idea was that there is an FDOBenchmark class. Each specific FDO provider
would have its own subclass which implements setting up an FDO connection to
that source in the way that specific provider needs to but with the actual
benchmark tests executing in the common base code to make the test as fair
as possible.

The individual benchmarks ended up being instantiated in the
BenchmarkRunnerForm code behind. It was originally intended  to be cleaner
that this with a specific benchmark runner class doing this kind of thing
but this was a quick shortcut to save time because I was under pressure to
get some results data. 

There is a line in the BenchmarkRunnerForm.cs:

_sdfBenchmark = new SDF3Benchmark(new FileInfo("E:\\llWater.sdf"), new
FileInfo("D:\\testing_file.sdf"));

You will have to change this line to point to whatever SDF 3 data file you
are going to use to test with as I don't think you would want me attaching
the 344mb llWater.sdf to this email. The second parameter was for use

In the writing benchmark test but as that did not get implemented then
changing it does not really matter. I think that all that happens in the
code is that it may check for the presence of the second file and delete it
if it exists so as to create a fresh destination file as part of the test.

If you need anything else please let me know.

Carl Jokl

Keynetix

__________ NOD32 3278 (20080718) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/fdo-internals/attachments/20080718/9b61c8b3/attachment-0001.html