[fdo-internals] RE: SDF 3.0 Memory Leak

Fri Jul 18 11:00:12 EDT 2008

Why don't you try modifying the benchmark to stop storing all the attribute values in a single dictionary and see what the side effect is? If the memory usage drops to an acceptable level then the issue is most likely the benchmark implementation, not the SDF Provider.

Greg

From: Carl Jokl [mailto:carl.jokl at keynetix.com]
Sent: Friday, July 18, 2008 10:07 AM
To: Greg Boone
Subject: RE: SDF 3.0 Memory Leak

The test for memory was not the intention of the benchmark but just a side effect. It was hardly the most professional approach but to observe the memory usage I opened windows task manager and watched the memory usage of the FDOBenchmark process.

The benchmark reads entries up to an arbitary batch size. This is part of the FDOBatch class. Once the batch size is reached the time to load that single batch is saved and all the data loaded in that batch is discarded. The reason being that if we were to test with a really large file of the order of gigabites then the entries could not all  be loaded into memory. Using batches was supposed to make the test scalable up to larger sizes due to the assumption that discarding the data in a batch would free up the memory it has occupied ready to load another batch. Bear in mind that the benchmark was just about timing how long data took to load and not (at least in this version) doing anything with the data. Yesterday evening while doing some other mapguide work which had me looking into the C++ sourcecode I noted that the Dispose() function would call the C++ delete on the native object. This gave me the idea of altering the benchmark code so that both the FDO batch and FDO Entry classes would have a dispose method which would explicitly go through every map guide value object and call the dispose method. My thinking was that perhaps the FDO wrappers to these data objects were being garbage collected but the native components which they wrapped were remaining in memory and could then explain the memory leak. I put this to the test but having put the code in place to dispose of all the value objects the memory usage of the application during and after the benchmark were still the same. I also endeavoured to explicitly dispose of any FDO classes as soon as they were no longer needed be it connections, feature readers, schemas, class definitions. Disposing of them explicitly in this way still did not seem to impact on the memory used when observing the footprint though task manager.

I am informed by a colleague that AutoDesk are in position of some profiling tools which could more effectively diagnose what is going on with the memory usage than I am able to.

Regards

Carl

From: Greg Boone [mailto:greg.boone at autodesk.com]
Sent: 18 July 2008 14:50
To: Carl Jokl
Subject: RE: SDF 3.0 Memory Leak

Excuse my insistence here, but I need to understand the memory read dynamics of the benchmark.

So, the benchmark opens a 344MB SDF file. It then reads all the features and stores a copy of all the Attribute and Geometry values for each feature in memory in a dictionary. Is this the case?

What is your expected result in this situation. As I read it, memory usage will increase as each feature is read and stored in the dictionary, up until memory usage reaches ~340 MB.

Greg

From: Carl Jokl [mailto:carl.jokl at keynetix.com]
Sent: Friday, July 18, 2008 9:43 AM
To: Greg Boone
Subject: RE: SDF 3.0 Memory Leak

This was the intent but on it's own doing this isn't of great value. The idea is to time how long loading takes so it performs a loading operation from and SDF file times how long it takes to load the contents of the file while discarding the data once it has loaded. The time here on it's own isn't very useful but for the benchmark and identical set of data was stored in a PostGIS database and the time taken to load the contents of the SDF file through the SDF FDO provider were compared with the time taken to load identical data from the PostGIS database through the PostGIS provider.

The times were then compared to see how the retrieval times compared. I was expecting PostGIS to be a bit slower because the SDF just had to read in from a Flat file and this would likely be happening in the same process as the calling benchmark. By comparison the PostGIS data is in a PostGIS database running in it's own process and piping data out over TCP/IP. That was bound to be extra overhead vs the SDF provider.

The point of the benchmark was that at Keynetix we are migrating a legacy MapGuide 6 application to MapGuide enterprise. As part of this migration the question of the best way to store the migrated data came up. The options are to continue using SDF flat files, use PostGIS, use SQLServer or if we really had lots of money to use Oracle. To help make that  kind of decision I was asked to try and compare the speed of data manipulation from the various data sources. If I had time I would have done reading, writing and querying etc as was my original intention. As I went through development however I was put under increasing pressure just to get some figures back and so I just implemented reading in this version.

The benchmark showed that the PostGIS FDO provider was much much slower than SDF, far more so than I expected. I was a bit suspicious of this and so wrote a test Java program to load the same data directly via JDBC as well as using PostGIS with PGAdmin and running queries  and that was much much faster (the Java program was able to load the data faster than the SDF provider).

This as well as some log analysis on PostGIS pointed the speed problems to most likely be caused by the quality of the PostGIS fdo provider implementation.

By this point the benchmarks had served their purpose for the most part. It was dug out again when my colleague was writing a migration application to migrate our legacy SDF 2.0 data to SDF 3.0. This also used FDO in parts but was plagued by the Memory out of error problem. When that was discovered I dug out this benchmark application to see if it too had problems with escalating requirements for memory. I found that it did albeit not to the point of having a memory out of error exception.

I hope that explains what the code was for. I commented out the PostGIS benchmark as it will not be of much use to you unless you have a PostGIS data source to test with but that provider did not appear to have a memory leak problem that I could tell.

Regards

Carl

From: Greg Boone [mailto:greg.boone at autodesk.com]
Sent: 18 July 2008 14:22
To: Carl Jokl
Subject: RE: SDF 3.0 Memory Leak

Can I ask what is the ultimate purpose of the Benchmark code? After looking at the source code, it seems that the benchmark opens the SDF file, reads all the features, both attributes and geometry values, and stores them in memory. Is this the intent?

Greg

From: Carl Jokl [mailto:carl.jokl at keynetix.com]
Sent: Friday, July 18, 2008 9:04 AM
To: Greg Boone
Subject: SDF 3.0 Memory Leak

Greg

I attach a copy of the benchmark test program. I apologise that it is not as well structured as I would like as I was under pressure to get some results back quickly and originally the benchmarks were going to cover reading writing querying and removing etc. In the end only reading was implemented. The idea was that there is an FDOBenchmark class. Each specific FDO provider would have its own subclass which implements setting up an FDO connection to that source in the way that specific provider needs to but with the actual benchmark tests executing in the common base code to make the test as fair as possible.

The individual benchmarks ended up being instantiated in the BenchmarkRunnerForm code behind. It was originally intended  to be cleaner that this with a specific benchmark runner class doing this kind of thing but this was a quick shortcut to save time because I was under pressure to get some results data.

There is a line in the BenchmarkRunnerForm.cs:

_sdfBenchmark = new SDF3Benchmark(new FileInfo("E:\\llWater.sdf"), new FileInfo("D:\\testing_file.sdf"));

You will have to change this line to point to whatever SDF 3 data file you are going to use to test with as I don't think you would want me attaching the 344mb llWater.sdf to this email. The second parameter was for use
In the writing benchmark test but as that did not get implemented then changing it does not really matter. I think that all that happens in the code is that it may check for the presence of the second file and delete it if it exists so as to create a fresh destination file as part of the test.

If you need anything else please let me know.

Carl Jokl

Keynetix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/fdo-internals/attachments/20080718/ae225ebd/attachment-0001.html