[mapserver-dev] Query Sandbox: PostGIS

Jim Klassen Jim.Klassen at ci.stpaul.mn.us
Tue Mar 17 10:28:03 EDT 2009


Sorry for the delay, I've been swamped with email.

The patches are separate. Only the last patch needs to be applied.

For me the 8x is a great improvement over where we were. At the moment, the delay seems to be pulling the data from the DB (OCIStmtFetch2). 

I've been thinking along the lines of a feature cache and it would improve performance even more so long as it doesn't eat up too much memory.

In the mean time, I have been playing with creating a separate thread that handles getting the data from Oracle and pre-fetches it ARRAY_SIZE (1024) features at a time into one of two buffers. NextShape and GetShape then just copy from these buffers. The benefit is on the query modes the DB connection is busy nearly 100% of the time which gives about another 10% speedup. There are still downsides which I haven't had time to sort out. I was expecting more of an overall speedup. It appeared to slow down NextShape because mode=map is only half as fast with the patch. I would've expected a gain across the board. So... there is something different in what NextShape vs. GetShape are doing yet (in the single-pass patch) where NextShape is still more efficient by a notable amount. Also, the database doesn't seem to like to be hit that hard and throws ora-600 errors, although the results on the client side appear to be correct.

One thing I discovered is the first pass (NextShapes) ends up waiting for the fetching thread to finish getting the data, while in the second pass (GetShape) the fetching thread ends up waiting for GetShape to be done with the buffer

I suspect this version is going to become more of a learning tool than production code, but if anyone is interested, I'd be willing to post the threaded version, but I don't want people confusing it with something that is anywhere near production ready. Also, it uses pthreads so I'm guessing this won't work on Windows.

I need to investigate further if there is a difference in the timings of NextShape in mode=map vs. the WFS results.

>>> "Steve Lime" <Steve.Lime at dnr.state.mn.us> 03/12/09 8:19 AM >>>
I guess the question is if 8x is fast enough. I was thinking of adding a feature cache 
that could be used in addition to the new code (and would be tunable). Small result
sets would only hit the db once.

Jim, do your patches need to be applied in any order?

Steve

>>> "Jim Klassen" <Jim.Klassen at ci.stpaul.mn.us> 03/12/09 12:21 AM >>>
I made some updates to my previous patch and attached a new patch to #2933.

The new results for the same dataset are:
* trunk takes 6m8s (1m19s in mapserver code)
* prepare takes 1m35s (23.5s in mapserver code)
* single-pass takes 46.5s (26.5s in mapserver code)
* both function=none and function=filter function, although function=none returns one more row than function=filter. function=filter agrees with postgis and shape. I haven't had time to track this down.

Conclusion:
* Single-pass is about 8x faster than trunk.
* In single-pass PostGIS is 2.4x faster than Oracle.

BTW: the bus error in trunk with filter=none appears to be related to a failed malloc. I haven't looked into it enough yet to know for sure what is going on.

>>> "Jim Klassen" <Jim.Klassen at ci.stpaul.mn.us> 03/11/09 6:47 PM >>>
Ok, I feel kind of stupid here in that I should have noticed something was wrong with a file that small. (The DBF of the shape was 300MB). My test dataset is the address points for the City of St. Paul (2D Point with 32 attributes).

Note: single-pass is in ticket #2933, prepare is in ticket #2927, my trunk is at r8765.

It turns out:
* The dataset is 139,000 rows (excluding the ones with null geometry, which mapserver skips)
* The final GML size is 364MB
* The single-pass time to generate that file is 3m17s (1m6s in mapserver the rest waiting for oracle over the network)
* The trunk time to generate the file is 3m21s (1m2s in mapserver) with a bus error at the 100MB mark.
* The trunk+prepare time to generate the file is 1m3s (20s in mapserver) with a bus error at the 100MB mark.
* The PostGIS (from trunk) time to generate the file is 1m21s (17s in mapserver) and finished successfully.
* THe PostGIS (from single-pass) time to generate the file is 19s (14s in mapserver) and finished successfully.
* There is a bug somewhere on using function = none (vs. function = filter which works fine) where the record id seems to be getting mangled causing single-pass to exit early.  Also there appears to still be a bug in trunk given the bus errors.
* MapServer appears to use about 25% cpu and 200MB ram on single-pass vs. 33% cpu and 13MB ram in trunk (according to activity monitor). Shark says most of this is spent in OCIStmtFetch2() in single-pass and OCIStmtExecute() in trunk and vs. 38% cpu and 13MB ram for trunk+prepare.

Conclusion:
* Just looking at the rates, single-pass is about 3.58x faster than trunk, and prepare is about 3.2x faster than trunk.
* For PostGIS, I get single-pass is about 4.3x faster than trunk.
* In single-pass PostGIS is 10.9x faster than Oracle. In trunk PostGIS is 8.9x faster then Oracle.
* There is something different with the oracle at the office vs. the sprint because I wasn't seeing bus errors at the sprint.

>>> Jim Klassen 03/11/09 11:39 AM >>>
I added the OracleSpatial patch to track as #2933.
 
On our 220k point address database the query time went from 6 minutes down to 32 seconds and produced about 360K of GML.

>>> On 3/10/2009 at 10:33 AM, Paul Ramsey <pramsey at opengeo.org> wrote:
I have updated the PostGIS driver in the sandbox to use the new logic.
My timing on outputting 5Mb of GML via WFS query has gone from 12s to
0.5s, a 24x improvement.

Great work Steve!

P.
_______________________________________________
mapserver-dev mailing list
mapserver-dev at lists.osgeo.org 
http://lists.osgeo.org/mailman/listinfo/mapserver-dev

_______________________________________________
mapserver-dev mailing list
mapserver-dev at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/mapserver-dev

_______________________________________________
mapserver-dev mailing list
mapserver-dev at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/mapserver-dev




More information about the mapserver-dev mailing list