[Geomoose-users] query.php performance advice sought
Len Kne
lkne at houstoneng.com
Tue Jun 22 17:38:10 EDT 2010
Johan
Something seems wrong with the performance when accessing the Postgres DB. You say there are only 4000 records, unless there are a million attributes, seems like that should be a really quick query. Have you tried running the query from the command line or PGAdmin to see how it performs?
If you have a large number of records with a large mapserver template, it is going to take some time to get that data to the client. Could you have a simple template that returns just the syspop attribute and then have a link to make a follow-up query if details are needed?
Another thing to check out is FastCGI, I seen some performance boost using it. http://www.maptools.org/ms4w/index.phtml?page=README_INSTALL_3.0.html#f-fastcgi
Len
-----Original Message-----
From: Johan Forsman [mailto:Johan.Forsman at LA.GOV]
Sent: Tuesday, June 22, 2010 3:06 PM
To: geomoose-users at lists.sourceforge.net
Subject: Re: [Geomoose-users] query.php performance advice sought
Please pardon the prior incomplete post; I inadvertently pressed "send" rather than "save draft". Blame it on the streaming World Cup coverage online. Here is the more complete version.
Preemptive apologies for a long post. I have attempted to collect enough data for the collective to have a chance at theorizing.
I am attempting to use the new query.php to extract a subset of a PostGIS table based on integer values stored in the "syspop" field. My service definition looks like this at the moment (I made a copy of the original query.php and changed the name to allow for troubleshooting without risking destroying the original, hence the different script name):
<service name="advanced_search">
<url>php/query_sdwp.php</url>
<step type="input">
<input type="hidden" name="highlight" value="false"/>
<input type="hidden" name="mode" value="search"/>
<input type="hidden" name="layer0" value="wells/sdwis_wells"/>
<input type="hidden" name="template0" value="itemquery"/>
<input type="hidden" name="fieldname0" value="syspop"/>
<input type="user" name="value0" title="Population is greater than"/>
<input type="hidden" name="comparitor0" value="gt"/>
<input type="hidden" name="operator1" value="and"/>
<input type="hidden" name="fieldname1" value="syspop"/>
<input type="user" name="value1" title="and less than"/>
<input type="hidden" name="comparitor1" value="lt"/>
</step>
</service>
It's not pretty but it works. Sort of. I am having problems with the performance to such extent it's not usable. I have made some attempts to quantify this below. It's not scientific, but it is illustrative.
Relevant stats:
Server hardware is 2 x 2.0 GHz Xeon, 3 GB RAM. It's a repurposed Dell PowerEdge 4600, I think.
Server OS is Ubuntu Server 9.10.
GeoMoose 2.2 in FGS 9.5 with MapServer 5.4.2.
Dataset of interest is a point set with 4100 total records, stored in PostGIS on the same server. An index on the syspop field has no discernible effect on the performance.
During testing I was the only user on the server.
I turned off highlighting so the query would only run once.
I ran two scenarios:
1) syspop >5000 and syspop <10000
2) syspop >0 and syspop <5000
I observed the server process list to determine where the CPU time was being spent.
I uncommented the existing diagnostic lines in query.php to verify the request and the result count for both scenarios and they are both correct.
I increased the php timeout to 10 minutes and the php memory to 128 megs to stop those errors from interfering. I am logging the php errors and file is now clear with no errors.
These are the results for the two scenarios:
Search Layer: wells/sdwis_wells
Template: itemquery
FILTER: syspop > 5000 and syspop < 10000 Total Results: 306 Stop watch time to completion: 13 seconds Time spent by postgres: 7 seconds with CPU at 88% Time spent by httpd: 6 seconds with CPU at 92%
Search Layer: wells/sdwis_wells
Template: itemquery
FILTER: syspop > 0 and syspop < 5000
Total Results: 3323
Stop watch time to completion: 5.5 minutes!
Time spent by postgres: 30 seconds with CPU at 88% Time spent by httpd: 5 minutes with CPU at 92%
With that CPU usage I'd say stuff is going on, but what? SQL-wise this is a very simple query, is it not?
My baseline comparison is that for both scenarios a SQL query from Access 2007 on my workstation on the linked table from the server, or a row filter from PgAdmin, returns the expected results instantaneously. Apples and oranges I know, but perhaps confirmation that the database can work quickly and serve the expected results in no time flat?
I tried a MapServer direct URL query, with a simple template to just list the record IDs in the browser like so:
.../cgi-bin/mapserv?map=/opt/fgs/apps/geomoose2/maps/sdwp/wells/wells_webmaps.map&layers=sdwis_wells&mode=itemnquery&qlayer=sdwis_wells&qitem=syspop&qstring=(syspop > 5000 and syspop < 10000)
This returns the 306 records to the browser window in 3 seconds with barely a blip in the server CPU usage.
Direct URL query with syspop > 0 and syspop < 5000:
Returns the 3323 records to the browser window in 35 seconds with server CPU (postgres) at 85%. A respectable portion of that time is transferring the data to the browser window.
If those URL queries are representative of what GM is doing, then I'd say that the 30-second mark is as fast as MapServer can query the PostGIS and return the results on this system, but what in the world is going on during those other 5 minutes? I know GM building the formatted results with the template, but what else? Is there a way to tell?
Very oddly with the URL queries: The mapfile contains a FILTER "sysstatus = 'Active'" directive to only include the active records. If that directive is absent the URL query works, but if it is present I receive and error: "Content-type: text/html msDrawMap(): Image handling error. Failed to draw layer named 'sdwis_wells'". Depending on what numbers I use for the lower and upper bounds I can get some records returned before the error. MapServer bug? MapServer feature? PEBKAC? Just weird?
I don't know that there is anything I can do, but please feel free to opine at will; I am fresh out of ideas. My coding skills are worse than non-existent; they are dangerous, and regrettably there is no budget for that shiny new 26 GHz Core i13, which surely would address some part of this.
Sorry about the long post. After this, I'm thinking it's noon somewhere...
/Johan.
----------------------
Johan Forsman
Geologist
Safe Drinking Water Program
Louisiana Department of Health and Hospitals Office of Public Health
Telephone: 225.342.7309
Telefax: 225.342.7303
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Geomoose-users mailing list
Geomoose-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geomoose-users
More information about the Geomoose-users
mailing list