[Geomoose-users] query.php performance advice sought

Johan Forsman Johan.Forsman at LA.GOV
Tue Jun 22 18:00:11 EDT 2010


Hi Len, thanks for the PgAdmin suggestion; I hadn't thought to do the SQL there to test, only checked with a row filter. Here is the result of a simple SELECT query to find all the records with syspop between 0 and 5000.

SELECT gid, pwsid, sysname, pwstype, syspop, parish, dhhregion, sysstatus, 
       other_sysn, sourcetype, sourcecoun, facid, facname, other_facn, 
       wellregid, constr_dat, facstatus, facstatus_, lat, lon, datum, 
       location_d, moreinfo, staticleve, groutingde, welldepth, welldiam, 
       point_x, point_y, wkb_geometry
  FROM webmaps_sdwis_wells
  WHERE (syspop > 0 and syspop < 5000);

Total query runtime: 241 ms.
3294 rows retrieved.

I would submit that postgres is working ok with that runtime.

I tested a brief version of your template suggestion by pointing to a simple html that only returns a record id. With that I get back 3294 rows of records in my "Search results" tab in 43 seconds. With my swanky formatted template it takes 5 minutes and 35 seconds to get the same results back.

That's hard data that seems to suggest that it's the templating that consumes all the time and resources.

Looks like a workaround is prescribed, and I need to learn how to make the right links from the result list to the "more info" template.

Thanks for the help!
/Johan.

> -----Original Message-----
> From: Len Kne [mailto:lkne at houstoneng.com]
> Sent: Tuesday, June 22, 2010 4:38 PM
> To: Johan Forsman; geomoose-users at lists.sourceforge.net
> Subject: RE: query.php performance advice sought
> 
> Johan
> 
> Something seems wrong with the performance when accessing the Postgres DB.
> You say there are only 4000 records, unless there are a million
> attributes, seems like that should be a really quick query.  Have you
> tried running the query from the command line or PGAdmin to see how it
> performs?
> 
> If you have a large number of records with a large mapserver template, it
> is going to take some time to get that data to the client.  Could you have
> a simple template that returns just the syspop attribute and then have a
> link to make a follow-up query if details are needed?
> 
> Another thing to check out is FastCGI, I seen some performance boost using
> it.
> http://www.maptools.org/ms4w/index.phtml?page=README_INSTALL_3.0.html#f-
> fastcgi
> 
> Len
> 
> -----Original Message-----
> From: Johan Forsman [mailto:Johan.Forsman at LA.GOV]
> Sent: Tuesday, June 22, 2010 3:06 PM
> To: geomoose-users at lists.sourceforge.net
> Subject: Re: [Geomoose-users] query.php performance advice sought
> 
> Please pardon the prior incomplete post; I inadvertently pressed "send"
> rather than "save draft". Blame it on the streaming World Cup coverage
> online. Here is the more complete version.
> 
> Preemptive apologies for a long post. I have attempted to collect enough
> data for the collective to have a chance at theorizing.
> 
> I am attempting to use the new query.php to extract a subset of a PostGIS
> table based on integer values stored in the "syspop" field. My service
> definition looks like this at the moment (I made a copy of the original
> query.php and changed the name to allow for troubleshooting without
> risking destroying the original, hence the different script name):
> 
> 	<service name="advanced_search">
> 		<url>php/query_sdwp.php</url>
> 		<step type="input">
> 			<input type="hidden" name="highlight" value="false"/>
> 			<input type="hidden" name="mode" value="search"/>
> 
> 			<input type="hidden" name="layer0"
> value="wells/sdwis_wells"/>
> 			<input type="hidden" name="template0"
> value="itemquery"/>
> 
> 			<input type="hidden" name="fieldname0" value="syspop"/>
> 			<input type="user" name="value0" title="Population is
> greater than"/>
> 			<input type="hidden" name="comparitor0" value="gt"/>
> 
> 			<input type="hidden" name="operator1" value="and"/>
> 
> 			<input type="hidden" name="fieldname1" value="syspop"/>
> 
> 			<input type="user" name="value1" title="and less than"/>
> 			<input type="hidden" name="comparitor1" value="lt"/>
> 		</step>
> 	</service>
> 
> It's not pretty but it works. Sort of. I am having problems with the
> performance to such extent it's not usable. I have made some attempts to
> quantify this below. It's not scientific, but it is illustrative.
> 
> Relevant stats:
> Server hardware is 2 x 2.0 GHz Xeon, 3 GB RAM. It's a repurposed Dell
> PowerEdge 4600, I think.
> Server OS is Ubuntu Server 9.10.
> GeoMoose 2.2 in FGS 9.5 with MapServer 5.4.2.
> 
> Dataset of interest is a point set with 4100 total records, stored in
> PostGIS on the same server. An index on the syspop field has no
> discernible effect on the performance.
> 
> During testing I was the only user on the server.
> 
> I turned off highlighting so the query would only run once.
> 
> I ran two scenarios:
> 1) syspop >5000 and syspop <10000
> 2) syspop >0 and syspop <5000
> 
> I observed the server process list to determine where the CPU time was
> being spent.
> 
> I uncommented the existing diagnostic lines in query.php to verify the
> request and the result count for both scenarios and they are both correct.
> 
> I increased the php timeout to 10 minutes and the php memory to 128 megs
> to stop those errors from interfering. I am logging the php errors and
> file is now clear with no errors.
> 
> These are the results for the two scenarios:
> 
> Search Layer: wells/sdwis_wells
> Template: itemquery
> FILTER: syspop > 5000 and syspop < 10000 Total Results: 306 Stop watch
> time to completion: 13 seconds Time spent by postgres: 7 seconds with CPU
> at 88% Time spent by httpd: 6 seconds with CPU at 92%
> 
> Search Layer: wells/sdwis_wells
> Template: itemquery
> FILTER: syspop > 0 and syspop < 5000
> Total Results: 3323
> Stop watch time to completion: 5.5 minutes!
> Time spent by postgres: 30 seconds with CPU at 88% Time spent by httpd: 5
> minutes with CPU at 92%
> 
> With that CPU usage I'd say stuff is going on, but what? SQL-wise this is
> a very simple query, is it not?
> 
> My baseline comparison is that for both scenarios a SQL query from Access
> 2007 on my workstation on the linked table from the server, or a row
> filter from PgAdmin, returns the expected results instantaneously. Apples
> and oranges I know, but perhaps confirmation that the database can work
> quickly and serve the expected results in no time flat?
> 
> I tried a MapServer direct URL query, with a simple template to just list
> the record IDs in the browser like so:
> 
> .../cgi-
> bin/mapserv?map=/opt/fgs/apps/geomoose2/maps/sdwp/wells/wells_webmaps.map&
> layers=sdwis_wells&mode=itemnquery&qlayer=sdwis_wells&qitem=syspop&qstring
> =(syspop > 5000 and syspop < 10000)
> 
> This returns the 306 records to the browser window in 3 seconds with
> barely a blip in the server CPU usage.
> 
> Direct URL query with syspop > 0 and syspop < 5000:
> Returns the 3323 records to the browser window in 35 seconds with server
> CPU (postgres) at 85%. A respectable portion of that time is transferring
> the data to the browser window.
> 
> If those URL queries are representative of what GM is doing, then I'd say
> that the 30-second mark is as fast as MapServer can query the PostGIS and
> return the results on this system, but what in the world is going on
> during those other 5 minutes? I know GM building the formatted results
> with the template, but what else? Is there a way to tell?
> 
> Very oddly with the URL queries: The mapfile contains a FILTER "sysstatus
> = 'Active'" directive to only include the active records. If that
> directive is absent the URL query works, but if it is present I receive
> and error: "Content-type: text/html msDrawMap(): Image handling error.
> Failed to draw layer named 'sdwis_wells'". Depending on what numbers I use
> for the lower and upper bounds I can get some records returned before the
> error. MapServer bug? MapServer feature? PEBKAC? Just weird?
> 
> I don't know that there is anything I can do, but please feel free to
> opine at will; I am fresh out of ideas. My coding skills are worse than
> non-existent; they are dangerous, and regrettably there is no budget for
> that shiny new 26 GHz Core i13, which surely would address some part of
> this.
> 
> Sorry about the long post. After this, I'm thinking it's noon somewhere...
> /Johan.
> 
> ----------------------
> Johan Forsman
> Geologist
> Safe Drinking Water Program
> Louisiana Department of Health and Hospitals Office of Public Health
> Telephone: 225.342.7309
> Telefax: 225.342.7303
> 
> 
> --------------------------------------------------------------------------
> ----
> ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's
> Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit.  See the prize
> list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> Geomoose-users mailing list
> Geomoose-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geomoose-users




More information about the Geomoose-users mailing list