[postgis-users] Can I turn off MapServer bounding box check for postgis query?

Kevin Neufeld kneufeld at refractions.net
Wed Jan 21 19:14:38 PST 2009


If EXPLAIN ANALYZE is taking too long, can you just post the EXPLAIN for 
starters (both over a 2day and say 5 day range)?  (You have ANALYZED 
your tables, right?)

At a quick glace I see that you can benefit a little by not computing 
st_centroid twice (once in the SELECT clause and once in the GROUP BY).  
Since you've aliased the centroids in the SELECT clause, you can change 
your GROUP BY to:
GROUP BY o.gid, d.gid, o.area_name, d.area_name, orig, dest

Also, have you considered using table inheritance with constraint 
exclusion?  It sounds like your use case makes for a perfect example.  I 
had a table of 18million linestrings that I subdivided into two 9million 
tables using inheritance.  Since I was typically only querying data in 
one of the tables, the query time halved.

If you typically only query a few days at a time or even within the same 
month, you could set up 1 table for every month this year, maybe one for 
all 2008 data and one for all older data.  In this way, if you 
frequently query the current month, you'll only have to look at 300,000 
entries instead of 4million.  If the main query table is 7% its original 
size, the indexes will be too ... perhaps even resident in memory.  
That's a huge performance gain. 

Cheers,
Kevin

Ben Madin wrote:
> G'day all,
>
> I thought this might be a MapServer List question, but have received 
> no replies so apologise for the cross posting.
>
> I have a table with approx 4 million rows, each with a date and start 
> and end point (and some other information). This table would probably 
> increase by around 300 000 rows per month. The table is indexed and 
> clustered by date, and the table has no specific 'geometry' 
> information, just a code which I link to a fairly simple geometry 
> table (about 150 polygons).
>
> To produce a useful sense of the activity, I am collecting all the 
> start and end points in a given time frame, and making line segments, 
> then aggregating by start and end point, and counting the total number 
> of each line.
>
> EXPLAIN ANALYZE
> SELECT 
> "bucket"::text,asbinary(force_collection(force_2d(route)),'NDR'),mid::text
> from ( SELECT (ogid*1000 + dgid)::int as mid, origtta, desttta, 
> ST_MakeLine(s.orig, s.dest) as route, total_head, 
> ceiling(log(total_head))::int as bucket
> FROM ( SELECT o.gid as ogid, d.gid as dgid, o.area_name as origtta, 
> d.area_name as desttta, st_centroid(o.the_geom) as orig, 
> st_centroid(d.the_geom) as dest, count(n.id) as total_head
> FROM nlis n
> JOIN tailtagarea o ON n.sourceshire = o.shortcode
> JOIN tailtagarea d ON n.destinationshire = d.shortcode
> WHERE sourceshire is not null AND sourceshire not like ' '
> AND destinationshire is not null AND destinationshire not like ' '
> AND transferdate >= '13/06/2008' AND transferdate <= '15/06/2008'
> GROUP BY o.gid, d.gid, o.area_name, d.area_name, 
> st_centroid(o.the_geom), st_centroid(d.the_geom)
> ) AS s
> ) AS foo
>
> in the terminal, this runs in either about 4 or 9 seconds depending on 
> the date range - if it becomes more than about 3 days of reports the 
> query analyzer notes moving to disk based sorts etc.
>
> However, when I run it in MapServer, the following statement is added :
>
> WHERE route && setSRID('BOX3D(105.204213080645 
> -37.0327214193548,143.604244080645 -9.31123074193548)'::BOX3D, 4326 )
>
> and if there is two days of data (maybe 110 aggregate rows) it works 
> OK, but if there is 3 or more days (maybe 130 - 140 aggregate lines) 
> it never completes, and neither does it if I take the final statement 
> and run it directly in psql.
>
> So, I read the documentation, and read about moving the BBOX statement 
> using !BOX!, which I did (using st_within on both the joins, but in 
> fact, this is not the behaviour I want, as it stops any lines being 
> created if they start or end outside the window ( I want to see these 
> ones as well).
>
> Additionally, it didn't stop the final bbox statement being added, 
> which I thought it was meant to?
>
> So can I just trick MapServer into not adding the BBOX?
>
> After aggregation, there is unlikely to be more than about 200 lines 
> to be rendered, so whether or not they are in the map window shouldn't 
> damage rendering performance too much - should it?
>
> Failing that, my option appears to be create a 'lookup' table when new 
> data arrives, which includes the geometry I'm currently joining in the 
> table, and then create the indexes again. Given that the table could 
> be receiving data 24 hours a day, this would seem like a large amount 
> of overhead. 
>
> I am trying to pull about the output from explain, but as the query 
> appears to never complete, it's a bit hard to figure when it isn't 
> finishing.
>
> I would welcome any suggestions... any!!
>
> cheers
>
> Ben
>
>
>
> -- 
>
> Ben Madin
> REMOTE INFORMATION
>
> t : +61 8 9192 5455
> f : +61 8 9192 5535
> m : 0448 887 220
> Broome   WA   6725
>
> ben at remoteinformation.com.au <mailto:ben at remoteinformation.com.au>
>
>
>
> Out here, it pays to know...
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>   



More information about the postgis-users mailing list