[postgis-users] Performance question (how many years will this take?)

Paul Ramsey pramsey at refractions.net
Mon May 10 18:46:44 PDT 2004


Wood Brent wrote:
> --- Paul Ramsey <pramsey at refractions.net> wrote:
> 
>>Wood Brent wrote:
>>
>>
>>>select into...
>>>...
>>>...
>>>where intersects(uniq_event.the_geom, chat_cells3.the_geom);
>>
>>Is that really your where clause? You want this:
>>
>>   where intersects(uniq_event.the_geom, chat_cells3.the_geom)
>>   and uniq_event.the_geom && chat_cells3.the_geom
>>
>>So that the spatial index cuts down on the number of comparisons that 
>>have to be made with the expensive exact intersects() function.
> 
> I assume from this that _intersects_ does NOT do a pre-check with the GiST
> bounding box before the actual polygon check?

No, no functions do. If you want to use the index, you *must* use the && 
operator. (Or one of the other operators, please don't.)

> What I have now done is add a 
>     "where cell_id >= $START_ID and $CELL_ID < $END_ID"
> into the sql, and wrap it up in a script which starts at cell_id of 0 and
> increments in batches of 100 cells per iteration until it gets to the maximum
> cell_id. (note: there is a unique index on cell_id which is an integer attr)

Iterating outside the database seems to be incredibly efficient. I did 3 
million fairly complex iteractions (pulling 10-200 records each and 
analyzing them for time series properties) against a 50 million record 
table, with run times of under an hour for the whole analysis. And the 
program was in perl (like all great programs).

> This seems faster & is running without the postmaster process swapping out.
> So I think this will work OK, it's about 30% done already, while I wrote this. 

Good stuff.

P.




More information about the postgis-users mailing list