[Geodata] Tiger - can't get SQL to use indexes in postgres

Mon Jun 23 21:26:32 EDT 2008

Stephen Frost wrote:
> Stephen,
> 
> * Stephen Woodbridge (woodbri at swoodbridge.com) wrote:
>> I loaded up some of the Tiger2007fe data into postgres and I'm trying to  
>> get a select SELECT to use indexes, but it isn't and I don't understand  
>> why. There are three tables, somewhat over indexed at the moment because  
>>  I can't get things to work.
>>
>> I have vacuum analyzed the whole database. Below are the query, explain  
>> analyze plan and table definitions. This is a subset of the query:
>>
>> select count(*) from featnames b, addr c, addrfn d, edges a
>>  where a.tlid=b.tlid and b.tlid=c.tlid and b.linearid=d.linearid and  
>> c.arid=d.arid and a.roadflg='Y';
> 
> I'd start by recommending join-syntax rather than just commas, eg:
> 
> select count(*)
> from
>   featnames b
>   join addr c on (b.tlid = c.tlid)
>   join addrfn d on (b.linearid = d.linearid)
>   join edges a on (b.tlid = a.tlid)
> where
>   a.roadflg = 'Y'
> ;

Thanks, I'll look into this. I found it curious that I could not get 
just two tables to join on an index:

select count(*) from addr a, addrfn b where a.arid=b.arid;

I realize that this requires to a full table or index scan regardless 
because of the count(*), but I thought I should be able to use an index. 
Maybe not, the optimizer might have decided that doing the full table 
scan since it was required would be just as fast as scanning the index. 
I was thinking that doing a merge of two sorted indexes would be faster 
than building a hash and then doing a hash join. But probably it is 
smarter than I am :)

> A better question, really, is- what is it you're really trying to do
> here?  Do you really need a count?  Somehow, I doubt it..  There's

Right, ultimately what I want to do is load the data, join it and write 
some of it out into another format. Basically I have three uses for the 
data ATM, 1) mapping, 2) geocoding and 3) reverse geocoding. Each of 
these need a slightly different view or need some amalgamation of the 
separate views for a single dataset that would work with all three.

> *alot* of edges and whatnot in TIGER, you're going to want to filter it
> down more, one way or another.  I'm also not convinced that your joins
> are set up correctly, honestly.  On tiger_01 I got 51.6M records
> returned from that query, but there's only 1.6M edges, 1.5M featnames,
> and 775K addr.  Seems a bit fishy to me though I don't see anything
> obvious at the moment.

Yeah, exactly! So what I am doing ATM is just exploring the data and the 
relationships, Then I probably need to create some views and try out the 
data for each of the applications involved. And then make scripts to 
automate processing all the data. Also since the data is all changing 
because of the Tiger changes and some changes I want to make, all the 
applications need to be updated as well to deal with the new data <sigh> 
lots of work and as Schuyler said "a slow process".

Thanks,
   -Steve

> 	Thanks,
> 
> 		Stephen