[Geodata] Tiger - can't get SQL to use indexes in postgres
Stephen Woodbridge
woodbri at swoodbridge.com
Mon Jun 23 21:26:32 EDT 2008
Stephen Frost wrote:
> Stephen,
>
> * Stephen Woodbridge (woodbri at swoodbridge.com) wrote:
>> I loaded up some of the Tiger2007fe data into postgres and I'm trying to
>> get a select SELECT to use indexes, but it isn't and I don't understand
>> why. There are three tables, somewhat over indexed at the moment because
>> I can't get things to work.
>>
>> I have vacuum analyzed the whole database. Below are the query, explain
>> analyze plan and table definitions. This is a subset of the query:
>>
>> select count(*) from featnames b, addr c, addrfn d, edges a
>> where a.tlid=b.tlid and b.tlid=c.tlid and b.linearid=d.linearid and
>> c.arid=d.arid and a.roadflg='Y';
>
> I'd start by recommending join-syntax rather than just commas, eg:
>
> select count(*)
> from
> featnames b
> join addr c on (b.tlid = c.tlid)
> join addrfn d on (b.linearid = d.linearid)
> join edges a on (b.tlid = a.tlid)
> where
> a.roadflg = 'Y'
> ;
Thanks, I'll look into this. I found it curious that I could not get
just two tables to join on an index:
select count(*) from addr a, addrfn b where a.arid=b.arid;
I realize that this requires to a full table or index scan regardless
because of the count(*), but I thought I should be able to use an index.
Maybe not, the optimizer might have decided that doing the full table
scan since it was required would be just as fast as scanning the index.
I was thinking that doing a merge of two sorted indexes would be faster
than building a hash and then doing a hash join. But probably it is
smarter than I am :)
> A better question, really, is- what is it you're really trying to do
> here? Do you really need a count? Somehow, I doubt it.. There's
Right, ultimately what I want to do is load the data, join it and write
some of it out into another format. Basically I have three uses for the
data ATM, 1) mapping, 2) geocoding and 3) reverse geocoding. Each of
these need a slightly different view or need some amalgamation of the
separate views for a single dataset that would work with all three.
> *alot* of edges and whatnot in TIGER, you're going to want to filter it
> down more, one way or another. I'm also not convinced that your joins
> are set up correctly, honestly. On tiger_01 I got 51.6M records
> returned from that query, but there's only 1.6M edges, 1.5M featnames,
> and 775K addr. Seems a bit fishy to me though I don't see anything
> obvious at the moment.
Yeah, exactly! So what I am doing ATM is just exploring the data and the
relationships, Then I probably need to create some views and try out the
data for each of the applications involved. And then make scripts to
automate processing all the data. Also since the data is all changing
because of the Tiger changes and some changes I want to make, all the
applications need to be updated as well to deal with the new data <sigh>
lots of work and as Schuyler said "a slow process".
Thanks,
-Steve
> Thanks,
>
> Stephen
More information about the Geodata
mailing list