[postgis-users] Tigerdata for AZ, AS and VI

Paragon Corporation lr at pcorp.us
Thu Dec 15 17:52:00 PST 2011


Ravi,

I thought I optimized those, but I may have missed something.  The best way
to see what is going on is to use

normalize_address  to see how its reading the values.

Anyrate -- please put in a bug ticket for this with some examples you are
trying and how long its taking.

http://trac.osgeo.org/postgis/newticket

Make sure to set the component to "tiger geocoder" in the ticket so it gets
assigned to me.

Thanks,
Regina
http://www.postgis.us


 

> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net 
> [mailto:postgis-users-bounces at postgis.refractions.net] On 
> Behalf Of Ravi ada
> Sent: Thursday, December 15, 2011 12:35 PM
> To: 'PostGIS Users Discussion'
> Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
> 
> Thank you. I am able to speed up a bit by launching the query 
> for each state and doing 6 states at a time. I am able to see 
> all processors are being used with a 80-85% memory 
> utilization. However I noticed that geocode function is 
> taking forever to return for some addresses that are like 
> '100 29TH EAST ST.' or 'I-35 HIGHWAY'. Some states have the 
> convention of using numbers for the street names, it takes 
> forever to geocode these addresses. Numbered streets and 
> Highway service roads are the one taking the most time. I 
> wonder why? Any ideas what we need to speed these up?
> 
> Thanks,
> Ravi Ada
> 
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On 
> Behalf Of Stephen Woodbridge
> Sent: Wednesday, December 14, 2011 1:30 PM
> To: postgis-users at postgis.refractions.net
> Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
> 
> On 12/14/2011 10:12 AM, Ravi ada wrote:
> > Thanks Steve.
> > I did not get when you said '2. normalize the names as you load the 
> > data', is this the step we need to do manually or load scripts 
> > automatically do this step?
> 
> the load scripts do this automatically.
> 
> > I am normalizing my addresses before querying the reference data 
> > (tiger data).  Please clarify.
> 
> You should use geocode() not geocode_address(), because 
> geocode() will normalize the address in the same way the the 
> reference addresses are normalized. If you split the address 
> into fields and call
> geocode_address() this is NOT the same as normalizing the 
> address. If you want to be able to get good matches, you have 
> to use the same normalize function for both the reference and 
> the input addresses.
> 
> These functions are not about performance they are about 
> using the tool the correct way.
> 
> -Steve
> 
> > Thanks
> > Ravi Ada
> >
> > -----Original Message-----
> > From: postgis-users-bounces at postgis.refractions.net
> > [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of 
> > Stephen Woodbridge
> > Sent: Wednesday, December 14, 2011 8:14 AM
> > To: postgis-users at postgis.refractions.net
> > Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
> >
> > Ravi,
> >
> > The process for geocoding follows this:
> >
> > Load the data:
> > 1. get a reference set of streets (ie: the Tiger data) 2. normalize 
> > the names as you load the data 3. build the indexes you 
> need for the 
> > queries
> >
> > Query for an address:
> > 1. normalize the address on input
> > 2. query the normalized reference
> >
> > Ok, so you know most of this because you have already done 
> it, but the 
> > important part here is the you normalize BOTH the reference 
> data and 
> > the input to a query. This resolves things like:
> >
> > main street != main st
> >
> > because the normalize parses the addresses and converts them into a 
> > normalized standard form so that you can match. Yes it 
> takes time to 
> > normalize the request, but if you don't normalize it, then 
> there is a 
> > good change that you will not match an appropriate street in the 
> > reference
> set.
> >
> > -Steve
> >
> > On 12/14/2011 9:06 AM, Ravi Ada wrote:
> >> Regina,
> >>
> >> Thanks so much for the reply. I ran the 
> >> missing_indexes_generate_script(),
> >> actually it did not return anything, I am assuming all the indexes 
> >> are in place. That may be because I ran install_missing_indexes()
> earlier.
> >> I changed the debug flag in geocode_address and it produced a very 
> >> long query that it runs to geocode the address. I tried to cut and 
> >> paste the query to run the plan, I am getting errors, I 
> will figure 
> >> that
> > out.
> >>
> >> My question is, do we use gecode or geocode_address for faster 
> >> querying? I noticed that geocode_address takes the 
> normalized address 
> >> where as geocode takes address as string parameter. By adding 
> >> additional normalize_address function when doing the 
> geocode_address 
> >> akes
> > it run any faster?
> >>
> >>
> >> Thanks
> >> Ravi Ada
> >>
> >>
> >>
> >>
> >> On Wed, 14 Dec 2011 04:18:27 -0500, Paragon Corporation wrote
> >>>> I just don't understand why the geocode function takes 
> so long to 
> >>>> return the coordinates. I am sure some of you on this list might 
> >>>> have done the batch geocoding millions of addresses. I may be 
> >>>> missing just a simple configuration which might make a 
> whole lot of 
> >>>> difference in the speed. I don't know what it is. I am following 
> >>>> the examples exactly from this link 
> >>>> 
> (http://postgis.refractions.net/documentation/manual-svn/Geocode.ht
> >>>> m
> >>>> l)
> >>>>
> >>>> If someone is familiar with the problem willing to help me using 
> >>>> GoTo Meeting connection to my machine, I can arrange that too. I 
> >>>> just have to move along with my project and meet the 
> deadlines. I 
> >>>> am already delayed, everybody in my team asking me for 
> this everyday.
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Ravi Ada
> >>>>
> >>>>
> >>> Ravi,
> >>>
> >>> Sorry been busy with raster stuff so haven't been tuned into this 
> >>> discussion.
> >>>
> >>> 1) The indexes the loader generates are not the only ones needed.
> >>> Initially I was constantly changing the loader script, 
> but since we 
> >>> were changing decisions as we changed code and optimal indexes 
> >>> needed with aeach change required changing indexes, which indexes 
> >>> would be best, I created a function that would put them in rather 
> >>> than bothering with the loader (since a lot of people 
> would already 
> >>> have their data loaded)
> >>>
> >>> Have you tried running that.  I suspect you are just 
> missing indexes 
> >>> as the timings you are getting are what I used to get earlier on.
> >>>
> >>> If you haven't run the update script (which runs this 
> routine anyway)
> >>>    or run this to get generated script for indexes you 
> are missing 
> >>> you should.
> >>>
> >>> 
> http://www.postgis.org/documentation/manual-svn/Missing_Indexes_Gene
> >>> r
> >>> ate_Scr
> >>> ipt.html
> >>>
> >>> 2) There are a couple of other things to note: First 
> address you do 
> >>> around an area can take a lot more time because of the 
> data caching 
> >>> effects in postgresql.  So for the example in the docs 
> you describe.
> >>>
> >>> I can do a geocode of 75 State Street,Boston, MA  -- and if I 
> >>> haven't done any geocoding in a while that takes like 1-3 seconds
> >>>
> >>> Then if I do 80 State Street, Boston, MA -- that subsequent takes 
> >>> anywhere from 60 ms - 150 ms.
> >>> I also don't have all the states loaded since I only 
> needed it for 
> >>> about 6 states.  thought that should just increase the 
> planner time 
> >>> rather than later times.
> >>>
> >>> 3) For debugging performance there is a variable in the 
> >>> geocode_address function called var_debug.  Its false by default, 
> >>> change
> > it to true.
> >>> That spits out the sql being run and is a better sql to 
> pass to the 
> >>> planner to check.
> >>>
> >>> We were hoping to make these debugging features more publically 
> >>> exposed e.g via a config table, but haven't had the time 
> to do that.
> >>>
> >>> Hope this all helps,
> >>> Regina
> >>> http://www.postgis.us
> >>>
> >>> _______________________________________________
> >>> postgis-users mailing list
> >>> postgis-users at postgis.refractions.net
> >>> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >>
> >>
> >> Thanks,
> >> Ravi Ada
> >> 918-630-7381
> >>
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 





More information about the postgis-users mailing list