[postgis-users] Tigerdata for AZ, AS and VI
Ravi ada
raviada at dobeyond.com
Thu Dec 15 09:34:53 PST 2011
Thank you. I am able to speed up a bit by launching the query for each state
and doing 6 states at a time. I am able to see all processors are being used
with a 80-85% memory utilization. However I noticed that geocode function is
taking forever to return for some addresses that are like '100 29TH EAST
ST.' or 'I-35 HIGHWAY'. Some states have the convention of using numbers for
the street names, it takes forever to geocode these addresses. Numbered
streets and Highway service roads are the one taking the most time. I wonder
why? Any ideas what we need to speed these up?
Thanks,
Ravi Ada
-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Stephen
Woodbridge
Sent: Wednesday, December 14, 2011 1:30 PM
To: postgis-users at postgis.refractions.net
Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
On 12/14/2011 10:12 AM, Ravi ada wrote:
> Thanks Steve.
> I did not get when you said '2. normalize the names as you load the
> data', is this the step we need to do manually or load scripts
> automatically do this step?
the load scripts do this automatically.
> I am normalizing my addresses before querying the reference data
> (tiger data). Please clarify.
You should use geocode() not geocode_address(), because geocode() will
normalize the address in the same way the the reference addresses are
normalized. If you split the address into fields and call
geocode_address() this is NOT the same as normalizing the address. If you
want to be able to get good matches, you have to use the same normalize
function for both the reference and the input addresses.
These functions are not about performance they are about using the tool the
correct way.
-Steve
> Thanks
> Ravi Ada
>
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
> Stephen Woodbridge
> Sent: Wednesday, December 14, 2011 8:14 AM
> To: postgis-users at postgis.refractions.net
> Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
>
> Ravi,
>
> The process for geocoding follows this:
>
> Load the data:
> 1. get a reference set of streets (ie: the Tiger data) 2. normalize
> the names as you load the data 3. build the indexes you need for the
> queries
>
> Query for an address:
> 1. normalize the address on input
> 2. query the normalized reference
>
> Ok, so you know most of this because you have already done it, but the
> important part here is the you normalize BOTH the reference data and
> the input to a query. This resolves things like:
>
> main street != main st
>
> because the normalize parses the addresses and converts them into a
> normalized standard form so that you can match. Yes it takes time to
> normalize the request, but if you don't normalize it, then there is a
> good change that you will not match an appropriate street in the reference
set.
>
> -Steve
>
> On 12/14/2011 9:06 AM, Ravi Ada wrote:
>> Regina,
>>
>> Thanks so much for the reply. I ran the
>> missing_indexes_generate_script(),
>> actually it did not return anything, I am assuming all the indexes
>> are in place. That may be because I ran install_missing_indexes()
earlier.
>> I changed the debug flag in geocode_address and it produced a very
>> long query that it runs to geocode the address. I tried to cut and
>> paste the query to run the plan, I am getting errors, I will figure
>> that
> out.
>>
>> My question is, do we use gecode or geocode_address for faster
>> querying? I noticed that geocode_address takes the normalized address
>> where as geocode takes address as string parameter. By adding
>> additional normalize_address function when doing the geocode_address
>> akes
> it run any faster?
>>
>>
>> Thanks
>> Ravi Ada
>>
>>
>>
>>
>> On Wed, 14 Dec 2011 04:18:27 -0500, Paragon Corporation wrote
>>>> I just don't understand why the geocode function takes so long to
>>>> return the coordinates. I am sure some of you on this list might
>>>> have done the batch geocoding millions of addresses. I may be
>>>> missing just a simple configuration which might make a whole lot of
>>>> difference in the speed. I don't know what it is. I am following
>>>> the examples exactly from this link
>>>> (http://postgis.refractions.net/documentation/manual-svn/Geocode.ht
>>>> m
>>>> l)
>>>>
>>>> If someone is familiar with the problem willing to help me using
>>>> GoTo Meeting connection to my machine, I can arrange that too. I
>>>> just have to move along with my project and meet the deadlines. I
>>>> am already delayed, everybody in my team asking me for this everyday.
>>>>
>>>>
>>>> Thanks,
>>>> Ravi Ada
>>>>
>>>>
>>> Ravi,
>>>
>>> Sorry been busy with raster stuff so haven't been tuned into this
>>> discussion.
>>>
>>> 1) The indexes the loader generates are not the only ones needed.
>>> Initially I was constantly changing the loader script, but since we
>>> were changing decisions as we changed code and optimal indexes
>>> needed with aeach change required changing indexes, which indexes
>>> would be best, I created a function that would put them in rather
>>> than bothering with the loader (since a lot of people would already
>>> have their data loaded)
>>>
>>> Have you tried running that. I suspect you are just missing indexes
>>> as the timings you are getting are what I used to get earlier on.
>>>
>>> If you haven't run the update script (which runs this routine anyway)
>>> or run this to get generated script for indexes you are missing
>>> you should.
>>>
>>> http://www.postgis.org/documentation/manual-svn/Missing_Indexes_Gene
>>> r
>>> ate_Scr
>>> ipt.html
>>>
>>> 2) There are a couple of other things to note: First address you do
>>> around an area can take a lot more time because of the data caching
>>> effects in postgresql. So for the example in the docs you describe.
>>>
>>> I can do a geocode of 75 State Street,Boston, MA -- and if I
>>> haven't done any geocoding in a while that takes like 1-3 seconds
>>>
>>> Then if I do 80 State Street, Boston, MA -- that subsequent takes
>>> anywhere from 60 ms - 150 ms.
>>> I also don't have all the states loaded since I only needed it for
>>> about 6 states. thought that should just increase the planner time
>>> rather than later times.
>>>
>>> 3) For debugging performance there is a variable in the
>>> geocode_address function called var_debug. Its false by default,
>>> change
> it to true.
>>> That spits out the sql being run and is a better sql to pass to the
>>> planner to check.
>>>
>>> We were hoping to make these debugging features more publically
>>> exposed e.g via a config table, but haven't had the time to do that.
>>>
>>> Hope this all helps,
>>> Regina
>>> http://www.postgis.us
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>> Thanks,
>> Ravi Ada
>> 918-630-7381
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list