[postgis-users] joining "nearest" data

Kirk Wythers kwythers at umn.edu
Wed Feb 14 13:43:57 PST 2007


On Feb 14, 2007, at 3:04 PM, Obe, Regina wrote:

> Yes sorry for the typo.  Oh I kind of see what you are doing now  
> with the observe id after rereading what you had wrote.  So the  
> query may be a little trickier  than I had proposed because I  
> realize now that solars are observation points.
>
> Do you always take these observations from the same sites -- e.g.  
> for some days would you have solar data for a particular site and  
> then for a different day no solar data for that site - but solar  
> data for a different site or the sites are always consistent?   
> Sorry if that's  a stupid question - I'm not too familiar with how  
> this kind of data is collected.

No - not stupid at all. These data are from NOAA. The sites are  
always the same. For example site A, B, C. But they are historic data  
going back to the turn of the century. So that means that some sites  
may only have a certain period of observations (say 1940 - 1970, then  
nothing), some sites have only weather data, some sites have only  
solar data, and some sites have both.

Consider these three sites: A B C

A has weather and solar from 1900 - 2007 (but they are in different  
tables weather and solar)
B has weather from 1951 - 2007 (in the weather table)
C has solar from 1951 - 2007 (in the solar table... but it is pretty  
close location to B)

The goal would be to join the two tables with site A showing weather  
and solar data, Site B showing the site B weather and the solar from  
site C since it is the closest available solar data.





> If the sites you get solar data change from day to day, then you  
> might have to do a compound key join in the mindist query with date  
> and station_id.

No... one site should resolve as being the closest to another site...  
oh... I see the problem, what if one site has the closest available  
solar for some number of years, say up to 1980, then another site  
appears in the record with solar dat from 1990 - present that is  
actually closer than the first site.

This is getting complicated!

>
> Regarding your question about views.  Yes they are almost always  
> slower than caching the data in a table.  I usually try the view  
> out first though since it provides more flexibility and see how  
> slow it is and then resort to a static table if it is too slow.   
> Sometimes the speed difference is not that noticeable especially if  
> you have a clustered index on the station_id field.  Sometimes I  
> think the view can actually be faster because in this case your  
> station id table is probably much smaller than your weather solar  
> data so the number of geometries you'd be going thru would be  
> smaller and the planner would probably come out with a better  
> plan.  If the geometries are repeated in your solar/weather table,  
> then it may perceive those geometries as being different when they  
> are the same so may actually be doing more processing work when  
> doing distance checks.

I asked because original suggestion (with the corrected typos) is  
still running (several hours now) and I even limited it to one  
station and one year where there was both weather and solar data

WHERE sites.station_id = 212248 AND w.year = 1948;

Would this be easier if I created new solar and weather tables that  
included the geometry colum in addition to the station_id columns.  
Something like:

CREATE TABLE solar_full
(
   obs_id text NOT NULL,
   station_id integer,
   longname character varying,
   lat double precision,
   lon double precision,
   thepoint_meter geometry,
   date date,
   "year" smallint,
   "month" smallint,
   "day" smallint,
   doy smallint,
   rad real,
   dss smallint,
   par real,
CONSTRAINT solar_full_pkey PRIMARY KEY (obs_id)
)
WITHOUT OIDS;




>
> ________________________________
>
> From: postgis-users-bounces at postgis.refractions.net on behalf of  
> Kirk Wythers
> Sent: Wed 2/14/2007 3:12 PM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] joining "nearest" data
>
>
>
> On Feb 14, 2007, at 6:20 AM, Obe, Regina wrote:
>
>
> 	
> 	SELECT w.obs_id,  sites.station_id, sites.longname, sites.lat,  
> sites.lon, sites.thepoint_meter,
> 	w.date, w.year, w.month, w.day, w.doy, w.precip, w.tmin, w.tmax,  
> w.tmean, w.snowfall, w.snowdepth,
> 	s.date, s.year, s.month, s.day, s.doy, s.rad, s.dss, s.par
> 	FROM sites INNER JOIN solar s ON sites.station_id = s.station_id  
> LEFT OUTER JOIN weather w ON w.station_id = sites.station_id
> 	UNION
> 	
> 	SELECT w.obs_id,  sites.station_id, sites.longname, sites.lat,  
> sites.lon, sites.thepoint_meter,
> 	w.date, w.year, w.month, w.day, w.doy, w.precip, w.tmin, w.tmax,  
> w.tmean, w.snowfall, w.snowdepth,
> 	s.date, s.year, s.month, s.day, s.doy, s.rad, s.dss, s.par
> 	FROM sites INNER JOIN
> 	(SELECT s2.station_id, min(distance(s2.thepoint_meter,  
> sol.thepoint_meter))  As mindist
> 	FROM station s2 LEFT JOIN solar nosols ON s2.station_id =  
> nosols.station_id CROSS JOIN vwsolar sol
>
>
> Regina, did you mean FROM sites s2 LEFT JOIN here? There is no  
> "station" table, nor have we aliased one
>
>
>
> 	
> 	
> 	
> 	WHERE nosols.station_id IS NULL
> 	GROUP BY s2.station_id) solmin
> 	    ON sites.station_id = solmin.station_id INNER JOIN vwsolar s  
> ON distance(sites.thepoint_meter, s.thepoint_meter) =  
> solmin.mindistance LEFT JOIN weather w ON w.station_id =  
> sites.station_id)
>
>
>
>
>
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
> <winmail.dat>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list