[postgis-users] Union of 7 datasets

Obe, Regina robe.dnd at cityofboston.gov
Tue Aug 28 09:55:09 PDT 2007


Andreas, 

You would use the SQL UNION predicate like shown below (actually slight correction - it is speedier to use UNION ALL especially when you know there will not be dupiclates since it saves the processing of sorting to get a distinct UNION does an implicit distinct)  - so I have corrected below. 

-  - I happened to insert comments in between which may have confused you, but you should be able to run the whole thing as one statement or if you prefer because of speed issues run each insert separately.

So  would be
INSERT INTO sometable(field1, field2,field3,field4, newgeom)
 SELECT udo.field1, udo.field2, kai.field3, kai.field4, geomunion(udo.the_geom, kai.the_geom) AS newgeom
  FROM udo INNER JOIN kai ON (udo.the_geom && kai.the_geom AND intersects(udo.the_geom, kai.the_geom))
  UNION ALL
 -- the second select gives you udos that have no kais - your 15
  SELECT udo.field1, udo.field2, NULL As field3, NULL As field4, udo.the_geom AS newgeom
  FROM udo LEFT JOIN kai ON (udo.the_geom && kai.the_geom AND intersects(udo.the_geom, kai.the_geom))
  WHERE kai.the_geom IS NULL
 UNION ALL
 -- and the 3rd gives you kais that have no udos. - your 40
  SELECT NULL As field1, null As field2, kai.field3, kai.field4, kai.the_geom AS newgeom
  FROM kai LEFT JOIN udo ON (udo.the_geom && kai.the_geom AND intersects(udo.the_geom, kai.the_geom))
  WHERE udo.the_geom IS NULL;

 

----If FULL JOIN were to work (which in theory it should, but doesn't seem to with Postgis functions  (HINT HINT: would be nice if that worked and can be easily fixed (but sadly I think the issue is deeper than Postgis and Geos) - you could write the above much simpler as)

INSERT INTO sometable(field1, field2,field3,field4, newgeom)
 SELECT udo.field1, udo.field2, kai.field3, kai.field4, COALESCE(geomunion(udo.the_geom, kai.the_geom), udo.the_geom, kai.the_geom) AS newgeom
  FROM udo FULL JOIN kai ON (udo.the_geom && kai.the_geom AND intersects(udo.the_geom, kai.the_geom))


Now if you have a lot of these and the tables are very similar in nature and named in a predictable way then the way I usually handle it is to write a pgsql function that dynamically generates the SQL statement to execute either via a FOR loop and then executes the built SQL or set of SQL statements or you could do a similar thing in some scripted language like perl or php.

I take it SQL and pgsql and all that is fairly new to you so it might be worthwhile (even though its a lot of typing) to do it the long cut and paste way if nothing more than an exercise to get a feel of how this all works and visualize the patterns at play.

I don't get the sense that I comprehend your full problem.   Unfortunately I don't have any experience with ArcGIS/ArcView ways of doing things, so I'm not quite sure if there is an equivalent way in PostGIS/PostgreSQL  world of doing the same kind of thing and what exactly that thing is you are doing in ArcGIS.

Union has 3 meanings in PostGIS/PostgreSQL (actually stuff your favorite spatial/DB here - all non-trivial spatial relational dbs behave more or less the same)  

1) unioning of record sets (standard ANSI SQL UNION/UNION ALL) - which is simply a way of stringing together a bunch of selects into a single result set as shown above

2)  unioning of 2 geometry fields like shown above with geomunion

3) Aggregate  geomunion - aggregate variant of the above geomunion function that groups and unions a whole setof geometries together but requires you are grouping by some field or set of fields).


I must also mention there is collect (non-aggregate and aggregate function) which often times is just as effective as the geomunion and in general much faster processor wise.

Hope that helps,

Regina

-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-bounces at postgis.refractions.net <mailto:postgis-users-bounces at postgis.refractions.net> ] On Behalf Of Andreas Laggner
Sent: Tuesday, August 28, 2007 9:13 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Union of 7 datasets

Obe, Regina schrieb:
>  Sounds like you would have to go with a full join type thing with workaround I described below (last example).  So If I understand you correctly then something like this - 
>
> --the first select gives you those records in both tables your 25 udokai
>  SELECT udo.field1, udo.field2, kai.field3, kai.field4, geomunion(udo.the_geom, kai.the_geom) AS newgeom
>  FROM udo INNER JOIN kai ON (udo.the_geom && kai.the_geom AND intersects(udo.the_geom, kai.the_geom))
>  UNION
> -- the second select gives you udos that have no kais - your 15
>  SELECT udo.field1, udo.field2, NULL As field3, NULL As field4, udo.the_geom AS newgeom
>  FROM udo LEFT JOIN kai ON (udo.the_geom && kai.the_geom AND intersects(udo.the_geom, kai.the_geom))
>  WHERE kai.the_geom IS NULL
>  UNION
> -- and the 3rd gives you kais that have no udos. - your 40
>  SELECT nul As field1, null As field2, kai.field3, kai.field4, kai.the_geom AS newgeom
>  FROM kai LEFT JOIN udo ON (udo.the_geom && kai.the_geom AND intersects(udo.the_geom, kai.the_geom))
>  WHERE udo.the_geom IS NULL
>
> Unioned together you should get
> 25 + 15 + 40 = ? 80
>
> Hope that helps,
> Regina
>
>  
UFF - that seems to be a complex question, a comprehensive operation and
a lot to type too!!! Thanks a million - i would have spent many time to
find out this query by myself! Ok..... how can i perform your "UNION": I
would just insert my selects one after another in one new table - will
that work well?
I want to aggregate around 20 datasets this way!? Do you know a method
to operate with more datasets (for example 7) in a effektiv manner and
not to do this three selects 6 times?
AND: Am i totally wrong with my aims? It seems to me that is an exotic
think to do with PostGis, but our projects/problems require to aggregate
datasets in almost all cases (around 90%) and that is exactly what one
of the basic functions "Union" in ArcView already carried out ten years
ago. So i would exspect there is a function to perform this operation
easier!? .....i just wonder.....


cheers      Andreas

> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-bounces at postgis.refractions.net <mailto:postgis-users-bounces at postgis.refractions.net> ] On Behalf Of Andreas Laggner
> Sent: Tuesday, August 28, 2007 7:28 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Union of 7 datasets
>
> Obe, Regina schrieb:
>  
>> BASIC TRICK:  If you want to get all records with no matching including those that match - put what you would normally put in your WHERE clause in the JOIN clause and use a LEFT JOIN.
>>    
> ok - it seems to me i need some coaching......Yes, I want to get all
> records with no matching including those that match, but i also want to
> dissect the polygons that matches. I will try to explain again just to
> be sure you understood my aims: I have 20 polygons in udo and 50 in kai.
> 10 from kai are intersecting 5 from udo to 25 new polygons i call udokai
> (because they have attributes from udo AND kai). My result should have:
> 15 polygons with attributes only from udo, 40 polygons with attributes
> only from kai AND 25 with attributes from udo and kai! In most cases i
> want to do such a operation because i want to aggregate (spatial
> correct) different datasets!
> Which example from you fits best for this aim? I did not find any
> information on how left join works on the postgis or postgresql reference...
>  
>> 
>>
>> Unfortunately as I have come across before if you need an either or (if in table 1 or table 2 - ideally you would use a FULL JOIN but for some reason Postgres chokes when you use postgis functions in the FULL JOIN clause for the cases I have tried).  In that case you need a workaround using a set of UNIONS.
>> --------------------------------------------------------------------------
>> Simplest case - get all records in g1 one or union of g1 and g2 that intersect
>> NOTE: COALESCE is an ANSI SQL function that will return the first non-null - when you do a geomunion of a geometry and null you get null which is why we need COALESCE
>>
>> SELECT g1.field1, g1.field2, COALESCE(geomunion(g1.the_geom, g2.the_geom), g1.the_geom) AS newgeom
>> FROM g1 LEFT JOIN g2 ON (g1.the_geom && g2.the_geom AND intersects(g1.the_geom, g2.the_geom))
>>
>> -----------------------------------------------------------------------------------------
>> Either of case - get all geometries in g1 or g2 or union if there is a match - workaround for full joins not working right
>>
>> SELECT g1.field1, g1.field2, geomunion(g1.the_geom, g2.the_geom) AS newgeom
>> FROM g1 INNER JOIN g2 ON (g1.the_geom && g2.the_geom AND intersects(g1.the_geom, g2.the_geom))
>> UNION
>> SELECT g1.field1, g1.field2, g1.the_geom AS newgeom
>> FROM g1 LEFT JOIN g2 ON (g1.the_geom && g2.the_geom AND intersects(g1.the_geom, g2.the_geom))
>> WHERE g2.the_geom IS NULL
>> UNION
>> SELECT g1.field1, g1.field2, g2.the_geom AS newgeom
>> FROM g2 LEFT JOIN g1 ON (g1.the_geom && g2.the_geom AND intersects(g1.the_geom, g2.the_geom))
>> WHERE g1.the_geom IS NULL
>>
>>
>>
>> Hope that helps,
>> Regina
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-bounces at postgis.refractions.net <mailto:postgis-users-bounces at postgis.refractions.net> ] On Behalf Of Andreas Laggner
>> Sent: Thursday, August 23, 2007 11:32 AM
>> To: PostGIS Users Discussion
>> Subject: Re: [postgis-users] Union of 7 datasets
>>
>> Obe, Regina schrieb:
>>  
>>    
>>> Oh the g1 g2 .. was just for example - I don't actually call my tables meaningless names like that. 
>>>
>>> You should be doing a join on something or have a where clause unless one of your tables has only one record.  Otherwise you are doing what is called a CROSS JOIN (cartesian product)  which gives you an nxm records where n is the number of records in your first table and m is the number in second table.  This is generally a big NO NO.  In certain rare cases you do want to do something like that, but is usually the exception.
>>>
>>>  
>>>    
>>>      
>> I think the records in my targed table must be added (more or less) and
>> not multiplied! My Aim is a table that contains the areas of all the 7
>> sourcetables and the information which refuges are inside and wich not.
>> Perhaps i must use the intersection!? If i do my query with a gist like
>> this: where t1.the_geom && t2.the_geom; than the operation is very fast
>> (about one minute) but i only have the Polygons covered by BOTH
>> datasets, and i want to have as well those, which are covered by one
>> dataset only!! But my operation without the where clause runs for 4
>> hours now - that shows me there is something wrong  ;-)
>>  
>>    
>>> Its hard for me to tell if you need a cartesian product in this case since I'm not quite sure what for example nature and biosphere represent.  I would guess that is wrong and you should first figure out which sets of say nature records you need to geomunion with biosphere and then join by that field or set of fields.
>>>
>>> It would help a bit if you could provide some sample questions you expect to answer with your statistical analysis.  My guess is you may be better off with more than one table.
>>>
>>>  
>>>    
>>>      
>> Sample question: give me all areas (all polygons) from germany where
>> landuse=arable land and soils=good and precipitation>600 and any (of 7)
>> reserves and so on.......
>> I need the values in my table to calculate the potential yield or other
>> things...
>> And i want to analyse such questions with a statistical software (SAS),
>> so it seems to me i need one table to import in SAS (or to query from
>> SAS directly to the postgresql).
>>
>> Thanks for your help, i will be back in my office in Monday.......Andreas
>>
>>  
>>    
>>> Which structure is best really boils down to what questions you hope to answer because one approach may make one question easy and fast and another question slow and cumbersome.
>>>
>>> Hope that helps,
>>> Regina
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-bounces at postgis.refractions.net <mailto:postgis-users-bounces at postgis.refractions.net> ] On Behalf Of Andreas Laggner
>>> Sent: Thursday, August 23, 2007 10:04 AM
>>> To: PostGIS Users Discussion
>>> Subject: Re: [postgis-users] Union of 7 datasets
>>>
>>> Obe, Regina schrieb:
>>>  
>>>    
>>>      
>>>> Andreas,
>>>>
>>>> It would help to know what your table structure looks like and why do you want to put them all in a single geometry?
>>>>  
>>>>    
>>>>      
>>>>        
>>> My table structures are a little bit different. I want to have them in a
>>> single geometry to intersect them with other data and built a large
>>> table to run statistics over it (production site analysis over germany).
>>>  
>>>    
>>>      
>>>> I'm imaging you are you doing something like
>>>>
>>>> SELECT g1.somefield, geomunion(geomunion(g1.the_geom, g2.the_geom), g3.the_geom)
>>>> FROM g1 INNER JOIN g2 on g1.somefield = g2.somefield INNER JOIN g3 on g2.somefield = g3.somefield
>>>> GROUP BY g1.somefield
>>>>
>>>>  
>>>>    
>>>>      
>>>>        
>>> That´s an interesting method with inner join..why go you call your
>>> tables g1. g2. and so on?
>>> That´s my method i am using right now (geomunion 1 to 3 from 6), seems
>>> to be a pedestrian method :-(
>>>
>>> create table natura2000
>>>     (ffh_name character varying(80), ffh_land character varying(3), ffh
>>> smallint, ffh_id smallint,
>>>      spa_name character varying(80), spa_land character varying(3), spa
>>> smallint, spa_id smallint) with oids;
>>> select
>>> addgeometrycolumn('','natura2000','the_geom','31467','MULTIPOLYGON',2);
>>> alter table natura2000 drop constraint enforce_geotype_the_geom;
>>> insert into natura2000
>>>     select
>>> t1.ffh_name,t1.ffh_land,t1.ffh,t1.ffh_id,t2.spa_name,t2.spa_land,t2.spa,t2.spa_id,
>>>             geomunion(t1.the_geom, t2.the_geom)
>>>             from ffh_rep t1, spa_rep t2;
>>>
>>> create table sg71
>>>     (ffh_name character varying(80), ffh_land character varying(3), ffh
>>> smallint, ffh_id smallint,
>>>      spa_name character varying(80), spa_land character varying(3), spa
>>> smallint, spa_id smallint,
>>>      bio_name character varying(70), bio smallint, bio_id smallint) with
>>> oids;
>>> select addgeometrycolumn('','sg71','the_geom','31467','MULTIPOLYGON',2);
>>> alter table sg71 drop constraint enforce_geotype_the_geom;
>>> insert into sg71
>>>     select t1.ffh_name, t1.ffh_land, t1.ffh, t1.ffh_id, t1.spa_name,
>>> t1.spa_land, t1.spa, t1.spa_id,
>>>             t2.name,t2.bio,t2.bio_id,geomunion(t1.the_geom, t2.the_geom)
>>>             from natura2000 t1, biosphere t2;
>>>
>>> create table sg72
>>>        (ffh_name character varying(80), ffh_land character varying(3),
>>> ffh smallint, ffh_id smallint,
>>>     spa_name character varying(80), spa_land character varying(3), spa
>>> smallint, spa_id smallint,
>>>     bio_name character varying(70), bio smallint, bio_id smallint,
>>>     np_name character varying(60), np smallint, np_id smallint) with oids;
>>> select addgeometrycolumn('','sg72','the_geom','31467','MULTIPOLYGON',2);
>>> alter table sg72 drop constraint enforce_geotype_the_geom;
>>> insert into sg72
>>>     select t1.ffh_name, t1.ffh_land, t1.ffh, t1.ffh_id, t1.spa_name,
>>> t1.spa_land, t1.spa, t1.spa_id,
>>>             t1.bio_name,t1.bio,t1.bio_id,t2.np_name,t2.np,t2.np_id,
>>>             geomunion(t1.the_geom, t2.the_geom)
>>>             from sg71 t1, np t2;
>>> AND SO ON......
>>>  
>>>    
>>>      
>>>> or
>>>>
>>>> SELECT g1.somefield, geomunion(gt.the_geom)
>>>> FROM (SELECT somefield, the_geom FROM g1 UNION SELECT somefield, the_geom FROM g2 ...) gt
>>>> GROUP BY gt.somefield
>>>>
>>>>
>>>> If I have 7 different tables that have pretty much the same structure, but for logistical or other technical reasons (such as each has additional attributes distinct from one another), I need to keep them as separate tables, then I usually use inherited tables for that. That way when I need to join all datasets at once, I can simply query the parent table and it will automatically drill down to the child tables. Not sure if that helps more than it confuses your situation.
>>>>
>>>> Then instead of the above I can simply do
>>>>
>>>> SELEG myparenttable.somefield, geomunion(myparenttable.the_geom)
>>>> FROM myparenttable
>>>> GROUP by gh.somefield
>>>>
>>>>
>>>>  
>>>>    
>>>>      
>>>>        
>>> ok - i have to think about your suggestions......that´s my second week
>>> with postgis.
>>> Can you tell me from my SQL-Statements which method will be best? So i
>>> try to understand that one.....
>>>
>>> Thanks for your reply!!!
>>>  
>>>    
>>>      
>>>> Hope that helps,
>>>> Regina
>>>>
>>>> -----Original Message-----
>>>> From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-bounces at postgis.refractions.net <mailto:postgis-users-bounces at postgis.refractions.net> ] On Behalf Of Andreas Laggner
>>>> Sent: Thursday, August 23, 2007 9:11 AM
>>>> To: PostGis_Mailinglist
>>>> Subject: [postgis-users] Union of 7 datasets
>>>>
>>>> Hi users,
>>>>
>>>> i want to put together 7 datasets to have all the different refuges in
>>>> one table (and in one geometry). Am i doing right with 6 times geomunion
>>>> (that´s much to type with all the attributes) or is there a more
>>>> effective way?
>>>>
>>>> cheers Andreas
>>>>
>>>>  
>>>>    
>>>>      
>>>>        
>>>  
>>>    
>>>      
>>  
>>    
>
>
>  


--
Dipl. Geoökologe Andreas Laggner
Institut für Ländliche Räume (LR)
Bundesforschungsanstalt für Landwirtschaft (FAL)

Institute of Rural Studies
Federal Agricultural Research Centre (FAL)

Bundesallee 50
D-38116 Braunschweig

Tel.: (+49) (0)531 596 5515
Fax: (+49) (0)531 596 5599
E-mail: andreas.laggner at fal.de
Homepage: http://www.lr.fal.de/ <http://www.lr.fal.de/> 

_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users <http://postgis.refractions.net/mailman/listinfo/postgis-users> 





-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070828/d42c0b1e/attachment.html>


More information about the postgis-users mailing list