[Gdal-dev] OGR JOIN Support

Frank Warmerdam warmerdam at pobox.com
Mon Feb 17 22:29:50 EST 2003


Folks,

As part of a contract (via DM Solutions), it is desired for MapServer to
support attribute joins for OGR datasources.  I am intending to implement
access to this as a narrow extension to the existing support SQL syntax
supportted by OGR's ExecuteSQL() call.  In order to get decent speed for
attribute joins it is planned to optionally use the xBase 2.0 library
in place of the "simple" dbf support currently used in the shapelib driver.
The xBase based support should make it possible to quickly find the record
matching matching a particular key value when indexes are pre-built for the
columns in question.

I have attached my rough notes on how this would be incorporated.  I am
throwing it out here in case anyone has suggestions or would be interested
in utilizing the mechanism.

The work should be implemented over the next two weeks.

Best regards,

-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent




See http://www.w3schools.com/sql/sql_join.asp for SQL join syntax.


Requirements:

   o Arbitrary OGR->.dbf Joins.
   o Any field type for key.
   o Use xBase library for fast (index based) fetches from joined table.
   o Joined table may be in a different location than the file being joined to.


-----

SQL Syntax (for ExecuteSQL() method on OGRDataSource):

   SELECT <field_list> FROM <primary_table>
      LEFT JOIN ['DataSourceName'.]<join_table> [<join_table_alias>]
      ON <primary_table>.<key_field> = <join_table>.<key_field>
      WHERE <where_query>

<field_list>
   - This may be '*' indicating all fields from primary table, and secondary
     table, including duplication of the keyfield.  Any field name duplication
     will result in the joined tables name being prefix with the join name.
   - Fields from the primary table should not be qualified by a table they
     are from.
   - Fields from the secondary table *should* be qualified by the table they
     are from (ie. 'jointb.id').

Example:

SELECT * FROM cities
   LEFT JOIN 'province.dbf'.province ON cities.prov_id = province.id
   WHERE pop > 100000

SELECT name, pop, prov.name FROM cities
   LEFT JOIN '/u/data/province.dbf'.province prov ON cities.prov_id = prov.id
   WHERE pop > 100000 ORDER BY pop;


Some Details:
  o If no <join_table_alias> is provided, then the name of the join table will
    be used.
  o The ON clause must specificially be of the form primary key = join key.
  o The WHERE clause can only have conditions on fields from the primary record.
    It is evaluated before any joining is done.
  o More than one "LEFT JOIN / ON" clause may be supplied to append from
    multiple sources.
  o This mechanism can only be used against data sources that do not provide
    their own SQL interpretation, so it cannot be used with Oracle or PostGIS.
  o The 'DataSourceName' is a quoted filename for the OGRDataSource holding the
    secondary table.  It may be any OGR supported data source (including
    Oracle/PostGIS), but for the time being it is likely to be terribly
    inefficient for anything but .dbf, Oracle or PostGIS since a query will be
    issues on the joined table for each primary record selected.
  o There is no way to utilize geometry from the joined table.
  o Unlike a real left join only one record will be produced if the primary
    record has more than one join record associated with it.  In this case the
    first joined record matching the criteria will be used and the rest ignored.
  o Any primary records for which no related record in the join table will
    just have the joined fields set to NULL ... the primary record will still
    be returned.
  o The joined data source (when different from the main one) will only be
    kept open as long as the ExecuteSQL() results layer exists.  It won't be
    shared with anything else that is active.

-----

Overall Implementation:

  1) swq.c/swq.h: Parsing of the JOIN syntax.

  2) gdal/ogr/ogrsf_frmts/generic/ogr_gensql.{cpp,h}: Implement the joining.

  3) gdal/ogr/ogrsf_frmts/shape:
      - Incorporate alternate DBF support based on xBase 2.0
      - Implement fast evaluation of *simple* attribute queries of the form
        <field> = <value>, possible also <field> > <value> and <field> < <value>
      - Implement a pseudo-SQL statement to build indexes in xBase fields.

  4) MapServer: modify mapogr.cpp to allow an SQL statement in place of a layer
                number/name in the connection string.  The SQL will be executed
                with OGRDataSource::ExecuteSQL() to create a temporary layer.
      eg. CONNECTION 'spmap.dgn,SELECT * FROM elements'

-----

Future directions:

  o Implement a generic OGR mechanism for holding attribute field indexes that
    could be fairly easily applied to other formats to give similar performance
    characteristics to the xBase library.

  o Reqwrite SQL parser to be much more generic and support more of the multi
    table support from SQL, removing many limitations listed above.






More information about the Gdal-dev mailing list