[SoC] GSoC 2020 Final Report (CQL implementation on pygeoapi)

Farheen Bano farheenbano94 at gmail.com
Mon Aug 31 10:44:51 PDT 2020


This is my final report for the period for GSoC 2020.  The updated report
can also be found at the project wiki page
The forked repo of the project can be found here

implementation on pygeoapi

[image: pygeoapi-logo]
Summer of Code 2020

   - Author - Farheen Bano <https://wiki.osgeo.org/wiki/User:FarheenBano>
   - Mentor 1 - Francesco Bartoli
   - Mentor 2 - Jorge Samuel Mendes de Jesus
   - Organization - OSGeo, <https://www.osgeo.org/> pygeoapi


pygeoapi is a Python server implementation of the OGC API suite of
standards. OGC API standards define modular API building blocks to
spatially enable Web API in a consistent way. This standard specifies the
fundamental API building blocks for interacting with features. pygeoapi
provides the capability for organizations to deploy a RESTful OGC API
endpoint using OpenAPI, GeoJSON, and HTML. Project/code is structured to
provide functionality via plugins where data can be fetched from any
backend services like remote services or local files.

Querying is one of the fundamental operations performed on a collection of
features. It is in order to obtain a subset of the data which contains
feature instances that satisfy some filtering criteria. This project
implements these enhanced filtering criteria in a request to a server. CQL
is used to specify how resource instances in a source collection should be
filtered to identify a result set. Typically, CQL is used here in query
operations because it can be written in human readable format. So its the
best query language that can be used to identify the subset of resources
that should be included in a response document. Each resource instance in
the source collection is evaluated using a filtering expression. The
overall filter expression always evaluates to true or false. If the
expression evaluates to true, the resource instance satisfies the
expression and is marked as being in the result set. If the overall filter
expression evaluates to false, the data instance is not in the result set.

This project is based on OGC API - Features - Part 3: Common Query Language
document <http://docs.opengeospatial.org/DRAFTS/19-079.html> that defines
the schema for a JSON document that exposes the set of properties or keys
that may be used to construct CQL expressions for pygeoapi.

pygeoapi has various data provider plugins like OGR, GeoJSON, CSV,
PostGreSQL, Sqlite, Elasticsearch etc. Feature filtering was not yet been
implemented at API level in pygeoapi. If this functionality is implemented
that this wwould gives pygeoapi an advantage in the GIS community because
this helps in delivering appropriate results based on the set of conditions
specified by the client. Thus increases the client’s usage capabilities.

On developing CQL feature filter implementation with JSON encoding, any
combination of bbox, datetime and parameters for filtering on feature
properties will be allowed on pygeoapi. The requirements on these
parameters imply that only features matching all the predicates are in the
result set. i.e., the logical operator between the predicates is 'AND'. The
API definition may be used to determine details, e.g., on filter
parameters. This depends on the needs of the client. These are clients that
are in general able to use multiple APIs as long as it implements OGC API
before GSoC 2020

   - The pygeoapi package integrates Flask as a web framework for defining
   the API routes/endpoints and WSGI support. Web Server Gateway Interface
   (WSGI) is a standard for forwarding requests to web applications written in
   Python language. pygeoapi supports structured metadata about a deployed
   instance, and is also capable of presenting feature data as structured data.
   - The REST structure and payload are defined using yaml file structures
   - The API is accessible at /openapi endpoint.
   - The API page has REST description but also integrated clients that can
   be used to send requests to the REST endpoints and see the response
   - Each dataset is represented as a REST endpoint under collections.
   - The service collection metadata contains a description of the
   collections provided by the server.
   - /features/items composing the data are aggregated on the /items
   endpoint, in this REST endpoint it is possible to obtain all dataset, or
   restrict /features/items to a numerical limit, bounding box, time stamp,
   paging (start index).
   - For each feature in the dataset we have a specific identifier (noticed
   that the identifier is not part of the JSON properties).
   - pygeoapi has implemented various data provider plugins.
   - pygeoapi serves custom output formats.

in pygeoapi during GSoC 2020

CQL filter capabilities are added to pygeoapi as an extension to their
software. This implementation allows user to request for features with an
underlying layer of multiple simple or complex filter expressions. Thus
providing an enhanced flexibility and better user control over response
results. The added implementations are as follows:


   OpenAPI Documentation: Implementation of CQL extension on pygeoapi by
   following OGC Standards and generated an OpenAPI Document with CQL
   specifications. Whether a data provider supports CQL filter extension or
   not is decided from the configuration file. The related CQL schema,
   components and filter parameters are added in the document.

   Abstract Syntax Tree for CQL filter expression: Validation of CQL filter
   expressions and generation of Abstract Syntax Tree from the filter
   expression. Usage of lexer and parser from pycql library.

   CQL for CSV and GeoJSON data providers: Evaluation of the Abstract
   Syntax Tree to filter the feature collections supported by CSV and GeoJSON
   data providers. pycql library has implementation connection to databases
   using ORM, but in pygeoapi the data providers don't work with ORM. So the
   evaluation for all the CQL query operations are developed from scratch and
   by using efficient methodlogy. The evaluated output is the response from
   the API.

   CQL for SQLite data provider: Evaluation of the Abstract Syntax Tree to
   filter the feature collections supported by SQLite data provider. The AST
   of the CQL filter request is translated into SQL queries and then used as a
   request to the database. The evaluated output from the SQLite database is
   the response from the API.

   *CQL for PostGreSQL data provider:* Evaluation of the Abstract Syntax
   Tree to filter the feature collections supported by PostGreSQL data
   provider. Like SQLite quesries, the AST of the CQL filter request is
   translated into PostGreSQL queries by following the syntax of psycopg2
   database adapter. The query is then used as a request to the database. The
   evaluated output from the PostGreSQL database is the response from the API.

   CQL predicates: Implementation of the following CQL predicates in
   pygeoapi to support filtering functionality on features. *Simple
   Condition Predicate, Combination Predicate, Not Condition Predicate,
   Between Predicate, Like Predicate, In Predicate, Null Predicate, BBox
   Predicate, Spatial Predicate and Temporal Predicate*


There are endless possibilities that can be done with CQL query filters.
The CQL expressions can further be written in JSON language to specify the
filters. CQL is implemented for CSV , GeoJSON , SQLite and PostGreSQL in
this scope of GSoC 2020. But it can be expanded to other data providers
that pygeoapi supports such as OGR, ElasticSearch and MongoDB.

Deciding on what Object Oriented Programming strategy we will be using for
CQL implementation was a major challenge and to conclude on a proper CQL
filtering approach was difficult in the beginning. As this was a major
change in the architecture of pygeoapi, thorough analysis of various
approaches was needed. After rigorous discussion with the mentors we
decided on implementing CQL at the provider level and not at collection
level. As pygeoapi supports multiple data providers and its not necessary
that every provider would support CQL filter functionality. Also every
provider is needed its separate approach on filtering feature list. The
next challenge was faced when there was a requirement to write a generic
class as CQLHandler that will act as a plugin for all the data providers.
But I am glad that am able to create a proper architecture for CQL
implementation on CSV, GeoJSON, SQLite and PostGreSQL.

A tutorial on how to use the added functionalities and detailed stepwise
generation and evaluation of CQL endpoint can be seen in the following
link. Steps

The operations implemented for CQL extension on pygeoapi can be seen here.
to Repository: pygeoapi openapi-cql-extension branch

*(Pull request waiting to be merged)*
to Project's OSGeo Wiki-page: Wiki-Page Link
documentation of CQL extension: RTD
links related to the project Git Wiki

*Here is a link to the Final Report of GSoC 2020.* Final Report

Thanks and Regards,
Farheen Bano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/soc/attachments/20200831/405331a8/attachment-0001.html>

More information about the SoC mailing list