[gdal-dev] GSoC proposal looking for mentors and suggestions

Even Rouault even.rouault at mines-paris.org
Thu Mar 20 13:13:41 PDT 2014


Le jeudi 20 mars 2014 05:02:33, Zhang, Shuai a écrit :
> Hi All,
> 
> I think i need a mentor working with me and help me make gdal under mongodb
> support. Below is the proposal i wrote, hopefully you find it worth a
> trial.

This is something I may potentially mentor, but there are already 2 students 
interested on other subjects. I'm not sure how many will get eventually 
selected by the GSOC program, but I won't be able to mentor 3 people for sure 
!

> 
> Thanks,
> shuai
> 
> 
> Title: OGR Driver for MongoDB
> 
> Short description:
> MongoDB, a document database that provides high performance, high
> availability, and easy scalability, can be a good platform for storing
> extremely large spatial datasets, to support high performance
> geo-computation and real-time spatial analysis in a large scale.This
> project aims at developing a OGR Driver for MongoDB to help applications
> or softwares based on GDAL, such QGIS, Geoserver, Mapserver, and so on,
> read & write the spatial data in it, and thus enable the Open Source GIS
> Ecosystem powered by the advanced NoSQL database.
> 
> Describe your idea
> 1. Introduction
> MongoDB,  a document database that provides high performance, high
> availability, and easy scalability, can be a good platform for storing
> extremely large spatial datasets, to support high performance
> geo-computation and real-time spatial analysis in a large scale. Yet,
> there is little attention so far that GIS fields pay to make most of its
> strength. This project aims at developing a OGR Driver for MongoDB to help
> applications or softwares based on GDAL read & write the spatial data in
> it, and thus enable the Open Source GIS Ecosystem powered by the advanced
> NoSQL database.
> 
>  2. Background
> Since we are living in the era of big data, tools and equipment today for
> capturing spatial data both at the mega-scale and the milli-scale are just
> dreadful. The magnitude of this data volume is well beyond the capability
> of any mainstream geographic information systems. Yet, we, GIS fields,
> have no off-the-shelf solutions to manage these massive spatial data.
> Relational spatial databases have taken in charge for decades but now the
> situation seems a little different.
> 
> A computing pattern shift can be seen throughout the IT industry in recent
> years and GIS would be no exception. Especially, data analytics may not be
> achievable within a reasonable amount of time without resorting to
> high-performance computing strategies. However, relational spatial
> databases are kind of slow to support these high-performance computing
> scenarios, and often lack of flexible scalability to handle a growing
> amount of work in a capable manner.
> 
> Fortunately, there are several groups trying to address the problem, and
> MongoDB is an apparent leader in this direction. MongoDB, which has native
> support for maintaining geospatial data, using a document-oriented model,
> lies in fifth place in the DB-Engines Ranking of database management
> systems classed according to popularity and the highest rated
> non-relational system. From version 2.4 (released on March 19, 2013),
> MongoDB introduces support for a subset of GeoJSON geometries including
> basic shapes like points, linestrings, polygons.

Good to know. Last time I looked, MongoDB had only support for point 
geometries.

> And quite a number of
> partners related with big data, NoSQL, cloud, mobile and high performance
> computing join the MongoDB ecosystem. Foursquare is featured one of them
> which benefits from MongoDB’s support for geospatial indexing, allowing it
> to easily query for large location-based data.
> 
> 3. The idea
> MongoDB employs GeoJSON to store spatial data and concurrently GDAL
> supports for access to features encoded in GeoJSON format, which can be
> reusable. 

As far as I remember, the interface with MongoDB is (was?) a kind of binary 
JSON format. Has this changed ?

> This project is trying to implement a MongoDB Driver according
> to the OGR format driver interfaces with subclasses of OGRSFDriver,
> OGRDataSource and OGRLayer, and registered with the OGRSFDriverRegistrar
> at runtime, so that GDAL may use MongoDB as a datasource to access large
> scale spatial data.
> 
> 4. Project plan (detailed timeline: how do you plan to spend your summer?)
> The first thing in the list is to design the structure inside of MongoDB
> spatial database. In the context of OGR data model, we got Datasource,
> Layer and Feature, so accordingly every database in MongoDB is regarded as
> a Datasource, and the Collections within should be treated as Layers, thus
> every Document as a Feature. 

Yes, sounds a bit similar to what was done with CouchDB

> PostGIS and other spatial databases often
> harness some system tables to maintain the metadata, but since MongoDB is
> schema free metadata such as spatial reference can be stored within the
> particular Layer, in this case a Collection.
> 
> The most important part of a data format driver is to define how to read
> and write the data format in the specific driver, especially the Open and
> Create method in the Datasource Class. As MongoDB organizes its spatial
> data in GeoJSON model, the GeoJSON driver already supported by current
> GDAL can be reused to code or decode the GeoJSON fetched from MongoDB
> database. Therefore, there would be totally four files to implement,
> including ogr_mongo.h, ogrmongodriver.cpp, ogrmongodatasource.cpp, and
> ogrmongolayer.cpp.

The write part should be no problem : a no SQL database can receive documents 
with a fixed structure.
The read part will need to explore all the documents/features to retrieve 
their structure and build a OGR FeatureDefinition. This is done in the CouchDB 
driver.

> 
> Test Plan
> [1] After the MongoDB Driver is compiled into the OGR framework, the
> utility ogr2ogr can be used as the test program to import and output
> spatial data between shapefile and MongoDB. [2] Conduct a parallel
> transformation process to find how fast MongoDB Driver can be compared to
> file system and PostGIS.
> 
> Time Line
> 
> May 19- June 8 (Coding - Phase 1 - 3 weeks)
> Preparing the developing environment and bringing GDAL, MongoDB C++ driver
> and C++ together, Implementing OGRMongoDriver, OGRMongoDataSource,
> OGRMongoLayer according to the interfaces defined by OGRSFDriver,
> OGRDataSource and OGRLayer. June 9 - June 23 (Coding - Phase 2 - 2 weeks)
> Build MongoDB into the OGR framework, and may first support to exchange a
> small size of spatial data with MongoDB, Simultaneously bug fixing. July
> 24 - July 13 (Coding - Phase 3 - 3 weeks)
> Passing the query string (a JSON style document) for both spatial and
> attribute data into MongoDB to select features as requested. Compile all
> the codes and conduct several tests, fix bugs and make it faster. July 14
> - July 27 (Testing - Phase 1 - 2 weeks)
> Transfer large scale spatial data with MongoDB using ogr2ogr to see the
> driver efficiency. Improve its efficiency and fix bugs. July 28 - August
> 10 (Testing - Phase 2 - 2 weeks)
> Conduct a parallel transformation experiment to find how fast MongoDB
> Driver can be compared to file system and PostGIS, and fix bugs. August 11
> - August 18 (pencils down)
> Write code documentation, includes doxygen comments and techbase/userbase
> articles.

You could mention adding support for spatial filtering.

> 
> 5. Future ideas / How can your idea be expanded?
> MongoDB is also an ideal platform for storing massive geo-raster data, so
> next job would be writing a MongoDB Driver for raster dataset.

Hum, I'm not sure if MongoDB is aimed at this... You would probably have to 
tile the raster to avoid sending/retrieving huge blobs at once

> 
> Explain how your SoC task would benefit the OSGeo member project and more
> generally the OSGeo Foundation as a whole: MongoDB can be a distributed
> and parallel NoSQL spatial database with high performance, high
> availability, and easy scalability, thus extremely suitable for large
> scale data-intensive computing. By implementing the MongoDB Driver in the
> OGR framework, the whole OSGeo ecosystem based on GDAL/OGR will be benefit
> from it and powered by MongoDB.
> 
> Please provide details of general computing experience: (operating systems
> you use on a day-to-day basis, languages you could write a program in,
> hardware, networking experience, etc.) During my college time, I mainly
> used .NET languages such as C#,VB.net, to build GIS software running on
> the Windows platform, while after that and my PhD program beginning most
> of my work were done in standard C++ on Linux environment.
> 
> Please provide details of previous GIS experience:
> I’m a GIS student ever since I attend college. Right now I'm a Ph.D
> candidate in Cartography and Geographic Information System, School of
> Geographic and Oceanographic Sciences, Nanjing University, China, and a
> visiting scholar at Geography & GIScience and NCSA (The National Center
> for Supercomputing Applications), UIUC, IL, USA.
> 
> Please provide details of any previous involvement with GIS programming and
> other software programming: [1] Climate Information Management System of
> Shanxi Province: Outstanding Award in ESRI Chinese College Student
> Software Development Contest, 2009. [2] Forest Fire Simulation Model based
> on Geographic Cellular Automata: Third Prize in ESRI Chinese College
> Student Software Development Contest, 2009. [3] High Performance
> Geospatial Computing System: HiGIS, (2011-2013)Supported by the National
> High Technology Research and Development Program of China (863 project),
> in construction. [4] NoSQL Expression of Massive Geospatial Information in
> the era of Big Data, (2013-2015) Supported by the Scientific Research
> Foundation of Graduate School of Nanjing University, in construction
> 
> Please tell us why you are interested in GIS and open source software:
> They are powerful and beautiful treasures of humankind, and I want to be
> part of it.
> 
> Please tell us why you are interested in working for OSGeo and the software
> project you have selected: It’s part of my research, since I was trying to
> harness MongoDB to support high performance geo-computing.
> 
> Please tell us why you are interested in your specific coding project:
> I spent lots of time in the past three years learning how GDAL works and
> how to employ them into high performance computing applications. So I
> believe a new GDAL with MongoDB support will do much good to my current
> research.
> 
> Would your application contribute to your ongoing studies/ degree? If so,
> how? Yes. MongoDB cluster is a good way to handle large quantities of
> spatial data, and if OGR provides MongoDB Driver, lots of tools we
> developed based on GDAL can be reusable, and powered by MongoDB, thus much
> faster.
> 
> Please explain how you intend to continue being an active member of your
> project and/or OSGeo AFTER the summer is over: I’ll try my best to keep
> following this thread to make MongoDB Driver stable and efficient.
> 
> Do you understand this is a serious commitment, equivalent to a full-time
> paid summer internship or summer job? Yes, I understand. I’ll give my
> best.
> 
> Do you have any known time conflicts during the official coding period?
> (May 19 to August 19) No, I don't.

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html


More information about the gdal-dev mailing list