[gdal-dev] GSoC proposal looking for mentors and suggestions

Zhang, Shuai shuai at illinois.edu
Wed Mar 19 21:02:33 PDT 2014


Hi All,

I think i need a mentor working with me and help me make gdal under mongodb support.
Below is the proposal i wrote, hopefully you find it worth a trial.

Thanks,
shuai


Title: OGR Driver for MongoDB

Short description:
MongoDB, a document database that provides high performance, high availability, and easy scalability, can be a good platform for storing extremely large spatial datasets, to support high performance geo-computation and real-time spatial analysis in a large scale.This project aims at developing a OGR Driver for MongoDB to help applications or softwares based on GDAL, such QGIS, Geoserver, Mapserver, and so on, read & write the spatial data in it, and thus enable the Open Source GIS Ecosystem powered by the advanced NoSQL database.

Describe your idea
1. Introduction
MongoDB,  a document database that provides high performance, high availability, and easy scalability, can be a good platform for storing extremely large spatial datasets, to support high performance geo-computation and real-time spatial analysis in a large scale. Yet, there is little attention so far that GIS fields pay to make most of its strength. This project aims at developing a OGR Driver for MongoDB to help applications or softwares based on GDAL read & write the spatial data in it, and thus enable the Open Source GIS Ecosystem powered by the advanced NoSQL database.

 2. Background
Since we are living in the era of big data, tools and equipment today for capturing spatial data both at the mega-scale and the milli-scale are just dreadful. The magnitude of this data volume is well beyond the capability of any mainstream geographic information systems. Yet, we, GIS fields, have no off-the-shelf solutions to manage these massive spatial data. Relational spatial databases have taken in charge for decades but now the situation seems a little different.

A computing pattern shift can be seen throughout the IT industry in recent years and GIS would be no exception. Especially, data analytics may not be achievable within a reasonable amount of time without resorting to high-performance computing strategies. However, relational spatial databases are kind of slow to support these high-performance computing scenarios, and often lack of flexible scalability to handle a growing amount of work in a capable manner.

Fortunately, there are several groups trying to address the problem, and MongoDB is an apparent leader in this direction. MongoDB, which has native support for maintaining geospatial data, using a document-oriented model, lies in fifth place in the DB-Engines Ranking of database management systems classed according to popularity and the highest rated non-relational system. From version 2.4 (released on March 19, 2013), MongoDB introduces support for a subset of GeoJSON geometries including basic shapes like points, linestrings, polygons. And quite a number of partners related with big data, NoSQL, cloud, mobile and high performance computing join the MongoDB ecosystem. Foursquare is featured one of them which benefits from MongoDB’s support for geospatial indexing, allowing it to easily query for large location-based data.

3. The idea
MongoDB employs GeoJSON to store spatial data and concurrently GDAL supports for access to features encoded in GeoJSON format, which can be reusable. This project is trying to implement a MongoDB Driver according to the OGR format driver interfaces with subclasses of OGRSFDriver, OGRDataSource and OGRLayer, and registered with the OGRSFDriverRegistrar at runtime, so that GDAL may use MongoDB as a datasource to access large scale spatial data.

4. Project plan (detailed timeline: how do you plan to spend your summer?)
The first thing in the list is to design the structure inside of MongoDB spatial database. In the context of OGR data model, we got Datasource, Layer and Feature, so accordingly every database in MongoDB is regarded as a Datasource, and the Collections within should be treated as Layers, thus every Document as a Feature. PostGIS and other spatial databases often harness some system tables to maintain the metadata, but since MongoDB is schema free metadata such as spatial reference can be stored within the particular Layer, in this case a Collection.

The most important part of a data format driver is to define how to read and write the data format in the specific driver, especially the Open and Create method in the Datasource Class. As MongoDB organizes its spatial data in GeoJSON model, the GeoJSON driver already supported by current GDAL can be reused to code or decode the GeoJSON fetched from MongoDB database. Therefore, there would be totally four files to implement, including ogr_mongo.h, ogrmongodriver.cpp, ogrmongodatasource.cpp, and ogrmongolayer.cpp.

Test Plan
[1] After the MongoDB Driver is compiled into the OGR framework, the utility ogr2ogr can be used as the test program to import and output spatial data between shapefile and MongoDB.
[2] Conduct a parallel transformation process to find how fast MongoDB Driver can be compared to file system and PostGIS.

Time Line

May 19- June 8 (Coding - Phase 1 - 3 weeks)
Preparing the developing environment and bringing GDAL, MongoDB C++ driver and C++ together, Implementing OGRMongoDriver, OGRMongoDataSource, OGRMongoLayer according to the interfaces defined by OGRSFDriver, OGRDataSource and OGRLayer.
June 9 - June 23 (Coding - Phase 2 - 2 weeks)
Build MongoDB into the OGR framework, and may first support to exchange a small size of spatial data with MongoDB, Simultaneously bug fixing.
July 24 - July 13 (Coding - Phase 3 - 3 weeks)
Passing the query string (a JSON style document) for both spatial and attribute data into MongoDB to select features as requested. Compile all the codes and conduct several tests, fix bugs and make it faster.
July 14 - July 27 (Testing - Phase 1 - 2 weeks)
Transfer large scale spatial data with MongoDB using ogr2ogr to see the driver efficiency. Improve its efficiency and fix bugs.
July 28 - August 10 (Testing - Phase 2 - 2 weeks)
Conduct a parallel transformation experiment to find how fast MongoDB Driver can be compared to file system and PostGIS, and fix bugs.
August 11 - August 18 (pencils down)
Write code documentation, includes doxygen comments and techbase/userbase articles.

5. Future ideas / How can your idea be expanded?
MongoDB is also an ideal platform for storing massive geo-raster data, so next job would be writing a MongoDB Driver for raster dataset.

Explain how your SoC task would benefit the OSGeo member project and more generally the OSGeo Foundation as a whole:
MongoDB can be a distributed and parallel NoSQL spatial database with high performance, high availability, and easy scalability, thus extremely suitable for large scale data-intensive computing. By implementing the MongoDB Driver in the OGR framework, the whole OSGeo ecosystem based on GDAL/OGR will be benefit from it and powered by MongoDB.

Please provide details of general computing experience: (operating systems you use on a day-to-day basis, languages you could write a program in, hardware, networking experience, etc.)
During my college time, I mainly used .NET languages such as C#,VB.net, to build GIS software running on the Windows platform, while after that and my PhD program beginning most of my work were done in standard C++ on Linux environment.

Please provide details of previous GIS experience:
I’m a GIS student ever since I attend college. Right now I'm a Ph.D candidate in Cartography and Geographic Information System, School of Geographic and Oceanographic Sciences, Nanjing University, China, and a visiting scholar at Geography & GIScience and NCSA (The National Center for Supercomputing Applications), UIUC, IL, USA.

Please provide details of any previous involvement with GIS programming and other software programming:
[1] Climate Information Management System of Shanxi Province: Outstanding Award in ESRI Chinese College Student Software Development Contest, 2009.
[2] Forest Fire Simulation Model based on Geographic Cellular Automata: Third Prize in ESRI Chinese College Student Software Development Contest, 2009.
[3] High Performance Geospatial Computing System: HiGIS, (2011-2013)Supported by the National High Technology Research and Development Program of China (863 project), in construction.
[4] NoSQL Expression of Massive Geospatial Information in the era of Big Data, (2013-2015) Supported by the Scientific Research Foundation of Graduate School of Nanjing University, in construction

Please tell us why you are interested in GIS and open source software:
They are powerful and beautiful treasures of humankind, and I want to be part of it.

Please tell us why you are interested in working for OSGeo and the software project you have selected:
It’s part of my research, since I was trying to harness MongoDB to support high performance geo-computing.

Please tell us why you are interested in your specific coding project:
I spent lots of time in the past three years learning how GDAL works and how to employ them into high performance computing applications. So I believe a new GDAL with MongoDB support will do much good to my current research.

Would your application contribute to your ongoing studies/ degree? If so, how?
Yes. MongoDB cluster is a good way to handle large quantities of spatial data, and if OGR provides MongoDB Driver, lots of tools we developed based on GDAL can be reusable, and powered by MongoDB, thus much faster.

Please explain how you intend to continue being an active member of your project and/or OSGeo AFTER the summer is over:
I’ll try my best to keep following this thread to make MongoDB Driver stable and efficient.

Do you understand this is a serious commitment, equivalent to a full-time paid summer internship or summer job?
Yes, I understand. I’ll give my best.

Do you have any known time conflicts during the official coding period? (May 19 to August 19)
No, I don't.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140320/50f06d36/attachment-0001.html>


More information about the gdal-dev mailing list