[gdal-dev] GSoC proposal looking for mentors and suggestions
Stephen Woodbridge
woodbri at swoodbridge.com
Thu Mar 20 13:50:21 PDT 2014
I believe that OSGeo is expecting every approved student to have a
mentor and a co-mentor this year.
I have been a mentor for the last 5+ years for pgRouting. We have had 2
students most of those years and two mentors with each mentor being the
co-mentor for the other project. This has worked well for us. This
allows one of us to be gone but to still provide coverage for both projects.
The thing that I have found over the years is that it is important to
help your students set realistic and conservative goals especially if
they have not done previous development on the project. If you don't
know the issues then everything seems trivial, and students are
wonderfully optimistic but they rapidly get behind and overwhelmed. We
combat this by having them set minimum goals and stretch goals. The
minimum are required to get a passing grade. Think of it as "have to
have" vs "nice to have".
Hope this helps. Feel free to contact me off list if you want to discuss
mentoring more.
-Steve
On 3/20/2014 4:13 PM, Even Rouault wrote:
> Le jeudi 20 mars 2014 05:02:33, Zhang, Shuai a écrit :
>> Hi All,
>>
>> I think i need a mentor working with me and help me make gdal under mongodb
>> support. Below is the proposal i wrote, hopefully you find it worth a
>> trial.
>
> This is something I may potentially mentor, but there are already 2 students
> interested on other subjects. I'm not sure how many will get eventually
> selected by the GSOC program, but I won't be able to mentor 3 people for sure
> !
>
>>
>> Thanks,
>> shuai
>>
>>
>> Title: OGR Driver for MongoDB
>>
>> Short description:
>> MongoDB, a document database that provides high performance, high
>> availability, and easy scalability, can be a good platform for storing
>> extremely large spatial datasets, to support high performance
>> geo-computation and real-time spatial analysis in a large scale.This
>> project aims at developing a OGR Driver for MongoDB to help applications
>> or softwares based on GDAL, such QGIS, Geoserver, Mapserver, and so on,
>> read & write the spatial data in it, and thus enable the Open Source GIS
>> Ecosystem powered by the advanced NoSQL database.
>>
>> Describe your idea
>> 1. Introduction
>> MongoDB, a document database that provides high performance, high
>> availability, and easy scalability, can be a good platform for storing
>> extremely large spatial datasets, to support high performance
>> geo-computation and real-time spatial analysis in a large scale. Yet,
>> there is little attention so far that GIS fields pay to make most of its
>> strength. This project aims at developing a OGR Driver for MongoDB to help
>> applications or softwares based on GDAL read & write the spatial data in
>> it, and thus enable the Open Source GIS Ecosystem powered by the advanced
>> NoSQL database.
>>
>> 2. Background
>> Since we are living in the era of big data, tools and equipment today for
>> capturing spatial data both at the mega-scale and the milli-scale are just
>> dreadful. The magnitude of this data volume is well beyond the capability
>> of any mainstream geographic information systems. Yet, we, GIS fields,
>> have no off-the-shelf solutions to manage these massive spatial data.
>> Relational spatial databases have taken in charge for decades but now the
>> situation seems a little different.
>>
>> A computing pattern shift can be seen throughout the IT industry in recent
>> years and GIS would be no exception. Especially, data analytics may not be
>> achievable within a reasonable amount of time without resorting to
>> high-performance computing strategies. However, relational spatial
>> databases are kind of slow to support these high-performance computing
>> scenarios, and often lack of flexible scalability to handle a growing
>> amount of work in a capable manner.
>>
>> Fortunately, there are several groups trying to address the problem, and
>> MongoDB is an apparent leader in this direction. MongoDB, which has native
>> support for maintaining geospatial data, using a document-oriented model,
>> lies in fifth place in the DB-Engines Ranking of database management
>> systems classed according to popularity and the highest rated
>> non-relational system. From version 2.4 (released on March 19, 2013),
>> MongoDB introduces support for a subset of GeoJSON geometries including
>> basic shapes like points, linestrings, polygons.
>
> Good to know. Last time I looked, MongoDB had only support for point
> geometries.
>
>> And quite a number of
>> partners related with big data, NoSQL, cloud, mobile and high performance
>> computing join the MongoDB ecosystem. Foursquare is featured one of them
>> which benefits from MongoDB’s support for geospatial indexing, allowing it
>> to easily query for large location-based data.
>>
>> 3. The idea
>> MongoDB employs GeoJSON to store spatial data and concurrently GDAL
>> supports for access to features encoded in GeoJSON format, which can be
>> reusable.
>
> As far as I remember, the interface with MongoDB is (was?) a kind of binary
> JSON format. Has this changed ?
>
>> This project is trying to implement a MongoDB Driver according
>> to the OGR format driver interfaces with subclasses of OGRSFDriver,
>> OGRDataSource and OGRLayer, and registered with the OGRSFDriverRegistrar
>> at runtime, so that GDAL may use MongoDB as a datasource to access large
>> scale spatial data.
>>
>> 4. Project plan (detailed timeline: how do you plan to spend your summer?)
>> The first thing in the list is to design the structure inside of MongoDB
>> spatial database. In the context of OGR data model, we got Datasource,
>> Layer and Feature, so accordingly every database in MongoDB is regarded as
>> a Datasource, and the Collections within should be treated as Layers, thus
>> every Document as a Feature.
>
> Yes, sounds a bit similar to what was done with CouchDB
>
>> PostGIS and other spatial databases often
>> harness some system tables to maintain the metadata, but since MongoDB is
>> schema free metadata such as spatial reference can be stored within the
>> particular Layer, in this case a Collection.
>>
>> The most important part of a data format driver is to define how to read
>> and write the data format in the specific driver, especially the Open and
>> Create method in the Datasource Class. As MongoDB organizes its spatial
>> data in GeoJSON model, the GeoJSON driver already supported by current
>> GDAL can be reused to code or decode the GeoJSON fetched from MongoDB
>> database. Therefore, there would be totally four files to implement,
>> including ogr_mongo.h, ogrmongodriver.cpp, ogrmongodatasource.cpp, and
>> ogrmongolayer.cpp.
>
> The write part should be no problem : a no SQL database can receive documents
> with a fixed structure.
> The read part will need to explore all the documents/features to retrieve
> their structure and build a OGR FeatureDefinition. This is done in the CouchDB
> driver.
>
>>
>> Test Plan
>> [1] After the MongoDB Driver is compiled into the OGR framework, the
>> utility ogr2ogr can be used as the test program to import and output
>> spatial data between shapefile and MongoDB. [2] Conduct a parallel
>> transformation process to find how fast MongoDB Driver can be compared to
>> file system and PostGIS.
>>
>> Time Line
>>
>> May 19- June 8 (Coding - Phase 1 - 3 weeks)
>> Preparing the developing environment and bringing GDAL, MongoDB C++ driver
>> and C++ together, Implementing OGRMongoDriver, OGRMongoDataSource,
>> OGRMongoLayer according to the interfaces defined by OGRSFDriver,
>> OGRDataSource and OGRLayer. June 9 - June 23 (Coding - Phase 2 - 2 weeks)
>> Build MongoDB into the OGR framework, and may first support to exchange a
>> small size of spatial data with MongoDB, Simultaneously bug fixing. July
>> 24 - July 13 (Coding - Phase 3 - 3 weeks)
>> Passing the query string (a JSON style document) for both spatial and
>> attribute data into MongoDB to select features as requested. Compile all
>> the codes and conduct several tests, fix bugs and make it faster. July 14
>> - July 27 (Testing - Phase 1 - 2 weeks)
>> Transfer large scale spatial data with MongoDB using ogr2ogr to see the
>> driver efficiency. Improve its efficiency and fix bugs. July 28 - August
>> 10 (Testing - Phase 2 - 2 weeks)
>> Conduct a parallel transformation experiment to find how fast MongoDB
>> Driver can be compared to file system and PostGIS, and fix bugs. August 11
>> - August 18 (pencils down)
>> Write code documentation, includes doxygen comments and techbase/userbase
>> articles.
>
> You could mention adding support for spatial filtering.
>
>>
>> 5. Future ideas / How can your idea be expanded?
>> MongoDB is also an ideal platform for storing massive geo-raster data, so
>> next job would be writing a MongoDB Driver for raster dataset.
>
> Hum, I'm not sure if MongoDB is aimed at this... You would probably have to
> tile the raster to avoid sending/retrieving huge blobs at once
>
>>
>> Explain how your SoC task would benefit the OSGeo member project and more
>> generally the OSGeo Foundation as a whole: MongoDB can be a distributed
>> and parallel NoSQL spatial database with high performance, high
>> availability, and easy scalability, thus extremely suitable for large
>> scale data-intensive computing. By implementing the MongoDB Driver in the
>> OGR framework, the whole OSGeo ecosystem based on GDAL/OGR will be benefit
>> from it and powered by MongoDB.
>>
>> Please provide details of general computing experience: (operating systems
>> you use on a day-to-day basis, languages you could write a program in,
>> hardware, networking experience, etc.) During my college time, I mainly
>> used .NET languages such as C#,VB.net, to build GIS software running on
>> the Windows platform, while after that and my PhD program beginning most
>> of my work were done in standard C++ on Linux environment.
>>
>> Please provide details of previous GIS experience:
>> I’m a GIS student ever since I attend college. Right now I'm a Ph.D
>> candidate in Cartography and Geographic Information System, School of
>> Geographic and Oceanographic Sciences, Nanjing University, China, and a
>> visiting scholar at Geography & GIScience and NCSA (The National Center
>> for Supercomputing Applications), UIUC, IL, USA.
>>
>> Please provide details of any previous involvement with GIS programming and
>> other software programming: [1] Climate Information Management System of
>> Shanxi Province: Outstanding Award in ESRI Chinese College Student
>> Software Development Contest, 2009. [2] Forest Fire Simulation Model based
>> on Geographic Cellular Automata: Third Prize in ESRI Chinese College
>> Student Software Development Contest, 2009. [3] High Performance
>> Geospatial Computing System: HiGIS, (2011-2013)Supported by the National
>> High Technology Research and Development Program of China (863 project),
>> in construction. [4] NoSQL Expression of Massive Geospatial Information in
>> the era of Big Data, (2013-2015) Supported by the Scientific Research
>> Foundation of Graduate School of Nanjing University, in construction
>>
>> Please tell us why you are interested in GIS and open source software:
>> They are powerful and beautiful treasures of humankind, and I want to be
>> part of it.
>>
>> Please tell us why you are interested in working for OSGeo and the software
>> project you have selected: It’s part of my research, since I was trying to
>> harness MongoDB to support high performance geo-computing.
>>
>> Please tell us why you are interested in your specific coding project:
>> I spent lots of time in the past three years learning how GDAL works and
>> how to employ them into high performance computing applications. So I
>> believe a new GDAL with MongoDB support will do much good to my current
>> research.
>>
>> Would your application contribute to your ongoing studies/ degree? If so,
>> how? Yes. MongoDB cluster is a good way to handle large quantities of
>> spatial data, and if OGR provides MongoDB Driver, lots of tools we
>> developed based on GDAL can be reusable, and powered by MongoDB, thus much
>> faster.
>>
>> Please explain how you intend to continue being an active member of your
>> project and/or OSGeo AFTER the summer is over: I’ll try my best to keep
>> following this thread to make MongoDB Driver stable and efficient.
>>
>> Do you understand this is a serious commitment, equivalent to a full-time
>> paid summer internship or summer job? Yes, I understand. I’ll give my
>> best.
>>
>> Do you have any known time conflicts during the official coding period?
>> (May 19 to August 19) No, I don't.
>
More information about the gdal-dev
mailing list