[GRASS-user] Organizing spatial (time series) data for mixed GIS environments

Sören Gebbert soerengebbert at googlemail.com
Wed Dec 4 09:06:51 PST 2013


Hi Stefan,
there is a FOSS4G presentation online as well:
http://elogeo.nottingham.ac.uk/xmlui/handle/url/288

Best regards
Soeren

2013/12/4 Sören Gebbert <soerengebbert at googlemail.com>:
> Hi Stefan,
>
> 2013/12/3 Blumentrath, Stefan <Stefan.Blumentrath at nina.no>:
>> Dear all,
>>
>>
>>
>> On our Ubuntu server we are about to reorganize our GIS data in order to
>> develop a more efficient and consistent solution for data storage in a mixed
>> GIS environment.
>>
>> By “mixed GIS environment” I mean that we have people working with GRASS,
>> QGIS, PostGIS but also many people using R and maybe the largest fraction
>> using ESRI products, furthermore we have people using ENIV, ERDAS and some
>> other. Only few people (like me) actually work directly on the server…
>>
>> Until now I stored “my” data mainly in GRASS (6/7) native format which I was
>> very happy with. But I  guess our ESRI- and PostGIS-people would not accept
>> that as a standard…
>>
>>
>>
>> However, especially for time series data we cannot have several copies in
>> different formats (tailor-made for each and every software).
>>
>>
>>
>> So I started thinking: what would be the most efficient and convenient
>> solution for storing a large amount of data (e.g. high resolution raster and
>> vector data with national extent plus time series data) in a way that it is
>> accessible for all (at least most) remote users (with different GIS
>> software). As I am very fond of the temporal framework in GRASS 7 it would
>> be a precondition that I can use these tools on the data without
>> unreasonable performance loss. Another precondition would be that users at
>> remote computers in our (MS Windows) network can have access to the data.
>>
>>
>>
>> In general, four options come into my mind:
>>
>> a)      Stick to GRASS native format and have one copy in another format
>>
>> b)      Use the native formats the data come in (e.g. temperature and
>> precipitation comes in zipped ascii-grid format)
>>
>> c)       Use PostGIS as a backend for data storage (raster / vector) (linked
>> by (r./v.external.*)
>>
>> d)      Use another GDAL/OGR format for data storage (raster / vector)
>> (linked by (r./v.external.*)
>>
>>
>>
>> My question(s) are:
>>
>> What solutions could you recommend or what solution did you choose?
>
> I would suggest to use r.external and uncompressed geotiff files for
> raster data. But you have to make sure that external software does not
> modify these files, or if they do, that the temporal framework is
> triggered to update dependent space time raster datasets.
>
> I would suggest to use the native GRASS format, in case of vector
> data. Hence vector data needs to be copied. But maybe PostgreSQL with
> topology support will be a solution? I think Martin Landa may have an
> opinion here.
>
>>
>> Who is having experience with this kind of data management challenge?
>
> No experience here from my side.
>
>> How do externally linked data series perform compared to GRASS native?
>
> It will be slower than the native format for sure. But i don't know
> how much slower.
>
>>
>>
>> I searched a bit the mailing list and found this:
>> (http://osgeo-org.1560.x6.nabble.com/GRASS7-temporal-GIS-database-questions-td5054920.html)
>> where Sören recommended “postgresql as temporal database backend”. However I
>> am not sure if that was meant only for the temporal metadata and not the
>> rasters themselves…
>
> My recommendation was related to the temporal metadata only. The
> sqlite database will not scale very well for select requests if you
> have more than 30,000 maps registered in your temporal database.
> PostgreSQL will be much faster for select requests. But PostgreSQL
> performs very badly in managing (insert, update, delete) many maps. I
> am not sure what the reason for this is, but from my experience has
> PostgreSQL a scaling problem with many tables. Hence if you do not
> modify you data often, PostgreSQL is your temporal database backend of
> choice. Otherwise i would recommend Sqlite, even if its slower for
> select requests.
>
>> Furthermore in the idea collection for the Temporal framework
>> (http://grasswiki.osgeo.org/wiki/Time_series_development, Open issues
>
> This discussion is pretty old and does not reflect the current
> temporal framework implementation. Please have a look at the new
> TGRASS paper:
> https://www.sciencedirect.com/science/article/pii/S136481521300282X?np=y
> and the Geostat workshop:
> http://geostat-course.org/Topic_Gebbert
>
>> section) limitations were mentioned regarding the number of files in a
>> folder, which would be possibly a problem both for file based storage. The
>> ext2 file system had “"soft" upper limit of about 10-15k files in a single
>> directory” but theoretically many more where possible. Other file systems
>> may allow for more I guess… Will usage of such big directories > 10,000
>> files lead to performance problems?
>
> Modern file systems should not have problems with many files. I am
> using ext4 and the temporal framework with 100.000 maps without
> noticeable performance issues.
>
>>
>> The “Working with external data in GRASS 7” – wiki entry
>> (http://grasswiki.osgeo.org/wiki/Working_with_external_data_in_GRASS_7)
>> covers the technical part (and to some degree performance issues) very well.
>> Would it be worth adding a part on the strategic considerations / pros and
>> cons of using external data? Or is that too much user and format dependent?
>
> It would be great if you could share your experience with us. :)
>
> Best regards
> Soeren
>
>>
>>
>>
>> Thanks for any feedback our thoughts around this topic…
>>
>>
>>
>> Cheers
>>
>> Stefan
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> grass-user mailing list
>> grass-user at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/grass-user


More information about the grass-user mailing list