[GRASS-user] Organizing spatial (time series) data for mixed GIS environments

Sören Gebbert soerengebbert at googlemail.com
Wed Dec 4 09:01:34 PST 2013


Hi Stefan,

2013/12/3 Blumentrath, Stefan <Stefan.Blumentrath at nina.no>:
> Dear all,
>
>
>
> On our Ubuntu server we are about to reorganize our GIS data in order to
> develop a more efficient and consistent solution for data storage in a mixed
> GIS environment.
>
> By “mixed GIS environment” I mean that we have people working with GRASS,
> QGIS, PostGIS but also many people using R and maybe the largest fraction
> using ESRI products, furthermore we have people using ENIV, ERDAS and some
> other. Only few people (like me) actually work directly on the server…
>
> Until now I stored “my” data mainly in GRASS (6/7) native format which I was
> very happy with. But I  guess our ESRI- and PostGIS-people would not accept
> that as a standard…
>
>
>
> However, especially for time series data we cannot have several copies in
> different formats (tailor-made for each and every software).
>
>
>
> So I started thinking: what would be the most efficient and convenient
> solution for storing a large amount of data (e.g. high resolution raster and
> vector data with national extent plus time series data) in a way that it is
> accessible for all (at least most) remote users (with different GIS
> software). As I am very fond of the temporal framework in GRASS 7 it would
> be a precondition that I can use these tools on the data without
> unreasonable performance loss. Another precondition would be that users at
> remote computers in our (MS Windows) network can have access to the data.
>
>
>
> In general, four options come into my mind:
>
> a)      Stick to GRASS native format and have one copy in another format
>
> b)      Use the native formats the data come in (e.g. temperature and
> precipitation comes in zipped ascii-grid format)
>
> c)       Use PostGIS as a backend for data storage (raster / vector) (linked
> by (r./v.external.*)
>
> d)      Use another GDAL/OGR format for data storage (raster / vector)
> (linked by (r./v.external.*)
>
>
>
> My question(s) are:
>
> What solutions could you recommend or what solution did you choose?

I would suggest to use r.external and uncompressed geotiff files for
raster data. But you have to make sure that external software does not
modify these files, or if they do, that the temporal framework is
triggered to update dependent space time raster datasets.

I would suggest to use the native GRASS format, in case of vector
data. Hence vector data needs to be copied. But maybe PostgreSQL with
topology support will be a solution? I think Martin Landa may have an
opinion here.

>
> Who is having experience with this kind of data management challenge?

No experience here from my side.

> How do externally linked data series perform compared to GRASS native?

It will be slower than the native format for sure. But i don't know
how much slower.

>
>
> I searched a bit the mailing list and found this:
> (http://osgeo-org.1560.x6.nabble.com/GRASS7-temporal-GIS-database-questions-td5054920.html)
> where Sören recommended “postgresql as temporal database backend”. However I
> am not sure if that was meant only for the temporal metadata and not the
> rasters themselves…

My recommendation was related to the temporal metadata only. The
sqlite database will not scale very well for select requests if you
have more than 30,000 maps registered in your temporal database.
PostgreSQL will be much faster for select requests. But PostgreSQL
performs very badly in managing (insert, update, delete) many maps. I
am not sure what the reason for this is, but from my experience has
PostgreSQL a scaling problem with many tables. Hence if you do not
modify you data often, PostgreSQL is your temporal database backend of
choice. Otherwise i would recommend Sqlite, even if its slower for
select requests.

> Furthermore in the idea collection for the Temporal framework
> (http://grasswiki.osgeo.org/wiki/Time_series_development, Open issues

This discussion is pretty old and does not reflect the current
temporal framework implementation. Please have a look at the new
TGRASS paper:
https://www.sciencedirect.com/science/article/pii/S136481521300282X?np=y
and the Geostat workshop:
http://geostat-course.org/Topic_Gebbert

> section) limitations were mentioned regarding the number of files in a
> folder, which would be possibly a problem both for file based storage. The
> ext2 file system had “"soft" upper limit of about 10-15k files in a single
> directory” but theoretically many more where possible. Other file systems
> may allow for more I guess… Will usage of such big directories > 10,000
> files lead to performance problems?

Modern file systems should not have problems with many files. I am
using ext4 and the temporal framework with 100.000 maps without
noticeable performance issues.

>
> The “Working with external data in GRASS 7” – wiki entry
> (http://grasswiki.osgeo.org/wiki/Working_with_external_data_in_GRASS_7)
> covers the technical part (and to some degree performance issues) very well.
> Would it be worth adding a part on the strategic considerations / pros and
> cons of using external data? Or is that too much user and format dependent?

It would be great if you could share your experience with us. :)

Best regards
Soeren

>
>
>
> Thanks for any feedback our thoughts around this topic…
>
>
>
> Cheers
>
> Stefan
>
>
>
>
>
>
>
>
> _______________________________________________
> grass-user mailing list
> grass-user at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-user


More information about the grass-user mailing list