[GRASS-user] Organizing spatial (time series) data for mixed GIS environments

Blumentrath, Stefan Stefan.Blumentrath at nina.no
Thu Dec 5 01:58:10 PST 2013


Hi again

Now I tested a bit the GeoTIFF approach with regards to disc space and performance:
For the disc space test I exported a MODIS dataset over Scandinavia (Size is 7629 x 9387 pixels) from GRASS native format (type CELL which had (compressed) 27M on disc) to Geotiff with two different data types (Int16 and Float64) both LZW-compressed and uncompressed.

Results for the Int16 dataset
MODIS_sizetest_compressed.tif (LZW-compressed, Predictor 2): 14M
MODIS_sizetest_uncompressed.tif (uncompressed): 137M

Results for the Float64 dataset
MODIS_sizetest_compressed_Float64.tif (LZW-compressed): 29M
MODIS_sizetest_uncompressed_Float64.tif  (uncompressed): 547M

So, disc capacity seems to be an factor one should consider as uncompressed data is in this case at least 10 to 20 times heavier than compressed...
Maybe we will have to accept that raw data is kept in a less interoperable (GRASS native) format as the processed results are mainly the ones of interests (and I guess visual file browser e.g. in Arc software will never the less almost freeze when opening a folder with hundreds or thousands of files, do not know?).

I tried to run also performance tests using the time command and r.mapcalc with external and native format both for input and output (all 4 combinations). Results were not reliable and performance tests seem to be a bit tricky according to Glynns post here:
http://osdir.com/ml/grass-development-gis/2010-09/msg00225.html
Does anyone have a suggestion how such tests Glynn describes could be run technically (for not C-developers) without too much effort?

Cheers
Stefan

-----Original Message-----
From: grass-user-bounces at lists.osgeo.org [mailto:grass-user-bounces at lists.osgeo.org] On Behalf Of Blumentrath, Stefan
Sent: 4. desember 2013 22:52
To: Sören Gebbert
Cc: grass-user at lists.osgeo.org list
Subject: Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS environments

Hi Sören,

First of all thank you very much for the excellent temporal framework! It is really great work!
Thank you also for your answers. They are already very helpful too!

I will test the solution with external Geotiffs.

Updates of the Geotiffs by external software are expectable (possibly by cron-jobs), so I have to think about a strategy for updating all time-space datasets down-streams which depend on a file updated (decade, year, month, whatever...)

I`ll report back after some first tests...

Best regards
Stefan


-----Original Message-----
From: Sören Gebbert [mailto:soerengebbert at googlemail.com]
Sent: 4. desember 2013 18:02
To: Blumentrath, Stefan
Cc: grass-user at lists.osgeo.org list
Subject: Re: [GRASS-user] Organizing spatial (time series) data for mixed GIS environments

Hi Stefan,

2013/12/3 Blumentrath, Stefan <Stefan.Blumentrath at nina.no>:
> Dear all,
>
>
>
> On our Ubuntu server we are about to reorganize our GIS data in order 
> to develop a more efficient and consistent solution for data storage 
> in a mixed GIS environment.
>
> By “mixed GIS environment” I mean that we have people working with 
> GRASS, QGIS, PostGIS but also many people using R and maybe the 
> largest fraction using ESRI products, furthermore we have people using 
> ENIV, ERDAS and some other. Only few people (like me) actually work 
> directly on the server…
>
> Until now I stored “my” data mainly in GRASS (6/7) native format which 
> I was very happy with. But I  guess our ESRI- and PostGIS-people would 
> not accept that as a standard…
>
>
>
> However, especially for time series data we cannot have several copies 
> in different formats (tailor-made for each and every software).
>
>
>
> So I started thinking: what would be the most efficient and convenient 
> solution for storing a large amount of data (e.g. high resolution 
> raster and vector data with national extent plus time series data) in 
> a way that it is accessible for all (at least most) remote users (with 
> different GIS software). As I am very fond of the temporal framework 
> in GRASS 7 it would be a precondition that I can use these tools on 
> the data without unreasonable performance loss. Another precondition 
> would be that users at remote computers in our (MS Windows) network can have access to the data.
>
>
>
> In general, four options come into my mind:
>
> a)      Stick to GRASS native format and have one copy in another format
>
> b)      Use the native formats the data come in (e.g. temperature and
> precipitation comes in zipped ascii-grid format)
>
> c)       Use PostGIS as a backend for data storage (raster / vector) (linked
> by (r./v.external.*)
>
> d)      Use another GDAL/OGR format for data storage (raster / vector)
> (linked by (r./v.external.*)
>
>
>
> My question(s) are:
>
> What solutions could you recommend or what solution did you choose?

I would suggest to use r.external and uncompressed geotiff files for raster data. But you have to make sure that external software does not modify these files, or if they do, that the temporal framework is triggered to update dependent space time raster datasets.

I would suggest to use the native GRASS format, in case of vector data. Hence vector data needs to be copied. But maybe PostgreSQL with topology support will be a solution? I think Martin Landa may have an opinion here.

>
> Who is having experience with this kind of data management challenge?

No experience here from my side.

> How do externally linked data series perform compared to GRASS native?

It will be slower than the native format for sure. But i don't know how much slower.

>
>
> I searched a bit the mailing list and found this:
> (http://osgeo-org.1560.x6.nabble.com/GRASS7-temporal-GIS-database-ques
> tions-td5054920.html) where Sören recommended “postgresql as temporal 
> database backend”. However I am not sure if that was meant only for 
> the temporal metadata and not the rasters themselves…

My recommendation was related to the temporal metadata only. The sqlite database will not scale very well for select requests if you have more than 30,000 maps registered in your temporal database.
PostgreSQL will be much faster for select requests. But PostgreSQL performs very badly in managing (insert, update, delete) many maps. I am not sure what the reason for this is, but from my experience has PostgreSQL a scaling problem with many tables. Hence if you do not modify you data often, PostgreSQL is your temporal database backend of choice. Otherwise i would recommend Sqlite, even if its slower for select requests.

> Furthermore in the idea collection for the Temporal framework 
> (http://grasswiki.osgeo.org/wiki/Time_series_development, Open issues

This discussion is pretty old and does not reflect the current temporal framework implementation. Please have a look at the new TGRASS paper:
https://www.sciencedirect.com/science/article/pii/S136481521300282X?np=y
and the Geostat workshop:
http://geostat-course.org/Topic_Gebbert

> section) limitations were mentioned regarding the number of files in a 
> folder, which would be possibly a problem both for file based storage.
> The
> ext2 file system had “"soft" upper limit of about 10-15k files in a 
> single directory” but theoretically many more where possible. Other 
> file systems may allow for more I guess… Will usage of such big 
> directories > 10,000 files lead to performance problems?

Modern file systems should not have problems with many files. I am using ext4 and the temporal framework with 100.000 maps without noticeable performance issues.

>
> The “Working with external data in GRASS 7” – wiki entry
> (http://grasswiki.osgeo.org/wiki/Working_with_external_data_in_GRASS_7
> ) covers the technical part (and to some degree performance issues) 
> very well.
> Would it be worth adding a part on the strategic considerations / pros 
> and cons of using external data? Or is that too much user and format dependent?

It would be great if you could share your experience with us. :)

Best regards
Soeren

>
>
>
> Thanks for any feedback our thoughts around this topic…
>
>
>
> Cheers
>
> Stefan
>
>
>
>
>
>
>
>
> _______________________________________________
> grass-user mailing list
> grass-user at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-user
_______________________________________________
grass-user mailing list
grass-user at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


More information about the grass-user mailing list