[GRASS-stats] Loading a point-vector table with 466 columns

Sat May 23 10:23:21 EDT 2009

On Sat, 23 May 2009, Nikos Alexandris wrote:

>
> Nikos:
>>> # almost an hour...
>>> Sys.time() ; sample_2 <- readVECT6("sample_2_grid_points") ; Sys.time()
>>> [1] "2009-05-22 23:25:02 CEST"
>>> OGR data source with driver: GRASS
>>> Source:
>>> "/geo/grassdb/peloponnese/evaluation_utm/nik/vector/sample_2_grid_points/head", layer: "1"
>>> with  875  rows and  466  columns
>
>>> Feature type: wkbPoint with 3 dimensions
>>> [1] "2009-05-23 00:22:12 CEST"
>
> Roger:
> --%<---
>> Does plugin=FALSE speed it up or slow it down (that would force the use of
>> a temporary shapefile)?
>
> Yes, it speeds up.
>
> # with "plugin=FALSE"
> system.time(readVECT6("sample_2_grid_points", plugin=FALSE))
> Exporting 875 points/lines...
> 100%
> 875 features written
> OGR data source with driver: ESRI Shapefile
> Source: "/geo/grassdb/peloponnese/evaluation_utm/nik/.tmp/vertical",
> layer: "sample_2"
> with  875  rows and  466  columns
> Feature type: wkbPoint with 2 dimensions
>   user  system elapsed
> 169.450  24.677 204.882
>
>
> ## there is one difference: wkbPoint with "3" vs "2" dimensions ##
> ## what does this mean (wkbPoint)? OK, I look for it in the book ##
>

Three minutes instead of thirty+ suggests that the OGR plugin has trouble 
with SQLite as the DB format. So maybe the default for plugin= should be 
FALSE, not NULL and automatic use if present?

The plugin also creates a fictitious third dimension in (point at least) 
data that has created havoc, and has led to readVECT6() getting a 
pointDropZ= argument - that's why it says that wkbPoint is 3 with the 
plugin and (correctly) 2 otherwise.

>
>>> # while reading the csv...
>>> Sys.time() ; sample_2 <-
>>> read.csv(file="sample_2_grid_points_table.csv") ; Sys.time()
>>> [1] "2009-05-23 01:39:51 CEST"
>>> [1] "2009-05-23 01:39:52 CEST"
>
> --%<---
>> This is not a fair comparison, because you have to dump the CSV file from
>> the GRASS database first, although it won't take long. What are you using
>> to do that?
>
> # right, it takes some time (<1min)
> # running from within GRASS location
> time db.out.ogr in=sample_2_grid_points
> dsn=/geo/grassdb/peloponnese/R/R_files/sample_2_grid_points_table
> format=CSV
> Exported table
> </geo/grassdb/peloponnese/R/R_files/sample_2_grid_points_table.csv>
>
> real	0m46.845s
> user	0m22.065s
> sys	0m23.637s

OK, thanks, this mirrors part of the v.out.ogr timing in the three 
minutes.

Roger

>
>> Have you considered connecting to the SQLite file directly
>> from R? Are the (2) coordinates present in the table? See:
>>
>> http://cran.r-project.org/web/packages/RSQLite/index.html
>>
>> for direct reading.
>
> I was not aware of RSQLite. If it's straight-forward I'll try it today.
> If you mean the x, y coordinates just as normal columns, no, I don't
> require them currently.
>
>
> Overview of loading grass attrubute table (875 rows, 466 colummns) via:
>
> * readVECT6() with plugin=TRUE                         : ~57min
> * readVECT6() with plugin=FALSE                        : ~3min+
> * export from grass as CSV (~46sec) + read.csv (1 sec) : ~47sec
>
> Nikos
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no