[GRASS-user] too many categories: buffer overflow

Ken Mankoff mankoff at gmail.com
Tue Jun 18 12:51:07 PDT 2019


Hi Micha and Markus,

On 2019-06-18 at 10:07 -04, Micha Silver <tsvibar at gmail.com> wrote...
> Do you really want a vector polygon map with > 2 billion features?

No, and there are not that many.

% r.info -r basins
  min=-2147474681
  max=2147429730

But I don't have categories from 1 to 2147429730. The values are sparse. I describe my workflow and why I've created these sparse values in more detail below.

Even though << 2 billion, there should be many basins. This is all of Greenland at 30 m resolution, which is 4.5 billion features.

Taking a step back, I'm trying to generate unique basin values that match the stream and outlet CAT values. Here is my workflow which doesn't appear to have any problems when run at 90x90 m resolution (400 million cells) but fails at 30x30 m resolution (10x as many, or 4.5 billion cells).

1) Find streams:

r.stream.extract elevation=head threshold=${THRESH} memory=16384 direction=dir stream_raster=streams stream_vector=streams

2) Find outlets. Where streams have outlets, use the same CAT value so the two can be linked in further analysis. But many outlets don't have streams. These need to have unique categories for the next step when we find basins. This is where my error is. I set the unique value to the cell #, which is > 2 billion when using a 30x30 m domain.

r.mapcalc "outlets_all = if(dir < 0, 1, null())"
r.mapcalc "outlets_streams_1 = if((dir < 0) && (not(isnull(streams))), streams, outlets_all)"
### BUG INTRODUCED HERE, setting (eventual) cat to cell number:
r.mapcalc "outlets_streams = if(outlets_streams_1 != 1, outlets_streams_1, max(outlets_streams_1)+1+col()+(max(col())*(row()-1)))"

# convert outlets to a vector.
r.out.xyz input=outlets_streams | \
    v.in.ascii input=- output=outlets_streams separator=pipe \
        columns="x int, y int, cat int" x=1 y=2 cat=3

Q: How can I create the outlets_streams vector for all locations where dir < 0 (all outlets), that maintains the same value as the streams raster where that raster is defined, but unique values at all other locations where streams is not defined, but dir < 0?


3) Find basins

r.stream.basins -m direction=dir points=outlets_streams basins=basins_all memory=16384 --verbose


4) Absorb small basins

r.clump -d input=basins_all output=basins_nosmall minsize=124
r.mode base=basins_nosmall cover=basins_all output=basins
### BUG APPEARS HERE
r.to.vect -v input=basins output=basins type=area

# drop outlets for absorbed basins.
r.mapcalc "outlets = if(outlets_streams == basins, basins, null())"
r.to.vect -v input=outlets output=outlets type=point

NOTE: I use r.mode instead of r.area because I need to maintain the category value, so that eventual vectors can have linked primary keys. r.area re-assigns categories.


Any advice how to generate streams, outlets, and basins all with linked primary key would be much appreciated.

Thanks,

  -k.


More information about the grass-user mailing list