[GRASS-dev] i.segment: possible to cache results of open_files() for several runs of i.segment ?
Moritz Lennert
mlennert at club.worldonline.be
Thu Aug 3 01:40:35 PDT 2017
On 03/08/17 10:11, Markus Metz wrote:
>
>
> On Thu, Aug 3, 2017 at 7:02 AM, Moritz Lennert
> <mlennert at club.worldonline.be <mailto:mlennert at club.worldonline.be>> wrote:
> >
> > On 02/08/17 21:43, Markus Metz wrote:
> >>
> >> Hi Moritz,
> >>
> >> On Wed, Aug 2, 2017 at 2:52 PM, Moritz Lennert
> <mlennert at club.worldonline.be <mailto:mlennert at club.worldonline.be>
> <mailto:mlennert at club.worldonline.be
> <mailto:mlennert at club.worldonline.be>>> wrote:
> >> >
> >> > Hi MarkusM,
> >> >
> >> > Working on segmentation parameter optimization with fairly large
> images we have stumbled upon some questions (ISTR that we've discussed
> this before, but I cannot find traces of that discussion). As a
> reminder, i.segment.uspo works by looping through a series of threshold
> parameter values, segmenting a series of test regions at each parameter
> value and then comparing the results in order to identify the "optimal"
> threshold.
> >> >
> >> > Two issues have popped up:
> >> >
> >> > - One approach we tried was to optimize thresholds separately for
> different types of morphological zones. For each type we have several
> polygons distributed across the image. These polygons are used as input
> for a mask. However, it does seem that even if most of the image is
> masked, open_files() takes a long time, as if it does read the entire
> image. Is this expected / normal ? Would it be possible to reduce the
> read time when most of the area is masked ?
> >>
> >> You could reduce the read time by zooming to the current mask with
> g.region zoom=current_mask
> >
> >
> > Yes, but this doesn't help for situations where the mask areas are
> distributed across the entire image, so that the region will be almost
> as large as the original image.
> >
> >>
> >> >
> >> > - More generally: for every i.segment call, open_files() goes
> through the reading of the input files and, AFAIU, checks for min/max
> values and creates the seglib temp files (+ possibly other operations).
> When segmenting the same image several times just using different
> thresholds, it would seem that most of what open_files() does is
> repeated in exactly the same manner at each call. Would it be possible
> to cache that information somehow and to instruct i.segment to reuse
> that info each time it is called on the same image and region ?
> >>
> >> The most time (and disk space) consuming part of open_files() is
> creating temporary files for the input files and the current region and
> the current mask. These temporary files are temporary because too many
> things can change between two consecutive runs of any module using the
> segment library. First of all, the input files could change (same name,
> but different cell values), then region and mask settings could change.
> >
> >
> > Agreed, but here I'm talking about the situation where I run
> i.segment multiple times in a loop with exactly the same input, and only
> threshold value (and possibly minsize) changing. So we hoped that it
> would be possible to reuse the segment library files.
>
> One problem is that the contents of the temporary files are modified at
> runtime and can thus not be re-used for a new run. This is in order to
> save disk space and memory, otherwise resource requirements would double
> if input and output are kept separate.
Ok, understood.
> >
> >>
> >> >
> >> > Just trying to crunch larger and larger images... :-)
> >>
> >> As in, it's working but a bit slow?
> >
> >
> > i.segment is definitely not slow compared to other similar software,
> but in this specific case of looping the accumulated time used in the
> phase of reading the input files grows to a significant duration.
>
> Reading input can take some time, but I thought that most of the time is
> spent on the actual segmentation which takes substantially longer than
> reading the input.
Yes, sure, we were just wondering whether this might be a low-hanging
fruit, but I now see it is quite the contrary.
Related to this is the question of parallel processing: we've hit upon
the issue of several parallel processes all doing the same open_files()
run at the beginning on the same file. Obviously this quickly leads to a
severe bottleneck when you do not use a parallel file system, but we
also thought that maybe the information could be shared. I now
understand that this is not feasible since the info will be modified by
each process.
> Of course reading input maps does require some time,
> but I can't see a reasonable solution for creating a permanent cache of
> the input data without lots of sanity checks and increasing resource
> requirements. I see more potential in the actual segmentation part,
> maybe this could be further optimized.
We're always ready to test any such optimizations. But as I said, I
believe GRASS GIS is already quite fast in its region growing
segmentation...
Moritz
More information about the grass-dev
mailing list