<br><div class="gmail_quote">On Thu, Apr 1, 2010 at 3:24 PM, Glynn Clements <span dir="ltr">&lt;<a href="mailto:glynn@gclements.plus.com" target="_blank">glynn@gclements.plus.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


<div><br>

Jordan Neumeyer wrote:<br>

<br>

&gt; &gt; &gt; Just kind of my thought process about how I would try to go about<br>

&gt; &gt; &gt; parallelizing a module.<br>

&gt; &gt;<br>

&gt; &gt; The main issue with parallelising raster input is that the library<br>

&gt; &gt; keeps a copy of the current row&#39;s data, so that consecutive reads of<br>

&gt; &gt; the same row (as happen when upsampling) don&#39;t re-read the data.<br>

&gt; &gt;<br>

&gt; &gt; For concurrent access to a single map, you would need to either keep<br>

&gt; &gt; one row per thread, or abandon caching. Also, you would need to use<br>

&gt; &gt; pread() rather than fseek()+read().<br>

&gt;<br>

&gt; It sounds like you&#39;re talking about parallelism in I/O from a file or<br>

&gt; database. Neither of which is my intent or goal for this project. I will<br>

&gt; parallelize things after they have already been read into memory, and tasks<br>

&gt; are processor intensive. I wouldn&#39;t want parallelize any I/O, but if I were<br>

&gt; to optimize I/O. I would make all operations I/O asynchronous, which is can<br>

&gt; mimic parallelism in a sense. Queuing up the chunks of data and then<br>

&gt; processing them as resources become available.<br>

<br>

</div>Most GRASS raster modules process data row-by-row, rather than reading<br>

entire maps into memory. Reading maps into memory is frowned upon, as<br>

GRASS is regularly used with maps which are too large to fit into<br>

memory. Where the algorithm cannot operate row-by-row, use of a tile<br>

cache is the next best alternative; see e.g. r.proj.seg (renamed to<br>

r.proj in 7.0).<br></blockquote><div> </div><div>That makes more sense. So a row is like chunk from the map data? Kind of like the first row of pixels from an image. So from the first pixel to width of image is one row, then width plus one starts the next, and so on and so forth. How large are the rows generally?<br>


 </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

<br>

Holding an entire map in memory is only considered acceptable if the<br>

algorithm is inherently so slow that processing a gigabyte-sized map<br>

simply wouldn&#39;t be feasible, or the access pattern is such that even a<br>

tile-cache approach isn&#39;t feasible.<br>

<br>

In general, GRASS should be able to process multi-gigabyte maps even<br>

on 32-bit systems, and work on multi-user systems where a process<br>

cannot assume that it can use a significant proportion of the system&#39;s<br>

total physical memory.<br></blockquote><div> </div><div>Which is good. I didn&#39;t realize how big the data set could be. What&#39;s biggest map you&#39;ve seen?<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


<div>

&gt; &gt; It&#39;s more straightfoward to read multiple maps concurrently. In 7.0,<br>

&gt; &gt; this case should be thread-safe.<br>

&gt; &gt;<br>

&gt; &gt; Alternatively, you could have one thread for reading, one for writing,<br>

&gt; &gt; and multiple worker threads for the actual processing. However, unless<br>

&gt; &gt; the processing is complex, I/O will be the bottleneck.<br>

&gt; &gt;<br>

&gt;<br>

&gt; I/O is generally a bottleneck anyway. Something always tends to be waiting<br>

&gt; on another.<br>

<br>

</div>When I refer to I/O, I&#39;m referring not just to read() and write(), but<br>

also the (de)compression, conversion and resampling, i.e. everything<br>

performed by the get/put-row functions. For many GRASS modules, this<br>

takes more time than the actual processing.<br></blockquote><div><br>I can see why, especially for big maps since it&#39;s doing that row-by-row.<br>So when a GRASS module loads a map the basic algorithm looks something like:<br>


1) Read row<br>2) get-row function does necessary preprocessing<br>3) row is cached or held in memory. Does the caching take place after<br>4) row is processed<br>5) Display/write process ? (Or is this after a couple iterations, all of them?)<br>


5) repeat (1)<br><br>Would it be beneficial/practical to parallelize some of the preprocessing like conversion and resampling before the caching occurs? <br>


<br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Finally, the thread title refers to libraries. Very little processing<br>

occurs in the libraries; most of it is in the individual modules. So<br>

there isn&#39;t much scope for &quot;parallelising&quot; the libraries. The main<br>

issue for library functions is to ensure that they are thread-safe.<br>

Most of the necessary work for the raster library has been done in<br>

7.0.<font color="#888888"><br></font></blockquote><div> </div><div>I was trying to refer to all of the raster modules as a whole, but library is just what the modules share. I&#39;ve changed the title from Parallelization of Raster and Vector libraries to Parallelization of Raster and Vector modules. <br>


<br>Would I be working on GRASS 6.x or 7.x? Is there a minimum compiler version when using GCC/MingW? Just curious because openMP tasks are only supported on GCC &gt;= 4.2. Which may or not be useful, but can be a valuable tool when you don&#39;t know how much data or how many &quot;tasks&quot; you have. Like processing a linked-list or binary trees.<br>


 </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


<font color="#888888">

--<br>

</font><div><div></div><div>Glynn Clements &lt;<a href="mailto:glynn@gclements.plus.com" target="_blank">glynn@gclements.plus.com</a>&gt;<br></div></div></blockquote><div><br>~Jordan <br></div></div><br>