<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 9, 2015 at 4:52 PM, Fábio Dias <span dir="ltr"><<a href="mailto:fabio.dias@gmail.com" target="_blank">fabio.dias@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I switched to postgis for data storage and the v.generalize time went<br>

down to 130ish minutes, all processes working in parallel.<br>

<br>

I'm happy now :) thanks you guys very much.<br></blockquote><div><br></div><div>Thanks for reporting this back. What about a blog post, or something like that, on this topic? I believe there is a lot of people interested in some benchmarks.<br></div><div> <br></div><div>Vaclav<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class="im HOEnZb">-=--=-=-<br>

Fábio Augusto Salve Dias<br>

ICMC - USP<br>

<a href="http://sites.google.com/site/fabiodias/" target="_blank">http://sites.google.com/site/fabiodias/</a><br>

<br>

<br>

</span><div class="HOEnZb"><div class="h5">On Tue, Jan 27, 2015 at 8:50 PM, Fábio Dias <<a href="mailto:fabio.dias@gmail.com">fabio.dias@gmail.com</a>> wrote:<br>

> Hi,<br>

><br>

> I've kept an iotop, cumulative, running while the generalization run.<br>

> No disk IO involved, just a couple of postgre stats. I believe the OS<br>

> is keeping everything in RAM cache. I don't believe the disk is a<br>

> bottleneck either, it is a 10 disk raid of 15k rpm disks, it's really<br>

> fast.<br>

><br>

> I interrupted the processing, moved everything into postgres and<br>

> started over. I'm still loading the shapefiles (that I'm doing one at<br>

> a time), I'll start the 15 processes as soon as it is loaded. As soon<br>

> as that stabilizes, I'll report back.<br>

><br>

><br>

> On a related note, wouldn't it be interesting to always try to<br>

> simplify a line? As I understood the code, if the simplification fails<br>

> for any reason, the original line is used. Might be interesting to<br>

> have an option that makes the algorithm retry that line, with half the<br>

> threshold, for instance. It's kind of weird for me to see one side of<br>

> something really simplified while the other side really complicated :)<br>

><br>

> F<br>

> -=--=-=-<br>

> Fábio Augusto Salve Dias<br>

> ICMC - USP<br>

> <a href="http://sites.google.com/site/fabiodias/" target="_blank">http://sites.google.com/site/fabiodias/</a><br>

><br>

><br>

> On Tue, Jan 27, 2015 at 7:56 PM, Markus Metz<br>

> <<a href="mailto:markus.metz.giswork@gmail.com">markus.metz.giswork@gmail.com</a>> wrote:<br>

>> On Mon, Jan 26, 2015 at 3:54 PM, Fábio Dias <<a href="mailto:fabio.dias@gmail.com">fabio.dias@gmail.com</a>> wrote:<br>

>>> Hi,<br>

>>><br>

>>> The machine has 128Gb of ram. Doesn't matter what I do, I can't make a<br>

>>> dent on it. Even with everything cached in ram (shp files, database,<br>

>>> the whole lot), there is still free memory.<br>

>><br>

>> OK, it's not RAM.<br>

>><br>

>>><br>

>>> I'm asking about the database because the behavior I'm seeing on 'top'<br>

>>> looks like the one you get when mutexes are involved. The processes<br>

>>> don't go all to 100% processing at same time (and the machine has 64<br>

>>> processors, so no dent there either), except for the v.in.ogr.<br>

>><br>

>> The v.generailze processes should be at 100% while generalizing,<br>

>> unless the disk can not keep up with multiple simultaneous IO<br>

>> requests. The tables are copied only after the generalization finished<br>

>> (100% reached).<br>

>><br>

>>> What it<br>

>>> looks like is that something is locking each process and they are<br>

>>> taking turns. Considering how 'lite' the sqlite appears to be, and the<br>

>>> weird locking behavior that was mentioned in other threads (I'm not<br>

>>> getting the locked message here... I did, when I was running 2<br>

>>> parallel v.in.ogr), isn't it likely to be the culprit? Should I change<br>

>>> it to a more 'non-lite' system, like postgres for instance?<br>

>><br>

>> That could make sense when running multiple processes in parallel. An<br>

>> alternative would be to create a separate mapset for each process and<br>

>> at the end copy the results back to the main mapset.<br>

>><br>

>> Technically, it is not possible that the new v.generalize version in<br>

>> trunk (G71) is slower than the old version as in relbr70 because the<br>

>> new version updates only those parts of the topology that actually get<br>

>> changed. The old version also updates components that do not get<br>

>> changed, this is quite time-consuming.<br>

>><br>

>> I understand you like to go for the big nail immediately, but maybe it<br>

>> is worth testing first on a smaller sample?<br>

>><br>

>> Markus M<br>

>><br>

>>><br>

>>> F<br>

>>> -=--=-=-<br>

>>> Fábio Augusto Salve Dias<br>

>>> ICMC - USP<br>

>>> <a href="http://sites.google.com/site/fabiodias/" target="_blank">http://sites.google.com/site/fabiodias/</a><br>

>>><br>

>>><br>

>>> On Mon, Jan 26, 2015 at 7:22 AM, Markus Metz<br>

>>> <<a href="mailto:markus.metz.giswork@gmail.com">markus.metz.giswork@gmail.com</a>> wrote:<br>

>>>> On Mon, Jan 26, 2015 at 9:30 AM, Markus Metz<br>

>>>> <<a href="mailto:markus.metz.giswork@gmail.com">markus.metz.giswork@gmail.com</a>> wrote:<br>

>>>>> On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <<a href="mailto:fabio.dias@gmail.com">fabio.dias@gmail.com</a>> wrote:<br>

>>>>>> Hi,<br>

>>>>>><br>

>>>>>> Running r64249, with a couple of stuff in parallel using &. It seems<br>

>>>>>> to be considerably slower.<br>

>>>>><br>

>>>>> Very strange. Are you using trunk or GRASS 7.0?<br>

>>>><br>

>>>> Here, v.generalize on a TerraClass tile is down from 25 minutes to 13 seconds.<br>

>>>><br>

>>>>><br>

>>>>>> More than 100h, no 1% printed. To be fair,<br>

>>>>>> I'm not entirely sure I'll see it when it prints, 10 v.generalize<br>

>>>>>> running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is<br>

>>>>>> running for almost 100h too. I'm loading the shps directly, as advised<br>

>>>>>> way, way back in this thread.<br>

>>>>><br>

>>>>> What exactly do you mean with "loading shps directly"? For<br>

>>>>> v.generalize, you should import them with v.in.ogr.<br>

>>>>><br>

>>>>> What about memory consumption on your system? With 10 v.generalize + 1<br>

>>>>> v.in.ogr on such a big dataset, quite a lot of memory would be used.<br>

>>>>><br>

>>>>> Markus M<br>

>>>>><br>

>>>>>><br>

>>>>>> AFAIK, no disk is been used, the whole thing is cached (after more<br>

>>>>>> than 24h processing, cumulative iotop shows only a few mb<br>

>>>>>> written/read). I'm no longer using a ramdisk for the grassdata dir.<br>

>>>>>><br>

>>>>>> However, it appears to be considerably slower, probably because of the<br>

>>>>>> parallel running jobs.<br>

>>>>>><br>

>>>>>> My question then would be, considering the thread I saw about sqlite,<br>

>>>>>> should I be using something else as backend? When it starts to make<br>

>>>>>> sense to change it?<br>

>>>>>><br>

>>>>>> F<br>

>>>>>><br>

>>>>>> -=--=-=-<br>

>>>>>> Fábio Augusto Salve Dias<br>

>>>>>> ICMC - USP<br>

>>>>>> <a href="http://sites.google.com/site/fabiodias/" target="_blank">http://sites.google.com/site/fabiodias/</a><br>

>>>>>><br>

>>>>>><br>

>>>>>> On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <<a href="mailto:neteler@osgeo.org">neteler@osgeo.org</a>> wrote:<br>

>>>>>>> On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <<a href="mailto:fabio.dias@gmail.com">fabio.dias@gmail.com</a>> wrote:<br>

>>>>>>> ...<br>

>>>>>>>> What would be the best way to do that in parallel? One mapset for each<br>

>>>>>>>> year? Can I run multiple v.generalizes on the same input with<br>

>>>>>>>> different outputs?<br>

>>>>>>><br>

>>>>>>> Yes sure.<br>

>>>>>>><br>

>>>>>>>> My first thought was to run completely separated grass processes for<br>

>>>>>>>> each simplification, but I didn't find a way to make it search<br>

>>>>>>>> something different than .grass / .grass70 for the configuration<br>

>>>>>>>> stuff....<br>

>>>>>>><br>

>>>>>>> Maybe take a look at this approach<br>

>>>>>>> <a href="http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine" target="_blank">http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine</a><br>

>>>>>>><br>

>>>>>>> but even sending different v.generalize jobs to background (&) should<br>

>>>>>>> work if you have enough RAM.<br>

>>>>>>><br>

>>>>>>> markusN<br>

_______________________________________________<br>

grass-user mailing list<br>

<a href="mailto:grass-user@lists.osgeo.org">grass-user@lists.osgeo.org</a><br>

<a href="http://lists.osgeo.org/mailman/listinfo/grass-user" target="_blank">http://lists.osgeo.org/mailman/listinfo/grass-user</a><br>

</div></div></blockquote></div><br></div></div>