<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div><span></span></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>I always keep open mind. I hate marketing hype of noSQL vendors, plus GIS vendors too! </div><div><br></div><div>But when a colleague who is one of leading data engineers in Visual Effects (vfx) industry combines mongo db and postgis, then I take note.</div><div><br></div><div>mongo db has got spatial capabilities And you can use the mongo FDW to return query results into PG.</div><div><br></div><div>using Big Data /noSQL solutions to do some of the simple heavy lifting - as you say, scalable designs are a must. The datasets are only going to get bigger and bigger over time - so IMHO, the quicker we become conversant inPolyglot DB design, the better.</div><div><br></div><div>If we think every problem is a nail and therefore postgis is the hammer / answer, then innovation ( which is about using and combining existing technologies in new ways) will leave us behind.</div><div><br></div><div>The people who use NoSQL tend not to be on the pgmailing list.<br><br>Sent from my iPhone</div><div><br>On 3 Apr 2015, at 7:19 pm, Rémi Cura <<a href="mailto:remi.cura@gmail.com">remi.cura@gmail.com</a>> wrote:<br><br></div><blockquote type="cite"><div><div dir="ltr"><div><div>Hey,<br></div>thanks for the answer Mark (I saw your pg_routing mail, I never used pg_routing, so I asked a co_worker and I'm waiting his answer)!<br><br></div><div>What I want to achieve is slightly different.<br>It is not a processing issue (one big table, cutting it in pieces to process it faster trough pl/R, pl/python, etc)<br><br></div><div>It is a scaling issue !<br></div><div><br> - having several thousand tables (each millions rows) that all have few columns in common (including a geom column).<br><br></div><div>How do you query easily all this table at once with one simple query?<br><br></div><div>For instance, I want to get all geometries of all table that are within a rectangle.<br><br></div><div>The classical solution is to use UNION ALL<br></div><div>select from table_1<br></div><div>union all<br></div><div>select from table_2 ...<br><br></div><div>It is inefficient and a pain to write.<br><br></div><div>Now postgres offers partitioning, that is that you build a hierarchy of tables.<br><br></div><div>In this case, you would have one empty father table, and all the thousand table would be declared as child of the father table.<br></div><div>Now when you write<br></div><div>select * from father; <br></div><div>you query in fact all the child tables.<br><br></div><div>This is all good and working, but it will be inefficient, because each time you look for geometry within a rectangle, you would have to read all tables (using their index).<br></div><div>Of course having thousands of indexes in memory is not possible, so it would be very slow.<br></div><div><br></div><div>Postgres offers a solution for that, which is to declare constraints (check) on table.<br><br></div><div>So you would say, table child_1 is entirely contained in a rectangle R1, <br></div><div>table child_2 entirely contained in a rectangle R2, etc.<br><br></div><div>That way, when you query the father table and asking all the geometry inside a rectangle R0, <br></div><div>the planner will first check in which tables i Ri intersects R0,<br>then it will only consider those tables, instead of considering all the tables.<br><br></div><div>This is the theory. In practice the planner was not using those constraints (check).<br><br>Nicolas might have understood why.<br></div><div>If he is correct, it is possible to create a workaround<br></div><div><br>Cheers,<br></div><div>Rémi-C <br> </div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-04-03 1:16 GMT+02:00 Mark Wynter <span dir="ltr"><<a href="mailto:mark@dimensionaledge.com" target="_blank">mark@dimensionaledge.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Remi<br>
I might be off the mark with what you are trying to achieve.<br>
<br>
One thing I've experimented with, which is allied to vector tiling, is to assign tile IDs to features, based on various spatial relationships, and to use the "tile id" to index and "subset" the tables prior to doing other "stuff". Feature IDs mapped to Tile IDs start moving into the realm of Key Value pairs...<br>
Moreover, you could parallelise the Postgis subsetting and map reduce process using R called from Postgis via Pl/r.<br>
<br>
And to make working with R more efficient, have you seen this package that allows you to manipulate R data frames using SQL syntax....<br>
<br>
<a href="http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/" target="_blank">http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/</a><br>
<br>
Food for thought :)</blockquote></div><br></div>
</div></blockquote></div></body></html>