[postgis-users] Partitionning using geometry
Rémi Cura
remi.cura at gmail.com
Fri Apr 3 01:49:05 PDT 2015
Hey,
thanks for the answer Mark (I saw your pg_routing mail, I never used
pg_routing, so I asked a co_worker and I'm waiting his answer)!
What I want to achieve is slightly different.
It is not a processing issue (one big table, cutting it in pieces to
process it faster trough pl/R, pl/python, etc)
It is a scaling issue !
- having several thousand tables (each millions rows) that all have few
columns in common (including a geom column).
How do you query easily all this table at once with one simple query?
For instance, I want to get all geometries of all table that are within a
rectangle.
The classical solution is to use UNION ALL
select from table_1
union all
select from table_2 ...
It is inefficient and a pain to write.
Now postgres offers partitioning, that is that you build a hierarchy of
tables.
In this case, you would have one empty father table, and all the thousand
table would be declared as child of the father table.
Now when you write
select * from father;
you query in fact all the child tables.
This is all good and working, but it will be inefficient, because each time
you look for geometry within a rectangle, you would have to read all tables
(using their index).
Of course having thousands of indexes in memory is not possible, so it
would be very slow.
Postgres offers a solution for that, which is to declare constraints
(check) on table.
So you would say, table child_1 is entirely contained in a rectangle R1,
table child_2 entirely contained in a rectangle R2, etc.
That way, when you query the father table and asking all the geometry
inside a rectangle R0,
the planner will first check in which tables i Ri intersects R0,
then it will only consider those tables, instead of considering all the
tables.
This is the theory. In practice the planner was not using those constraints
(check).
Nicolas might have understood why.
If he is correct, it is possible to create a workaround
Cheers,
Rémi-C
2015-04-03 1:16 GMT+02:00 Mark Wynter <mark at dimensionaledge.com>:
> Hi Remi
> I might be off the mark with what you are trying to achieve.
>
> One thing I've experimented with, which is allied to vector tiling, is to
> assign tile IDs to features, based on various spatial relationships, and to
> use the "tile id" to index and "subset" the tables prior to doing other
> "stuff". Feature IDs mapped to Tile IDs start moving into the realm of Key
> Value pairs...
> Moreover, you could parallelise the Postgis subsetting and map reduce
> process using R called from Postgis via Pl/r.
>
> And to make working with R more efficient, have you seen this package that
> allows you to manipulate R data frames using SQL syntax....
>
> http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/
>
> Food for thought :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20150403/a71a3445/attachment.html>
More information about the postgis-users
mailing list