[postgis-users] Iterating through large datasets
Fabio Renzo Panettieri
fpanettieri at xoomcode.com
Fri Sep 24 09:35:51 PDT 2010
Hi, I have a question about performance.
I have to run a big number of validations against multiple tables, with
lots of records. Each table has 200k rows aprox.
They represent roads, blocks, buildings, landuses, infrastructure,
etc...
Also, I have a "errors" table, where the problems found are stored along
a simple description of the problem.
With plpgsql I have developed a large set of topological rules that
validates their relations eg. "blocks intersects building?", and create
a new entry in the errors table when needed.
My question is, how should I check each rule to ensure the best
performance?
Currently I get the new objects of each table (roads for example), and
check the relation with others. This way I ensure that roads are
iterated only once, but I'm not sure if it's the best way.
>From the top of my head, I think I can write rules that selects errors
and stores them. The only problem I see to this approach, is that all
the tables are going to be iterated many times.
I would like to know what do you think, which should I use? or if you
know a better way to do that.
--
Fabio R. Panettieri
Lead Software Engineer
http://www.xoomcode.com
More information about the postgis-users
mailing list