[postgis-users] Iterating through large datasets

Fri Sep 24 09:35:51 PDT 2010

Hi, I have a question about performance.

I have to run a big number of validations against multiple tables, with
lots of records. Each table has 200k rows aprox.

They represent roads, blocks, buildings, landuses, infrastructure,
etc...

Also, I have a "errors" table, where the problems found are stored along
a simple description of the problem.

With plpgsql I have developed a large set of topological rules that
validates their relations eg. "blocks intersects building?", and create
a new entry in the errors table when needed.

My question is, how should I check each rule to ensure the best
performance?

Currently I get the new objects of each table (roads for example), and
check the relation with others. This way I ensure that roads are
iterated only once, but I'm not sure if it's the best way.

>From the top of my head, I think I can write rules that selects errors
and stores them. The only problem I see to this approach, is that all
the tables are going to be iterated many times.

I would like to know what do you think, which should I use? or if you
know a better way to do that.

-- 
Fabio R. Panettieri
Lead Software Engineer
http://www.xoomcode.com