[postgis-users] Iterating through large datasets

Mon Sep 27 07:12:31 PDT 2010

  Hi Fabio,

I am not sure if this would really enhance the performance of our error 
check, but, I could think about a third option: Append all tables into 
one with an identifier for the type of information (e.g. roads, 
blocks,...). Then perform an iterating selfintersection for the new 
table and write your error log depending on the combination of type 
identifiers found during the selfintersection.
The advantage is that you are able to check the relations of all types 
in one step. The disadvantage might be that the appended table could be 
so big that the selfintersection has a poor performance again.

Regards,

Birgit.

On 24.09.2010 18:35, Fabio Renzo Panettieri wrote:
> Hi, I have a question about performance.
>
> I have to run a big number of validations against multiple tables, with
> lots of records. Each table has 200k rows aprox.
>
> They represent roads, blocks, buildings, landuses, infrastructure,
> etc...
>
> Also, I have a "errors" table, where the problems found are stored along
> a simple description of the problem.
>
> With plpgsql I have developed a large set of topological rules that
> validates their relations eg. "blocks intersects building?", and create
> a new entry in the errors table when needed.
>
> My question is, how should I check each rule to ensure the best
> performance?
>
> Currently I get the new objects of each table (roads for example), and
> check the relation with others. This way I ensure that roads are
> iterated only once, but I'm not sure if it's the best way.
>
> > From the top of my head, I think I can write rules that selects errors
> and stores them. The only problem I see to this approach, is that all
> the tables are going to be iterated many times.
>
> I would like to know what do you think, which should I use? or if you
> know a better way to do that.
>