[postgis-users] Histogram2d formation

Tue Oct 8 12:09:38 PDT 2002

On Tuesday 08 October 2002 11:11 am, David Blasby wrote:
> Here's an example of the histogram creation for BC roads data.
>
> One has a grid size of 20*20, the other 40*40 - see the attached .gif
> files.

Very cool!

> This histogram formation only takes a few seconds!

How many rows in the dataset? Or another way to size the computation... If it 
only took a few seconds to calculate the histogram, how long did it take to 
create a GiST index for the same data?

> Next, I'm going to try to find a way of estimating the result set size
> based on a query window.  If anyone has any ideas how to go about doing
> this (esp for very small query windows), I'd like to hear from you.
>
> dave

If you're keeping a count of the rows intersected by each grid cell, then you 
could calculate the sum of the grid cell values that are completely enclosed 
by the query, and then add a linear interpolation for those that partially 
overlap (e.g., area-of-cell/area-of-query*value-of-cell).

Looking at your .gifs, there seems to be a lot of "blank" areas where the 
histogram is close to zero. Could these be collapsed out with a quad-tree 
implementation that didn't need to represent equal-sized cells?

Another thought... keeping an actual row count for each cell might make the 
histogram larger and also make it difficult to calculate the estimated count 
for each query. However, you may be able to get by with just a few bits for 
each cell. You could compress the actual counts with some sort of log scale 
and then multiply the sum at the end to arrive at the predicted row count.