[postgis-users] Histogram2d formation
Andy Turk
andy at streetlight.com
Tue Oct 8 12:09:38 PDT 2002
On Tuesday 08 October 2002 11:11 am, David Blasby wrote:
> Here's an example of the histogram creation for BC roads data.
>
> One has a grid size of 20*20, the other 40*40 - see the attached .gif
> files.
Very cool!
> This histogram formation only takes a few seconds!
How many rows in the dataset? Or another way to size the computation... If it
only took a few seconds to calculate the histogram, how long did it take to
create a GiST index for the same data?
> Next, I'm going to try to find a way of estimating the result set size
> based on a query window. If anyone has any ideas how to go about doing
> this (esp for very small query windows), I'd like to hear from you.
>
> dave
If you're keeping a count of the rows intersected by each grid cell, then you
could calculate the sum of the grid cell values that are completely enclosed
by the query, and then add a linear interpolation for those that partially
overlap (e.g., area-of-cell/area-of-query*value-of-cell).
Looking at your .gifs, there seems to be a lot of "blank" areas where the
histogram is close to zero. Could these be collapsed out with a quad-tree
implementation that didn't need to represent equal-sized cells?
Another thought... keeping an actual row count for each cell might make the
histogram larger and also make it difficult to calculate the estimated count
for each query. However, you may be able to get by with just a few bits for
each cell. You could compress the actual counts with some sort of log scale
and then multiply the sum at the end to arrive at the predicted row count.
More information about the postgis-users
mailing list