[pdal] writers.gdal: counting all points in a cell

Fri Jan 28 15:32:10 PST 2022

On 1/28/22 13:35, Howard Butler wrote:
>
>> On Jan 28, 2022, at 1:30 PM, Jim Klassen <klassen.js at gmail.com> wrote:
>>
>> There is a comment in GDALGrid.cpp about counting all points if they are in a cell, ignoring the radius for that cell.
>>
>>      // This is a questionable case.  If a point is in a cell, shouldn't
>>      // it just be counted?
>>
>> My experience is that there are times when I want this behavior when I want to know every point is assigned to one and only one cell. Currently, the only options are to set radius so that the circle includes the whole cell (but also includes area outside the cell) or the circle only includes area in the cell, but also skips some area of the cell.
>>
>> There are also times, I want radius to act exactly as it does now, viewing the cell as sampling an area around the centroid of the cell.  I can imagine cases (ex. large cell size with high point density) where a radius smaller than the cell size makes sense. This existing mode also seems to me like it is more representative of the physics of sampling a surface with a laser pulse which has a non-zero cross section than the "just count everything that falls in the cell" mode.
>>
>> So, I want to be able to choose the behavior depending on the my current situation.
>>
>> I propose choosing the "count all points that fall in the cell" mode if the user supplies a radius that is negative and maintaining the current behavior if the radius is 0 or positive?  I'm not sure a zero radius is useful, it is currently supported and potentially could select points exactly aligned with the center of the cell.  A negative radius currently would never select any points (since distance is always >= 0).
>>
>> As a potential optimization, since radius is negative, the updateXXXQuadrant() calls don't select any cells, but do take minimal time in this case.  I'm not sure if there would be an overall benefit of guarding the updateXXXQuadrant() calls with a:
>>      if (m_radius >= 0)
>> for this new case vs the existing case. It probably depends highly on the CPU branch predictor.
>>
>> The following patch implements this idea:
>>
>> diff --git a/io/private/GDALGrid.cpp b/io/private/GDALGrid.cpp
>> index dade8f792..4f319f4ee 100644
>> --- a/io/private/GDALGrid.cpp
>> +++ b/io/private/GDALGrid.cpp
>> @@ -232,8 +232,9 @@ void GDALGrid::addPoint(double x, double y, double z)
>>
>>       // This is a questionable case.  If a point is in a cell, shouldn't
>>       // it just be counted?
>> +    // Just count points in the cell if m_radius is < 0, otherwise act normally
>>       double d = distance(iOrigin, jOrigin, x, y);
>> -    if (d < m_radius &&
>> +    if ((m_radius < 0 || d < m_radius) &&
>>           iOrigin >= 0 && jOrigin >= 0 &&
>>           iOrigin < width() && jOrigin < height())
>>           update(iOrigin, jOrigin, z, d);
> Jim,
>
> I wanted this just the other day too. If you make a PR with this that also includes an option to writers.gdal that defaults to the old behavior for count but allows the user to switch to this one, I would be excited to merge it.
>
> Thanks!
>
> Howard
>
> https://github.com/PDAL/PDAL/issues/3215

I'm happy to make a PR, but I'm not sure I understand.  This doesn't change the default behavior.  Radius defaults to resolution * sqrt(2) which is >= 0 so it will behave as it does currently unless the user explicitly sets a radius to be < 0.  Any positive radius a user sets would also behave as it does now.  My understanding is that min/max/mean/idw/count/stdev all work off the same set of points (determined by the radius).

Looking at the issue referenced, my understanding is count currently reflects the number of points considered in the stats for that pixel (which makes perfect sense to me).  With this change, if radius is set < 0 then count will be the number of points that fall within the cell.  Does that solve the issue or are you asking for a second metric that always just counts the number of points that falls within the cell as opposed to the number of points that are used for the statistic (which is the same as the number of points in the cell in the latter case)?

The one sticky bit would be if someone wanted different radius settings for the different metrics.  That would be more complicated to implement.  (Or I guess they could just chain multiple writers.gdal stages if they want to use different radius settings).