[pdal] Bad allocation error

Howard Butler howard at hobu.co
Wed Jul 20 05:23:09 PDT 2022



> On Jun 29, 2022, at 2:39 PM, Howard Butler <howard at hobu.co> wrote:
> 
> 
> 
>> On Jun 22, 2022, at 6:50 AM, Krasovec, Nina <Nina.Krasovec at zi-mannheim.de> wrote:
>> 
>> Hello everyone,
>> 
>> I would like to assign a value from polygons (stored in a column “class”) to a Classification attribute in a point cloud. Both, point cloud and geopackage are split into tiles, which means that the point clouds usually contain ~15 million points and geopackages contain ~35,000 polygons or less. I tried to process multiple point clouds and one of them worked, while the other ones always throw an error “PDAL: bad allocation”. All the files have approximately the same size and I also tested it with a lot smaller area and smaller gpkg and it still did not work. The only thing that worked was splitting geopackages to very small files. The LAZ file that was successfully processed used around 15 GB of RAM, while the other files went up to 256 GB and it still wasn’t sufficient. Do you have a suggestion what could be the reason for such an enormous use of memory?
>> Here is an example of a pipeline:
>> 
>> [
>>      {
>>            "type":"readers.las",
>>            "use_eb_vlr":"true",
>>            "filename":"D:/User/Nina/460000_5477000.laz"
>>      },
>>    {
>>        "type":"filters.overlay",
>>        "dimension":"Classification",
>>        "datasource":" D:/User/Nina/460000_5477000.gpkg",
>>            "column": "class",
>>            "where": "Classification == 8"
>>    },
>>      {
>>        "type":"writers.las",
>>            "compression": "true",
>>            "a_srs": "EPSG:25832",
>>            "extra_dims":"all",
>>        "filename":" D:/User/Nina/460000_5477000.laz"
>>    }
>> ]
>> 
> 
> Nina,
> 
> I don't know the specifics of your scenario, but I should think you will want to break your processing up into something more specific. Make a loop that iterates each geometry in the geopackage and runs a pipeline for each that produces your classified las file and then merge them altogether at the end.
> 
> 1) Convert your data to COPC.io format using writers.copc. 
> 2) Read each geometry in the geopackage and get its bounds
> 3) Write a pipeline (or use the Python bindings) to set the 'bounds' object of readers.copc to it
> 4) Permute the writers.las.filename to correspond to your feature id
> 5) Merge all of the files together once complete.
> 
> This model means that you should only read the points that matter for each polygon you are selecting. It should also parallelize conveniently using whatever batching approach you desire (I see you are on windows so that might be some powershell magic, IDK). 
> 
> It would just be speculation to explain why things are going haywire without your data and a test scenario.
> 
> Howard   
> 

Nina provided me some example data and the issue is her GeoPackage has 33k polygons in it. The way PDAL's polygon/overlay code works is to test every one of these polygons and edges. There is no index that pre-filters the query.

Luckily, OGR can do this for us, but that capability wasn't exposed to PDAL. I have since added a 'bounds' option to filters.overlay to allow the user to use OGR to pre-filter the polygons. This will be generally available when PDAL 2.5.0 is released.

Howard

https://github.com/PDAL/PDAL/pull/3815



More information about the pdal mailing list