[pdal] Bad allocation error

Wed Jun 29 12:39:58 PDT 2022

> On Jun 22, 2022, at 6:50 AM, Krasovec, Nina <Nina.Krasovec at zi-mannheim.de> wrote:
> 
> Hello everyone,
>  
> I would like to assign a value from polygons (stored in a column “class”) to a Classification attribute in a point cloud. Both, point cloud and geopackage are split into tiles, which means that the point clouds usually contain ~15 million points and geopackages contain ~35,000 polygons or less. I tried to process multiple point clouds and one of them worked, while the other ones always throw an error “PDAL: bad allocation”. All the files have approximately the same size and I also tested it with a lot smaller area and smaller gpkg and it still did not work. The only thing that worked was splitting geopackages to very small files. The LAZ file that was successfully processed used around 15 GB of RAM, while the other files went up to 256 GB and it still wasn’t sufficient. Do you have a suggestion what could be the reason for such an enormous use of memory?
> Here is an example of a pipeline:
>  
> [
>       {
>             "type":"readers.las",
>             "use_eb_vlr":"true",
>             "filename":"D:/User/Nina/460000_5477000.laz"
>       },
>     {
>         "type":"filters.overlay",
>         "dimension":"Classification",
>         "datasource":" D:/User/Nina/460000_5477000.gpkg",
>             "column": "class",
>             "where": "Classification == 8"
>     },
>       {
>         "type":"writers.las",
>             "compression": "true",
>             "a_srs": "EPSG:25832",
>             "extra_dims":"all",
>         "filename":" D:/User/Nina/460000_5477000.laz"
>     }
> ]
>  

Nina,

I don't know the specifics of your scenario, but I should think you will want to break your processing up into something more specific. Make a loop that iterates each geometry in the geopackage and runs a pipeline for each that produces your classified las file and then merge them altogether at the end.

1) Convert your data to COPC.io format using writers.copc. 
2) Read each geometry in the geopackage and get its bounds
3) Write a pipeline (or use the Python bindings) to set the 'bounds' object of readers.copc to it
4) Permute the writers.las.filename to correspond to your feature id
5) Merge all of the files together once complete.

This model means that you should only read the points that matter for each polygon you are selecting. It should also parallelize conveniently using whatever batching approach you desire (I see you are on windows so that might be some powershell magic, IDK). 

It would just be speculation to explain why things are going haywire without your data and a test scenario.

Howard