[pdal] Proposal: Remove SQLite reader/writer for PDAL 2.0

Howard Butler howard at hobu.co
Thu Aug 1 11:32:05 PDT 2019



> On Aug 1, 2019, at 1:13 PM, Jed Frechette <jedfrechette at gmail.com> wrote:
> 
> On Wed, 31 Jul 2019 08:28:09 -0500 Howard Butler wrote:
>> On second thought, we'll table the removal until 2.1 to allow a full release cycle for everyone who might be impacted to catch up, but we will still plan to remove these drivers. Again, if you are going to be impacted by this change we would like to hear from you. We don't know of much use of these drivers, and their continued maintenance burden does not seem worth it given their limited use.
> 
> The use case was to round-trip data to PDAL and back from inside a couple of applications that include embedded Python interpreters, but have limited support for other PDAL formats. In particular, although the specific applications we're working with do have some level of
> support for both las and e57, the built in readers and writers don't always handle arbitrary dimensions well, if at all.

I think TileDB is a better choice for this task at the moment than SQLite with the upcoming PDAL 2.0 release. Norman Barker is supporting it through PDAL's support venues, and I believe his firm is available for paid support opportunities. It is going to perform much better and it supports streaming. The SQLite drivers were a proof of concept that I developed based on our experience with both the Oracle and pgpointcloud drivers, and while it is interesting for database storage of point clouds, it has some significant downsides.

TileDB's interface in Python is much better than SQLite's and is going to give you much more convenient numpy access to the points and  attributes. With SQLite you would have to build that all up yourself, and you would likely need to protect yourself from us making any schema changes to the SQLite storage layout too. 

> As an aside, I don't know if it makes sense for PDAL to have a "native" container format or not, but it would be helpful to more
> clearly document which writer/reader pairs can be expected to losslessly round-trip data if PDAL is the only application involved.

We don't really have such a thing. LAS and LAZ with extra_bytes is likely the closest thing it we have released at the moment, especially if you stuff metadata in VLRs, but it isn't pure by any means.

> With the number of useful filters PDAL already has, I don't think it is unreasonable to start thinking about it as a central part of a processing workflow rather than just  a tool for moving data from External Application A to External Application B. The more it takes on
> a central role the more I think it makes sense to be asking "What's the best container format to use with PDAL?" rather than just conforming to whatever might be supported by external applications.

HDF or TileDB might make the most sense as the binary container for this task. Both would give you maximum flexibility, compressed binary storage with universal platform support, and opportunity to advertise the data in other computing environments beyond PDAL. It hasn't been a requirement for us to develop something like this, however.  You should explore whether or not the TileDB drivers that Norman Barker recently added are sufficient for this task, but I would suspect some of the finer points like PDAL's metadata and such might not fully survive a transit. It's hard work to get all of that stuff right. We haven't done it.

Howard


More information about the pdal mailing list