[pdal] Chunking/streaming support

Jason Overland joverland at lizardtech.com
Tue Jun 7 15:41:09 PDT 2016


Hi Andrew,
A quick follow up on this.  For now I made 3 methods on BpfReader public - ready, processOne, and done.  This allowed me to read a point at a time by calling processOne multiple times.  I also tried using read() to read a chunk at a time instead of processOne() but that ended up being a little bit slower (I thought it would be faster due to less seeks) and involved a little more trickery with setting up the PointTable.  Of course I'd prefer not willy nilly making private methods public as a long term strategy, but as a quick and dirty solution it seems to get things working for us.  The limitation of this is that we aren't building a full PDAL pipeline with these methods, just one reader, but that's all we're interested in for now anyway.  A good example of usage of our API is actually the MrsidReader plugin in PDAL.  We're open to sending patches if there's any interest in that.  Is making these methods public (or some equivalent) something PDAL is interested in?

Here's the code I'm using to get points :

class PDALPointReader::Iterator : public PointIterator
{
   CONCRETE_ITERATOR(Iterator);
protected:
   ~Iterator(void)
   {
      m_reader.done(m_fixedPointTable);

   }
   
public:
	Iterator(void) :
		m_fixedPointTable(10)
	{
		
	}
   void init(const Bounds &bounds,
             double fraction,
             const PointInfo &pointInfo,
             ProgressDelegate *delegate,
             const char *path)
   {
      assert(path != NULL && *path != '\0');
      PointIterator::init(bounds, fraction, pointInfo, delegate);
      
      m_finished = false;
      pdal::Options options;
      options.add("filename", path);
      m_reader.setOptions(options);
      m_reader.prepare(m_fixedPointTable);
      m_fixedPointTable.finalize();
      m_reader.ready(m_fixedPointTable);
   }
      
   size_t getNextPoints(PointData &points)
   {
      static const char message[] = "reading PDAL file";
      if(m_delegate != NULL)
         m_delegate->updateCompleted(0, message);

      std::vector<pdal::Dimension::Id::Enum> channelsToDimensions;
      for (int i = 0; i < points.getNumChannels(); i++)
      {
         ChannelData& channel = points.getChannel(i);

         pdal::Dimension::Id::Enum dim = m_fixedPointTable.layout()->findDim(channel.getName());
         channelsToDimensions.push_back(dim);
      }

      //We thought using read() would be faster as there's less seeking, but at least in debug mode, using processOne() is faster.
      //Since point mode requires a little less hackery with PointTables, we're going with point mode for now.
/*#if 0
      pdal::FixedPointTable fixedPointTable(points.getNumSamples());
      m_reader.addDimensions(fixedPointTable.layout());
      fixedPointTable.finalize();
      pdal::PointViewPtr pointViewPtr(new pdal::PointView(fixedPointTable));
      
      int cnt = m_reader.read(pointViewPtr, points.getNumSamples());
      int used = 0;
      size_t cancelCount = 0;

      for (int i = 0; i < cnt; i++)
      {
         
         double x = pointViewPtr->getFieldAs<double>(pdal::Dimension::Id::X, i);
         double y = pointViewPtr->getFieldAs<double>(pdal::Dimension::Id::Y, i);
         double z = pointViewPtr->getFieldAs<double>(pdal::Dimension::Id::Z, i);
         if (useSample(x, y, z))
         {
            for (int j = 0; j < points.getNumChannels(); j++)
            {
               ChannelData& channel = points.getChannel(j);
               pdal::Dimension::Id::Enum dim = channelsToDimensions[j];
               switch (dim)
               {
               case pdal::Dimension::Id::X:
                  static_cast<double*>(channel.getData())[i] = x;
                  break;
               case pdal::Dimension::Id::Y:
                  static_cast<double*>(channel.getData())[i] = y;
                  break;
               case pdal::Dimension::Id::Z:
                  static_cast<double*>(channel.getData())[i] = z;
                  break;
               default:
                  {
                     float value = pointViewPtr->getFieldAs<float>(dim, i);
                     static_cast<float*>(channel.getData())[i] = value;
                     break;
                  }
               }
            }

            used++;
         }
         cancelCount += 1;
         if (cancelCount == 4096)
         {
            if (m_delegate != NULL)
            {
               m_delegate->updateCompleted(static_cast<double>(cancelCount), message);

               if (m_delegate->getCancelled())
                  THROW_LIBRARY_ERROR(LTL_STATUS_CORE_OPERATION_CANCELLED)
                  ("operation cancelled.");
            }
            cancelCount = 0;
         }
      }
      return used;
#endif*/


      pdal::PointRef point(m_fixedPointTable, 0);
      int cnt = 0, cancelCount = 0;
      while (!m_finished && cnt < points.getNumSamples())
      {
         m_finished = !m_reader.processOne(point);
         
         
         double x = point.getFieldAs<double>(pdal::Dimension::Id::X);
         double y = point.getFieldAs<double>(pdal::Dimension::Id::Y);
         double z = point.getFieldAs<double>(pdal::Dimension::Id::Z);
         if (useSample(x, y, z))
         {
            for (int j = 0; j < points.getNumChannels(); j++)
            {
               ChannelData& channel = points.getChannel(j);
               pdal::Dimension::Id::Enum dim = channelsToDimensions[j];
               switch (dim)
               {
               case pdal::Dimension::Id::X:
                  static_cast<double*>(channel.getData())[cnt] = x;
                  break;
               case pdal::Dimension::Id::Y:
                  static_cast<double*>(channel.getData())[cnt] = y;
                  break;
               case pdal::Dimension::Id::Z:
                  static_cast<double*>(channel.getData())[cnt] = z;
                  break;
               default:
               {
                  float value = point.getFieldAs<float>(dim);
                  static_cast<float*>(channel.getData())[cnt] = value;
                  break;
               }
               }
            }

            cnt++;
         }
         cancelCount += 1;
         if (cancelCount == 4096)
         {
            if (m_delegate != NULL)
            {
               m_delegate->updateCompleted(static_cast<double>(cancelCount), message);

               if (m_delegate->getCancelled())
                  THROW_LIBRARY_ERROR(LTL_STATUS_CORE_OPERATION_CANCELLED)
                  ("operation cancelled.");
            }
            cancelCount = 0;
         }
         
         
      }
      return cnt;
   }
protected:

   pdal::BpfReader  m_reader;
   pdal::FixedPointTable m_fixedPointTable;
   bool m_finished;
};

Where PointIterator is an abstract base class that requires implementations to implement getNextPoints().

Here's the changes we made in PDAL.  I had to add a bounds check upfront to processOne because of the way we're calling it.  Also found a random typo in Dimension.hpp while poking through the code.

diff -r 85fc82f96ded -r 80f391aa4670 xt_lib_pdal/src/PDAL-1.2.0-src/include/pdal/Dimension.hpp
--- a/xt_lib_pdal/src/PDAL-1.2.0-src/include/pdal/Dimension.hpp	Wed May 25 14:38:46 2016 -0700
+++ b/xt_lib_pdal/src/PDAL-1.2.0-src/include/pdal/Dimension.hpp	Wed May 25 15:46:26 2016 -0700
@@ -424,7 +424,7 @@
         return Id::IsPpsLocked;
     else if (s == "STARTPULSE")
         return Id::StartPulse;
-    else if (s == "RELFECTEDPULSE")
+    else if (s == "REFLECTEDPULSE")
         return Id::ReflectedPulse;
     else if (s == "PITCH")
         return Id::Pitch;
diff -r 85fc82f96ded -r 80f391aa4670 xt_lib_pdal/src/PDAL-1.2.0-src/io/bpf/BpfReader.cpp
--- a/xt_lib_pdal/src/PDAL-1.2.0-src/io/bpf/BpfReader.cpp	Wed May 25 14:38:46 2016 -0700
+++ b/xt_lib_pdal/src/PDAL-1.2.0-src/io/bpf/BpfReader.cpp	Wed May 25 15:46:26 2016 -0700
@@ -256,6 +256,8 @@
 
 bool BpfReader::processOne(PointRef& point)
 {
+   if (eof() || (m_index >= m_count))
+      return true;
     switch (m_header.m_pointFormat)
     {
     case BpfFormat::PointMajor:
diff -r 85fc82f96ded -r 80f391aa4670 xt_lib_pdal/src/PDAL-1.2.0-src/io/bpf/BpfReader.hpp
--- a/xt_lib_pdal/src/PDAL-1.2.0-src/io/bpf/BpfReader.hpp	Wed May 25 14:38:46 2016 -0700
+++ b/xt_lib_pdal/src/PDAL-1.2.0-src/io/bpf/BpfReader.hpp	Wed May 25 15:46:26 2016 -0700
@@ -64,6 +64,10 @@
 
     virtual point_count_t numPoints() const
         {  return (point_count_t)m_header.m_numPts; }
+
+    virtual void ready(PointTableRef table);
+    virtual void done(PointTableRef table);
+    virtual bool processOne(PointRef& point);
 private:
     ILeStream m_stream;
     BpfHeader m_header;
@@ -86,10 +90,8 @@
     virtual QuickInfo inspect();
     virtual void initialize();
     virtual void addDimensions(PointLayoutPtr Layout);
-    virtual void ready(PointTableRef table);
-    virtual bool processOne(PointRef& point);
     virtual point_count_t read(PointViewPtr data, point_count_t num);
-    virtual void done(PointTableRef table);



From: Andrew Bell [mailto:andrew.bell.ia at gmail.com] 
Sent: Monday, May 23, 2016 6:16 PM
To: Jason Overland
Cc: pdal at lists.osgeo.org
Subject: Re: [pdal] Chunking/streaming support

On Mon, May 23, 2016 at 5:02 PM, Jason Overland <joverland at lizardtech.com> wrote:
Hi,
I’m trying to use PDAL’s C++ API to read BPF files and integrate it into our existing API.  In our API we expose an iterator called PointIterator which is constructed from a file and a region (bounding box).  Our PointIterator has a getNextPoints() method which walks the specified region of the point cloud until there are no more points to extract.  To work within this existing API we would like to use a chunking/streaming/stripping/iteration mechanism, i.e. read at most n points at a time from the file, stop and return execution to our API user’s code, and then continue where we left off on the next call to getNextPoints(), rinse and repeat until we’ve read all the points we’re interested in.  I’ve been looking at the Streaming support as exemplified in StreamingTest.cpp but haven’t quite been able to wrap my head around whether or not what I’m trying to achieve is currently possible.

The issue is that PDAL's API works in a manner opposite to yours.  PDAL expects to call your code when a point has been read, rather than the other way around.  PDAL handles the point buffering and so on, relieving your code of the burden.  PDAL wants to run an entire pipeline -- it's not intended as a stand-along point-by-point reader.  Without seeing your processing code, I can't provide any advice on how to make it work with the existing API.

That said, I understand that this doesn't meet your model and your model isn't unreasonable.  I don't know where doing something like this might fit in our current priority list, but it's probably not too large a task.

-- 
Andrew Bell
andrew.bell.ia at gmail.com


More information about the pdal mailing list