<div dir="ltr">Thanks for the context!<br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 4 Oct 2022 at 17:17, Daniel Baston <<a href="mailto:dbaston@gmail.com">dbaston@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">
<div>GEOS is pretty tied to the interleaved representation. LIke JTS, it uses Coordinate struct to pass around coordinates, with fields for X, Y, and Z values.
If we wanted to support a separate-array representation, we'd either need to either
change our Coordinate struct to store pointers to the X and Y values,
or remove the Coordinate struct
altogether and have algorithms ask a CoordinateSequence for X and Y values
directly. Both seem like a large effort to eliminate the first of what is likely several copies.<br></div><div><br></div><div>That said, an updated CoordinateSequence could still bring about more incremental improvements in this space:</div><div>- Eliminating the virtual function overhead of coordinate access (this is what the linked PR does)<br></div><div>- Allowing the use of an external buffer instead of the one managed by std::vector, so we could have a zero-copy method of bringing data into GEOS from an interleaved buffer</div><div>- Making Coordinate a bit more generic, so we could avoid storing the Z value (or requiring it in an external buffer) in the cases where we don't want it.</div></div></blockquote><div><br></div><div>I think this last point would certainly be useful indeed. For example with GEOSCoordSeq_copyFromBuffer, this currently can only use the single-memcpy case if you have z coordinates. It would be useful to have an efficient version for the 2D case as well. <br></div><div>I am currently working on an implementation to convert an array of geometries in GeoArrow format to an array of GEOSGeometries, and even with the single copy, this copyFromBuffer is what takes the most time. So to repeat what Howard also just answered, allowing to create a CoordinateSequence from a buffer without copying would be very useful. But for this to be useful, such zero-copy option should ideally also work for 2D data, and not require a buffer including z values.</div><div><br></div><div>Joris<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div>Dan</div>
</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Sep 29, 2022 at 11:38 AM Joris Van den Bossche <<a href="mailto:jorisvandenbossche@gmail.com" target="_blank">jorisvandenbossche@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>We are currently having a discussion about interleaved (XYXY) vs separate arrays (XX YY) in the GeoArrow context (<a href="https://github.com/geopandas/geo-arrow-spec/pull/26" target="_blank">https://github.com/geopandas/geo-arrow-spec/pull/26</a>),
and I was wondering if the GEOS developers have thoughts on that topic
(whether it was ever specifically discussed if the current approach of
CoordinateArraySequence to store all dimensions interleaved is the best
option for GEOS or if separate arrays could be considered as well). But
so the above already touches upon that topic, and I also noticed related
discussion at <a href="https://github.com/libgeos/geos/pull/674#issuecomment-1245741058" target="_blank">https://github.com/libgeos/geos/pull/674#issuecomment-1245741058</a></div><div><br></div><div>In
general, from the Shapely/GeoPandas side, we are certainly interested
in being able to create CoordinateSequences zero-copy from arrays of
coordinates (the current GEOSCoordSeq_copyFromBuffer_r et al already
helps a lot though, although it would be nice if it could use a single
memcpy for 2D as well (related to underlying memory storage above) or
even avoid this one memcpy). <br></div><div><br></div><div>Best,<br></div><div>Joris</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 24 Aug 2022 at 13:50, Daniel Baston <<a href="mailto:dbaston@gmail.com" target="_blank">dbaston@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>This has come up a few times and despite some pushback from users [1] I agree that the benefits of the simplification probably outweigh the costs. Replacing CoordinateSequence with a concrete implementation is top on my list for the GDAL grant work [2] that I will be available to begin on September 1. My general plan was to</div><div><br></div><div>1. Replace CoordinateSequence with CoordinateArraySequence and check performance improvement. Even without algorithmic improvements like the ones you're talking about, I would expect that devirtualizating Coordinate access, and the consequent enabling of inlining, will provide a good benefit across the board.<br></div><div>2. Replace CoordinateArraySequence with a class backed by a buffer of doubles to support future generalization of Coordinate dimension. This may also allow us to offer zero-copy to clients that happen to use an XYXY representation internally (PostGIS)<br></div><div>3. Experiment with a union or std::variant to sneak in an optimized stack-only implementation for Points<br></div><div><br></div><div>I do wonder if our embrace of XYXY is the best choice here and if XXYY would enable vectorization in some cases. That would be a pretty invasive experiment to perform across the whole library, but maybe we could look at something like OrientationIndex in isolation.<br></div><div><br></div><div>Dan<br></div><div><br></div><div>[1] <a href="https://github.com/libgeos/geos/issues/564" target="_blank">https://github.com/libgeos/geos/issues/564</a></div><div>[2] <a href="https://lists.osgeo.org/pipermail/geos-devel/2022-March/010672.html" target="_blank">https://lists.osgeo.org/pipermail/geos-devel/2022-March/010672.html</a></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 23, 2022 at 3:22 PM Paul Ramsey <<a href="mailto:pramsey@cleverelephant.ca" target="_blank">pramsey@cleverelephant.ca</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">One of the things that tinkering with the SegmenString layer of overlay brought out to me was the extent to which we construct CoordinateSequence almost exclusively out of CoordinateArraySequence. Like, all the time. At yet, because we handle those CoordinateArraySequence at the API level almost exclusively as CoordinateSequence we lose the ability to do some handy optimizations. <br>
<br>
Like, if one were going to (as one does on every single CoordinateSequence that enters the overlay code) <br>
(1) test if there are repeated points and <br>
(2a) remove any if there are<br>
(2b) just return the untouched CoordinateSequence if there aren't<br>
a useful pattern would be for ::hasRepeatedPoints() to return/populate a list of indexes at which repeated points appear and for ::removeRepeatedPoints() to do bulk copies of all the points in between those indexes. This is foreclosed by the CoordinateSequence API, you can play this trick nicely with a std::vector living underneath, but the API doesn't let us see that (in fact) that's what we have 99.9% of the time.<br>
<br>
So, one obvious thing to do would be to remove the virtual methods in CoordinateSequence and pull the implementation up to that level, std::vector and all, and give up on the idea of an abstract interface that we don't actually use. For a handful of use cases, where data access cost is greater than computation cost (area, length, distance(?), some others (?)) this might be "bad" in some theoretical way, but note that currently we still don't actually have that abstract layer in place for a zero copy computation. Removing the virtual methods and inheritance from CoordinateSequence would foreclose an option that (a) we seem unlikely to ever deliver on and (b) has narrow performance benefits even if we did deliver on it.<br>
<br>
Meanwhile, the flip case seems to likely have a *lot* of performance benefits just hanging around waiting to be harvested. Coordinate access without going through the inheritance structure; access to some bulk operations like the repeated points case.<br>
<br>
For the "zero copy" crew, I feel like a big chunk of gains for them could be harvested by ensuring that point-based operations are available and don't require construction of a full Point() object. So things like PreparedGeometry->intersects(x, y). Sure, you still have to copy in your polygon feature and prepare it, but much of the overhead in that would still exist in a "zero copy" paradigm (all the internal index buildings). Meanwhile you'd no longer need to create a full Point() to do a point-in-poly test, and that would hopefully be a big win for most users.<br>
<br>
Random thoughs on a sunny day,<br>
P<br>
<br>
_______________________________________________<br>
geos-devel mailing list<br>
<a href="mailto:geos-devel@lists.osgeo.org" target="_blank">geos-devel@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/geos-devel" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/geos-devel</a><br>
</blockquote></div>
_______________________________________________<br>
geos-devel mailing list<br>
<a href="mailto:geos-devel@lists.osgeo.org" target="_blank">geos-devel@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/geos-devel" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/geos-devel</a><br>
</blockquote></div></div>
_______________________________________________<br>
geos-devel mailing list<br>
<a href="mailto:geos-devel@lists.osgeo.org" target="_blank">geos-devel@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/geos-devel" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/geos-devel</a><br>
</blockquote></div>
_______________________________________________<br>
geos-devel mailing list<br>
<a href="mailto:geos-devel@lists.osgeo.org" target="_blank">geos-devel@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/geos-devel" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/geos-devel</a><br>
</blockquote></div></div>