[postgis-devel] ST_DumpPoints in C
Paul Ramsey
pramsey at opengeo.org
Mon Sep 24 09:29:05 PDT 2012
Nathan,
I'd have to sit with you to talk through the copying, there is
somewhat more of it that is strictly needed in a lot of access paths,
a throwback to our pre-2.0 unaligned heritage. It's now possible to
direct read from underlying gserialized using the
extern const POINT2D* getPoint2d_cp(const POINTARRAY *pa, int n);
function (though you'll note there's only a 2d variant ATM)
I'm sure folks will be very please to have a C dumppoints.
P.
On Mon, Sep 24, 2012 at 5:20 AM, Nathan Wagner <nw at hydaspes.if.org> wrote:
> We met and spoke briefly at the Postgres Open conference in Chicago.
> Following our conversation, I took another look at rewriting
> ST_DumpPoints in C. I've got a preliminary attempt working.
>
> Some output from my working tryout script is below. For creating a temp
> table from a geometry of 1.16 million points, the existing plpgsql
> function takes about 11 seconds. I have gotten that down to around 4
> seconds with my C implementation.
>
> For some reason the plpgsql also takes the same amount of time to
> generate the first few points under a limit restriction, while the C
> function is much faster at that. That is, it seems like the plpgsql
> function takes a fixed amount of time to generate the points that is
> dependent on the size, but not the number of points actually returned.
> I don't know why that would be.
>
> There is an amazing amount of copying and memory allocation going on
> behind the scenes with the lwgeom data structures. I have tried to
> minimize that, but I may be a bit too cozy with the internals. If I had
> a function that would convert a point in a ptarray directly into a
> gserialized, that would eliminate pretty much all of the copying
> internally, other than the final one for the gserialized. I think that
> would speed things up quite a lot also. I do one copy at the top of the
> function to cache the input, I'm not sure that's actually needed, since
> I never write to the incoming data.
>
> A few other optimizations can probably be made, but without profiling it
> to be sure, I don't think it can be made much faster without converting
> directly from ptarray member to gserialized.
>
> Once I clean up the code and make sure it will handle every geometry
> type (I've only been testing against polygons), I will drop it in
> against the latest SVN checkout and make sure it passes the
> regression tests. Once it does, I will put the code somewhere it
> can be found and post to the list. Feel free to forward this email
> if you think someone else would be interested before I get to
> posting it publicly.
>
> Output follows should you care to look. The 'old dumping first ten
> points' and the 'odump' temp table use the plpgsql function,
> the others use the C implementation.
>
> psql -d gis -1 -f try.sql
> DROP EXTENSION
> CREATE EXTENSION
> SET
> st_geometrytype
> -----------------
> ST_Polygon
> (1 row)
>
> SET
> SET
> creating sdump
> Timing is on.
> dumping first 10 points of gid 1
> select *, st_astext(geom) from (
> select (dp.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1 limit 10
> ) x
> ;
> path | geom | st_astext
> --------+----------------------------------------------------+-----------------------------
> {1,1} | 0101000020E61000000000000000806640FF5E0A0F9A3F5140 | POINT(180 68.993778)
> {1,2} | 0101000020E61000000000000000806640C154336B29425040 | POINT(180 65.033778)
> {1,3} | 0101000020E61000004293C492F27F66405C8FC2F528425040 | POINT(179.998361 65.03375)
> {1,4} | 0101000020E6100000AF7AC03CE47F66407BD976DA1A425040 | POINT(179.996611 65.032889)
> {1,5} | 0101000020E6100000F10D85CFD67F6640E09EE74F1B425040 | POINT(179.994972 65.032917)
> {1,6} | 0101000020E6100000DD5F3DEE5B7F6640672B2FF99F415040 | POINT(179.979972 65.025389)
> {1,7} | 0101000020E6100000D255BABB4E7F6640672B2FF99F415040 | POINT(179.978361 65.025389)
> {1,8} | 0101000020E6100000E17A14AE477F66409A99999999415040 | POINT(179.9775 65.025)
> {1,9} | 0101000020E61000000BCF4BC5467F6640B9E34D7E8B415040 | POINT(179.977389 65.024139)
> {1,10} | 0101000020E610000005E09F52257F66408FAB915D69415040 | POINT(179.973306 65.022056)
> (10 rows)
>
> Time: 183.578 ms
> old dumping first 10 points of gid 1
> select *, st_astext(geom) from (
> select (public.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1 limit 10
> ) x
> ;
> path | geom | st_astext
> --------+----------------------------------------------------+-----------------------------
> {1,1} | 0101000020E61000000000000000806640FF5E0A0F9A3F5140 | POINT(180 68.993778)
> {1,2} | 0101000020E61000000000000000806640C154336B29425040 | POINT(180 65.033778)
> {1,3} | 0101000020E61000004293C492F27F66405C8FC2F528425040 | POINT(179.998361 65.03375)
> {1,4} | 0101000020E6100000AF7AC03CE47F66407BD976DA1A425040 | POINT(179.996611 65.032889)
> {1,5} | 0101000020E6100000F10D85CFD67F6640E09EE74F1B425040 | POINT(179.994972 65.032917)
> {1,6} | 0101000020E6100000DD5F3DEE5B7F6640672B2FF99F415040 | POINT(179.979972 65.025389)
> {1,7} | 0101000020E6100000D255BABB4E7F6640672B2FF99F415040 | POINT(179.978361 65.025389)
> {1,8} | 0101000020E6100000E17A14AE477F66409A99999999415040 | POINT(179.9775 65.025)
> {1,9} | 0101000020E61000000BCF4BC5467F6640B9E34D7E8B415040 | POINT(179.977389 65.024139)
> {1,10} | 0101000020E610000005E09F52257F66408FAB915D69415040 | POINT(179.973306 65.022056)
> (10 rows)
>
> Time: 10319.984 ms
> creating ndump
> create temp table ndump as select (dp.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1;
> SELECT 1160953
> Time: 3929.143 ms
> creating odump
> create temp table odump as select (public.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1;
> SELECT 1160953
> Time: 13254.297 ms
> creating nndump
> create temp table nndump as select (dp.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1;
> SELECT 1160953
> Time: 3895.538 ms
> rollback;
> ROLLBACK
> Time: 119.024 ms
>
> --
> nw
More information about the postgis-devel
mailing list