[postgis-devel] ST_DumpPoints in C

Paul Ramsey pramsey at opengeo.org
Mon Sep 24 09:29:05 PDT 2012


Nathan,
I'd have to sit with you to talk through the copying, there is
somewhat more of it that is strictly needed in a lot of access paths,
a throwback to our pre-2.0 unaligned heritage. It's now possible to
direct read from underlying gserialized using the

extern const POINT2D* getPoint2d_cp(const POINTARRAY *pa, int n);

function (though you'll note there's only a 2d variant ATM)

I'm sure folks will be very please to have a C dumppoints.

P.

On Mon, Sep 24, 2012 at 5:20 AM, Nathan Wagner <nw at hydaspes.if.org> wrote:
> We met and spoke briefly at the Postgres Open conference in Chicago.
> Following our conversation, I took another look at rewriting
> ST_DumpPoints in C.  I've got a preliminary attempt working.
>
> Some output from my working tryout script is below.  For creating a temp
> table from a geometry of 1.16 million points, the existing plpgsql
> function takes about 11 seconds.  I have gotten that down to around 4
> seconds with my C implementation.
>
> For some reason the plpgsql also takes the same amount of time to
> generate the first few points under a limit restriction, while the C
> function is much faster at that.  That is, it seems like the plpgsql
> function takes a fixed amount of time to generate the points that is
> dependent on the size, but not the number of points actually returned.
> I don't know why that would be.
>
> There is an amazing amount of copying and memory allocation going on
> behind the scenes with the lwgeom data structures.  I have tried to
> minimize that, but I may be a bit too cozy with the internals.  If I had
> a function that would convert a point in a ptarray directly into a
> gserialized, that would eliminate pretty much all of the copying
> internally, other than the final one for the gserialized.  I think that
> would speed things up quite a lot also.  I do one copy at the top of the
> function to cache the input, I'm not sure that's actually needed, since
> I never write to the incoming data.
>
> A few other optimizations can probably be made, but without profiling it
> to be sure, I don't think it can be made much faster without converting
> directly from ptarray member to gserialized.
>
> Once I clean up the code and make sure it will handle every geometry
> type (I've only been testing against polygons), I will drop it in
> against the latest SVN checkout and make sure it passes the
> regression tests.  Once it does, I will put the code somewhere it
> can be found and post to the list.  Feel free to forward this email
> if you think someone else would be interested before I get to
> posting it publicly.
>
> Output follows should you care to look.  The 'old dumping first ten
> points' and the 'odump' temp table use the plpgsql function,
> the others use the C implementation.
>
> psql -d gis -1 -f try.sql
> DROP EXTENSION
> CREATE EXTENSION
> SET
>  st_geometrytype
> -----------------
>  ST_Polygon
> (1 row)
>
> SET
> SET
> creating sdump
> Timing is on.
> dumping first 10 points of gid 1
> select *, st_astext(geom) from (
> select (dp.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1 limit 10
> ) x
> ;
>   path  |                        geom                        |          st_astext
> --------+----------------------------------------------------+-----------------------------
>  {1,1}  | 0101000020E61000000000000000806640FF5E0A0F9A3F5140 | POINT(180 68.993778)
>  {1,2}  | 0101000020E61000000000000000806640C154336B29425040 | POINT(180 65.033778)
>  {1,3}  | 0101000020E61000004293C492F27F66405C8FC2F528425040 | POINT(179.998361 65.03375)
>  {1,4}  | 0101000020E6100000AF7AC03CE47F66407BD976DA1A425040 | POINT(179.996611 65.032889)
>  {1,5}  | 0101000020E6100000F10D85CFD67F6640E09EE74F1B425040 | POINT(179.994972 65.032917)
>  {1,6}  | 0101000020E6100000DD5F3DEE5B7F6640672B2FF99F415040 | POINT(179.979972 65.025389)
>  {1,7}  | 0101000020E6100000D255BABB4E7F6640672B2FF99F415040 | POINT(179.978361 65.025389)
>  {1,8}  | 0101000020E6100000E17A14AE477F66409A99999999415040 | POINT(179.9775 65.025)
>  {1,9}  | 0101000020E61000000BCF4BC5467F6640B9E34D7E8B415040 | POINT(179.977389 65.024139)
>  {1,10} | 0101000020E610000005E09F52257F66408FAB915D69415040 | POINT(179.973306 65.022056)
> (10 rows)
>
> Time: 183.578 ms
> old dumping first 10 points of gid 1
> select *, st_astext(geom) from (
> select (public.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1 limit 10
> ) x
> ;
>   path  |                        geom                        |          st_astext
> --------+----------------------------------------------------+-----------------------------
>  {1,1}  | 0101000020E61000000000000000806640FF5E0A0F9A3F5140 | POINT(180 68.993778)
>  {1,2}  | 0101000020E61000000000000000806640C154336B29425040 | POINT(180 65.033778)
>  {1,3}  | 0101000020E61000004293C492F27F66405C8FC2F528425040 | POINT(179.998361 65.03375)
>  {1,4}  | 0101000020E6100000AF7AC03CE47F66407BD976DA1A425040 | POINT(179.996611 65.032889)
>  {1,5}  | 0101000020E6100000F10D85CFD67F6640E09EE74F1B425040 | POINT(179.994972 65.032917)
>  {1,6}  | 0101000020E6100000DD5F3DEE5B7F6640672B2FF99F415040 | POINT(179.979972 65.025389)
>  {1,7}  | 0101000020E6100000D255BABB4E7F6640672B2FF99F415040 | POINT(179.978361 65.025389)
>  {1,8}  | 0101000020E6100000E17A14AE477F66409A99999999415040 | POINT(179.9775 65.025)
>  {1,9}  | 0101000020E61000000BCF4BC5467F6640B9E34D7E8B415040 | POINT(179.977389 65.024139)
>  {1,10} | 0101000020E610000005E09F52257F66408FAB915D69415040 | POINT(179.973306 65.022056)
> (10 rows)
>
> Time: 10319.984 ms
> creating ndump
> create temp table ndump as select (dp.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1;
> SELECT 1160953
> Time: 3929.143 ms
> creating odump
> create temp table odump as select (public.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1;
> SELECT 1160953
> Time: 13254.297 ms
> creating nndump
> create temp table nndump as select (dp.st_dumppoints(geom)).* from gshhs.shape_full where gid = 1;
> SELECT 1160953
> Time: 3895.538 ms
> rollback;
> ROLLBACK
> Time: 119.024 ms
>
> --
> nw



More information about the postgis-devel mailing list