[postgis-devel] performance test suite

Thu May 3 01:19:15 PDT 2018

I really the idea of having a performance framework either in the project
itself or closely associated.

I've previously made some experiments to check the performance impact of
some of my changes while working on issues related to either Postgis or
Postgres so here are some thoughts about the frameworks I've used:

* Google benchmark: can be used for liblwgeom/ but not for postgis/ so it's
only partial solution. It's C++, requires an external library and it's
oriented to microbenchmarks. My guess is that creating the test cases and
maintaining them takes a similar effort as adding unit tests from scratch;
some of the microbenchmarks could be taken from already existing unit tests
but more complex tests would be required from the wild real world.

Here is an example with some tests I made for
lwgeom_mindistance2d_tolerance, lw_arc_center or lw_dist2d_arc_arc:
https://git.io/vpgoI

* pgbench: This is the benchmark framework included in Postgresql. I
believe it's meant to be used to test the database under different settings
/ hardware but it can be easily extended to make whatever queries you need.
The functions available are limited but it has a bunch of new ones coming
in PG11.

Here is an example with a script that makes queries for a raster tile given
the table, zoom and bbox:
https://gist.github.com/Algunenano/dc6618e4e08ab8aebcb4c12113114988 I've
mainly used pgbench tests plus some shell oneliners to run a bunch of
automated tests and check for performance changes with underlying changes
like PG configuration changes or kernel updates (remember Meltdown and
Spectre).

Out of the 2 I think pgbench is the closest to what it's being proposed
here but it certainly requires some work on top to setup tables, run
queries and extract results.

One cool thing about both is that as both can be setup to repeat the same
thing over and over it's pretty easy to use a tool like perf to analyze
performance.

I'd also mention that I've found nearly impossible to have results be
useful more than some weeks away due to underlying changes in the OS. You
could always have an isolated machine without updates for testing but as
soon as you make any change (e.g. update the compiler) the results are
invalidated. As Even has also mentioned, I prefer to do a run, make
whatever change/update I want to test, and inmediately after make another
run for comparison.

On Wed, May 2, 2018 at 6:44 PM, Regina Obe <lr at pcorp.us> wrote:

> I'm not sure how this "ET phone home" you are proposing would work.  How
> could it send somewhere since whatever the stats collector would need to
> make a network call outside which generally is not available in DB.
>
>
>
> Sounds like it might need to be a shared preloaded lib as well as well as
> some outside scheduled service that sends usage stats.
>
>
>
>
>
>
>
>
>
> *From:* postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] *On
> Behalf Of *Darafei "Kom?pa" Praliaskouski
> *Sent:* Wednesday, May 02, 2018 3:22 AM
>
> *To:* PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
> *Subject:* Re: [postgis-devel] performance test suite
>
>
>
> I think the biggest part of it is about collecting real world datasets and
> workloads.
>
>
>
> I was thinking about making a system that would serve as error handler for
> things like unexpected GEOS errors that would send offending geometries and
> all versions of all software to some (separate) issue tracker.
>
>
>
> A lot of software asks "do you want to share usage statistics with us?" -
> can we do something similar? If enabled, log number of invocations and
> typical characteristics of calls (number of points in geometries, their
> types, K for KMeans, whatever seems reasonable for the case) and send
> somewhere we can pick it up once a day?
>
>
>
> ср, 2 мая 2018 г. в 10:08, Regina Obe <lr at pcorp.us>:
>
> That sounds like a good start, we could also have a folder on postgis.net
> hosting some data files we can use.
>
> I was wondering if pg_bench would be of any value.  I honestly haven't
> explored it to know how much flexibility we have in feeding it custom
> queries.  It seems it's possible.
>
> https://www.postgresql.org/docs/10/static/pgbench.html
>
>
>
>
>
> *From:* postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] *On
> Behalf Of *Daniel Baston
> *Sent:* Tuesday, May 01, 2018 4:53 PM
> *To:* PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
>
>
> *Subject:* Re: [postgis-devel] performance test suite
>
>
>
> I agree that this would be very useful, not only for catching regressions
> but also to help us promote the performance improvements make it in to each
> release. We itemize performance improvements in the changelog, but we don't
> generally quantify what they mean for typical use cases. It would be nice
> to say by upgrading to release 2.5, typical point-in-polygon queries are
> improved by 20%, K-means is improved by X%, etc.
>
>
>
> To keep the perl/python to a minimum, could we rely on pg_stat_statements
> to do the bulk of the work for us? So it be something as simple as:
>
>
>
> 1) a script that loads or generates test data
>
> 2) a SQL file that runs a bunch of queries capturing typical usages of
> PostGIS
>
> 3) something that parses the output of pg_stat_statements
>
>
>
> Dan
>
>
>
> On Tue, May 1, 2018 at 4:42 PM, Regina Obe <lr at pcorp.us> wrote:
>
> Bjorn,
>
>
>
> Oh you are a man after my own heart.  Yes definitely.  Performance testing
> is a very weak spot in our testing.  I hate finding out about this when
> users complain J
>
>
>
> I think starting it off as a separate project is a good idea but I'd
> really love to see it eventually as part of PostGIS core that say we can
> flip on and have enabled for some bots or when we are about to release.
> How we keep record of timings etc, seems to me a bot end thing the testing
> bot reporting to some mothership database.
>
>
>
> As to whether it should be done in perl or something else – to be honest
> Perl scares the shit out of me.  Python sadly I haven't warmed up to
> either.  I always feel like I'm fumbling thru a mine field with both.  Okay
> that's an exaggeration.
>
>
>
> But then again Perl is a dependency we are used to having, so whatever you
> do ideally shouldn't add any crazy dependencies and if additional
> dependencies – a dependency that can run on all platforms.  It's okay to
> have extra dependencies as long as they are not required for regular
> testing.  I think Komzpa already put in some logic for code coverage
> testing via lcov for example, which is fine since it's not a requirement.
>
>
>
>
>
> Thanks,
>
> Regina
>
>
>
>
>
> *From:* postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] *On
> Behalf Of *Paul Ramsey
> *Sent:* Tuesday, May 01, 2018 1:08 PM
> *To:* Björn Harrtell <bjorn at wololo.org>; PostGIS Development Discussion <
> postgis-devel at lists.osgeo.org>
> *Subject:* Re: [postgis-devel] performance test suite
>
>
>
> I think perhaps “do it as a separate project”. It’s going to be complex,
> it’s going to be brittle, it’s going to eventually break and I’d rather not
> have it sitting around broken inside the main source tree. The only way to
> find the regressions is going to be longitudinally testing and keeping
> track of numbers over time, so it’ll be quite a complex piece of work.
>
>
>
> P
>
>
>
> On May 1, 2018, at 10:05 AM, Björn Harrtell <bjorn.harrtell at gmail.com>
> wrote:
>
>
>
> Hi devs,
>
>
>
> In recent times I've been pondering on about how to make a sensible test
> suite specifically for performance. Hacking/extending run_test.pl to
> accommodate for this has been the only suggested path forward but to me
> it's a dead end mostly because of perl (sorry)-
>
>
>
> The reason why this has become a to me apparent missing thing is due to:
>
>
>
> 1. My own work on https://trac.osgeo.org/postgis/ticket/4076.
>
>
>
> 2. The recently discovered large performance regression of ST_Union
> tracked by https://trac.osgeo.org/postgis/ticket/4075. Even if it's in
> GEOS and could perhaps be performance tested there, I think it would not be
> wrong to also performance test ST_Union without consideration of underlying
> implementation.
>
>
>
> Any additional thoughts on the subject? do it in perl or don't do it? :)
>
>
>
> Regards,
>
>
>
> /Björn
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>

-- 

*Raúl Marín Rodríguez *carto.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20180503/5ff27152/attachment-0001.html>