[pgpointcloud] Patch min and max

Tue Jun 5 15:37:06 PDT 2018

Hi,
just a quick 2 cents.

Index are not going to help because you read the full table, thus no index
is used.
There are many parameters that can affect the speed of this query
(just for fun, there is a whole wiki page on postgres to explain what
information to provide for "slow query" questions.)

On of this parameter is the version of pgpointcloud you are using.
Another is the max size of a row, because basically you patch is mainly
stored in the postgres TOAST table,
only the beginning is stored in the actual row of your table.
(by the way, TOAST compression should be de-activated on this table).

Bottom line is, even when using toast, you are trying to access
8KB (one row) * 50k (nb of rows), that is about 400 MB (worse case
scenario), it is going to take time.
By the way 1.5sec for 50k rows, and 5sec for 160k rows seems like very nice
scaling to me (linear).

A good solution (depending on your workflow), would be to use a
"proxy/miror/shadow" table
that mirrors the one with patch.
In proxy table, you would maintain (with triggers for instance) one row per
patch in your patch table,
and you would store all kind of useful statistics,
such as bounding boxes (min max etc, ), geom cast to postgis, number of
points, etc etc.
This proxy table should also be heavily indexed;

I used this approach on Billion-range point cloud without trouble.

Cheers,
Remi-C

2018-06-01 8:17 GMT-04:00 Lars <laasunde at hotmail.com>:

> The query being used is "select id, PC_PatchMin(pa, 'x'), PC_PatchMax(pa,
> 'x'),PC_PatchMin(pa, 'y'), PC_PatchMax(pa, 'y') from importdata;".
>
> This information is used to make bounding boxes for *all patches* in a
> table. There is no where clause in our query, we want all rows. Does the
> index expression help "pre-compute" the PC_PatchMin/Max values for quicker
> access or does it only work when using a where clause?
>
>
> ------------------------------
> *Fra:* Karl Pietrzak <kap4020 at gmail.com>
> *Sendt:* fredag 1. juni 2018 12.44
>
> *Til:* laasunde at hotmail.com
> *Kopi:* pgpointcloud at lists.osgeo.org
> *Emne:* Re: [pgpointcloud] Patch min and max
>
> Based on the "Seq Scan on public.importdata", it doesn't look like the
> index is being used. :(
>
> It's almost impossible to replicate this, because we don't have any
> specifics.  If you could show us the exact, complete statements you used to
> create the table, create the index, etc. and the SELECT you are doing, that
> would be very helpful.
>
> On Fri, Jun 1, 2018 at 4:10 AM Lars <laasunde at hotmail.com> wrote:
>
> Karl,
>
> Interesting proposal.
>
> The inital tests did however *not *yield any improved performance.
>
> The table index is described like this (\d+ importdata)
> "patch_min_max_idx" btree (pc_patchmin(pa, 'x'::text), pc_patchmax(pa,
> 'x'::text),pc_patchmin(pa, 'y'::text), pc_patchmax(pa, 'y'::text))
>
> The explain query looks like this;
> Seq Scan on public.importdata (cost=0.00..1402.50 rows=52321 width=132)
> (actual time=3.063..1546.160 rows=52321 loops=1)
> Output: id, _pc_patchstat(pa, 0, 'x'::text), _pc_patchstat(pa, 0,
> 'y'::text), _pc_patchstat(pa, 1, 'x'::text), _pc_patchstat(pa, 1, 'y'::text)
> Buffers: shared hit=795053 read=57146
> Planning time= 0.600ms
> Execution time= 1548.494 ms
> (5 rows)
>
> The "analyze importdata" command has also been invoked but did not make
> any difference.
>
> Any ideas on why the expression does not improve performance? Viewing the
> explain output I notice "_pc_patchstat" which make me think that the query
> is using the index but it might be wishfull thinking on my part.
>
> Kind regards, Lars
>
>
> ------------------------------
> *Fra:* Karl Pietrzak <kap4020 at gmail.com>
> *Sendt:* 31. mai 2018 17:18
> *Til:* laasunde at hotmail.com
> *Kopi:* pgpointcloud at lists.osgeo.org
> *Emne:* Re: [pgpointcloud] Patch min and max
>
> Postgres supports indices on expressions.  https://www.postgresql.org/
> docs/current/static/indexes-expressional.html
>
> Would this feature support your use case?
>
> Something like this:
>
> CREATE INDEX patch_min_and_max_idx ON importdata (PC_PatchMin(pa, 'x'), PC_PatchMax(pa, 'x'),PC_PatchMin(pa, 'y'), PC_PatchMax(pa, 'y'));
>
>
>
> On Thu, May 31, 2018 at 9:46 AM Lars <laasunde at hotmail.com> wrote:
>
> Hello,
>
> The below query takes 1,46 seconds using table with 50 000 patches and at
> least 5 seconds using table with 160 000 patches.
> "select id, PC_PatchMin(pa, 'x'), PC_PatchMax(pa, 'x'),PC_PatchMin(pa,
> 'y'), PC_PatchMax(pa, 'y') from importdata;"
>
> The table size has a significant effect on the query execution speed which
> means the query does not scale well.
>
> How can we make the query more scalable and/or improve query performance?
>
> One approach is to create an additional table that contains min/max values
> for each patch to prevent the "expensive" PC_PatchXXX function. This is
> obviously duplicate information that adds complexity to the system.
>
> Using Windows 10 and PostgreSQL 10.3
>
> Kind regard, Lars
> _______________________________________________
> pgpointcloud mailing list
> pgpointcloud at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pgpointcloud
>
>
>
> --
> Karl
>
>
>
> --
> Karl
>
> _______________________________________________
> pgpointcloud mailing list
> pgpointcloud at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pgpointcloud
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pgpointcloud/attachments/20180605/74c9f4f9/attachment.html>