[postgis-devel] Issues about the GSoC project

Tue Jul 27 09:01:41 PDT 2021

Hi Paul,

Thanks for your email. Sorry for not being clear.

I have made a draft here in a document(
https://docs.google.com/document/d/1m4oxBAsKCyjAnYmkCmQ0X_ltiid5tliFwF3rtdzlKsc/edit?usp=sharing
)

I used `EXPLAIN(ANALYSE, VERBOSE, BUFFER) to check the test details but I
don't know why `no index` method only hits 18 shared buffers while others,
using `Index Scan`, hit far more than that.
And next I will repeat the test to reach a stable status to get a confident
result.

Best regards,
Han

On Tue, Jul 27, 2021 at 2:55 AM Paul Ramsey <pramsey at cleverelephant.ca>
wrote:

> The column names in the table are a little confusing :)
>
> I don't have any thing I can think that would explain the execution time
> difference, so you'll probably need to gather more evidence. Comparing the
> call counts (gproff? callgrind?) might give a clue where the extra work is
> going in the sorted indexes. Actually, if your sort function was in fact
> "desorting" the inputs, that could explain both the extra shared buffer
> hits and the performance... like, if your whole index floats into memory
> then you'd expect lots of buffer hits, and the more work on the index, the
> more hits. So if you had a terrible index you'd get both lots of buffer
> hits and a lot of time spent.
>
> Basically, it shouldn't take a lot of extra steps on a sorted index than
> on a normal gist index, so if there *are* a lot of extra index accesses...
> something is very awry in the sorting.
>
> P.
>
> > On Jul 25, 2021, at 10:21 PM, Han Wang <hanwgeek at gmail.com> wrote:
> >
> > Hi Giuseppe and Hi Regina,
> >
> > After checking the paper of GiST and implementation in Postgres. I think
> the query performance should be considered besides the building process. In
> the larger data test scenario, the building time of different indexes are
> similar because Postgres just hashes the tuples and sorts them and packs
> them into pages, building the tree index from bottom to up. With a bad hash
> order definition, the building process cannot detect the poor index query
> performance. So it is necessary to test the index query performance. I have
> tested the query performance with the `EXPLAIN` operator, using the sql
> scripts like other indexes in the `/regress`. But I am not familiar with
> PL/pgSQL, so I handle the log with some python scripts.
> >
> > In this test, I focus on the buffer hits and execution time of different
> tasks of different index types including `No Index`, `Simple GiST index`,
> `X hash function`, `morton hash function` and `hilbert hash function`. And
> there are some results:
> > Shared buffer hits:
> >  Index        Create Time(ms) <<      &<      &&      &>      >>
> ~=      ~       @       &<|     <<|     |>>     |&>
> > 0     No Index        0       18      18      18      18      18
> 18      18      18      18      18      18      18
> > 1     GiST Index      25.249  40237   46085   3009    40297   46025
>  3009    3009    3009    41994   45295   41934   45355
> > 2     X PreSort Index 8.829   443620  441235  3009    444501  440354
> 3009    3009    3009    441568  442503  440687  443384
> > 3     Morton PreSort Index    16.885  447779  447446  4079    448669
> 446556  4079    4079    4079    445362  449428  444472  450318
> > 4     Hilbert PreSort Index   16.824  446714  444058  3558    447600
> 443172  3558    3558    3558    446072  445394  445186  446280
> > Execution time:
> >
> > Index Create Time(ms) <<      &<      &&      &>      >>      ~=      ~
>      @       &<|     <<|     |>>     |&>
> > 0     No Index        0       567.251 565.720 481.618 561.877 563.452
> 480.128 478.275 478.518 567.144 572.191 563.255 556.281
> > 1     GiST Index      25.249  289.255 293.143 28.838  289.002 291.296
> 28.336  28.597  26.947  295.394 297.556 293.988 299.760
> > 2     X PreSort Index 8.829   440.861 445.630 37.960  439.564 440.535
> 37.594  37.979  37.741  386.662 384.635 385.166 388.832
> > 3     Morton PreSort Index    16.885  421.999 413.427 77.002  422.939
> 412.130 77.415  75.102  76.056  416.205 446.599 410.614 434.613
> > 4     Hilbert PreSort Index   16.824  417.539 415.962 56.583  421.226
> 414.553 56.320  55.600  55.338  416.639 421.243 418.550 417.094
> > The number of shared buffer hits are far bigger than the original one.
> But what confuses me is that the execution times are worse. I am trying to
> figure out why this happened.
> > What's more, I am not very clear about the relationship between query
> performance and the number of shared buffer hits.
> >
> > If you have any questions or suggestions, please let me know.
> >
> > Best regards,
> > Han
> >
> > On Thu, Jul 8, 2021 at 1:32 AM Giuseppe Broccolo <g.broccolo.7 at gmail.com>
> wrote:
> > Hi Han,
> >
> > Il giorno mer 7 lug 2021 alle ore 18:05 Han Wang <hanwgeek at gmail.com>
> ha scritto:
> > Hi Regina and hi Giuseppe,
> >
> > Thanks for your reply!
> >
> > I am now checking the original paper and postgres's implement of gist.
> And now I think the query performance test after building gist index is
> necessary. Because as Darafei mentioned, the fast sorting building method
> may just pack the sorted tuples into pages regardless of which hash
> function it use. The correctness of index may be checked in the runtime
> query. At present, I am working on confirm the io access of hash functions.
> >
> > Feel free to give suggestions or questions.
> >
> > The correctness of the index should be covered by some of the regression
> tests in action for the GiST support in PostGIS - for instance, kNN
> searches etc. - for the moment I am really curious about the I/O access of
> the hash functions. This is something maybe it has not been checked even in
> PostgreSQL, and maybe it would be good to share the results with them as
> well.
> >
> > Keep us updated, and thank you for your help!
> >
> > Giuseppe.
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20210728/d7b82abc/attachment.html>