<div dir="ltr"><div>Hi Paul,</div><div><br></div><div>Thanks for your email. Sorry for not being clear.</div><div><br></div><div>I have made a draft here in a document(<a href="https://docs.google.com/document/d/1m4oxBAsKCyjAnYmkCmQ0X_ltiid5tliFwF3rtdzlKsc/edit?usp=sharing" target="_blank">https://docs.google.com/document/d/1m4oxBAsKCyjAnYmkCmQ0X_ltiid5tliFwF3rtdzlKsc/edit?usp=sharing</a>)</div><div><br></div><div>I used `EXPLAIN(ANALYSE, VERBOSE, BUFFER) to check the test details but I don't know why `no index` method only hits 18 shared buffers while others, using `Index Scan`, hit far more than that. </div><div>And next I will repeat the test to reach a stable status to get a confident result.</div><div><br></div><div>Best regards,</div><div>Han</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 27, 2021 at 2:55 AM Paul Ramsey <<a href="mailto:pramsey@cleverelephant.ca">pramsey@cleverelephant.ca</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The column names in the table are a little confusing :)<br>

<br>

I don't have any thing I can think that would explain the execution time difference, so you'll probably need to gather more evidence. Comparing the call counts (gproff? callgrind?) might give a clue where the extra work is going in the sorted indexes. Actually, if your sort function was in fact "desorting" the inputs, that could explain both the extra shared buffer hits and the performance... like, if your whole index floats into memory then you'd expect lots of buffer hits, and the more work on the index, the more hits. So if you had a terrible index you'd get both lots of buffer hits and a lot of time spent.<br>

<br>

Basically, it shouldn't take a lot of extra steps on a sorted index than on a normal gist index, so if there *are* a lot of extra index accesses... something is very awry in the sorting.<br>

<br>

P.<br>

<br>

> On Jul 25, 2021, at 10:21 PM, Han Wang <<a href="mailto:hanwgeek@gmail.com" target="_blank">hanwgeek@gmail.com</a>> wrote:<br>

> <br>

> Hi Giuseppe and Hi Regina,<br>

> <br>

> After checking the paper of GiST and implementation in Postgres. I think the query performance should be considered besides the building process. In the larger data test scenario, the building time of different indexes are similar because Postgres just hashes the tuples and sorts them and packs them into pages, building the tree index from bottom to up. With a bad hash order definition, the building process cannot detect the poor index query performance. So it is necessary to test the index query performance. I have tested the query performance with the `EXPLAIN` operator, using the sql scripts like other indexes in the `/regress`. But I am not familiar with PL/pgSQL, so I handle the log with some python scripts.<br>

> <br>

> In this test, I focus on the buffer hits and execution time of different tasks of different index types including `No Index`, `Simple GiST index`, `X hash function`, `morton hash function` and `hilbert hash function`. And there are some results:<br>

> Shared buffer hits:<br>

>  Index        Create Time(ms) <<      &<      &&      &>      >>      ~=      ~       @       &<|     <<|     |>>     |&><br>

> 0     No Index        0       18      18      18      18      18      18      18      18      18      18      18      18<br>

> 1     GiST Index      25.249  40237   46085   3009    40297   46025   3009    3009    3009    41994   45295   41934   45355<br>

> 2     X PreSort Index 8.829   443620  441235  3009    444501  440354  3009    3009    3009    441568  442503  440687  443384<br>

> 3     Morton PreSort Index    16.885  447779  447446  4079    448669  446556  4079    4079    4079    445362  449428  444472  450318<br>

> 4     Hilbert PreSort Index   16.824  446714  444058  3558    447600  443172  3558    3558    3558    446072  445394  445186  446280<br>

> Execution time:<br>

> <br>

> Index Create Time(ms) <<      &<      &&      &>      >>      ~=      ~       @       &<|     <<|     |>>     |&><br>

> 0     No Index        0       567.251 565.720 481.618 561.877 563.452 480.128 478.275 478.518 567.144 572.191 563.255 556.281<br>

> 1     GiST Index      25.249  289.255 293.143 28.838  289.002 291.296 28.336  28.597  26.947  295.394 297.556 293.988 299.760<br>

> 2     X PreSort Index 8.829   440.861 445.630 37.960  439.564 440.535 37.594  37.979  37.741  386.662 384.635 385.166 388.832<br>

> 3     Morton PreSort Index    16.885  421.999 413.427 77.002  422.939 412.130 77.415  75.102  76.056  416.205 446.599 410.614 434.613<br>

> 4     Hilbert PreSort Index   16.824  417.539 415.962 56.583  421.226 414.553 56.320  55.600  55.338  416.639 421.243 418.550 417.094<br>

> The number of shared buffer hits are far bigger than the original one. But what confuses me is that the execution times are worse. I am trying to figure out why this happened. <br>

> What's more, I am not very clear about the relationship between query performance and the number of shared buffer hits.<br>

> <br>

> If you have any questions or suggestions, please let me know.<br>

> <br>

> Best regards,<br>

> Han<br>

> <br>

> On Thu, Jul 8, 2021 at 1:32 AM Giuseppe Broccolo <<a href="mailto:g.broccolo.7@gmail.com" target="_blank">g.broccolo.7@gmail.com</a>> wrote:<br>

> Hi Han,<br>

> <br>

> Il giorno mer 7 lug 2021 alle ore 18:05 Han Wang <<a href="mailto:hanwgeek@gmail.com" target="_blank">hanwgeek@gmail.com</a>> ha scritto:<br>

> Hi Regina and hi Giuseppe,<br>

> <br>

> Thanks for your reply!<br>

> <br>

> I am now checking the original paper and postgres's implement of gist. And now I think the query performance test after building gist index is necessary. Because as Darafei mentioned, the fast sorting building method may just pack the sorted tuples into pages regardless of which hash function it use. The correctness of index may be checked in the runtime query. At present, I am working on confirm the io access of hash functions.<br>

> <br>

> Feel free to give suggestions or questions.<br>

> <br>

> The correctness of the index should be covered by some of the regression tests in action for the GiST support in PostGIS - for instance, kNN searches etc. - for the moment I am really curious about the I/O access of the hash functions. This is something maybe it has not been checked even in PostgreSQL, and maybe it would be good to share the results with them as well.<br>

> <br>

> Keep us updated, and thank you for your help!<br>

> <br>

> Giuseppe.<br>

> _______________________________________________<br>

> postgis-devel mailing list<br>

> <a href="mailto:postgis-devel@lists.osgeo.org" target="_blank">postgis-devel@lists.osgeo.org</a><br>

> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-devel" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-devel</a><br>

<br>

_______________________________________________<br>

postgis-devel mailing list<br>

<a href="mailto:postgis-devel@lists.osgeo.org" target="_blank">postgis-devel@lists.osgeo.org</a><br>

<a href="https://lists.osgeo.org/mailman/listinfo/postgis-devel" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-devel</a><br>

</blockquote></div>