[postgis-devel] Issues about the GSoC project
Han Wang
hanwgeek at gmail.com
Tue Jul 27 09:01:41 PDT 2021
Hi Paul,
Thanks for your email. Sorry for not being clear.
I have made a draft here in a document(
https://docs.google.com/document/d/1m4oxBAsKCyjAnYmkCmQ0X_ltiid5tliFwF3rtdzlKsc/edit?usp=sharing
)
I used `EXPLAIN(ANALYSE, VERBOSE, BUFFER) to check the test details but I
don't know why `no index` method only hits 18 shared buffers while others,
using `Index Scan`, hit far more than that.
And next I will repeat the test to reach a stable status to get a confident
result.
Best regards,
Han
On Tue, Jul 27, 2021 at 2:55 AM Paul Ramsey <pramsey at cleverelephant.ca>
wrote:
> The column names in the table are a little confusing :)
>
> I don't have any thing I can think that would explain the execution time
> difference, so you'll probably need to gather more evidence. Comparing the
> call counts (gproff? callgrind?) might give a clue where the extra work is
> going in the sorted indexes. Actually, if your sort function was in fact
> "desorting" the inputs, that could explain both the extra shared buffer
> hits and the performance... like, if your whole index floats into memory
> then you'd expect lots of buffer hits, and the more work on the index, the
> more hits. So if you had a terrible index you'd get both lots of buffer
> hits and a lot of time spent.
>
> Basically, it shouldn't take a lot of extra steps on a sorted index than
> on a normal gist index, so if there *are* a lot of extra index accesses...
> something is very awry in the sorting.
>
> P.
>
> > On Jul 25, 2021, at 10:21 PM, Han Wang <hanwgeek at gmail.com> wrote:
> >
> > Hi Giuseppe and Hi Regina,
> >
> > After checking the paper of GiST and implementation in Postgres. I think
> the query performance should be considered besides the building process. In
> the larger data test scenario, the building time of different indexes are
> similar because Postgres just hashes the tuples and sorts them and packs
> them into pages, building the tree index from bottom to up. With a bad hash
> order definition, the building process cannot detect the poor index query
> performance. So it is necessary to test the index query performance. I have
> tested the query performance with the `EXPLAIN` operator, using the sql
> scripts like other indexes in the `/regress`. But I am not familiar with
> PL/pgSQL, so I handle the log with some python scripts.
> >
> > In this test, I focus on the buffer hits and execution time of different
> tasks of different index types including `No Index`, `Simple GiST index`,
> `X hash function`, `morton hash function` and `hilbert hash function`. And
> there are some results:
> > Shared buffer hits:
> > Index Create Time(ms) << &< && &> >>
> ~= ~ @ &<| <<| |>> |&>
> > 0 No Index 0 18 18 18 18 18
> 18 18 18 18 18 18 18
> > 1 GiST Index 25.249 40237 46085 3009 40297 46025
> 3009 3009 3009 41994 45295 41934 45355
> > 2 X PreSort Index 8.829 443620 441235 3009 444501 440354
> 3009 3009 3009 441568 442503 440687 443384
> > 3 Morton PreSort Index 16.885 447779 447446 4079 448669
> 446556 4079 4079 4079 445362 449428 444472 450318
> > 4 Hilbert PreSort Index 16.824 446714 444058 3558 447600
> 443172 3558 3558 3558 446072 445394 445186 446280
> > Execution time:
> >
> > Index Create Time(ms) << &< && &> >> ~= ~
> @ &<| <<| |>> |&>
> > 0 No Index 0 567.251 565.720 481.618 561.877 563.452
> 480.128 478.275 478.518 567.144 572.191 563.255 556.281
> > 1 GiST Index 25.249 289.255 293.143 28.838 289.002 291.296
> 28.336 28.597 26.947 295.394 297.556 293.988 299.760
> > 2 X PreSort Index 8.829 440.861 445.630 37.960 439.564 440.535
> 37.594 37.979 37.741 386.662 384.635 385.166 388.832
> > 3 Morton PreSort Index 16.885 421.999 413.427 77.002 422.939
> 412.130 77.415 75.102 76.056 416.205 446.599 410.614 434.613
> > 4 Hilbert PreSort Index 16.824 417.539 415.962 56.583 421.226
> 414.553 56.320 55.600 55.338 416.639 421.243 418.550 417.094
> > The number of shared buffer hits are far bigger than the original one.
> But what confuses me is that the execution times are worse. I am trying to
> figure out why this happened.
> > What's more, I am not very clear about the relationship between query
> performance and the number of shared buffer hits.
> >
> > If you have any questions or suggestions, please let me know.
> >
> > Best regards,
> > Han
> >
> > On Thu, Jul 8, 2021 at 1:32 AM Giuseppe Broccolo <g.broccolo.7 at gmail.com>
> wrote:
> > Hi Han,
> >
> > Il giorno mer 7 lug 2021 alle ore 18:05 Han Wang <hanwgeek at gmail.com>
> ha scritto:
> > Hi Regina and hi Giuseppe,
> >
> > Thanks for your reply!
> >
> > I am now checking the original paper and postgres's implement of gist.
> And now I think the query performance test after building gist index is
> necessary. Because as Darafei mentioned, the fast sorting building method
> may just pack the sorted tuples into pages regardless of which hash
> function it use. The correctness of index may be checked in the runtime
> query. At present, I am working on confirm the io access of hash functions.
> >
> > Feel free to give suggestions or questions.
> >
> > The correctness of the index should be covered by some of the regression
> tests in action for the GiST support in PostGIS - for instance, kNN
> searches etc. - for the moment I am really curious about the I/O access of
> the hash functions. This is something maybe it has not been checked even in
> PostgreSQL, and maybe it would be good to share the results with them as
> well.
> >
> > Keep us updated, and thank you for your help!
> >
> > Giuseppe.
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20210728/d7b82abc/attachment.html>
More information about the postgis-devel
mailing list