[SoC] GSoC 2021 - Week 4 Report - Implement pre-sorting methods before GiST index building

Han Wang hanwgeek at gmail.com
Sun Jul 4 09:12:33 PDT 2021


Hi all,
I am here to share with you my Week4 report. You can also find it at [1]
Coding Week 4 (28th June - 4th July)
<https://trac.osgeo.org/postgis/wiki/ImplementSortingMethodsBeforeGistIndexBuilding#CodingWeek428thJune-4thJuly>

*Coding Phase *:

   - Finish hash function
   - Create the FlameGraph for CPU time analysis
   - Keep bounding with community

*Plans for next week*:

   - Check the performance of the hash function in detail
   - Prepare for evaluation of performance and logic


In summary, the `Hilbert hash function` occupies about 5% time in the `gist
build` process in the case of `O0`. And the different hash function seems
to have a similar time consumption. So it is necessary to use flame graph
to check the CPU time of the hash function and the index building
afterwards.

Moreover, like Darafei said,

>  - IO access ("buffers") in explain. Since you're working on changing the
> index creation algorithm the interesting thing will be to minimize the
> number of buffers accessed on the queries after the index is created. This
> is available in the EXPLAIN output and can be checked in the test suite. To
> get a quick idea of how much it will take in realistic amazon deployment,
> assume each 1000 buffers will take 1 second on a fully saturated system
> (your laptop is likely much faster).

It is important to apply `EXPLAIN` in the query after index building to
check the io bound of the index in the way of checking the buffer number.

If you have any questions or suggestions, please let me know. You can also
see me in the matrix.

Best regards,
Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/soc/attachments/20210705/a754e0e6/attachment.html>


More information about the SoC mailing list