[SoC] [postgis-devel] Introduction and PostGIS-GSoC Project Information
Han Wang
hanwgeek at gmail.com
Tue May 25 06:31:52 PDT 2021
Hi Imre,
Thank you for your reply! And it is a great question!
As I mentioned in the proposal, the priority R-tree has been proved to have
a theoretical optimal *query* performance but it may be too complex to
build one which would cost a lot of time. And the current project mainly
focuses on the *building *process of a spatial index. From
previous experience, a z-order pre-sorting may be a trade-off between
*building* and *query*. In fact, the GeoHash, as you mentioned, is
essentially equivalent to the z-order method[1].
>From my perspective, the dataset should be divided into several subgroups
to be put in the index which is called *partition *in this paper[2] because
of the scalability. The memory of machines cannot hold so much data at the
same time. And then sort data in the subset with a pre-defined order such
as z-order. There are many algorithms to determine where to cut the dataset
into subsets as mentioned in the paper. But at present, I just choose to
implement a trivial one that cuts the dataset into some small one evenly.
Because the performance of different methods are very dependent on the
sizes of the dataset and the memories of machines which may need a fine
tuning just like what many HPC engineers are doing.
To conclude, the whole project is a research and test based program. There
are two major problems, *subset partitioning* and *internal pre-sorting*.
And first I want to implement a z-order pre-sorting method on a small
random dataset and some real data from OSM which my mentors recommend. And
then study on the subset partitioning.
I am very happy to receive the suggestion from you! I hope this answers
your question. Feel free to ask me more or give me some suggestions! I am
looking forward to hearing from you!
Best regards,
Han
[1] https://en.wikipedia.org/wiki/Geohash
[2] http://www.cse.cuhk.edu.hk/~taoyf/paper/vldb18-sfc.pdf
On Tue, May 25, 2021 at 3:14 PM Imre Samu <pella.samu at gmail.com> wrote:
> Hi Han,
>
> Thank you for working on this topic! :)
>
> > Feel free to give me suggestions or ask me anything! I
>
> my question:
> - in the research paper ( "2018" - mentioned in your proposal ) -
> "partitioning" mentioned multiple times ..
> Do you have any plan - for adding spatial partitioning to the test
> cases?
>
> comment:
>
> As I know - in the OpenStreetMap word - the current
> clustering/"pre-sorting method": is the GeoHash based sorting:
> https://www.paulnorman.ca/blog/2016/06/improving-speed-with-reclustering/
>
> https://www.paulnorman.ca/blog/2016/05/improve-your-st-geohash-sorting-with-these-three-simple-tricks/
> Maybe you can borrow some ideas for your benchmark. :)
>
> Thanks,
> Imre
>
>
> Han Wang <hanwgeek at gmail.com> ezt írta (időpont: 2021. máj. 25., K, 4:19):
>
>> Hi all,
>>
>> My name is Han WANG. I am a first year graduate student majoring in GIS
>> at Peking University, and will get my Master's degree in 2023. And this is
>> my github(https://github.com/HanwGeek) and my linkedin(
>> https://www.linkedin.com/in/hanwgeek/). I am interested in all
>> cool things. And it is very exciting to join the open source community! My
>> research interest includes massive spatial temporal data management and
>> analysis. Currently, I am working on a machine learning project based on
>> big trajectory data, which is stored in PostgreSQL database and managed by
>> PostGIS.
>>
>> My project title is to *implement a pre-sorting method for PostGIS data
>> types before building GiST* *index. *Some previous research about
>> building GiST indexes in PostGreSQL has been proved that data pre-sorting
>> method will reduce the time of building an index. And the new feature will
>> be added in PostGreSQL 14. So it is necessary to apply this new feature to
>> some basic geometry data type. And my initial proposal is here(
>> https://docs.google.com/document/d/1_mY_F2hPDk3vmXH5PPp2z9BuQWt-ZMORk6KxtdVQ3HY/edit?usp=sharing
>> ).
>>
>> I am excited to make a contribution to the open source community. Feel
>> free to give me suggestions or ask me anything! I am looking forward to
>> hearing from you all!
>>
>> Best regards,
>> Han
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/soc/attachments/20210525/47d80abc/attachment.html>
More information about the SoC
mailing list