[postgis-devel] PointM(x,y,t) and gist_geometry_ops_nd

Tue May 17 00:46:01 PDT 2022

Dear all

Regarding spatio-temporal indexing, seminal papers are the recurrent
surveys from Walid G. Aref

https://www.cs.purdue.edu/homes/aref/RecentPapers.html

Ahmed R. Mahmood, Sri Punni, Walid G. Aref:
Spatio-temporal access methods: a survey (2010 - 2017). GeoInformatica
23(1): 1-36 (2019)

Long-Van Nguyen-Dinh, Walid G. Aref, Mohamed F. Mokbel:
Spatio-Temporal Access Methods: Part 2 (2003 - 2010). IEEE Data Eng. Bull.
33(2): 46-55 (2010)

Mohamed F. Mokbel, Thanaa M. Ghanem, Walid G. Aref:
Spatio-Temporal Access Methods. IEEE Data Eng. Bull. 26(2): 40-49 (2003)

A great introduction is Chapter 7 in

Ralf Hartmut Güting, Markus Schneider
Moving Objects Databases, Morgan Kaufmann, 2005

https://www.amazon.com/Objects-Databases-Kaufmann-Management-Systems/dp/0120887991

Regards

Esteban

On Mon, May 16, 2022 at 5:54 PM Paul Ramsey <pramsey at cleverelephant.ca>
wrote:

> Thank you Darafei for sharing your experience. I'm glad this is not a new
> and exciting problem, but rather a well experienced one.
>
> So, I did try out using Mercator instead of geographics to give the
> spatial dimension some more weight, and it resulted in a better performing
> index (about 700ms vs 1000ms before for the same query) but still way less
> good than using a multi-key on gist.
>
> I also timed out just using a timestamp index and a spatial index
> separately, and letting the system bitmap them together, and: surprise!
> this ended up being the actual fastest option of them all. So "do nothing
> special and just add single indexes on the columns of interest" won the
> whole race.
>
> This feels a little counter-intuitive, but maybe not. Maybe the gains
> available from an explicit spatial-temporal approach have quite a high
> floor already.
>
> Anyways, I was thinking about what such an explicit approach might look
> like.
>
> * put the XYT into a fixed-length key to get rid of the expense of the
> varlena key
> * compute the penalty of cost proportionally instead of absolutely... so a
> given new key will expand an existing node by N% in the time dimensin and
> M% in the spatial dimensions. That stills leaves a magic number to balance
> time and space, but it at least doesn't just wash out due to something
> arbitrary like the input unit size
>
> That still leaves splits to be calculated but maybe there's a similar
> proportional approach that can be applied.
>
> Anyone know any good papers on spatio-temporal indexing?
>
> P
>
> > On May 12, 2022, at 1:45 PM, Дорофей Пролесковский <me at komzpa.net>
> wrote:
> >
> > Hi,
> >
> > Thanks for the summon. I think the question is much better analyzed in
> the MobilityDB working group, so Esteban please feel free to add the
> findings you have to the discussion :)
> >
> > I would like to see the EXPLAIN (ANALYZE, VERBOSE, BUFFERS) from your
> queries so that we're sure that there is indeed an over-scanning somewhere
> and not a runaway Bitmap scan.
> >
> > Indeed the first thing is that the penalty will be calculated in a
> non-proportional way. What you get is essentially a 2D index that
> occasionally gets indexed internally by Z too. In the past designs I've
> mitigated this by using SRID=3857 which is nice for that kind of data,
> since it looks good and also assumes things travel 1 meter per second which
> is not far from the truth.
> >
> > Second thing is our GiST rework adventure's finding by Aliaskandr
> Kalenik on how Postgres handles multi-column gist indexes page split: after
> a split on first column it checks if the resulting pages have tuples that
> can have penalty=0 when inserted into either page and re-runs the split for
> them based on second column. I suspect this will be happening a lot in AIS
> data. Nothing similar happens to dimensions in 2D or ND gist implementation
> and may be a way for improvement.
> >
> > Third one that comes to mind is that our ND implementation is not
> getting as much love as our 2D gist implementation: picksplit there is
> still not updated to Alexander Korotkov's method from 2012, there is no
> universal strategy for sorting build since it's shared between XYZ and
> lon-lat-ele projections, and it is generally a varlen thing with a lot of
> conditional loops on dimensions (which are still up to 4). For fair
> comparison, can you create (geom_nd, T) index on it with geometry being 3D
> with M forced to 0 and check what is the pure difference between 2D and ND
> ops?
> >
> > Fourth thing is that people with that kind of data don't want
> point-based indexing of movement tracks, if someone disappears for 10
> minutes I still want to see where they were before and after that if I'm
> doing a slice. You want short tracks that stitch together. MobilityDB has
> that abstraction, and on that kind of data picksplit will work better and
> gist tree will be nicer, since you now can't just split at arbitrary Z/M
> and get nice separated buckets of data on first try.
> >
> > Also in real data there will be an ID of vehicle and that usually will
> also be in the indexing partition.
> >
> > Hope this helps.
> >
> > Darafei.
> >
> > чт, 12 мая 2022 г. в 22:54, Paul Ramsey <pramsey at cleverelephant.ca>:
> > All (but especially Darafei),
> >
> > I grabbed some AIS data (7M ship movements) to do up a blog post on
> indexing spatio-temporal data and some of the quirks therein.
> >
> > The data were long/lat/timestamp, as one would expect. I built a simple
> 2D index and did some queries that included an ST_Intersects() and a time
> range, where the filter was such that the result set was quite selective (a
> few thousand points) and it ran (predictably) slowly, as it had to choose
> either a temporal or a spatial index, and then plow through the remaining
> results in brute force.
> >
> > The naive approach took about 4s.
> >
> > Then I created a new table with XYT data, by doing PointM(long, lat,
> extract(epoch from ts)). Then I built an nd index on that table.
> >
> > Then I built a XYT query box that was the same as my original filter.
> And I ran it on the new table.
> >
> > It was faster than the naive approach, but only by about 4x. It took
> about 1s.
> >
> > Then I added the btree_gist extension and built a multi-key index on
> timestamp and geometry using the 2d ops. Running the same query as the
> naive approach, I got a 40ms return time. So, definitely this is the
> winning approach.
> >
> > However, I am left with the nd XYT approach  being really slow, and
> wanting to mentally model why.
> >
> > Here's my guess:
> >
> > In terms of data range, you have longitude, which in my data is no more
> than about 10 units wide. And similar for latitude. And then I have the
> time range, which is on a 1 second basis for a day so 86400 units.
> >
> > The nd penalty function looks at how the bounds volume and "edge"
> metrics are affected by adding new points. Given the huge difference in
> dimensional size, no matter what data is being added, the T dimension is
> always going to dominate. So the index will be made up of splits on T, over
> and over and over, until the variance in XY finally starts to get closer to
> the variance in T (when we're talking about time slices of a second or so).
> >
> > So the performance of queries on this index is going to be pretty bad
> for time ranges of more than a couple seconds, since the index will provide
> no spatial filtering at all on larger time ranges.
> >
> > Basically the moral of the story is "you cannot expect an ND index on
> non-spatial dimensions mixed with spatial dimensions to work unless you
> massage your dimensions to have proportions that match your expected query
> pattern".
> >
> > That's my story.
> >
> > Does that sound right to you?
> >
> > P
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20220517/43a3d094/attachment.htm>