[fdo-internals] FDO RFC 16 - FDO Provider for SQLite

Traian Stanev traian.stanev at autodesk.com
Wed Mar 26 09:54:24 EDT 2008


Yes, in a use case where the file is opened and closed a lot, without the ability to keep any cache context in memory, a spatial index which is also in a file (but not necessarily the same file) would be better. This is not the case, at least with the FDO client apps that I know of, so for now there is no need to serialize the spatial index. It would be easy to do so if there is such a need in the future.

Traian


From: fdo-internals-bounces at lists.osgeo.org [mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Kenneth, GEOGRAF A/S
Sent: Wednesday, March 26, 2008 7:25 AM
To: FDO Internals Mail List
Subject: Re: [fdo-internals] FDO RFC 16 - FDO Provider for SQLite

I see.

I did not know about the column count/index problem for SQLite.
I can see that in a purely MapGuide context, there would be great benefit from inital parsing of all geometry.
I was thinking in a more general context, where the database would open/close regularly.
In such a context, creating a search tree from scratch seems like a big deal.

Thank you for clearing that up.



Regards, Kenneth, GEOGRAF A/S


Traian Stanev skrev:
>>I can't figure out how to extract/compute the BBOX from an SQLite BLOB, without parsing the geometry.

You are right, there is no other way to get the bounding box.

>>I belive that it is faster to read the BBOX values rather than a potentially large BLOB field.

This is debatable. One thing about SQLite is that its read performance is *VERY* sensitive to how many columns there are in the table. If the BBOX columns are the last ones in a 50 column table, and the geometry is the first column, you can probably compute the bounds faster than you can offset all the way to the end of the row. On the other hand, if you put the BBOX columns first, you handicap access to all other columns by 4 columns. Also, in a huge segment of GIS data (namely point data), computing the bounds from the BLOB would be faster than getting it from the BBOX columns.

>>I agree that it would use some space to add the columns, but adding the columns would enable a fast filtering of unused features.
>>As for the 4 times index, one could merely add index for the x values, and then use sequential filtering for the y values.

Yes, such a scheme is possible and you would reduce the spatial search to two B-Tree searches. It also means you end up with a linear worst case search, if your search bounds is really wide but not very tall for example. This sounds like a lousy deal for adding ~40 bytes' worth of stuff to every row, not counting the index tables, of which you need two (one for minx and one for maxx). Index tables would cost something like 10 bytes per row at least (I am not sure exactly how SQLite implements indexes, but you have to figure it has at least the double value in there, plus a row offset for it).


>> Is your internal spatial index algorithm absolutely without benefit from added BBOX columns?

It computes bounding boxes and the spatial index on the fly, during the first table scan after the FDO connection is open. This will play well with FDO clients like MapGuide which keep connections pooled. In my benchmarks, computing bounding boxes has not been a big overhead. The cost of the spatial index in its prototype form, per feature, is roughly 20 bytes in RAM.

So to sum up, you can add BBOX columns to your SQLite databases if it helps you with spatial searches. The provider does not need them, so it will not require them. It will also not complain if they are there - it would treat them as regular columns.


Traian







From: fdo-internals-bounces at lists.osgeo.org<mailto:fdo-internals-bounces at lists.osgeo.org> [mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Kenneth, GEOGRAF A/S
Sent: Tuesday, March 25, 2008 4:54 PM
To: FDO Internals Mail List
Subject: Re: [fdo-internals] FDO RFC 16 - FDO Provider for SQLite

I can't figure out how to extract/compute the BBOX from an SQLite BLOB, without parsing the geometry.
I belive that it is faster to read the BBOX values rather than a potentially large BLOB field.

I agree that it would use some space to add the columns, but adding the columns would enable a fast filtering of unused features.
As for the 4 times index, one could merely add index for the x values, and then use sequential filtering for the y values.

Is your internal spatial index algorithm absolutely without benefit from added BBOX columns?




Regards, Kenneth, GEOGRAF A/S


Traian Stanev skrev:

It's a waste of space to store BBOX. It would cost at least 32 bytes per feature, not counting the indexes.

Also, you can easily compute the bounding box and add those 4 columns very quickly when you need them, given that SQLite is pretty fast to read from. Indexing on the 4 BBOX columns would make spatial queries have to compute the intersection of the results of 4 tree searches. SQLite is fast, but it's not *that* fast.

Anyway, the nice thing about this format is that you can create the BBOX columns anyway, and use them. The FDO provider will not be using it though, for performance reasons.


Traian



From: fdo-internals-bounces at lists.osgeo.org<mailto:fdo-internals-bounces at lists.osgeo.org> [mailto:fdo-internals-bounces at lists.osgeo.org] On Behalf Of Kenneth, GEOGRAF A/S
Sent: Tuesday, March 25, 2008 7:12 AM
To: FDO Internals Mail List
Subject: Re: [fdo-internals] FDO RFC 16 - FDO Provider for SQLite

I just re-read the RFC.

What are your reasons for not storing BBOX releated data in the data table?


I would suggest adding four columns, like minx, maxx, miny, max (perhaps name them via. the geometry_columns table?).
With that approach, and index on the pairs would move the BBOX query load/optimization into the DBMS.
BBOX queries are simple, and does not require reading the actual column data.

Your extended spatial index algorithm might read these values to produce the search tree.

Another suggestion, would be to store the search tree in a blob column.
That would enable a much faster load of data.

To avoid the requirement that all data updaters know the algorithm, it might be legal to just clear the BLOB data on updates.

I wrote a short paper on a similar mechanism some time ago, with a friend:
http://www.hexad.dk/opensource/spatialdbms.pdf






Regards, Kenneth, GEOGRAF A/S


Traian Stanev skrev:
Hello,

I just posted FDO RFC #16. Fresh off the virtual presses (printed on 100% recycled electrons, for you green-minded folks):

http://trac.osgeo.org/fdo/wiki/FDORfc16


Thanks,
Traian










________________________________


















_______________________________________________

fdo-internals mailing list

fdo-internals at lists.osgeo.org<mailto:fdo-internals at lists.osgeo.org>

http://lists.osgeo.org/mailman/listinfo/fdo-internals










________________________________












_______________________________________________

fdo-internals mailing list

fdo-internals at lists.osgeo.org<mailto:fdo-internals at lists.osgeo.org>

http://lists.osgeo.org/mailman/listinfo/fdo-internals








________________________________






_______________________________________________

fdo-internals mailing list

fdo-internals at lists.osgeo.org<mailto:fdo-internals at lists.osgeo.org>

http://lists.osgeo.org/mailman/listinfo/fdo-internals


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/fdo-internals/attachments/20080326/4989fc38/attachment-0001.html


More information about the fdo-internals mailing list