[postgis-users] Light WeightLight Weight Geometry (LWGEOM)Proposal

Mark Cave-Ayland m.cave-ayland at webbased.co.uk
Mon Feb 23 08:39:45 PST 2004


Hi Ralph/Dave/Listers,

(lots cut)

> If the  bounding box is just an optimization so it doesn't need to be 
> calculated for each row during a select where a non geometric 
> index is 
> being used, then a float representation should be just as 
> good, so long 
> as it's inflated enough that it covers the loss in precision. 
>  Seems you 
> always need to use a 'real' operator after a bounding box operation 
> anyway.  Thus cutting the side in half appears to have no unwarranted 
> side effects except perhaps a very small amount of false 
> positive matches.

OK, it seems like I didn't explain myself clear enough. I think that the
idea of using floats in the index itself would not present any problems
(like you say we can inflate the bounding box to handle this case), but
my main concern here is that if someone hands us a dataset which uses
the full accuracy of a double and we import this data into another
database which uses floats via WKB, then we could introduce rounding
errors into the resulting dataset and the user would be no wiser.

The ideas that everyone on the list has brought up are definitely worth
pursuing, it's just that perhaps I look at it from a different
perspective :). The things I would consder to be important are:

i)	we can still transfer dumps across different architectures
ii)	we can warn the user if the accuracy of the data has been
compromised
iii)	we don't have to maintain multiple geometric functions per type
iv)	we can convert between different types (ie float->double,
double->float, double->
	int32, bounding box->no bounding box etc.) by using an UPDATE
statement, i.e. it 	wouldn't require recompiling PostGIS from source
again.


I would be interested to see the improvement that reducing the size of
the geometry type would have on data access time when looking at a large
dataset - perhaps someone could throw together a quick hack using the
smaller WKB geometry to give an idea of what level of improvement this
would give. I think that the feedback from this would determine exactly
what improvements we should expect this to give.

> As for the 'dump' it is problematic, the WKB type field is 32 
> bits, so 
> perhaps if the user has specified a column that isn't compliant a non 
> compliant WKB (or not so well known binary) is dumped with some 
> 'extended' types.
> 
> The main idea here is, if a float or an int32 high enough 
> resolution for 
> the application, why force a dataset that is twice as large to be 
> 'compliant'.  The user knows best what they want to do, and how they 
> want to do it. Using an a couple of  int32's I can specify 
> any point on 
> the planet down to a 10cm square area, what do the extra 8 
> bytes give me 
> if i don't need them? I personally am looking at a dataset of 
> about 14gb 
> in shapefiles.  A little loss of resolution for speed is an option I 
> would like to have, even if I'm non compliant.  I can also think of 
> other examples like cad applications - integer data is 
> probably preferable.

Remember there are other ways of doing this. So far we are looking to
improve access to our data by breaking it down into tiles (smaller areas
= less tuples visited + smaller indexes). The plan is to store
information about each tile as a row in a tile table, including
information about its extents. When we do a spatial query, what we hope
to do is calculate a spatial intersection of the query extent against
the tile table extents using the && operator which should then returns a
row for each tile we need to query. We then use this to create a SELECT
that does a UNION of the spatial querries for all tiles to return the
combined data set to feed into mapserver (effectively this is like
having tablespaces based on extents the hard way until it gets added to
PostgreSQL).

Ralph/Dave, you might like to try an approach like this on the large
datasets you are working on to see if it improves things for you.

> It seems that WKT would be an easy go-between for non 
> compliant types. 
> Then to get a 'confirming' table you can create a view that takes it 
> between a WKT and then to a conformant WKB with a function that can 
> create WKB from WKT and a set of flags saying the desired 
> extended format.

Also remember that it may be possible to introduce errors into the data
by converting from WKB->WKT->WKB repeatedly.


Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.





More information about the postgis-users mailing list