[postgis-users] Re: MySQL 4.1 GIS Functionality

Tue Aug 5 10:17:33 PDT 2003

In order to retain context, I'll not edit out any of the document, 
which means the nesting is all now +1. :)

On Tuesday, August 5, 2003, at 09:49 AM, Paul DuBois wrote:

> Sorry for this delayed response to a message that was posted some
> weeks ago. I only recently became aware of the original message,
> because I subscribed after June 19, 2003 when it was posted.
>
> Thanks to Paul Ramsey (and Dave Blasby?) for taking a look at

The both of us. Dave did the lion's share of the write up.

> MySQL's GIS support and giving it a test. I have a couple of pieces
> of information in reply from Alexey Botchkov, one of the MySQL AB
> developers.  Replies to him can be sent through me, either at
> paul at kitebird.com or paul at mysql.com.  Or you can write to Alexey
> directly at hf at mysql.com, but please also cc: me so that I can
> follow the discussion.  Thanks.
>
> The doubly quoted sections below are by Paul Ramsey, the singly quoted
> sections are by Alexey. I have also added a few comments of my own.
>
>>>  * Spatial data (geometry) only works with the MyISAM table type.
>>
>> We need a kind of RTree indexes implemented in InnoDB to make spatial
>> extentions working with transactional bases.
>
> To confirm: Spatial column types indeed currently are supported only
> for MyISAM tables.
>
>>
>>>  * Trying to create create an InnoDB table with a geometry column, 
>>> will
>>>  crash the server and muddle the system tables.
>>
>> Yes, we should prevent creating spatial fields in InnoDB now.
>
> This means that you now will get an error if you try to create a
> table other than a MyISAM table that includes spatial columns.
> For example:
>
> mysql> CREATE TABLE t (g GEOMETRY) TYPE = INNODB;
> ERROR 1178 (42000): The storage engine for the table doesn't support 
> GEOMETRY
>
> Both Alexey and I were surprised that Paul experienced a server crash
> and muddled system tables. (By "system tables", I assume what was meant
> was the grant tables in the mysql database?  If not, can you clarify?)
> Alexey suspects this was a problem in an earlier version that has been
> corrected.

Dave can perhaps clarify. It has been a couple months. As I recall, the 
system thought we had a table which did not in fact exist (because the 
creation had failed).

> By the way, what was the test system (or systems)?

Linux 2.4 (Mandrake 8) on a dual AMD 2000+ w/ 2Gb RAM.

>
>>
>>>  * The MySQL spatial index is not null-safe. Marking the geometry 
>>> column
>>>  as "NOT NULL" is required in order to build a MySQL spatial index.  
>>> The
>>>  PostGIS spatial index is null safe.
>>
>> We're working on this.
>
> To confirm: Spatial columns included in an index must indeed currently
> be declared NOT NULL.
>
>>
>>>  * We did three test queries to compare the retrieval times on the
>>>  spatial index: first with a very small bounding box (enclosing 6
>>>  records), then with a medium box (70K records), then with a huge box
>>>  (all 200K records).
>>
>> When we tried PostGIS against MySQL we got almost equal productivity 
>> on
>> selects.
>> Can i get data you used for these tests?

I'll have to check. I fear that the data we used was proprietary to one 
of our clients (but convenient to us). We can create a new similar test 
set from a non-proprietary data source.

After publishing the note, we had some feedback from people to the 
effect that we were probably seeing cache effects between the two 
systems.  The PostGIS index is smaller (alot smaller) so it was cached 
in memory (my correspondent pointed out that our 2ms result was faster 
than an expected disk seek time).

So I repeated the (small area select) test in two ways. First I forced 
both systems to read off the disk (by cold starting both systems and 
feeding them the initial query right away). And they both gave the same 
times, about 20ms.

But cold reading off the disk is not normal operational behavior. So I 
ensured that MySQL had enough (300Mb) shared memory space to cache the 
whole index (which at 22Mb was actually larger than the table from 
which it was derived) and then ran the test. The cold search for both 
systems remained 20ms, as the indexes were pulled off the disk. After 
that postgis (150Mb of shared memory) was doing 2-3ms searches, and the 
mysql searches varied from 10-20ms.

>>
>>>  * MySQL spatial supports only 2D geometries.
>>
>> We don't have any plans here.
>>
>>>  * This String/Geometry duality causes the structure of the code to
>>>  fairly complex. Geometry processing is done mostly in the String
>>>  handling section of the code base, but also appears in the SQL 
>>> parser
>>>  and other scattered parts of the system.  We have no idea why they 
>>> did
>>>  this, but it sure makes it difficult to follow the logic in the 
>>> system.
>>
>> We couldn't find a better way to put OpenGIS recomendation in MySQL 
>> parser.
>> Probably there's better solution. We plan to review this part of the 
>> code.
>>
>>>  * We did not look too deeply at the indexing source code, but it in 
>>> the
>>>  myisam/ directory, so we assume it is either built inside a myisam
>>>  structure or is only useable on myisam tables.  Either way, 
>>> transactions
>>>    and ACID guarantees will be lacking.
>>
>> I don't know the InnoDB code, but I suspect that implementation of
>> spatial indexes will differ significally here.
>>
>> Regards.
>> HF
>
>
> Some additional comments in reply to other points in Paul's original
> message:
>
>
>> * MySQL stores the spatial data internally as WKB. Retrieving WKB 
>> should
>> be very fast, but we were not able to test that directly at this time.
>
> The internal structure is closer to WKB + SRID.
>
>
>> * The MySQL code contains an assumption in the WKB handling that all 
>> WBK
>> will be little endian. If you run MySQL spatial on a big endian 
>> machine
>> (Sparc, PowerPC, PA-RISC) you will find things go amiss when you try 
>> to
>> send WKB to little endian machines (x86), because your big endian
>> machine will be creating invalid data (big endian data incorrectly
>> flagged as little endian). Hopefully MySQL will fix this soon.
>
> I'm not sure about this. For example, I've directly copied the
> files corresponding to tables containing GIS data from Mac OS X
> (big-endian) to Red Hat on Intel (little-endian) and been able to
> use them with no problem.
>
> If you discovered a problem in this area, I'm sure the developers
> would like to see a repeatable test case so that they can address it.
> Thanks.
>

We would have to do some work for that. Our prices are reasonable :) 
But compiling on Solaris asking for WKB (AsBinary()) and then checking 
the results for validity against the WKB spec should be sufficient.

>>
>> * Spatial types are stored as WKB Strings in the MySQL database.  They
>> seem to have allocated a header on these for the SRID, but access to
>> this is unimplemented.
>
> I'm unclear what this means.  There is an SRID() function for obtaining
> the SRID of a geometry value.

Our mistake then.
>
>> * 4.1 is still just the "testing" release, but we did find it
>> surprisingly brittle.  Our attempts to create geometries in InnoDB
>> tables failed (fine) but created inconsistent system tables in the
>> process (not fine).  Spatial indexes seemed impossible to create, 
>> until
>> we realized the creation command seemed be sensitive to the case of 
>> the
>> referenced table name.
>
> This is most likely a manifestation of MySQL's general behavior that
> table (and database) names are case sensitive on machines with case
> sensitive filesystems, unless you run the server with the
> --lower-case-table-names=1 option.

Yes, one of our in-house MySQL experts pointed out that we were going 
to look a little foolish about mentioning that as a "problem" and we 
do. I don't know what the SQL92 spec says about case sensitivity. It 
was a surprise at the time :)

>>
>> * The MySQL developers made a real effort to follow the OpenGIS
>> specification, sometimes to their own detriment. There are four
>> different MBR comparison functions, with the correct (and confusing)
>> OpenGIS-style function names (MBRContains, MBRTouches, MBRIntersects,
>> MBRWithin). Intersects and Touches are subtly different things.
>
> Can you amplify more on what "to their own detriment" means?
> That it would have been better to implement these as operators
> rather than as functions?

It means that I think the OGC spec for naming the predicates is a 
fairly confusing, and that is about all it means. The MySQL guys 
implemented the OGC naming scheme correctly for their bbox operators 
(although they did not have to, it was a nice touch of consistency) and 
because the spec is confusing, using it is detrimental (but probably 
still a good idea). Clear as mud? It is just that the scheme is 
confusing. Our complete spatial predicates will use the same scheme 
(it's the spec, dammit!), and we'll have the same problems with it.  Do 
you know what Intersects() means? Touches()? :) Subtle enough 
differences that you basically need diagrams to explain the damn things.

      Paul Ramsey
      Refractions Research
      Email: pramsey at refractions.net
      Phone: (250) 885-0632