[postgis-users] Re: MySQL 4.1 GIS Functionality

Tue Aug 5 10:25:09 PDT 2003

At 10:17 -0700 8/5/03, Paul Ramsey wrote:
>In order to retain context, I'll not edit out any of the document, 
>which means the nesting is all now +1. :)

Okay, I'll do the same.

>
>On Tuesday, August 5, 2003, at 09:49 AM, Paul DuBois wrote:
>
>>Sorry for this delayed response to a message that was posted some
>>weeks ago. I only recently became aware of the original message,
>>because I subscribed after June 19, 2003 when it was posted.
>>
>>Thanks to Paul Ramsey (and Dave Blasby?) for taking a look at
>
>The both of us. Dave did the lion's share of the write up.
>
>>MySQL's GIS support and giving it a test. I have a couple of pieces
>>of information in reply from Alexey Botchkov, one of the MySQL AB
>>developers.  Replies to him can be sent through me, either at
>>paul at kitebird.com or paul at mysql.com.  Or you can write to Alexey
>>directly at hf at mysql.com, but please also cc: me so that I can
>>follow the discussion.  Thanks.
>>
>>The doubly quoted sections below are by Paul Ramsey, the singly quoted
>>sections are by Alexey. I have also added a few comments of my own.
>>
>>>>  * Spatial data (geometry) only works with the MyISAM table type.
>>>
>>>We need a kind of RTree indexes implemented in InnoDB to make spatial
>>>extentions working with transactional bases.
>>
>>To confirm: Spatial column types indeed currently are supported only
>>for MyISAM tables.
>>
>>>
>>>>  * Trying to create create an InnoDB table with a geometry column, will
>>>>  crash the server and muddle the system tables.
>>>
>>>Yes, we should prevent creating spatial fields in InnoDB now.
>>
>>This means that you now will get an error if you try to create a
>>table other than a MyISAM table that includes spatial columns.
>>For example:
>>
>>mysql> CREATE TABLE t (g GEOMETRY) TYPE = INNODB;
>>ERROR 1178 (42000): The storage engine for the table doesn't support GEOMETRY
>>
>>Both Alexey and I were surprised that Paul experienced a server crash
>>and muddled system tables. (By "system tables", I assume what was meant
>>was the grant tables in the mysql database?  If not, can you clarify?)
>>Alexey suspects this was a problem in an earlier version that has been
>>corrected.
>
>Dave can perhaps clarify. It has been a couple months. As I recall, 
>the system thought we had a table which did not in fact exist 
>(because the creation had failed).
>
>>By the way, what was the test system (or systems)?
>
>Linux 2.4 (Mandrake 8) on a dual AMD 2000+ w/ 2Gb RAM.

Did you compile MySQL yourself or use a precompiled distribution?
(Reason I ask is that we have observed some problems with GCC
and GIS columns unless optimization is turned off, or at least down
to -O1.)

>
>>
>>>
>>>>  * The MySQL spatial index is not null-safe. Marking the geometry column
>>>>  as "NOT NULL" is required in order to build a MySQL spatial index.  The
>>>>  PostGIS spatial index is null safe.
>>>
>>>We're working on this.
>>
>>To confirm: Spatial columns included in an index must indeed currently
>>be declared NOT NULL.
>>
>>>
>>>>  * We did three test queries to compare the retrieval times on the
>>>>  spatial index: first with a very small bounding box (enclosing 6
>>>>  records), then with a medium box (70K records), then with a huge box
>>>>  (all 200K records).
>>>
>>>When we tried PostGIS against MySQL we got almost equal productivity on
>>>selects.
>>>Can i get data you used for these tests?
>
>I'll have to check. I fear that the data we used was proprietary to 
>one of our clients (but convenient to us). We can create a new 
>similar test set from a non-proprietary data source.
>
>After publishing the note, we had some feedback from people to the 
>effect that we were probably seeing cache effects between the two 
>systems.  The PostGIS index is smaller (alot smaller) so it was 
>cached in memory (my correspondent pointed out that our 2ms result 
>was faster than an expected disk seek time).
>
>So I repeated the (small area select) test in two ways. First I 
>forced both systems to read off the disk (by cold starting both 
>systems and feeding them the initial query right away). And they 
>both gave the same times, about 20ms.
>
>But cold reading off the disk is not normal operational behavior. So 
>I ensured that MySQL had enough (300Mb) shared memory space to cache 
>the whole index (which at 22Mb was actually larger than the table 
>from which it was derived) and then ran the test. The cold search 
>for both systems remained 20ms, as the indexes were pulled off the 
>disk. After that postgis (150Mb of shared memory) was doing 2-3ms 
>searches, and the mysql searches varied from 10-20ms.

You could also enable MySQL's query cache, in which case the second and
subsequent execution of a query should yield almost instantaneous
results. :-)

>
>>>
>>>>  * MySQL spatial supports only 2D geometries.
>>>
>>>We don't have any plans here.
>>>
>>>>  * This String/Geometry duality causes the structure of the code to
>>>>  fairly complex. Geometry processing is done mostly in the String
>>>>  handling section of the code base, but also appears in the SQL parser
>>>>  and other scattered parts of the system.  We have no idea why they did
>>>>  this, but it sure makes it difficult to follow the logic in the system.
>>>
>>>We couldn't find a better way to put OpenGIS recomendation in MySQL parser.
>>>Probably there's better solution. We plan to review this part of the code.
>>>
>>>>  * We did not look too deeply at the indexing source code, but it in the
>>>>  myisam/ directory, so we assume it is either built inside a myisam
>>>>  structure or is only useable on myisam tables.  Either way, transactions
>>>>    and ACID guarantees will be lacking.
>>>
>>>I don't know the InnoDB code, but I suspect that implementation of
>>>spatial indexes will differ significally here.
>>>
>>>Regards.
>>>HF
>>
>>
>>Some additional comments in reply to other points in Paul's original
>>message:
>>
>>>* MySQL stores the spatial data internally as WKB. Retrieving WKB should
>>>be very fast, but we were not able to test that directly at this time.
>>
>>The internal structure is closer to WKB + SRID.
>>
>>>* The MySQL code contains an assumption in the WKB handling that all WBK
>>>will be little endian. If you run MySQL spatial on a big endian machine
>>>(Sparc, PowerPC, PA-RISC) you will find things go amiss when you try to
>>>send WKB to little endian machines (x86), because your big endian
>>>machine will be creating invalid data (big endian data incorrectly
>>>flagged as little endian). Hopefully MySQL will fix this soon.
>>
>>I'm not sure about this. For example, I've directly copied the
>>files corresponding to tables containing GIS data from Mac OS X
>>(big-endian) to Red Hat on Intel (little-endian) and been able to
>>use them with no problem.
>>
>>If you discovered a problem in this area, I'm sure the developers
>>would like to see a repeatable test case so that they can address it.
>>Thanks.
>>
>
>We would have to do some work for that. Our prices are reasonable :) 
>But compiling on Solaris asking for WKB (AsBinary()) and then 
>checking the results for validity against the WKB spec should be 
>sufficient.
>
>>>
>>>* Spatial types are stored as WKB Strings in the MySQL database.  They
>>>seem to have allocated a header on these for the SRID, but access to
>>>this is unimplemented.
>>
>>I'm unclear what this means.  There is an SRID() function for obtaining
>>the SRID of a geometry value.
>
>Our mistake then.
>>
>>>* 4.1 is still just the "testing" release, but we did find it
>>>surprisingly brittle.  Our attempts to create geometries in InnoDB
>>>tables failed (fine) but created inconsistent system tables in the
>>>process (not fine).  Spatial indexes seemed impossible to create, until
>>>we realized the creation command seemed be sensitive to the case of the
>>>referenced table name.
>>
>>This is most likely a manifestation of MySQL's general behavior that
>>table (and database) names are case sensitive on machines with case
>>sensitive filesystems, unless you run the server with the
>>--lower-case-table-names=1 option.
>
>Yes, one of our in-house MySQL experts pointed out that we were 
>going to look a little foolish about mentioning that as a "problem" 
>and we do. I don't know what the SQL92 spec says about case 
>sensitivity. It was a surprise at the time :)
>
>>>
>>>* The MySQL developers made a real effort to follow the OpenGIS
>>>specification, sometimes to their own detriment. There are four
>>>different MBR comparison functions, with the correct (and confusing)
>>>OpenGIS-style function names (MBRContains, MBRTouches, MBRIntersects,
>>>MBRWithin). Intersects and Touches are subtly different things.
>>
>>Can you amplify more on what "to their own detriment" means?
>>That it would have been better to implement these as operators
>>rather than as functions?
>
>It means that I think the OGC spec for naming the predicates is a 
>fairly confusing, and that is about all it means. The MySQL guys 
>implemented the OGC naming scheme correctly for their bbox operators 
>(although they did not have to, it was a nice touch of consistency) 
>and because the spec is confusing, using it is detrimental (but 
>probably still a good idea). Clear as mud? It is just that the 
>scheme is confusing. Our complete spatial predicates will use the 
>same scheme (it's the spec, dammit!), and we'll have the same 
>problems with it.  Do you know what Intersects() means? Touches()? 
>:) Subtle enough differences that you basically need diagrams to 
>explain the damn things.

Gotcha.  Thanks.

>
>      Paul Ramsey
>      Refractions Research
>      Email: pramsey at refractions.net
>      Phone: (250) 885-0632
>
>
>_______________________________________________
>postgis-users mailing list
>postgis-users at postgis.refractions.net
>http://postgis.refractions.net/mailman/listinfo/postgis-users