[postgis-users] MySQL 4.1 GIS Functionality
Paul Ramsey
pramsey at refractions.net
Thu Jun 19 14:42:08 PDT 2003
Dave and I have spent a little time over the last few days investigating
the spatial capabilities which have been added to the MySQL database as
of the 4.1 version. 4.1 is the current "testing" version, and 4.0 is the
current "shipping" version.
1. Notes from Testing
---------------------
* Spatial data (geometry) only works with the MyISAM table type. MySQL
supports multiple tables types, with different performance
characteristics. MyISAM is supposed to have high performance, but no
transactional capability. So that means that you cannot have
transactionally managed spatial updates.
* Trying to create create an InnoDB table with a geometry column, will
crash the server and muddle the system tables. The systems tables must
be MyISAM, since this is an effect of a failed transaction.
* The Spatial Index is almost certainly either tied to MyISAM or stored
in a MyISAM-type structure. At best this leaves your index vulnerable
to ACID transactional issues.
* We used a 200K segment road database to test data handling.
* Data load time for the test table was significantly faster in MySQL.
To load 200K rows it took 34 seconds on MySQL and 71 seconds on PgSQL.
MyISAM tables are noted for being fast at digesting nontransactional
inserts. PgSQL has to manage the inserts within a transactional context.
* Spatial index creation took about 14 seconds in MySQL and 9 seconds in
PostGIS. This is probably because MySQL uses the quadatic pick-split
algorthym and PostGIS uses a linear-time version.
* The MySQL spatial index is not null-safe. Marking the geometry column
as "NOT NULL" is required in order to build a MySQL spatial index. The
PostGIS spatial index is null safe.
* We did three test queries to compare the retrieval times on the
spatial index: first with a very small bounding box (enclosing 6
records), then with a medium box (70K records), then with a huge box
(all 200K records).
* To search for a very small area (6 records) in the 200,000 row table
took about 20 milliseconds on MySQL and about 2 milliseconds on PostGIS.
* To search for a medium sized area (70,000 records) in the 200,000 row
table took about 1.03 seconds in MySQL and 0.25 seconds in PostGIS.
* To search for a very large area (all records) in the 200,000 row table
took about 2.25 seconds in MySQL and 0.75 seconds in PostGIS.
* MySQL spatial supports only 2D geometries.
* MySQL stores the spatial data internally as WKB. Retrieving WKB should
be very fast, but we were not able to test that directly at this time.
* Because the internal representation of MySQL geometries is simple WKB,
un-indexed bounding box searches are fairly slow. The database must
calculate the bounding box for each feature in order to do the comparisons.
2. Notes from Examining the Source Code
---------------------------------------
* The MySQL code contains an assumption in the WKB handling that all WBK
will be little endian. If you run MySQL spatial on a big endian machine
(Sparc, PowerPC, PA-RISC) you will find things go amiss when you try to
send WKB to little endian machines (x86), because your big endian
machine will be creating invalid data (big endian data incorrectly
flagged as little endian). Hopefully MySQL will fix this soon.
* Spatial types are stored as WKB Strings in the MySQL database. They
seem to have allocated a header on these for the SRID, but access to
this is unimplemented.
* Inside the MySQL database, the geometries are stored in exactly the
same type as String. The system does an Internal String -> Geometry
conversion (which very easy because the Geometry type is just WKB).
* This String/Geometry duality causes the structure of the code to
fairly complex. Geometry processing is done mostly in the String
handling section of the code base, but also appears in the SQL parser
and other scattered parts of the system. We have no idea why they did
this, but it sure makes it difficult to follow the logic in the system.
* We did not look too deeply at the indexing source code, but it in the
myisam/ directory, so we assume it is either built inside a myisam
structure or is only useable on myisam tables. Either way, transactions
and ACID guarantees will be lacking.
* MySQL support for user defined functions has some limitations as well.
It appears that in order to create a user defined function, the
programmer must allocate the memory for the results before seeing the
return objects. For large and variable sized things like geometries,
this could be problematic. There might be work-arounds for this.
* MySQL does not offer any real support for object-relational
functionality. PostGIS is built entirely on PostgreSQL’s
object-relational structure, making it very isolated from the postgresql
development and very easy to maintain. In fact, the only PgSQL-related
maintenance to PostGIS has occurred because of significant improvements
to the GIST index.
3. General Impressions
----------------------
* 4.1 is still just the "testing" release, but we did find it
surprisingly brittle. Our attempts to create geometries in InnoDB
tables failed (fine) but created inconsistent system tables in the
process (not fine). Spatial indexes seemed impossible to create, until
we realized the creation command seemed be sensitive to the case of the
referenced table name.
* The MySQL developers made a real effort to follow the OpenGIS
specification, sometimes to their own detriment. There are four
different MBR comparison functions, with the correct (and confusing)
OpenGIS-style function names (MBRContains, MBRTouches, MBRIntersects,
MBRWithin). Intersects and Touches are subtly different things.
* There are no full predicates in this implementation, the predicate
functions are all stubs against bounding box operations.
* There are no operators either (Intersection, Union, etc).
* Basically, it is a start. With some data loading tools and a little
more stability for general purpose stuff, it could be a passable web
serving platform.
--
__
/
| Paul Ramsey
| Refractions Research
| Email: pramsey at refractions.net
| Phone: (250) 885-0632
\_
More information about the postgis-users
mailing list