[GRASS-dev] Re: new v.in.geonames: problems with UTF-8 Unicode text

Ivan Shmakov ivan at theory.asu.ru
Mon Jul 7 12:29:52 EDT 2008


>>>>> Glynn Clements <glynn at gclements.plus.com> writes:

 >>> IIRC, the main problem is that sqlite3_column_decltype() only works
 >>> for actual columns, not expressions, subselects etc, but the code
 >>> in question has to be able to describe the format of rows returned
 >>> by arbitrary SELECT statements, not just tables.

 >>> So it falls back to sqlite3_column_type(), which returns the type
 >>> of the column's data. but that only works if you have a valid
 >>> row. If a select doesn't return any rows, you lose. Also,
 >>> sqlite3_column_type() only understands null, integer, float, text
 >>> and blob types.

 >> I wonder, how it's done for other RDBMS?

[...]

 > Operators such as || which operate on strings normally accept any
 > operand types. Anything that isn't a string will be converted to one.
 > But the result is always a string.

 > Typically, an operator is selected according to its operand types,
 > and the operator's result type determines the type of the result
 > column.  E.g. suppose that you have:

 > SELECT a + b FROM foo;

 > If a and/or b are REAL then the operator would be REAL * REAL -> REAL
 > and the result would be REAL. If a and b were both INT then the
 > operator would be INT * INT -> INT and the result would be INT.

	From the above I could conclude that the other RDBMS are
	generally representing the results of expressions in SELECTs as
	having one type out of a narrow set of types (in absence of
	explicit type casts.)  This way, SQLite doesn't look much
	different to those.

	Or, to put it differently, I see the discussed problem as not
	being specific to SQLite.

	I suspect however, that there may be a problem with
	sqlite3_column_type () describing the value of the field of a
	given row as being of the NULL type (as it may be possible for
	the other row to have a value of a different type in the same
	field.)  (And if sqlite3_column_type ()'s result could be even
	more volatile, then it is the problem.)

 >>> Apart from needing to fall-back to the data type for expressions,
 >>> another problem is that it coerces the decltype to one of SQLite's
 >>> limited set of types, then converts that to one of the DB_SQL_*
 >>> types, losing information in the process.

 >> That doesn't feel sound.

 >>> However, I don't know if it does this for a specific reason. I
 >>> suspect the only way to find out is to try it and see what breaks.

 > One issue is that SQLite doesn't validate the column types. It uses
 > some heuristics to determine the affinity type, but ultimately it
 > allows any sequence of words along with an optional size specifier,
 > e.g.:

 > CREATE TABLE foo ( i INT, t TEXT, b ROAST BEEF(99) );

	Of course.  So, it becomes the user's responsibility to make
	sure that the definition of the table in question is compatible
	with the RDBMS to which it's copied.

	I guess that much the same way, once the table is defined to use
	any types specific to an RDBMS implementation, it cannot be
	readily copied to any other RDBMS.

	The only SQLite-specific problem that I see there is that it
	doesn't actually treat the type name as a constraint on the
	values put into that field.  But then, ``garbage in, garbage
	out'' is the guiding principle.



More information about the grass-dev mailing list