[GRASS-dev] new v.in.geonames: problems with UTF-8 Unicode text

Glynn Clements glynn at gclements.plus.com
Tue Jul 1 12:55:24 EDT 2008


Markus Neteler wrote:

> > After which, the file appears to import without any problems.
> >
> > I have committed a fix to G_tokenize(), and also enlarged the buffers
> > in v.in.ascii to 4000 bytes (although removing fixed limits altogether
> > would be better).
> 
> Excellent, now also the Polish file is getting imported.
> This closes
> http://trac.osgeo.org/grass/ticket/202
> 
> But:
> when using the SQLite driver, it cuts the "alternatename" field at
> 256 chars due to a hardcoded limitation in describe.c. I don't
> remember why not varchar() length is used there.

SQLite doesn't have VARCHAR. If you create a VARCHAR column, the
driver translates it to TEXT. The size you specify when the table is
created isn't stored anywhere.

Actually, SQLite doesn't really have column types at all. It remembers
the declaration type, although the SQLite driver doesn't use this; it
just returns the value types corresponding to the first row of a
"SELECT * FROM <table>" query.

[BTW, it appears that the PostgreSQL driver has a bug regarding TEXT
columns. get_column_info() (db/drivers/postgres/describe.c) doesn't
fill in the size field for TEXT columns, but describe_table() assumes
that it will always be filled in.]

> What about this patch?

> -           fsize = 255;
> +           fsize = 99999;

That will make every text column equivalent to VARCHAR(99999). That
may be problematic if you e.g. copy data to a "real" RDBMS (or any
file format which uses fixed-size fields), as it will create
VARCHAR(99999) columns, potentially wasting vast amounts of memory.

I would suggest lowering it to e.g. 1000. Or adding an environment
variable.

In the longer term, maybe the SQLite driver should try to store the
information which SQLite itself omits.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list