[Benchmarking] Pgsql set up

Martin Daly Martin.Daly at cadcorp.com
Mon Aug 9 04:24:30 EDT 2010


> The problem is now clear to me, thanks for following up. The data in
> the DBF files is in fact kind of corrupt. That is, it is in a mix of
> encodings. Most of it appears to be UTF8, but there are also some
> LATIN1 characters in there too. And LATIN1 characters are illegal in
> UTF8. So you can try to load using UTF8 and the loader will fail when
> it hits the illegal character. Or you can load using LATIN1, and the
> UTF8 characters will end up as random-looking hibit characters in the
> database.

Is it possible that the pre-merged source DBF files had a mixture of different codepages/encodings, and this was missed in the merge?

I'm not suggesting that anyone starts again, just that if we can find out why the data that we are using is like this then we can explain away the duff labels.

> Soooo.... not sure where that leaves us. I just checked the latest
> version of shp2pgsql in SVN and we don't have support for handling
> corrupt strings in UTF8 right now. I guess that could be my
> contribution to the benchmarking effort. In the meanwhile, I'm pleased
> to see OSM continuing their strong devotion to standards.

This is national mapping agency data, not that new fangled OSM stuff.

Martin
****************************************************************************
This email is confidential and may be privileged and should not be used, read
or copied by anyone who is not the  original intended recipient. If you have
received this email in error  please inform the sender and delete it from
your mailbox or any other storage mechanism. Unless specifically stated,
nothing in this email constitutes an offer by Cadcorp and Cadcorp does not
warrant that any information contained in this email is accurate.
Cadcorp cannot accept liability for any statements made which are clearly the
sender's own and not expressly made on behalf of Cadcorp or one of its agents.
Please rely on your own virus check. No responsibility is taken by Cadcorp
for any damage arising out of any bug or virus infection.
****************************************************************************



More information about the Benchmarking mailing list