[postgis-devel] shp2pgsql - What's Worse, Slow Load, or Bad Types

Markus Schaber schabios at logi-track.com
Tue Jun 22 07:05:54 PDT 2004


Hi, Paul,

On Thu, 17 Jun 2004 11:19:02 -0700
Paul Ramsey <pramsey at refractions.net> wrote:

> Right now, shp2pgsql figures out what types to use for columns based
> on the DBF header. Trouble is, the field size defined in the header is
> often much larger than that needed for the data. People use large 
> headers "for safety", etc. The result, particularly for integers, is 
> that we are getting bigint column types alot more often than we need
> to. The only way to "fix" this would be to scan the whole input file
> for the maximum values of all integer fields, and then set the field
> type appropriately.
> 
> Problem worth addressing, or bugbear?

I would generally avoid any 'automagic' tweaking of the data types, as
you don't know whether the customer wants to add some more shape files
to the same table and so hits the smaller datatype's limit.

Currently, we have the NavTeQ data sets, which comes as a set of approx.
40 CDs. Inserting the first file, I don't want to get a bad schema just
because this shape file has no gid value >32bit.

However, it might be interesting to define such mappings in a config
file (in fact, we're just working on a solution like this in java using
geotools), so that we can say e. G. for all street.shp files, map field
highway (which may be a one-char String telling yes or no) to boolean,
and map field areaid (which appears to be a string) to a bigint.

Thanks for your patience,
Markus

-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios at logi-track.com | www.logi-track.com



More information about the postgis-devel mailing list