[postgis-users] shp2pgsql and schemas PROPOSED PLAN

strk strk at keybit.net
Wed Dec 17 08:19:38 PST 2003


m.cave-ayland wrote:
> Hi strk,
> 
> Some thoughts in reply to your post:
> 
> > -----Original Message-----
> > From: postgis-users-bounces at postgis.refractions.net 
> > [mailto:postgis-users-bounces at postgis.refractions.net] On 
> > Behalf Of strk
> > Sent: 16 December 2003 08:12
> > To: PostGIS Users Discussion; postgis-devel at postgis.refractions.net
> > Cc: David Blasby; Paul Ramsey
> > Subject: Re: [postgis-users] shp2pgsql and schemas
> > 
> > 
> > Changing the AddGeometryColumn will break compatibility with 
> > dumps created by old installations. It might be wiser to
> > have a polyphormic function with a different number of args
> > for schema-aware behaviours.
> 
> It looks as if this will be the case anyway, since looking at Page 40 of
> the OGC spec the current PostGIS implementation of AddGeometryColumn()
> is missing the schema parameter (even though the database name is
> entered into the geometry_columns table). Also, I'm not convinced that
> changing AddGeometryColumn would break dumps, since dumps just restore
> the geometry_columns table and the appropriate columns without calling
> the AddGeometryColumn function itself.
> 
> Also bear in mind that most users simply want to restore a complete
> database dump in the case of hardware/disk failure and so this becomes
> irrelevant as the version of PostGIS restored will be exactly the same
> one that was dumped, although this has never worked for me without
> manual intervention :)

Sorry - by "dump" I meant output from shp2pgsql.
The pg_dump output - as you say - is not a big deal since it will
restore old postgis version as well. But files produced by shp2pgsql
contain calls to AddGeometryColumn using a parameter reported to be
( and assured to be ) the name of the database it was intended to be
fed to. What I do when in need of upgrading is upgrade postgresql, then
feed again all the shp2pgsql outputs kept around for just this purpose.
I might be crazy in doing so but I have a good backup level.

Since database name was required when I run shp2pgsql it is now embedded
in all these sql files. I'd like to be able to keep using them with new
postgis installation. To avoid confusion with old and new shp2pgsql output
we need a way to distinguish between them, so declaring current 
function obsolete (as you suggest below) and introducing a new function 
(with one more argument probably) could allow for detection of old dumps
and special handling of it (warn user about obsolescenze, discard of 
first parameter).

> 
> > Moreover since now the f_table_schema column is filled with
> > the wrong value we might require a call to fix_geometry_columns
> > making it also cleanup those values when having the same value
> > of the database name. On the other hand, this could do the wrong
> > thing if somebody is using the schema-aware version with schemas
> > that are *really* named as the database.
> 
> See above....
> 
> For every dump/restore of PostGIS I've ever done, I've had to manually
> re-order the pg_dump catalogue in order to get it to restore correctly.
> Normally I do something like this:
> 
> 	1. Take pg_dump of existing database
> 	2. Drop database
> 	3. Recreate database
> 	4. Use pg_restore -l to generate a list file
> 	5. Remove all references to procedural languages from the list
> file
> 	(because they never occur in the right order during normal
> restore)
> 	6. Remove all references to PostGIS functions from the list file
> 	(because they never occur in the right order during normal
> restore)
> 	7. Move all index/trigger creation commands to the end of the
> list file
> 	8. Createlang any procedural languages in the new database
> 	9. Reload postgis.sql
> 	10. pg_restore -s (schema only - this tests that I've got the
> restore order right 	w/o attempting to load the data)
> 	11. pg_restore (data and indexes)
> 
> 
> So basically, if I decide to upgrade the version of PostGIS at the same
> time (as opposed to restoring the same version), then after all the
> manual work that goes into restoring the DB it makes no difference to
> run a PostgisUpdate() or whatever :)

I agree, this would be the case for the pg_dump way as opposed to my
way described above.

> 
> The logic behind this could be quite simple:
> 
> 	For each row in the geometry_columns table:
> 		Does the table f_schema_name.f_table_name exist in this
> database?
> 			Yes, leave the entry as it is
> 			No, set f_schema_name to 'public'
> 	Next

For the No branch I would set f_schema_name to null (we might want to use
the search_path).

> 
> The only case I could see this not working is if someone had tweaked the
> geometry_columns table by hand, but if they can do that, I'm sure they
> can fix any resulting problems that may occur :)
> 
> Thinking more about it, the one function that concerns me is find_srid()
> as it is used in nearly all of our queries :(. Perhaps a polymorphic
> version of this should also be created, so using the old version will
> raise notice that it is deprecated and assume the default schema to be
> 'public'.

I think find_srid can be coded in such a way to avoid obsolescenze.
It might "suggest" to run PostgisUpdate in case it finds an entry 
with an unknown schema in f_schema_name.

> 
> Finally if a polymorphic version of AddGeometryColumn() were created, I
> would expect a similar behaviour for the 'old' version, i.e. raise a
> notice saying that the old version is deprecated in place of the new
> version, and that by default the table will be assumed to be in the
> public schema.
> 
> > Yet another possibility is adding a new column to specify wheter
> > or not we are dealing with schema-aware records. Old records will
> > have a 'null' value in this new column, while new one will have
> > something. Note that this will require loading an 
> > "postgis_upgrade.sql"
> > script to enlarge the geometry_columns definition.
> 
> My general experience tells me that adding flags like this is a bad
> idea, especially given the huge amount of knowledge currently required
> to restore a dump - it can only make the application more unreadable,
> and saves those countless hours of head-scrathing months later when
> you've added a geometry column and none of the geometric operators work
> with it...

Let's try without.

So, my plan is:

	1) Make an AddGeometryColumn accepting one more argument (catalogue)
	   to be the next "stable" implementation of it. Note that the
	   catalogue argument will be checked to be NULL or a warning will
	   be issued stating that catalogue support is not present.

	2) Update current AddGeoemtryColumn() so to discard "database"
	   argument and warn about its obsolescenze and availability
	   of newer function.

	3) Update shp2pgsql to allow for schema definition and call the
	   new AddGeometryColumn, removing "database" reference from
	   short help message and external documentations.

	4) Update fix_geometry_columns to also check for these "wrong"
	   schema errors.

	5) Update find_srid to consider new layout and warn if something
	   weird is found "suggesting" to run fix_geometry_columns.

Comments welcome.

--strk;




More information about the postgis-users mailing list