[postgis-devel] Loader cleanup

strk at refractions.net strk at refractions.net
Wed Apr 6 03:44:15 PDT 2005


I've finished the cleanup step.
I didn't change anything in the algorithm, just:
	- made main() much more concise
	- fixed a bug in -w handling (for hwgeom)

Now I'd go on with:

	- GID omission.

	  Current loader explicitly set the gid when NOT
	  in append mode. I'd make it *always* act like that
	  and rely on the 'serial' type. Do you see any
	  problem with that ?

	- TRANSACTION

	  Current loader breaks insert lines in multiple
	  transaction blocks. It has been noted that this
	  should speed things up, but I'm afraid it would
	  add half-shapes (skipping the bogus transaction
	  block and continue with the rest). Would it be
	  ok for you wrapping the whole code in a single
	  transaction relying on dump mode (-D) for speed
	  reasons ?

	- NULL shapes

	  Current loader checks shapefile and exits with
	  an error if all shapes are null (or contain no
	  vertexes). This requires two shapefile scans
	  (check scan, load scan). Since the load scan
	  copes cleanly with null shapes what about dropping
	  the first scan allowing for a all-nulls shapefile
	  to produce an all-nulls geometry features table ?
	  (attributes would still be loaded!)

	- PREPARE mode

	  I think we all know about this ;)

	- FIELD TYPE handling

	  This is Marks' thread. Didn't check it out yet.

CVS contains work done so far.

--strk;

On Wed, Apr 06, 2005 at 12:25:56PM +0200, strk at refractions.net wrote:
> Markus, I'm doing the big cleanup myself.
> Please stop producing patches or I'll feel too guilty :P
> I'll put the -p switch in.
> 
> --strk;
> 
> On Tue, Apr 05, 2005 at 10:44:22PM +0200, Markus Schaber wrote:
> > Hi, @all,
> > 
> > Here's another version of the patch. Apart from the documentation
> > updates, it tries to be as less invasive as possible. I even did not
> > re-indent the if()-encapsulated main loop.
> > 
> > Markus
> 
> > Index: doc/postgis.xml
> > ===================================================================
> > RCS file: /home/cvs/postgis/postgis/doc/postgis.xml,v
> > retrieving revision 1.135
> > diff -u -r1.135 postgis.xml
> > --- doc/postgis.xml	5 Apr 2005 08:00:07 -0000	1.135
> > +++ doc/postgis.xml	5 Apr 2005 20:39:38 -0000
> > @@ -1283,13 +1283,23 @@
> >            </varlistentry>
> >  
> >            <varlistentry>
> > +            <term>-p</term>
> > +
> > +            <listitem>
> > +              <para>Only produces the table creation SQL code, without adding 
> > +              any actual data. This can be used if you need to completely
> > +              separate the table creation and data loading steps.</para>
> > +            </listitem>
> > +          </varlistentry>
> > +
> > +          <varlistentry>
> >              <term>-D</term>
> >  
> >              <listitem>
> > -              <para>Creates a new table and populates it from the Shape file.
> > -              This uses the PostgreSQL "dump" format for the output data and
> > -              is much faster to load than the default "insert" SQL format. Use
> > -              this for very large data sets.</para>
> > +              <para>Use the PostgreSQL "dump" format for the output data. This 
> > +              can be combined with -a, -c and -d. It is much faster to load
> > +              than the default "insert" SQL format. Use this for very large data
> > +              sets.</para>
> >              </listitem>
> >            </varlistentry>
> >  
> > @@ -1334,6 +1344,8 @@
> >  
> >          </variablelist>
> >  
> > +        <para>Note that -a, -c, -d and -p are mutually exclusive.</para>
> > +
> >          <para>An example session using the loader to create an input file and
> >          uploading it might look like this:</para>
> >  
> > Index: doc/man/shp2pgsql.1
> > ===================================================================
> > RCS file: /home/cvs/postgis/postgis/doc/man/shp2pgsql.1,v
> > retrieving revision 1.3
> > diff -u -r1.3 shp2pgsql.1
> > --- doc/man/shp2pgsql.1	5 Apr 2005 13:43:50 -0000	1.3
> > +++ doc/man/shp2pgsql.1	5 Apr 2005 20:39:38 -0000
> > @@ -26,8 +26,14 @@
> >  Creates a new table and populates it from the Shape file. This is the default mode.
> >  
> >  .TP 
> > +\fB\-p\fR
> > +Only produces the table creation SQL code, without adding any actual data. This can
> > +be used if you need to completely separate the table creation and data loading steps.
> > +
> > +.TP 
> >  \fB\-D\fR
> > -Use the PostgreSQL "dump" format for the output data. This can be combined with -d, -a and -c and is much faster to load than the default "insert" SQL format. Use this for very large data sets.
> > +Use the PostgreSQL "dump" format for the output data. This can be combined with -a, -c and -d.
> > +It is much faster to load than the default "insert" SQL format. Use this for very large data sets.
> >  
> >  .TP 
> >  \fB\-s\fR <\fISRID\fR>
> > @@ -47,6 +53,9 @@
> >  Note that this will introduce coordinate drifts and will drop
> >  M values from shapefiles.
> >  
> > +.LP
> > +Note that -a, -c, -d and -p are mutually exclusive.
> > +
> >  .SH "EXAMPLES"
> >  .LP 
> >  An example session using the loader to create an input file and uploading it might look like this:
> > Index: loader/README.shp2pgsql
> > ===================================================================
> > RCS file: /home/cvs/postgis/postgis/loader/README.shp2pgsql,v
> > retrieving revision 1.3
> > diff -u -r1.3 README.shp2pgsql
> > --- loader/README.shp2pgsql	4 May 2002 22:44:04 -0000	1.3
> > +++ loader/README.shp2pgsql	5 Apr 2005 20:39:38 -0000
> > @@ -38,19 +38,23 @@
> >  
> >  The options are as follows:
> >  
> > -(-a || -c || -d) these options are mutually exclusive.
> > +(-a || -c || -d || -p) these options are mutually exclusive.
> >  
> >    -a    Append mode. Do not delete the target table or try to create
> >          a new table, simple insert the data into the existing table.
> >          A table will have to exist for this to work, it is usually
> > -        used after a create mode as been run once.(mutually exclusive
> > -	with -c and -d)
> > +        used after a create mode as been run once or after -p. (mutually
> > +        exclusive with -c, -d and -p)
> >    -c    Create mode. This is the default mode is no other is specified.
> >  	Create a new table and upload the data into that table.
> > -	(mutually exclusive with -a and -d)
> > +	(mutually eclusive with -a, -d and -p)
> >    -d    Delete mode. Delete the database table named <tablename>, then
> >  	create a new one with that name before uploading the data into
> > -	the new empty database table.(mutually exclusive with -a and -c)
> > +	the new empty database table. (mutually exclusive with -a, -c 
> > +        and -p)
> > +  -p    Prepare mode. Read the table schema from the shape file and 
> > +        create the new table, but do not insert any data. (mutually
> > +        exclusive with -a, -c and -d)
> >  
> >    -D Dump. When inserting the data into the table use 'dump' format.
> >  	Dump format is used by PostgreSQL for large data dumps and 
> > Index: loader/shp2pgsql.c
> > ===================================================================
> > RCS file: /home/cvs/postgis/postgis/loader/shp2pgsql.c,v
> > retrieving revision 1.84
> > diff -u -r1.84 shp2pgsql.c
> > --- loader/shp2pgsql.c	4 Apr 2005 20:51:26 -0000	1.84
> > +++ loader/shp2pgsql.c	5 Apr 2005 20:39:40 -0000
> > @@ -532,9 +532,9 @@
> >  	printf("BEGIN;\n");
> >  
> >  	//if opt is 'a' do nothing, go straight to making inserts
> > -	if(opt == 'c' || opt == 'd') create_table();
> > +	if(opt == 'c' || opt == 'd' || opt == 'p') create_table();
> >  
> > -	if (dump_format){
> > +	if (dump_format && opt != 'p'){
> >  		if ( schema )
> >  		{
> >  			printf("COPY \"%s\".\"%s\" %s FROM stdin;\n",
> > @@ -555,8 +555,9 @@
> >   *   MAIN SHAPE OBJECTS SCAN
> >   * 
> >   **************************************************************/
> > -	for (j=0;j<num_entities; j++)
> > -	{
> > +	if (opt != 'p') { /*only if we do not have prepare mode*/
> > +	    for (j=0;j<num_entities; j++)
> > +	    {
> >  		//wrap a transaction block around each 250 inserts...
> >  		if ( ! dump_format )
> >  		{
> > @@ -646,12 +647,12 @@
> >  		
> >  		SHPDestroyObject(obj);	
> >  
> > -	} // END of MAIN SHAPE OBJECT LOOP
> > +	    } // END of MAIN SHAPE OBJECT LOOP
> >  
> >  
> > -	if ((dump_format) ) {
> > +	    if ((dump_format) ) {
> >  		printf("\\.\n");
> > -
> > +	    }
> >  	} 
> >  
> >  	free(col_names);
> > @@ -660,7 +661,7 @@
> >  		if ( schema )
> >  		{
> >  			printf("\nALTER TABLE ONLY \"%s\".\"%s\" ADD CONSTRAINT \"%s_pkey\" PRIMARY KEY (gid);\n",schema,table,table);
> > -			if(j > 1)
> > +			if(j > 1 && opt != 'p')
> >  			{
> >  				printf("SELECT setval ('\"%s\".\"%s_gid_seq\"', %i, true);\n", schema, table, j-1);
> >  			}
> > @@ -668,7 +669,7 @@
> >  		else
> >  		{
> >  			printf("\nALTER TABLE ONLY \"%s\" ADD CONSTRAINT \"%s_pkey\" PRIMARY KEY (gid);\n",table,table);
> > -			if(j > 1){
> > +			if(j > 1 && opt != 'p'){
> >  				printf("SELECT setval ('\"%s_gid_seq\"', %i, true);\n", table, j-1);
> >  			}
> >  		}
> > @@ -783,13 +784,14 @@
> >  	fprintf(stderr, "OPTIONS:\n");
> >  	fprintf(stderr, "  -s <srid>  Set the SRID field. If not specified it defaults to -1.\n");
> >  	fprintf(stderr, "\n");
> > -	fprintf(stderr, "  (-d|a|c) These are mutually exclusive options:\n");
> > +	fprintf(stderr, "  (-d|a|c|p) These are mutually exclusive options:\n");
> >  	fprintf(stderr, "      -d  Drops the table , then recreates it and populates\n");
> >  	fprintf(stderr, "          it with current shape file data.\n");
> >  	fprintf(stderr, "      -a  Appends shape file into current table, must be\n");
> >  	fprintf(stderr, "          exactly the same table schema.\n");
> >  	fprintf(stderr, "      -c  Creates a new table and populates it, this is the\n");
> >  	fprintf(stderr, "          default if you do not specify any options.\n");
> > +	fprintf(stderr, "      -p  Prepare mode, only creates the table\n");
> >  	fprintf(stderr, "\n");
> >  	fprintf(stderr, "  -g <geometry_column> Specify the name of the geometry column\n");
> >  	fprintf(stderr, "     (mostly useful in append mode).\n");
> > @@ -1372,7 +1374,7 @@
> >  	int curindex=0;
> >  	char  *ptr;
> >  
> > -	while ((c = getopt(ARGC, ARGV, "kcdaDs:g:iW:w")) != EOF){
> > +	while ((c = getopt(ARGC, ARGV, "kcdapDs:g:iW:w")) != EOF){
> >                 switch (c) {
> >                 case 'c':
> >                      if (opt == ' ')
> > @@ -1392,6 +1394,12 @@
> >                      else
> >                           return 0;
> >                      break;
> > +	       case 'p':
> > +                    if (opt == ' ')
> > +                         opt ='p';
> > +                    else
> > +                         return 0;
> > +                    break;
> >  	       case 'D':
> >  		    dump_format =1;
> >                      break;
> 
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> 
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel



More information about the postgis-devel mailing list