[GRASS5] [PATCHES]: pg.in.dbf

Thierry Laronde tlaronde at polynum.com
Sun Oct 12 17:05:49 EDT 2003


Hello,

I have fixed several problems and made ---I think--- some improvements.
Some bugs were serious ones. In particular, the `init' function in
pgdump.c was not initializing correctly a structure causing random 
data to be passed to PG. 
I have not looked in the bug base, but these should close some bugs
referenced there.

Here is the ChangeLog :


main.c:
	- Added option to set the DELIMITER to a user defined character
	value (only used for super-user when using COPY command);
	-> Now PgDumpFromDBF takes 3 arguments: filename, mode and delimiter
	- PgDumpFromDBF proto removed from main.c and added to 
	pgdump.h;
	- include <string.h> for strlen
	- include "pgdump.h" for PgDumpFromDBF proto
	- Default delimiter now changed to '\t';
	- Suppressed bogus parm structure ;
	- no_rattle -> normal_user: more explicit, match PgDumpFromDBF and
	... well the way the assignement was made was queer: changed too!
	- Added comments

pgdump.h:
	- created for PgDumpFromDBF proto

pgdump.c:
	-XXX in the `init' function used to initialize the my_string struct
	the initial string pointed to by the data member was not initialized
	to NULL. 
	Hence, the use of the function `append', not taking into account 
	the length of the string and using `strcat' directly concatenated 
	the random data found where my_string.data was pointing until it 
	found some '\0'
	=> Now my_string.data points to a null string at initialization;
	- include "pgdump.h" for PgDumpFromDBF proto
	- modified PgDumpFromDBF: now takes 4 arguments:
		filename, normal_user, delimiter, null_string
	- modified DBFDumpASCII: now takes 3 arguments:
		DBFHandle, File pointer, delimiter

-dbfopen.c:
	- the filename of the DBF file supposed to be opened was
	subject of extension changes, `strcmp' being used as if it was a
	matching function returning 0 if failed or not zero on success. 
	And indeed this is almost exactly the contrary.
	Furthermore, the combination of two conditions made the first 
	test succeeds in all cases, transforming every extension in ".dbf".

	-Last but not least, _without renaming or copying the file that
	was read not written_ this was this modified version of the 
	filename that was given to `fopen' resulting in a
	failure in every case except when the change to ".dbf" was a 
	null one, that is when the filename was already ending with ".dbf".


Cheers,
-- 
Thierry Laronde (Alceste) <tlaronde at polynum.org>
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C
-------------- next part --------------
diff -uN pg.in.dbf.old/ChangeLog pg.in.dbf/ChangeLog
--- pg.in.dbf.old/ChangeLog	Thu Jan  1 01:00:00 1970
+++ pg.in.dbf/ChangeLog	Sun Oct 12 22:14:40 2003
@@ -0,0 +1,46 @@
+main.c:
+	- Added option to set the DELIMITER to a user defined character
+	value (only used for super-user when using COPY command);
+	-> Now PgDumpFromDBF takes 3 arguments: filename, mode and delimiter
+	- PgDumpFromDBF proto removed from main.c and added to 
+	pgdump.h;
+	- include <string.h> for strlen
+	- include "pgdump.h" for PgDumpFromDBF proto
+	- Default delimiter now changed to '\t';
+	- Suppressed bogus parm structure ;
+	- no_rattle -> normal_user: more explicit, match PgDumpFromDBF and
+	... well the way the assignement was made was queer: changed too!
+	- Added comments
+
+pgdump.h:
+	- created for PgDumpFromDBF proto
+
+pgdump.c:
+	-XXX in the `init' function used to initialize the my_string struct
+	the initial string pointed to by the data member was not initialized
+	to NULL. 
+	Hence, the use of the function `append', not taking into account 
+	the length of the string and using `strcat' directly concatenated 
+	the random data found where my_string.data was pointing until it 
+	found some '\0'
+	=> Now my_string.data points to a null string at initialization;
+	- include "pgdump.h" for PgDumpFromDBF proto
+	- modified PgDumpFromDBF: now takes 4 arguments:
+		filename, normal_user, delimiter, null_string
+	- modified DBFDumpASCII: now takes 3 arguments:
+		DBFHandle, File pointer, delimiter
+
+-dbfopen.c:
+	- the filename of the DBF file supposed to be opened was
+	subject of extension changes, `strcmp' being used as if it was a
+	matching function returning 0 if failed or not zero on success. 
+	And indeed this is almost exactly the contrary.
+	Furthermore, the combination of two conditions made the first 
+	test succeeds in all cases, transforming every extension in ".dbf".
+
+	-Last but not least, _without renaming or copying the file that
+	was read not written_ this was this modified version of the 
+	filename that was given to `fopen' resulting in a
+	failure in every case except when the change to ".dbf" was a 
+	null one, that is when the filename was already ending with ".dbf".
+
diff -uN pg.in.dbf.old/dbfopen.c pg.in.dbf/dbfopen.c
--- pg.in.dbf.old/dbfopen.c	Sun Oct 12 22:21:52 2003
+++ pg.in.dbf/dbfopen.c	Sun Oct 12 19:37:20 2003
@@ -185,7 +185,6 @@
     DBFHandle		psDBF;
     uchar		*pabyBuf;
     int			nFields, nRecords, nHeadLen, nRecLen, iField;
-    char	        *pszDBFFilename;
 
 /* -------------------------------------------------------------------- */
 /*      We only allow the access strings "rb" and "r+".                  */
@@ -195,36 +194,16 @@
         return( NULL );
     
 /* -------------------------------------------------------------------- */
-/*	Ensure the extension is converted to dbf or DBF if it is 	*/
-/*	currently .shp or .shx.						*/    
-/* -------------------------------------------------------------------- */
-    pszDBFFilename = (char *) malloc(strlen(pszFilename)+1);
-    strcpy( pszDBFFilename, pszFilename );
-    
-    if( strcmp(pszFilename+strlen(pszFilename)-4,".shp")
-        || strcmp(pszFilename+strlen(pszFilename)-4,".shx") )
-    {
-        strcpy( pszDBFFilename+strlen(pszDBFFilename)-4, ".dbf");
-    }
-    else if( strcmp(pszFilename+strlen(pszFilename)-4,".SHP")
-             || strcmp(pszFilename+strlen(pszFilename)-4,".SHX") )
-    {
-        strcpy( pszDBFFilename+strlen(pszDBFFilename)-4, ".DBF");
-    }
-
-/* -------------------------------------------------------------------- */
 /*      Open the file.                                                  */
 /* -------------------------------------------------------------------- */
     psDBF = (DBFHandle) calloc( 1, sizeof(DBFInfo) );
-    psDBF->fp = fopen( pszDBFFilename, pszAccess );
+    psDBF->fp = fopen( pszFilename, pszAccess );
     if( psDBF->fp == NULL )
         return( NULL );
-
+	
     psDBF->bNoHeader = FALSE;
     psDBF->nCurrentRecord = -1;
     psDBF->bCurrentRecordModified = FALSE;
-
-    free( pszDBFFilename );
 
 /* -------------------------------------------------------------------- */
 /*  Read Table Header info                                              */
diff -uN pg.in.dbf.old/main.c pg.in.dbf/main.c
--- pg.in.dbf.old/main.c	Sun Oct 12 22:21:52 2003
+++ pg.in.dbf/main.c	Sun Oct 12 22:25:22 2003
@@ -1,28 +1,40 @@
 #include <stdio.h>
 #include <stdlib.h>
+#include <string.h>
 #include "gis.h"
 #include "shapefil.h"
 #include "glocale.h"
+#include "pgdump.h"
 
 /******************************************************************/
  /*02/2000 dbf dump to Postgres
     *   Alex Shevlakov sixote at yahoo.com
     ***************************************************************** */
-int PgDumpFromDBF(char *, int);
+/*
+ * 10/2003 Thierry Laronde tlaronde at polynum.org
+ *	- WARNING the default delimiter is now a tabulation, not a comma!
+ *	But this make only sense for PG super-user since this is only used
+ *	for COPY FROM command;
+ *	- added option "delimiter" to allow the explicit definition of 
+ *	delimiter (character) in replacement of the default '\t' ;
+ * 	- added option "null" (null_string) to allow the explicit definition
+ *	of a string that PG should consider the expression of a NULL value;
+ *	- change no_rattle to normal_user to let the variable match the name
+ *	found in PgDumpFromDBF and... replace the "special" assignement
+ *	- misc name changes and simplifications
+ *	- added comments
+ */
+
+#define ERR_OPTION	1
 
 int main(int argc, char *argv[])
 {
-
-
-    int no_rattle;
+    int normal_user;
     char *infile;
+	char delim;	/* Character to use as the DELIMITER */
+	short length;	/* Used when parsing the DELIMITER string */
 
-
-    struct
-    {
-	struct Option *input, *dumpmode;
-    }
-    parm;
+	struct Option *input, *dumpmode, *delimiter, *null_string;
 
     /* Are we running in Grass environment ? */
 
@@ -30,35 +42,84 @@
 
     /* define the different options */
 
-    parm.input = G_define_option();
-    parm.input->key = "input";
-    parm.input->type = TYPE_STRING;
-    parm.input->required = YES;
-    parm.input->description = _("Name of .dbf file to be imported");
-
-
-    parm.dumpmode = G_define_option();
-    parm.dumpmode->key = "dumpmode";
-    parm.dumpmode->type = TYPE_STRING;
-    parm.dumpmode->required = NO;
-    parm.dumpmode->description =
+	/* Get the dbf input file name */
+    input = G_define_option();
+    input->key = "input";
+    input->type = TYPE_STRING;
+    input->required = YES;
+    input->description = _("Name of DBF file to be imported");
+
+	/* The method to insert new values will change whether you are 
+	 * PG super-user or not  */
+    dumpmode = G_define_option();
+    dumpmode->key = "dumpmode";
+    dumpmode->type = TYPE_STRING;
+    dumpmode->required = NO;
+    dumpmode->description =
 	_("Admin/normal user dump mode (Default = Postgres super-user)");
-    parm.dumpmode->options = "admin,normal";
-    parm.dumpmode->answer     = "admin";
-
+    dumpmode->options = "admin,normal";
+    dumpmode->answer     = "admin";
 
+	/* If the fields one imports have the default delimiter embedded
+	 * in them, using this delimiter is doomed to make PG abort. So
+	 * allow the change.
+	 */
+	delimiter = G_define_option();
+	delimiter->key = "delimiter";
+	delimiter->type = TYPE_STRING;
+	delimiter->required = NO;
+	delimiter->description =
+		_("Delimiter (character) to use");
+	delimiter->answer = "\\t";
+	
+	/* 
+	 * Allow the specification of a NULL string
+	 */
+	null_string = G_define_option();
+	null_string->key = "null";
+	null_string->type = TYPE_STRING;
+	null_string->required = NO;
+	null_string->description =
+		_("String to use as NULL");
+	null_string->answer = "";
     /* get options and test their validity */
 
     if (G_parser(argc, argv))
 	exit(-1);
 
-    infile = parm.input->answer;
-    no_rattle = (int) parm.dumpmode->answer;
-
-
-    PgDumpFromDBF(infile, no_rattle);
+    infile = input->answer;
 
+	/* Transform the delimiter string in a single char */
+	length = strlen(delimiter->answer);
+	if (length == 1)
+		delim = delimiter->answer[0];
+	else if (length == 2 && delimiter->answer[0] == '\\') 
+		switch (delimiter->answer[1]) {
+		case 'a': delim = '\a'; break;
+		case 'b': delim = '\b'; break;
+		case 'f': delim = '\f'; break;
+		case 'n':
+			fprintf(stderr, _("Since we are feeding PG with lines, \\n is not a great idea my friend!\n"));
+			exit(ERR_OPTION);
+		case 'r': delim = '\r'; break;
+		case 't': delim = '\t'; break;
+		case 'v': delim = '\v'; break;
+		default:
+		fprintf(stderr,_("Invalid escaped sequence given for the delimiter!\n"));
+		exit(ERR_OPTION);
+	} else {
+		fprintf(stderr, 
+		  _("The delimiter must be a character (escaped sequences are OK)!\n"));
+		exit(ERR_OPTION);
+	}
+
+	/* To be or not to be (a super-user...) */
+	if (!strcmp(dumpmode->answer, "admin"))
+		normal_user = 0; /* Not human, superman! */
+	else 
+		normal_user = 1; /* sigh... Yes, just an average one */
 
+    PgDumpFromDBF(infile, normal_user, delim, null_string->answer);
 
     exit(0);
 }
diff -uN pg.in.dbf.old/pgdump.c pg.in.dbf/pgdump.c
--- pg.in.dbf.old/pgdump.c	Sun Oct 12 22:21:52 2003
+++ pg.in.dbf/pgdump.c	Sun Oct 12 22:08:23 2003
@@ -20,6 +20,18 @@
  * DEALINGS IN THE SOFTWARE.
  ******************************************************************************
  *
+ * 10/2003 Thierry Laronde <tlaronde at polynum.org> 
+	- change PgDumpFromDBF in order to specify a custom DELIMITER
+	- change DBFDumpASCII to take a parameter for specifying the
+	delimiter
+	- include "pgdump.h" for PgDumpFromDBF proto
+	- change "float4" to "numeric" in order to not loose in precision
+	XXX The infos I have found tell that DBF numeric field can be up
+	to 20 digits long; numeric by default is (30,6) BUT can the numeric
+	field of the DBF have more than a 6 scale length? In this case a
+	safe way would be to define the conversion to numeric(40,20) but
+	isn't it a bit too much?
+ *
  * 12/2000 Federico Ponchio ponhio at dm.unipi.it (minor changes to memory 
  *  allocation for inserting in normal mode)
  * 02/2000 Alex Shevlakov sixote at yahoo.com	
@@ -35,6 +47,7 @@
 #include "shapefil.h"
 #include <libpq-fe.h>
 #include "glocale.h"
+#include "pgdump.h"
 
 typedef unsigned char uchar;
 
@@ -54,6 +67,11 @@
 	fprintf(stderr, _("Failed to allocate new memory.\n"));
 	exit(-1);
     }
+	/* Initialize to null string or `strcat' used in `append' will 
+	 * concatenate every random data available here till it
+	 * finds some '\0' !
+	 */
+	*str->data = '\0'; 
     str->len = 1;
     str->totlen = 128;
     return (0);
@@ -108,10 +126,10 @@
 /************************************************************************/
 /*                          DBFDumpASCII()                          	*/
 /*                                                                      */
-/*     		 Dumps DBF to comma-separated list. 			*/
+/*     		 Dumps DBF to DELIMITER-separated list. 			*/
 /************************************************************************/
 
-int DBFDumpASCII(DBFHandle psDBF, FILE * fp)
+int DBFDumpASCII(DBFHandle psDBF, FILE * fp, char delim)
 {
     int nRecordOffset;
     uchar *pabyRec;
@@ -123,6 +141,10 @@
     static int nStringFieldLen = 0;
     static char single_line[4096] = "";
 
+	/* transform the delim char in a string for insertion via %s */
+	char delim_string[2]; 
+	delim_string[0] = delim;
+	delim_string[1] = '\0';
 
     for (hEntity = 0; hEntity < psDBF->nRecords; hEntity++) {
 
@@ -177,9 +199,9 @@
 	    if (!iField)
 		snprintf(tmp_buf, 1024, "%s", pszStringField);
 	    else if (iField == psDBF->nFields - 1)
-		snprintf(tmp_buf, 1024, ",%s\n", pszStringField);
+		snprintf(tmp_buf, 1024, "%s%s\n", delim_string, pszStringField);
 	    else
-		snprintf(tmp_buf, 1024, ",%s", pszStringField);
+		snprintf(tmp_buf, 1024, "%s%s", delim_string, pszStringField);
 
 
 	    strncat(single_line, tmp_buf, strlen(tmp_buf));
@@ -200,7 +222,7 @@
     return 0;
 }
 
-int PgDumpFromDBF(char *infile, int normal_user)
+int PgDumpFromDBF(char *infile, int normal_user, char delim, const char *null_string)
 {
 
     DBFHandle hDBF;
@@ -208,6 +230,7 @@
 
     int i, j;
     char *dbname, *pp;
+					
 
     struct my_string SQL_create;
     struct my_string SQL_insert;
@@ -223,12 +246,16 @@
     FILE *fp;
     char *tmpfile_nm;
 
+	/* transform the delim char in a string for insertion via %s */
+	char delim_string[2]; 
+	delim_string[0] = delim;
+	delim_string[1] = '\0';
+
     init(&SQL_create);
     init(&SQL_insert);
     init(&chunks);
     init(&fldstrng);
 
-
     /* Check DATABASE env variable */
     if ((dbname = G__getenv("PG_DBASE")) == NULL) {
 	fprintf(stderr,
@@ -280,7 +307,7 @@
 		fld = "int8";
 	    break;
 	case 2:
-	    fld = "float4";
+	    fld = "numeric";
 	    break;
 	case 3:
 	    G_fatal_error(_("Invalid field type - bailing out"));
@@ -383,7 +410,7 @@
 	    exit(-1);
 	}
 
-	DBFDumpASCII(hDBF, fp);
+	DBFDumpASCII(hDBF, fp, delim);
 
 	fclose(fp);
 
@@ -392,7 +419,11 @@
 	append(&SQL_insert, name);
 	append(&SQL_insert, " from '");
 	append(&SQL_insert, tmpfile_nm);
-	append(&SQL_insert, "' using delimiters ','");
+	append(&SQL_insert, "' using delimiters '");
+	append(&SQL_insert, delim_string);
+	append(&SQL_insert, "' with null as '");
+	append(&SQL_insert, null_string);
+	append(&SQL_insert, "'");
 
 	fprintf(stdout, _("Executing %s\n"), SQL_insert.data);
 
diff -uN pg.in.dbf.old/pgdump.h pg.in.dbf/pgdump.h
--- pg.in.dbf.old/pgdump.h	Thu Jan  1 01:00:00 1970
+++ pg.in.dbf/pgdump.h	Sun Oct 12 21:17:33 2003
@@ -0,0 +1,6 @@
+#ifndef PGDUMP_H
+#define PGDUMP_H
+
+int PgDumpFromDBF(char *filename, int normal_user, char delim, const char *null_string);
+
+#endif
-------------- next part --------------
--- /home/tlaronde/src/grass50_exp_2003_09_06/html/html/pg.in.dbf.html	Fri May 24 05:12:22 2002
+++ html/html/pg.in.dbf.html	Sun Oct 12 23:00:44 2003
@@ -17,20 +17,36 @@
 <b>pg.in.dbf</b>
 <br><b>pg.in.dbf help</b>
 <br><b>pg.in.dbf input</b><i>=name</i> [<b>dumpmode</b><i>=name</i>]
+[<b>delimiter</b><i>=char</i>] [<b>null</b><i>=string</i>]
 <h2>
 DESCRIPTION</h2>
 <i>pg.in.dbf</i> imports a dBase database table into PostgreSQL for later
 use with GRASS-RDBMS modules.
-<p>You can also choose to dump data from the dbf file into text format
-file for further use. In such case,
-<br>you should type in the full Unix path to the new text file location
-when prompted.
-<p>The program can run in two modes: 'admin' is when the GRASS user has
-a superuser account in Postgres,
+
+<p>The program can run in two modes: 
+<br>
+<br>'admin' is when the GRASS user has a superuser account in Postgres,
 <br>'normal'&nbsp; - when the Postgres user is normal (not administrator).
 You should choose the mode adequately.
 <br>In the latter case, large files can take much time to import (as each
 line would be inserted separately).
+<p>When you have PostgreSQL admin privileges, the import of the DBF
+database is done via the <tt>COPY FROM</tt> command. This allows to
+tune several options :
+<br>
+<br>You will be asked if you want to save the dump of the data 
+from the dbf file into text format. In this case you <b>MUST</b>
+give an absolute path name, and you need, obviously, to have 
+the corresponding rights ;
+<br>
+<br>You can also specify a custom <b>delimiter</b> to be used to
+separate the fields (the default for the COPY FROM command is the
+tabulation);
+<br>
+<br>And finally you can specify a custom <b>null</b> string i.e. a
+string PostgreSQL has to interpret as a NULL value (the default is the
+empty string, meaning that void fields will be considered null).
+
 <h2>
 COMMAND LINE OPTIONS</h2>
 
@@ -42,13 +58,31 @@
 <b>input=name</b></dt>
 
 <dd>
-Name of .dbf file to be imported (example: 'simple.dbf'</dd>
+Name of .dbf file to be imported (example: 'simple.dbf')</dd>
 
 <dt>
 <b>dumpmode=name</b></dt>
 
 <dd>
-Admin/normal user dump mode (Default = Postgres super-user)</dd>
+Admin/normal user dump mode [Default = Postgres super-user]</dd>
+
+<dt>
+<b>delimiter=char</b>
+</dt>
+
+<dd>
+Specify char as the delimiter (escaped sequences are allowed). 
+<br>[Default = tabulation '\t']
+</dd>
+
+<dt>
+<b>null=string</b>
+</dt>
+
+<dd>
+Specify the string PostgreSQL will have to consider as a NULL value.
+[Default = empty string ""]
+</dd>
 
 <br>&nbsp;
 <p>&nbsp;</dl>
@@ -70,5 +104,6 @@
 <h2>
 AUTHOR</h2>
 Alex Shevlakov (sixote at yahoo.com)
+<br>Some additions and fixes by Thierry Laronde (tlaronde at polynum.org)
 </body>
 </html>


More information about the grass-dev mailing list