<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Parsing CSV files is one of the nastiest computing problems around. Very frequently, CSV files will have unparsable lines. Ogr2ogr is not going to solve this problem for you - it will only consume Correctly Formatted CSV files.<div><br></div><div>Before you can even get around to handling the problem of unparsable lines, oftentimes a character set conversion is required. There are, unfortunately, way too many folks who publish CSV files, shapefiles, and SQL dumps which contain UTF-8 multibyte encoding sequences saved in ISO-8859-1 file encoding. I need to run iconv on roughly 90% of the shapefiles I load with shp2pgsql - generally any shapefile (or CSV or SQL dump) which was produced by a North American or Western European person which contains international data. This class of folk seem to believe that since ISO 8859-1 or ISO 8859-15 works for their own character set, it works for the entire world. In 2013, there is absolutely no reason for anyone on this planet to be encoding in something other than UTF-8 - disk space and bandwidth is cheap enough now and in the areas of the world where it's not yet cheap enough, UTF-8 is the only choice anyway.</div><div><br></div><div>What causes unparsable lines in CSV? Quotes where there aren't supposed to be, missing quotes,missing fields, ambiguously utilised and unescaped delimiter characters, etc. Manual correction is difficult when you are handling, for example, an 80 thousand line file.</div><div><br></div><div>Here is a tool I wrote to fix CSV files from one particularly nasty source. It changes a file delimited by commas into a file delimited by tabs, as well as correcting a whole host of other common problems. I have found that it works quite well, in general, for multiple sources of nastily encoded CSV files.</div><div><br></div><div><br></div><div>Use it like this:</div><div><br></div><div>fix-csv.pl nasty.csv > fixed.csv</div><div><br></div><div><br></div><div><br></div><div>#!/usr/bin/perl -w</div><div><br></div><div><div>while (<>)</div><div>{</div><div> # 1. remove ^M</div><div> $_ =~ s/\r//g;</div><div><br></div><div> # 2. change commas at beginning of line to tabs</div><div> $_ =~ s/^,/\t/;</div><div><br></div><div> # 3. change "," to "\t" ("tab")</div><div> $_ =~ s/","/"\t"/g;</div><div><br></div><div> # 4. change ", to \t (tab)</div><div> $_ =~ s/",/\t/g;</div><div><br></div><div> # 5. change ," to \t (tab)</div><div> $_ =~ s/,"/\t/g;</div><div><br></div><div> # 6. change \t, to \t\t (double tab)</div><div> $_ =~ s/\t,/\t\t/g;</div><div><br></div><div> # 7. change \t, to \t\t (double tab)</div><div> $_ =~ s/\t,/\t\t/g;</div><div><br></div><div> # 8. change \t, to \t\t (double tab)</div><div> $_ =~ s/\t,/\t\t/g;</div><div><br></div><div> # 9. change \t, to \t\t (double tab)</div><div> $_ =~ s/\t,/\t\t/g;</div><div><br></div><div> # 10. remove quotes</div><div> $_ =~ s/"//g;</div><div><br></div><div> print $_;</div><div>}</div><div><br></div></div><div><br></div><div>-mike</div><div><br></div><div><br></div><div><br><div><br></div><div><div><br></div><div><br></div><div><br><div><div>On Apr 13, 2013, at 10:41 PM, Nathan Hemenway <<a href="mailto:nhemenway@kksbolash.com">nhemenway@kksbolash.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<div text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">As Richard Greenwood noted, ogr2ogr
works great for importing CSV files into Postgres tables.<br>
In fact, your CSV file does not necessarily even need to have any
geometry related columns for this to work.<br>
<br>
It is all documented here very nicely:<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.gdal.org/ogr/drv_csv.html">http://www.gdal.org/ogr/drv_csv.html</a><br>
<br>
<br>
<br>
On 4/13/2013 5:54 AM, Margie Roswell wrote:<br>
</div>
<blockquote cite="mid:CAPAc6WFQdCtRtu5t2shxLAc=br5mJXBMd+XyZiCkogZOX4PupA@mail.gmail.com" type="cite">I figured out that COPY is used to import a file into
a table.<br>
<br>
(Actually, even though I don't speak a word of Portuguese, a
Portuguese video did a great job of showing copying first into a
temp table: <a moz-do-not-send="true" href="https://www.youtube.com/watch?v=CwsnPPub9v4">https://www.youtube.com/watch?v=CwsnPPub9v4</a>
)<br>
<br>
But the shp2pgsql thread yesterday got me thinking: to import a
shapefile, they've created a utility so that <i>we don't have to
set up the structure of the table in advance</i><br>
<br>
Is there something similar on the CSV side?<br>
<br>
My guess is that <a moz-do-not-send="true" href="http://www.safe.com/solutions/for-databases/postgis/">http://www.safe.com/solutions/for-databases/postgis/</a><br>
might have something, but I can't quite put my finger on it.<br>
<br>
Details on that? <br>
<br>
Also, I'm sure there's a fee for that. Are there any other
strategies for making the table creation more efficient, when
importing a file to a table?<br>
<br>
I suppose I could copy and paste the field names from the top row
in the original Excel spreadsheet, and then manually reformat them
into a CREATE NEW TABLE statement by adding all the field types.
What strategies (like the shp2pgsql utility?) reduce the pain of
importing a text file?<br>
<br>
Margie<br>
<br clear="all">
<div>
<div>--</div>
<div><a moz-do-not-send="true" href="http://farmbillprimer.org/" target="_blank">http://FarmBillPrimer.org</a></div>
<a moz-do-not-send="true" href="http://www.baltimoreurbanag.org/" target="_blank">http://www.BaltimoreUrbanAg.org</a> (Please
send events; This site is hungry.)<br>
<a moz-do-not-send="true" href="http://www.excellentnutrition.org/" target="_blank">http://www.ExcellentNutrition.org</a><br>
<a moz-do-not-send="true" href="http://www.packtpub.com/drupal-5-views-recipes/book" target="_blank">http://www.packtpub.com/drupal-5-views-recipes/book</a><br>
</div>
<br>
<br>
<div class="gmail_quote">On Fri, Apr 12, 2013 at 6:14 PM, David
Rush <span dir="ltr"><<a moz-do-not-send="true" href="mailto:david@rushtone.com" target="_blank">david@rushtone.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Total noob to PostgreSQL and PostGIS here.
Trying to follow examples from the Obe+Hsu book (1st Ed) in
using shp2pgsql from the command line to import some tiger
county data.
<div><br>
</div>
<div>I ran this:</div>
<div><br>
</div>
<div>shp2pgsql -s 4269 -g geom_4269 -W LATIN1
c:/users/david/downloads/tl_2012_us_county/tl_2012_us_county.shp
<a href="http://public.us">public.us</a>_counties psql -h localhost -U postgres -p 5432
-d mygisdb <br>
</div>
<div>
<br>
</div>
<div>Thanks to an archive of this list that led me to add
the "-W LATIN1" param (it was failing with an error w/out
it).</div>
<div><br>
</div>
<div>Now the command runs for several minutes, spitting out
mostly zillions of hex digits, with no overt errors. Last
line it spits out is "COMMIT;".</div>
<div><br>
</div>
<div>But when I go into psql, I can't find the
<a href="http://public.us">public.us</a>_counties table that I thought I just added
created:</div>
<div><br>
</div>
<div>
<div>mygisdb=# select * from <a href="http://public.us">public.us</a>_counties;</div>
<div>ERROR: relation "<a href="http://public.us">public.us</a>_counties" does not exist</div>
<div>LINE 1: select * from <a href="http://public.us">public.us</a>_counties;</div>
<div> ^</div>
<div>mygisdb=# select table_schema, table_name,table_type
from information_schema.tables where</div>
<div>table_schema not in
('pg_catalog','information_schema');</div>
<div> table_schema | table_name | table_type</div>
<div>--------------+-------------------+------------</div>
<div> public | geography_columns | VIEW</div>
<div> public | geometry_columns | VIEW</div>
<div> public | spatial_ref_sys | BASE TABLE</div>
<div> ch01 | lu_franchises | BASE TABLE</div>
<div> ch01 | fastfoods | BASE TABLE</div>
<div>(5 rows)</div>
<div><br>
</div>
<div>Poking around with pgAdmin III I can't find in
anywhere, either.</div>
<div><br>
</div>
<div>Is the new table us_counties hiding somewhere? Or
did it quietly fail? Or what?</div>
<span class="HOEnZb"><font color="#888888">
<div><br>
</div>
<div>David</div>
</font></span></div>
</div>
<br>
_______________________________________________<br>
postgis-users mailing list<br>
<a moz-do-not-send="true" href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br>
<a moz-do-not-send="true" href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br>
<br>
</blockquote>
</div>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
postgis-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a>
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
.nathan.</pre>
</div>
_______________________________________________<br>postgis-users mailing list<br><a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br>http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users<br></blockquote></div><br></div></div></div></body></html>