[GRASS-dev] [GRASS GIS] #2692: v.in.ascii does not handle text in qoutes
GRASS GIS
trac at osgeo.org
Thu Jun 18 00:54:29 PDT 2015
#2692: v.in.ascii does not handle text in qoutes
-------------------------+-------------------------------------------------
Reporter: wenzeslaus | Owner: grass-dev@…
Type: defect | Status: new
Priority: normal | Milestone: 7.0.1
Component: Default | Version: svn-trunk
Resolution: | Keywords: CSV, doublequote, singlequote, text
CPU: | delimiter
Unspecified | Platform: Unspecified
-------------------------+-------------------------------------------------
Comment (by mlennert):
Replying to [comment:6 glynn]:
> Replying to [comment:3 glynn]:
>
> > For that, an explicit state machine is likely to be more legible than
ad-hoc logic.
>
> Please test attachment:tokenise.diff
Great, thanks !
I propose two small changes (attached tokenise_corrected.diff), one seems
just a typo (in case A_END_RECORD: "*q++ - '\0';") and the other comes
from the fact that when we are in state AFTER_QUOTE and we reach a
delimiter, we have to go back to state S_START. Otherwise if the next
field starts again with a quote, this quote is treated as a second quote.
Using the following example:
{{{
echo "123|123|1|test1|'test2'|'\"test3\"'|'test''4'" | v.in.ascii in=-
out=testtext text=singlequote --o
}}}
With your patch:
{{{
> v.db.select testtextcat|int_1|int_2|int_3|str_1|str_2|str_3|str_4
1|123|123|1|test1|test2|'"test3"|'test'4t''4'
}}}
With the correction:
{{{
> v.db.select testtextcat|int_1|int_2|int_3|str_1|str_2|str_3|str_4
1|123|123|1|test1|test2|"test3"|test'4
}}}
>
> An external library may be worth using for improved fault-tolerance (CSV
is a rather loose "standard", to say the least). But any such dependency
should be
> a. on specific modules (e.g. v.in.ascii), not lib/gis (i.e. G_tokenize),
and
> b. an optional alternative to G_tokenize(), i.e. modules should still
compile and work if the library isn't available.
>
> Python is far too heavyweight a dependency for such a task.
Well, I thought about a new module v.in.ascii2/v.in.csv which would be
based on the Python csv module. As Python is a dependency anyway so on
module level, this shouldn't be a problem. But I think that with your
patch this particular bug is solved, and that we can leave handling of
more complex csv files to other tools which people can use to prepare the
data for v.in.ascii.
--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2692#comment:7>
GRASS GIS <http://grass.osgeo.org>
More information about the grass-dev
mailing list