[GRASS-dev] Python and script header definitions for modules

Tue Jun 15 20:53:20 PDT 2021

On Sun, Jun 13, 2021 at 11:02 AM Huidae Cho <grass4u at gmail.com> wrote:

>
> Option 3 is readily available with no changes in the core library, which
> is great. It just takes mental effort to add the extra comment (# noqa:
> E501, especially this number, I'll keep forgetting it). Does it even solve
> no space between # and %, I hope?
>

In that case, you can write `#%` and no checks will care. When it is in a
docstring, we could drop the `#` part too. Or you somehow mark the
beginning and optionally the end and you can drop `%` or `% ` too, getting
close to YAML in the basic syntax (not in the value nesting or indentation).

You need `# noqa: E501` only when your lines are long, which is not always.
It should/would be part of submitting guidelines for Python code, so then
there would be no harm done if one needs to review those during writing of
a module. :-)

> Like option 5 as well, but can we embed YAML code as Python comments, not
> as a separate file, which I don't like. Maybe, as option 6, we write a YAML
> file and create a small utility that translates it to the current parser
> format in option 3 and replace it in the Python script? Maybe, then, some
> developers just write header definitions manually without YAML at all and
> running the extra script is an additional burden.
>

I'm not sure what option you prefer, even if you want to write YAML or not,
but YAML can be part of the comment, docstring, or a separate file. Keeping
it in the Python file has its advantages like you always know you have the
file and you can programmatically access it from there. Generating it and
placing it into the file is quite doable, but R packages work in the way
that many of the files in your package repo are generated and it becomes
quite hard to navigate I think, unless you are very familiar with it.

Python script itself is actually calling the g.parser module, so with a
proper g.parser interface and wrappers in grass.script, we can be quite
flexible in what is accepted without modifying g.parser that much. I'm
thinking here of having the YAML as a(any type of) string in Python and
then passing it to `grass.script.parse_cli_yaml()`. Let's call this option
7. It is kind of 3 and 5 together but with most YAML burden in Python and
maybe slower parsing. The nice thing is that you need the same
YAML-to-current-parser as for option 6, so doing the conversion part does
not commit you to one solution or the other.

> Option 4 may not be too bad. We can just concatenate any lines with more
> than a certain number (4?) of leading spaces to the current definition.
>

>From what I have seen in the source code, this is not a trivial change.

In syntax, it is again going closer to YAML which would be `|` instead of
the value for the first/zeroth row and typically 2 spaces for indent.

> My biggest complaint is "#-space-%-space-key:-space-value". I just have to
> type too many spaces manually.
>

If you write it like that, it seems really frightening, but `# % key:
value` doesn't look that bad. I'm copy-pasting these a lot and I think a
lot of people do, so a space here and there is not a big deal. Since the
space between % and key is optional, you can also see a lot of
inconsistencies related to that. For typing, I don't like the whole `# % `
piece with spaces or not.

I think we would design the YAML structure so that a typical line would be
`  key: value`, i.e., "<space><space>key:<space>value" assuming it is in a
separate file or string, not a comment.

> We could write a small script that handles this, but again, that's an
> extra effort. Is it possible to create hooks so we checkout "#%" and commit
> "#-space-%" automatically?
>

Perhaps a sed one-liner is all that is needed? This looks like what I used
to convert things in the source code:

sed -i 's/^#%/# %/g' */*.py

Significantly simpler than transitioning to YAML in a docstring, but that
would have many other benefits.

> Maybe, even long definitions can be handled in the same way?
>

Seems like too much work, too fragile. I'm afraid that in any scenario,
people will have to break their lines manually as they need to do, e.g.,
with strings in Python and C.

Best,
Vaclav

>
> On Tue, Apr 20, 2021 at 11:11 PM Vaclav Petras <wenzeslaus at gmail.com>
> wrote:
>
>>
>> I would like to disable the check in each file just for the specific
>> block of code, however, this is not possible because Flake8 does not allow
>> disabling for blocks of code. Requiring the lines to be shorter won't work
>> either because the descriptions item needs to be long. This leaves us with
>> the following options:
>>
>> 1. Use per-file ignores to disable the warning and keep adding the files
>> which need it. This makes the Flake8 configuration larger over time while
>> our goal is to make it smaller over time. It also leaves the warning
>> disabled for (other code in) these files.
>>
>> 2. Add inline Flake8 ignore comment to the offending line. This will make
>> the line little longer, but it would be a good solution for normal Python
>> code. However, in case of the script header, we would need to teach
>> g.parser to understand trailing comments inside the relevant fields so that
>> the Flake8 ignore comment does not leak into the user interface description.
>>
>> 3. According to the PEP 257 - Docstring Conventions document, the
>> 'docstring of a script (a stand-alone program) should be usable as its
>> "usage" message.' I don't think sticking something like our parser
>> instructions into the docstring was what the authors had in mind.
>> Additionally, it is not used like this either as far as I can tell.
>> However, it would solve our issue. Just adding `"""` before the definition
>> and adding `"""  # noqa: E501` after that disables the warning for the
>> definition. The nice bonus is that we comply with PEP 257 by providing
>> module docstring and by describing its interface there (in some way). The
>> docstring presence is checked by Pylint's C0111 "Missing ... docstring".
>>
>> 4. We modify the parser so that at least some of the items can have
>> multiple lines. However, the parser is currently quite line-oriented and
>> the cost-benefit ratio may be low.
>>
>> 5. We change the script header definition format to some existing format
>> that can break lines, i.e. allowing multi-line values. A clear candidate
>> for the format is YAML or rather its simpler subset. This would have
>> additional benefits of making the format a standard format. Which in turn
>> would be beneficial for other things, e.g., for easier learning of the
>> syntax. Combining this with option 3, we could drop the `# %` part to make
>> the YAML more readily readable.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20210615/7cb26d9f/attachment.html>