[gdal-dev] GSoC Proposal Idea: Strengthening GDAL Python Stub Generation via Runtime Consistency Validation

Mon Mar 9 08:21:39 PDT 2026

I also wanted to ask is GDAL planning to participate under the OSGeo GSoC
2026 umbrella this year? I noticed there isn't a GDAL-specific ideas page
yet and wanted to check before finalizing my proposal. I have also shared
my introduction and proposal direction on the OSGeo Discourse here:
https://discourse.osgeo.org/t/introduction-sionigdha-sadhukhan-gsoc-2026-proposal-gdal-python-stub-hardening-runtime-validation-type-coverage/152757

If any maintainer would be open to mentoring or giving feedback on the
proposal draft, I would be very grateful.

On Tue, 3 Mar 2026 at 18:46, Sionigdha Sadhukhan <snigdha.lee75 at gmail.com>
wrote:

> Hello GDAL developers,
>
> Over the past weeks, while contributing to GDAL and working on Python
> binding-related issues and PRs, I have been studying the current Python
> stub generation pipeline in detail. In particular, I explored the docstub integration
> and the implementation in _analysis.py, _docstrings.py, and _stubs.py,
> along with recent PRs related to docstring cleanup and stub generation.
>
> From examining the code, I understand that:
>
>    -
>
>    .pyi files are generated entirely from docstrings using a custom Lark
>    grammar.
>    -
>
>    Type resolution is handled through TypeMatcher and import
>    reconstruction.
>    -
>
>    Unresolved types fall back to _typeshed.Incomplete.
>    -
>
>    There is currently no mechanical validation step ensuring that
>    generated stubs remain consistent with the actual runtime callable
>    signatures produced by SWIG.
>
> This means the stub layer is structurally decoupled from the runtime
> bindings, and drift between:
>
> C++ → SWIG → Python runtime → docstrings → generated stubs
>
> is theoretically possible without automated detection.
>
> For GSoC, I would like to explore a project focused on hardening and
> modernizing this pipeline through runtime–stub consistency validation and
> stricter enforcement mechanisms.
>
> A possible scope could include:
>
> *Runtime–Stub Signature Validator*
>
>    -
>
>    Import osgeo modules and inspect public callables using
>    inspect.signature().
>    -
>
>    Parse generated .pyi files.
>    -
>
>    Detect mismatches in parameter names, counts, defaults, and return
>    presence.
>    -
>
>    Produce structured reports of inconsistencies.
>
> *Stricter Stub Generation Mode*
>
>    -
>
>    Optionally fail (or emit stronger diagnostics) on unresolved types
>    instead of silently aliasing to _typeshed.Incomplete.
>    -
>
>    Provide measurable metrics on annotation coverage and unresolved types.
>
> *CI Integration*
>
>    -
>
>    Integrate validation checks into CI to prevent silent drift over time.
>    -
>
>    Keep the approach incremental and compatible with the existing
>    docstring-driven workflow.
>
> The goal would not be to redesign SWIG bindings or replace the current
> system, but to introduce a validation and enforcement layer that increases
> confidence in typing correctness, IDE support, and long-term
> maintainability of the Python bindings.
>
> Before developing this into a formal proposal, I would really appreciate
> feedback on:
>
>    -
>
>    Whether runtime–stub consistency validation aligns with current Python
>    binding priorities.
>    -
>
>    Whether there are known constraints or prior efforts in this direction.
>    -
>
>    Whether this scope would be appropriate and realistic for a GSoC
>    project.
>
> Thank you very much for your time. I would be happy to refine or narrow
> this idea based on feedback.
>
> Best regards,
> Sionigdha
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20260309/896c9cf4/attachment-0001.htm>