[gdal-dev] GSoC Proposal Idea: Strengthening GDAL Python Stub Generation via Runtime Consistency Validation
Sionigdha Sadhukhan
snigdha.lee75 at gmail.com
Mon Mar 9 08:21:39 PDT 2026
I also wanted to ask is GDAL planning to participate under the OSGeo GSoC
2026 umbrella this year? I noticed there isn't a GDAL-specific ideas page
yet and wanted to check before finalizing my proposal. I have also shared
my introduction and proposal direction on the OSGeo Discourse here:
https://discourse.osgeo.org/t/introduction-sionigdha-sadhukhan-gsoc-2026-proposal-gdal-python-stub-hardening-runtime-validation-type-coverage/152757
If any maintainer would be open to mentoring or giving feedback on the
proposal draft, I would be very grateful.
On Tue, 3 Mar 2026 at 18:46, Sionigdha Sadhukhan <snigdha.lee75 at gmail.com>
wrote:
> Hello GDAL developers,
>
> Over the past weeks, while contributing to GDAL and working on Python
> binding-related issues and PRs, I have been studying the current Python
> stub generation pipeline in detail. In particular, I explored the docstub integration
> and the implementation in _analysis.py, _docstrings.py, and _stubs.py,
> along with recent PRs related to docstring cleanup and stub generation.
>
> From examining the code, I understand that:
>
> -
>
> .pyi files are generated entirely from docstrings using a custom Lark
> grammar.
> -
>
> Type resolution is handled through TypeMatcher and import
> reconstruction.
> -
>
> Unresolved types fall back to _typeshed.Incomplete.
> -
>
> There is currently no mechanical validation step ensuring that
> generated stubs remain consistent with the actual runtime callable
> signatures produced by SWIG.
>
> This means the stub layer is structurally decoupled from the runtime
> bindings, and drift between:
>
> C++ → SWIG → Python runtime → docstrings → generated stubs
>
> is theoretically possible without automated detection.
>
> For GSoC, I would like to explore a project focused on hardening and
> modernizing this pipeline through runtime–stub consistency validation and
> stricter enforcement mechanisms.
>
> A possible scope could include:
>
> *Runtime–Stub Signature Validator*
>
> -
>
> Import osgeo modules and inspect public callables using
> inspect.signature().
> -
>
> Parse generated .pyi files.
> -
>
> Detect mismatches in parameter names, counts, defaults, and return
> presence.
> -
>
> Produce structured reports of inconsistencies.
>
> *Stricter Stub Generation Mode*
>
> -
>
> Optionally fail (or emit stronger diagnostics) on unresolved types
> instead of silently aliasing to _typeshed.Incomplete.
> -
>
> Provide measurable metrics on annotation coverage and unresolved types.
>
> *CI Integration*
>
> -
>
> Integrate validation checks into CI to prevent silent drift over time.
> -
>
> Keep the approach incremental and compatible with the existing
> docstring-driven workflow.
>
> The goal would not be to redesign SWIG bindings or replace the current
> system, but to introduce a validation and enforcement layer that increases
> confidence in typing correctness, IDE support, and long-term
> maintainability of the Python bindings.
>
> Before developing this into a formal proposal, I would really appreciate
> feedback on:
>
> -
>
> Whether runtime–stub consistency validation aligns with current Python
> binding priorities.
> -
>
> Whether there are known constraints or prior efforts in this direction.
> -
>
> Whether this scope would be appropriate and realistic for a GSoC
> project.
>
> Thank you very much for your time. I would be happy to refine or narrow
> this idea based on feedback.
>
> Best regards,
> Sionigdha
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20260309/896c9cf4/attachment-0001.htm>
More information about the gdal-dev
mailing list