[QGIS-Developer] QGIS repository management

Julien Cabieces julien.cabieces at oslandia.com
Wed Oct 18 00:01:38 PDT 2023


> Is there anything I've missed?


A missing external ressource (OTB, oracle binaries...) that we get with curl. This one for
instance recently :
https://github.com/qgis/QGIS/actions/runs/6549790513/job/17793674746?pr=54963

Maybe we could store those ressources somewhere (a github repository for
instance?) to avoid those errors.

> network access manager test: depends on the badssl external website,
>  which goes down occasionally. We could potentially run this site
>  ourselves as part of the CI setup, but that's trading the dependence
>  on an external service for added CI maintenance burden on ourselves.
>

That's what we do for webdav and minio tests :
https://github.com/qgis/QGIS/blob/master/.docker/docker-compose-testing.yml#L14

I don't know much about bad ssl returned response but is it that much
complicated to configure ?

Regards,
Julien


> On Tue, Oct 17, 2023 at 9:19 PM Nyall Dawson via QGIS-Developer <qgis-developer at lists.osgeo.org> wrote:
>
>  >
>  > IMHO, the main issue here is that the CI is too often broken for no
>  > reasons related to the PR content. And most of the time recently, it's
>  > because of one of the mingw jobs.
>
>  I suggest we move this conversation into more practical areas and
>  focus instead on the actual CI issues.
>
>  In my experience there's a couple of main issues here:
>
>  1. The mingw build fails often with the error:
>
>  cd /__w/QGIS/QGIS/build_mingw64/resources && /usr/bin/cmake -E copy
>  /__w/QGIS/QGIS/resources/srs6.db
>  /__w/QGIS/QGIS/build_mingw64/resources/srs.db
>  make[2]: write error: stdout
>
>  It always succeeds if you restart the job though. I don't know what's
>  causing this... a disk space issue? (but then why would it succeed on
>  the following attempt?)
>
> mingw runs on ubuntu-latest within a fedora container. ubuntu images come with plenty of bloat preinstalled that can be easily deleted.
> cf. https://github.com/easimon/maximize-build-space/blob/master/action.yml#L114-L130
> I just don't know how to do that if a "container" is specified as the delete action will need to happen on the base system.
>  
>  
>  2. The msys2 build can take HOURS (5+ in some cases). The qt 5/ qt6
>  builds also see occasional extremely slow builds.
>
>  I think the root cause of this is that we've exhausted all the
>  available space allowed for caching objects on github. All the builds
>  are trying to cache things to speed up the workflows, but the
>  different caches are randomly getting deleted because together they've
>  well exceeded the total allowed size. The solutions I could see here
>  are:
>
>  - approaching github to ask nicely for more space
>  - dropping one or more workflows. Maybe we could combine the msys2 and
>  mingw workflow into one to cut out one workflow. But that's blocked
>  because the msys2 workflow currently has a lot of things disabled
>  (including python), so it's currently not a suitable replacement for
>  the mingw builds which give end users an easy way to test PRs on
>  windows.
>
> What is the advantage of the msys2 workflow?
> It does "some build on windows" but IIUC does not replicate the actual windows build using OSGEO4W. Is there something else that it brings to
> the mix?
>  
>  
>  3. Random test failures. Main culprits are:
>  - ept provider test (but I **think** I've fixed this with
>  improvements/fixes made over the 3.34 bug fixing period). At least, I
>  don't recall seeing this in the last week, when it used to be a daily
>  random fail.
>  - raster file writer test: randomly fails for an unknown reason. The
>  test passes valgrind/asan fine, so I suspect the random failures are
>  caused by some interplay with other tests modifying data in the test
>  data folder. More work is needed.
>  - raster analysis utils test: same as above.
>  - network access manager test: depends on the badssl external website,
>  which goes down occasionally. We could potentially run this site
>  ourselves as part of the CI setup, but that's trading the dependence
>  on an external service for added CI maintenance burden on ourselves.
>
>  4. Docker/ubuntu repository fails. This happens occasionally -- the
>  external repos will have transient connectivity issues. Not much we
>  can do here except manually re-running the jobs.
>
>  Is there anything I've missed?
>
> There is also the occasional "QGIS is not compatible with the bleeding edge [sip, name your dependency] version that [rolling distro] has
> decided to start shipping today".
>
> Matthias
>  
>  
>  Nyall
>
>  >
>  > Like proposed by Matthias, I would be very much in favor in modifying
>  > our CI jobs to rely on non-rolling release distribution.
>  >
>  > Regards,
>  > Julien
>  >
>  >
>  > > In my experience, the peer reviews have proven to be an effective tool to improve the code quality.
>  > > I think it can be explained with a four eyes principle. The first two eyes are involved in writing the code, the second pair of eyes that
>  validates
>  > > needs to be a trusted pair.
>  > > For a code base of the maturity of QGIS, this seems to me to be justified and reasonable.
>  > >
>  > > As for who can do a review, a potential conflict of interest has been raised. This arises most noticeable within a single company, but can
>  also
>  > > very well be the case whenever a collaboration occurs between multiple individuals or if a well-connected client is involved. In my
>  experience, in
>  > > most cases people will first ask the person that they trust to be most qualified to review a specific area of the code, regardless the
>  organisation
>  > > they work for. In some cases, this may be your colleagues. What I and others have done in the past in such cases, is to leave an
>  additional review
>  > > window of a couple of days before merging to give everyone the opportunity to jump in if they wanted to, just to be sure to play it fair.
>  In my
>  > > experience, this worked quite nicely, but I am also open to other approaches.
>  > >
>  > > As for failing tests, I think one possibly low hanging fruit would be to switch all CI workflows to be based on non-rolling distributions.
>  And we
>  > > could potentially also be a bit more strict with disabling flaky tests or making them optional.
>  > >
>  > > Best wishes
>  > > Matthias
>  > >
>  > > On Tue, Oct 17, 2023 at 9:47 AM Sandro Santilli via QGIS-Developer <qgis-developer at lists.osgeo.org> wrote:
>  > >
>  > >  On Tue, Oct 17, 2023 at 03:08:34PM +1300, Nyall Dawson via QGIS-Developer wrote:
>  > >  > On Tue, 17 Oct 2023 at 03:41, Sandro Santilli <strk at kbt.io> wrote:
>  > >  > >
>  > >  > > On Mon, Oct 16, 2023 at 09:59:35AM +1300, Nyall Dawson via QGIS-Developer wrote:
>  > >  > >
>  > >  > > > If you flip the situation, you'll see that yes, you do have trust!
>  > >  > > >
>  > >  > > > - a complete stranger CANNOT approve their own changes
>  > >  > > > - a complete stranger CANNOT approve other stranger's changes
>  > >  > > > - a complete stranger CANNOT approve an approved member's changes
>  > >  > > >
>  > >  > > > vs
>  > >  > > >
>  > >  > > > - an approved member CANNOT approve their own changes
>  > >  > > > - an approved member CAN approve a complete stranger's changes
>  > >  > > > - an approved member CAN approve a another approved member's changes
>  > >  > >
>  > >  > > That's partial trust. I'm trusted to be able to judge someone else's
>  > >  > > work but not to judge my own work !
>  > >  >
>  > >  > Ok, it's partial trust. What's wrong with that?
>  > >
>  > >  I feel I'm not explaining myself correctly.
>  > >  I do believe in peer review, and I'm happy for others to look at the
>  > >  changes I propose. I'd love everyone to look at them!
>  > >
>  > >  The word "partial" was not appropriate, probably, it's more about
>  > >  "inconsistency".
>  > >
>  > >  > In short: I'm more than OK with only partial trusting everyone. We ALL
>  > >  > make mistakes, so let's stick with the processes we have which reduce
>  > >  > the risk of these mistakes hitting end users..
>  > >
>  > >  I know I make mistakes, but I'm able to merge changes proposed by a
>  > >  random contributor, where my own review is enough.
>  > >
>  > >  I'm FULLY trusted in that case, and this is conflicting with not
>  > >  being so when it's me proposing the changes.
>  > >
>  > >  Why should I be able to merge in that case then ? Is it because
>  > >  2 people review the code ? Me and the proponent ? Then why the
>  > >  random contributor review would not count on MY prs ?
>  > >
>  > >  I hope you see the discrepancy here.
>  > >
>  > >  --strk;
>  > >
>  > >    Libre GIS consultant/developer
>  > >    https://strk.kbt.io/services.html
>  > >  _______________________________________________
>  > >  QGIS-Developer mailing list
>  > >  QGIS-Developer at lists.osgeo.org
>  > >  List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>  > >  Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>  > >
>  > > _______________________________________________
>  > > QGIS-Developer mailing list
>  > > QGIS-Developer at lists.osgeo.org
>  > > List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>  > > Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>  >
>  >
>  > --
>  >
>  > Julien Cabieces
>  > Senior Developer at Oslandia
>  > julien.cabieces at oslandia.com
>  > _______________________________________________
>  > QGIS-Developer mailing list
>  > QGIS-Developer at lists.osgeo.org
>  > List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>  > Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>  _______________________________________________
>  QGIS-Developer mailing list
>  QGIS-Developer at lists.osgeo.org
>  List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>  Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer


-- 

Julien Cabieces
Senior Developer at Oslandia
julien.cabieces at oslandia.com


More information about the QGIS-Developer mailing list