CI failures visible on GitHub (affecting contributors)

Hi, I see that on the 11.8 branch (https://github.com/MariaDB/server/commits/11.8/), which is supposed to be fully green and passing CI as it is soon to be released as a new LTS version, actually isn't consistently green. Out of the latest 10 CI runs the jobs passing rates are: 12/!2 11/12 12/!2 12/!2 8/8 10/12 8/12 7/8 8/8 11/12 Questions: - What is the current plan to get CI to be consistently green? - How is it possible that a commit is pushed on the 11.8 branch even though CI is not passing? Isn't branch protection preventing it? - If there is a way to get commits on 11.8 branch with failures bypassing branch protection, are all developers committed to follow-up on their commits and fix them before they do any other work? Or what is the plan for how to ensure post-push test failures are fixed? - The jobs buildbot/amd64-debian-11-msan-clang-16 and continuous-integration/appveyor/branch seem to be the most common failing ones. Can they be removed now to make the CI consistently green? And apply full branch protection so that they, or any other jobs, can't be added back to unless they are actually green? - Why is the CI occasionally tunning 8 jobs instead of 12? I also see plenty of red at https://buildbot.mariadb.org/#/grid?branch=11.8. Hopefully we get as much green as possible for the start of the 11.8 maintenance so that any potential regressions in future stable maintenance releases of 11.8 series are easier to catch. Most of the issues seem to be test issues and not real MariaDB bugs. I have also done a bunch of testing on Debian QA systems for MariaDB 11.8 and I have only found a couple small issues, so overall 11.8 is looking good! - Otto

On Sat, 12 Apr 2025 at 01:37, Otto Kekäläinen via developers < developers@lists.mariadb.org> wrote:
Hi,
I see that on the 11.8 branch (https://github.com/MariaDB/server/commits/11.8/), which is supposed to be fully green and passing CI as it is soon to be released as a new LTS version, actually isn't consistently green.
and our releases are meant to be 100% free of bugs if you ask our users too.
Out of the latest 10 CI runs the jobs passing rates are:
12/!2 11/12 12/!2 12/!2 8/8 10/12 8/12 7/8 8/8 11/12
Questions:
- What is the current plan to get CI to be consistently green?
Get more help, perhaps with more funding. or: * hide more errors * do less development * reduce testing/delivery scope - How is it possible that a commit is pushed on the 11.8 branch even
though CI is not passing? Isn't branch protection preventing it?
branch protection scope isn't every builder.
- If there is a way to get commits on 11.8 branch with failures bypassing branch protection, are all developers committed to follow-up on their commits and fix them before they do any other work?
Or what is the plan for how to ensure post-push test failures are fixed?
its all manual, and no roles defined, because of false positives that take effort to test (and even report), let alone fix. automating would lead to annoyance /filtering.
- The jobs buildbot/amd64-debian-11-msan-clang-16
about to get updated to clang 20 and run tests on non-debug builds which should timeout less.
and continuous-integration/appveyor/branch seem to be the most common
not sure why it's still there (MDBF-879 - yes my fault - MDEV-12386), let alone in GH status.
failing ones.
Can they be removed now to make the CI consistently green? And apply full branch protection so that they, or any other jobs, can't be added back to unless they are actually green?
- Why is the CI occasionally tunning 8 jobs instead of 12?
probably some machine unavailability and jobs where killed. There are more than 12.
I also see plenty of red at https://buildbot.mariadb.org/#/grid?branch=11.8. Hopefully we get as much green as possible for the start of the 11.8 maintenance so that any potential regressions in future stable maintenance releases of 11.8 series are easier to catch.
Most of the issues seem to be test issues and not real MariaDB bugs
Some were indeed real bugs - I looked at 10.11 as first bug fixed there: * MDEV-36587- Marko just commited to 10.11 which is looking a lot greener on upgrade tests * MDEV-36591 - will resolve the RHEL9 and compatible along with Ubuntu 20.04 install/upgrade tests * MDBF-1038 centos stream 9 missing lzo libraries galera on Debian 12 11.8 - galera.MDEV-35748 - re-record output result - PR welcome MDEV-35946 - pinged on MDEV galera.galera_wsrep_schema_detached probably timeout - no existing report - feel free to create. * amd64-ubuntu-2404-clang18-asan - many rpl occasional failures, found two other crypto (connection) memory leak and innodb memory leak in history (that will do bugs now). rpl bugs might be just out of time on something internal rpl related. Haven't found while testing clang20 locally yet. * s390x encryption and two test cases that probably have bug reports probably env/test cases: * PAM failure - MDBF-1036 * RPM major upgrade -MDBF-1040 <https://jira.mariadb.org/browse/MDBF-1040> * Fedora 41 - galera failure - can't install - https://buildbot.mariadb.org/#/grid?branch=mariadb-4.x MDEV-35108 * openeuler 2304 - mirror out of date * msan.alter_table - timing out - is debug only test - all threads on locks looks suspicious MSAN update uses non-debug build so will become green, but maybe not resolved. * asan on innodb.innodb-index - in aria on a page read during shutdown - odd no bug report - common - need to test with updated ASAN/UBSAN builder - MDBF-741/MDBF-740 * Fedora 39 - eol - will be removed next update * valgrind - was fixed, and then reverted. * sles 1506 MDBF-1041 <https://jira.mariadb.org/browse/MDBF-1041> * freebsd-14 - galera provider update soon coming (currently on 26.4.19) * MDBF-1041 amd64-sles-1506-rpm-autobake-minor-upgrade-all * amd64-ubuntu-2004-eco-php - haven't had time to work out and fix php incompatibilities - and no-one has volunteered to help * https://buildbot.mariadb.org/#/builders/ppc64le-debian-12-deb-autobake-minor... - no idea - hoping you can help us.mirhosting.net 10 days out of date - https://mirror.mariadb.org/mirrorstats - asked Faustin I have also done a bunch of testing on Debian QA systems for MariaDB
11.8 and I have only found a couple small issues, so overall 11.8 is looking good!
- Otto _______________________________________________ developers mailing list -- developers@lists.mariadb.org To unsubscribe send an email to developers-leave@lists.mariadb.org

Hi! On Mon, 14 Apr 2025 at 20:03, Daniel Black <daniel@mariadb.org> wrote: ...
Questions:
- What is the current plan to get CI to be consistently green?
Get more help, perhaps with more funding.
or: * hide more errors * do less development * reduce testing/delivery scope
I see there is now a lot of activity in https://jira.mariadb.org/browse/MDEV-36647 Having extra focus on this right now is of course good and helps make MariaDB 11.8 a solid release. Hopefully also people have some ideas brewing on potential process/policy changes so that it is clear what developers should do in the future if/when tests regress and GitHub published CI results go from green to red next time.

Hi Otto, On Wed, Apr 30, 2025 at 12:02 AM Otto Kekäläinen via developers < developers@lists.mariadb.org> wrote: ... I see there is now a lot of activity in
There is some activity indeed. Scope of this effort is to be defined. My initial aim was fixing failures that affect builders participating in pull request checks. The way I see it currently: every month identify 10 or so most annoying sporadic failures and make sure they get fixed within a month. Currently I ask Elena to make report based on buildbot logs, although I'm happy to accept lists from my fellow colleagues. The failure has to be sporadic and has to fail often (a few times a day at least) according to bb cross-reference. Since this issue was created (Apr 19) 4 issues were fixed, 1 reviewed, many were analysed. Hope I will have enough resources to keep it rolling.
Having extra focus on this right now is of course good and helps make MariaDB 11.8 a solid release. Hopefully also people have some ideas brewing on potential process/policy changes so that it is clear what developers should do in the future if/when tests regress and GitHub published CI results go from green to red next time.
My thoughts... There are hundreds of sporadic failures in our tests. Some fail once a year, other - every hour. Some are test issues, other - real bugs. The more often the test is failing the higher priority it should receive. Previous attempts of similar efforts were essential to keep things in decent state. However they didn't seem to follow systematic approach of prioritizing frequent failures. There has to be somebody who is constantly monitoring the situation and approaches appropriate developers. It is inefficient to expect it from all team members. It feels better if the team is concentrated on their assignments. Normally CI doesn't go red badly anymore. Large part of MariaDB core developers have switched towards using github pull requests for their daily job. Which means they have to pass CI checks before they can push. There are exceptions though, when commits get pushed directly and they do bring persistent failures (which are usually fixed promptly). We should get this fixed indeed. Let's see how this effort goes and adjust accordingly. Regards, Sergey
participants (3)
-
Daniel Black
-
Otto Kekäläinen
-
Sergey Vojtovich