...
I see there is now a lot of activity in
https://jira.mariadb.org/browse/MDEV-36647
There is some activity indeed. Scope of this effort is to be defined. My initial aim
was fixing failures that affect builders participating in pull request checks.
The way I see it currently: every month identify 10 or so most annoying sporadic
failures and make sure they get fixed within a month. Currently I ask Elena to
make report based on buildbot logs, although I'm happy to accept lists
from my fellow colleagues. The failure has to be sporadic and has to fail
often (a few times a day at least) according to bb cross-reference.
Since this issue was created (Apr 19) 4 issues were fixed, 1 reviewed, many
were analysed. Hope I will have enough resources to keep it rolling.
Having extra focus on this right now is of course good and helps make
MariaDB 11.8 a solid release. Hopefully also people have some ideas
brewing on potential process/policy changes so that it is clear what
developers should do in the future if/when tests regress and GitHub
published CI results go from green to red next time.
My thoughts...
There are hundreds of sporadic failures in our tests. Some fail once a year,
other - every hour. Some are test issues, other - real bugs.
The more often the test is failing the higher priority it should receive.
Previous attempts of similar efforts were essential to keep things in decent
state. However they didn't seem to follow systematic approach of prioritizing
frequent failures.
There has to be somebody who is constantly monitoring the situation and
approaches appropriate developers. It is inefficient to expect it from all team
members. It feels better if the team is concentrated on their assignments.
Normally CI doesn't go red badly anymore. Large part of MariaDB core
developers have switched towards using github pull requests for their
daily job. Which means they have to pass CI checks before they can push.
There are exceptions though, when commits get pushed directly and they
do bring persistent failures (which are usually fixed promptly). We should
get this fixed indeed.
Let's see how this effort goes and adjust accordingly.
Regards,
Sergey