Hi!
least the job amd64-fedora-38-last-N-failed is always failing and it would not have found its way into the codebase if a protected branch policy would require that all commits must have a passing CI status before getting pushed/merged.
This is not correct, unfortunately.
You are assuming that CI status is consistent for a given commit. If this was the case, getting a green CI would be easy, just a matter of discipline, as you say.
The problem is testcases that fail sporadically; that is, they normally pass but fail at random in a small percentage of runs. Branch protection will do nothing to prevent these failures from entering the tree. It just makes developers waste time clicking "retry" on the builders to try and get lucky on another test run.
A later paragraph in my original email on August 2nd stated that it is not just a rare random thing:
When I now look at https://github.com/MariaDB/server/commits/11.6/ I see a red cross on all commits since July 8th (last green one was 44af9bf).
Some failures may be sporadic, for sure. I would still argue that applying branch protection to require CI to green will *also* help with random ones in the same way it helps by the forcing function of gatekeeping and raising the bar on test related code quality.
To get the failures fixed, someone has to spend the considerable time and effort required to debug the failure and understand and fix the issue.
Yes, somebody has, and the motivation to do so is not there, if there is no reward in doing so. Every single project I have been involved in that applied branch protection and required CI to always be green had an initial pain, but rapidly after a radically improved CI quality. By quality I mean those projects had far fever CI failures seep into the main branch, including less random failures. Allowing failures causes alert fatigue and more failures start to seep in. Having a clear gating policy and requiring all tests to pass forces everyone to quickly agree on what tests actually should be in the CI to begin with, and make sure they are well maintained. With the policy in place, basically every developer is motivated to participate in maintaining tests as otherwise during a red mainline event no developer can do anything. - Otto