Hi Otto, On Tue, Aug 6, 2024 at 6:45 AM Otto Kekäläinen via developers <developers@lists.mariadb.org> wrote:
A later paragraph in my original email on August 2nd stated that it is not just a rare random thing:
When I now look at https://github.com/MariaDB/server/commits/11.6/ I see a red cross on all commits since July 8th (last green one was 44af9bf).
As far as I can tell, there are several contributing factors to this problem. One is that there are two CI systems. The pull request workflow uses https://buildbot.mariadb.org, which is mostly based on Docker images and a somewhat newer version of the Buildbot software than the one that is managed by https://buildbot.mariadb.net and is based on virtual machines. Like you write, many core developers do not use GitHub pull requests and seem to pay attention to buildbot.mariadb.net only, ignoring any failures that would occur on buildbot.mariadb.org. For some reason, buildbot.mariadb.net has been configured so that some platforms only build "main branches". If a development branch broke things on some platform that is not covered for "non-main branches" on buildbot.mariadb.net, then such failures would typically be ignored until the change reaches the main branch. Worse, many developers do not watch the main branch status at all, before or after the commit. Paying attention to buildbot.mariadb.org would lead to a better result, because each change will be scheduled on each builder. However, not all builders are created equal: * Only some builders are mandatory for branch protection. * Only some of the mandatory and non-mandatory builders report status to GitHub. * There are builders that are "invisible" to GitHub, mainly visible in the "grid view", say, https://buildbot.mariadb.org/#/grid?branch=10.11 or https://buildbot.mariadb.org/#/grid?branch=refs%2Fpull%2F3030%2Fmerge for https://github.com/MariaDB/server/pull/3030/. Many reviewers and developers seem to be unaware that you should pay attention also to such "hidden failures". Some could also think that some ISA such as POWER or IBM Z (s390x) are "exotic" and not worth any attention.
Allowing failures causes alert fatigue and more failures start to seep in.
Analogous to https://en.wikipedia.org/wiki/Broken_windows_theory one would tend to ignore any failures for a given platform, say, https://buildbot.mariadb.org/#/builders/588 (amd64-debian-12-asan-ubsan) always seems to fail, therefore I will ignore it. It might help if experimental builders were separated in the grid view. Based on the currently latest failure https://buildbot.mariadb.org/#/builders/588/builds/7718 this may be an issue with that particular builder. However, without a deeper investigation I would not claim so, because based on https://jira.mariadb.org/browse/MDEV-26272 and many related tickets I know that clang -fsanitize=undefined is much stricter than the corresponding GCC option.
Having a clear gating policy and requiring all tests to pass forces everyone to quickly agree on what tests actually should be in the CI to begin with, and make sure they are well maintained. With the policy in place, basically every developer is motivated to participate in maintaining tests as otherwise during a red mainline event no developer can do anything.
I agree. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc