Fixing Flaky Tests Matters

When automated tests seem to pass or fail randomly, we say that they are unstable, non-deterministic, or flaky. Beyond the annoyance this can be, fixing them is in fact critical to the success of any software product, as I will explain later.

Flaky tests present themselves in different ways. For example, a couple of weeks ago, I encountered an automated test that would randomly fail in my team’s repository. The test was supposed to check that we could filter a list of users by their name. It was something like this:

let!(u1) { FactoryBot.create(:user, name: "Bill") }
let!(u2) { FactoryBot.create(:user) }
let!(u3) { FactoryBot.create(:user) }

it "filters users by name" do
  all_users = users.get_all_from_db
  filtered_users = all_users.filter_by_name("Bill")

  expect(filtered_users.length).to eq(1)
end

This test uses FactoryBot to create database records for our test. We pass arguments when needed, or use defaults otherwise. In our case, we just needed to specify the name of the first user to test the filter functionality.

Simple as it is, this test sometimes failed because the filter returned two users instead of one.

In my experience, automated tests are usually unstable due to tests not properly isolated (dependencies on other tests, components, or the environment), or the execution of non-deterministic operations in the code being tested, or in the test.

Given how the test was written, I was confident the issue was not about dependencies, so I started looking for any non-deterministic operations, but the code was immaculate. However, there was an issue with the test itself.

It’d be very boring (and sometimes inconvenient) if all test users had the same name. For this reason, our user factory sets the default name using Faker, a convenient utility that can generate names, emails, and other useful strings at random.

To set a name, Faker randomly selects a name from a pre-defined list of names. It turns out that the list contains seven names that could match our filter.

# Male names:
Bill
Billie
Billy

# Female names:
Billi
Billie
Billy
Billye

When the test initialises user_2 and user_3, it selects their names at random. In most cases it will select names other than those, for example, Antonia and Stephen. But sometimes it will select one (or even two) of these names. The probability of that happening is ~0.25%, equivalent to about one failure per week, considering all the builds run in our build pipeline.

Trust

In my opinion, the biggest issue with unstable tests is trust. Broken tests undermine trust in our automated test suite.

Once, I worked for a company whose main product had so many flaky tests that if we re-ran them we’d still get failing tests, just not always the same. When a test failed, we could not tell whether a failure was real, or if it was just due to flakiness.

In the end, nobody would even try to confirm if test failures were legit or false alarms. It was assumed that the tests were just flaky, and were just ignored. We were driving blind.

It’s not a secret that we make mistakes as we write software. It is thanks to automated tests that we can prevent some of these mistakes from becoming bugs in production. If we let automated test instability grow out of control, we’ll eventually be doomed to either go back to manual testing or spend a lot of time working on incidents.

Whenever possible, we should fix flaky tests as we encounter them, or at least document them so fixing them can be prioritised by the team at some point. By taking these simple steps we can have a great positive impact on our teams.

Happy coding!
José Miguel

Share if you find this content useful, and Follow me on LinkedIn to be notified of new articles.

Trust

Leave a Reply Cancel reply