Playing on Hard
Hillel on Tests

First read Hillel’s Some tests are stronger than others:

An integration test crossing modules A and B will imply some unit tests in A and some unit tests on B. If an integration test can decompose into a set of unit tests, it must be stronger than any individual test in that set. Similarly, an end to end test can be decomposed into integration tests.

And also:

If you already know there’s an error, a weaker test can be more useful than a stronger test, because it localizes where the bug is more. If you’re trying to determine correctness, though, stronger tests are better.

I’m going to quickly detail my intuitions/convictions here:

  • E2E:

    • Most useful in proving the overall and current correctness of the system. They cannot prove full correctness of the individual components (because they are layered behind other components and each layer diminishes the freedom of inputs into the next layer) which is especially important for security.
    • Very late in the development cycle and often very slow and cumbersome, often flaky.
    • Often offer little detail on what went wrong. The issues are usually obfuscated by all the layers and the UX needs.
  • Integration:

    • Most useful to test the correctness at boundaries, including security. Any internal component may become external component at any time either intentionally or through errors. Hence all components must enforce the correctness of their inputs on their boundaries.
    • Earlier in the development process and faster than E2E, they are the 2nd layer of testing, only run after the unit tests have passed and the changes have been committed. But on larger systems still too slow to be preformed within the innermost development loop.
    • Offer lot of detail on the boundaries but often little on the inside.
  • Unit tests:

    • Most useful to proving behavior and testing the boundaries of individual “units” whatever that may mean.
    • They are part of the innermost development feedback loop and thus accelerate the development the most. The best way to fix anything is to be able to write a test for it and fix the test (this is not TDD which I don’t find at all useful for application vs library development)
    • They offer the most detail but are often too “unitized” when used in application (vs library) development. My rule of thumb is to only ever mock non-deterministic behaviors and prefer to test set of such units (sub-components?) rather than single unit in isolation. This matches Hillel’s “mock-removal” example - anything that is proved without mock is necessarily stronger than something proved with a mock.

Last modified on 2023-06-19