• "Standard" testing's problems:
    1. 1.
      It requires you to have prior knowledge of the expected output
    2. 2.
      Updating the expected output can be labor-intensive
    3. 3.
      Your inputs may not cover the total range of valid inputs; combinatorial explosion
  • Property-based testing (randomly generate inputs and assert they all pass some invariant) solves these all, but introduces new problems:
    1. 1.
      Your test is now a black box -- while you don't need prior knowledge of the expected output, you now have NO knowledge of the expected output
    2. 2.
      If the code is broken in a way that still fulfills the property, you won't detect it
  • Property-based testing effectively replaces the constraint of needing prior knowledge of expected output with needing prior knowledge of the PROPERTIES of the expected output (kind of like the integral of the output). And outside of code with nice mathematical properties this can be hard.
  • Metamorphic testing is similar to property testing, but instead of "for all values of x in range y, f(x) maintains some property p", it tests "given a constant value V, for all transformations g in some set of transformations, f(x) == f(g(x))." The given example is text-to-speech, with transforms being speed adjustments, pitch adjustments, etc. The transforms should not affect the TTS output.
  • Segura, S. et al. (2017) Metamorphic Testing of RESTful Web APIs highlights some practical examples here with API endpoints as an example (especially related to search / filtering). We currently test controllers by asserting that the controller called with some params gives some predetermined output. The metamorphic approach would assert things like "assert the output of search A is equal to the output of search A + an additional irrelevant filter," or "assert the output of search A that is expected to return <50 records does not change when pagination is set to 50, 100, or 200."
  • Examples of metamorphic testing patterns:
    • Equality -- the example described above (f(x) == f(g(x))). An example could be a search with fixed sort order.
    • Equivalence -- a relaxed form of above (Set(f(x)) == Set(f(g(x))). An example could be testing an ES index on a search with no fixed sort order.
    • Subset -- f(x) > f(g(x)). "Any search with X + 1 filters will always return a subset of the search with X filters." "A Yelp search in a 1 mile radius will always return a subset of the same Yelp search in a 5 mile radius."
    • Disjoint -- f(x) & f(g(x)) is empty. "A search for Gov ID verifications should never return overlapping entries as a search for Selfie verifications."
    • Difference -- Given x and y that differ on some attributes a[] but are equal on other attributes b[], f(x) and f(y) should only differ on the same attributes. "Given a Driver License and the same Driver License with birthday modified, all extractions and checks should be identical EXCEPT for ones that concern birthday."
  • IMO this still has a similar flaw to standard testing's issue 1. You still need to determine the set of functions to test, and you still have combinatorial explosion issues. It also has similar black box issues to property-based testing. Overall, metamorphic testing seems more appropriate for something like fuzz testing / acceptance testing, vs. something that's part of a standard test suite?
  • This post has a super interesting idea of using git worktree to checkout a "known good" branch, and then having metamorphic specs compare outputs across branches. Seems like it'd work most flawlessly with libraries; applications would have additional complications.


Chesterton's fence

In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, "I don't see the use of this; let us clear it away." To which the more intelligent type of reformer will do well to answer: "If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it."

Hyrum's Law

(From the SWE book)
If you are maintaining a project that is used by other engineers, the most important lesson about “it works” versus “it is maintainable” is what we’ve come to call Hyrum’s Law:
With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.
In our experience, this axiom is a dominant factor in any discussion of changing software over time. It is conceptually akin to entropy: discussions of change and maintenance over time must be aware of Hyrum’s Law8 just as discussions of efficiency or thermodynamics must be mindful of entropy. Just because entropy never decreases doesn’t mean we shouldn’t try to be efficient. Just because Hyrum’s Law will apply when maintaining software doesn’t mean we can’t plan for it or try to better understand it. We can mitigate it, but we know that it can never be eradicated.



Last modified 4mo ago