Saturday, July 29, 2017

Prevent infinitely many bugs with this one simple trick

Here's a way to quickly find a bug in any legacy code:

Step 1: Find a case-insensitive string comparison.

boost::remove_erase_if(attributeNameAndValuePairs, [](const auto& nameAndValue) {
    return boost::iequals(, "foo");   // <-- here

Step 2: Trace the use of those strings to find somewhere they are compared with the default comparison. Bug!

if (attributeName == "bar") // <-- bug!

Sometimes it's less obvious:

return std::find(allowedAttributes.cbegin(), allowedAttributes.cend(), != allowedAttributes.cend(); <-- hard to see bug!

I've written this kind of bug many times, and so has everyone else, judging by how often I see it. Recently I wrote it again, and decided to dig a little deeper instead of just fixing this one new case.

Ways to see the problem

Is this a testing problem?

Should we have caught this bug sooner by testing more carefully?

I could write a unit test, asserting that this function handles casing correctly, for each of the dozen or so places that could get this wrong. But if I can remember to write that test, then I can remember to write the code the correct way. Both my code and my unit tests are limited by imagination. How do I catch all the cases?

Getting a second person to do the testing can help: they'll see things I don't see, and that is fantastic. But still, testing is only as good as the imagination of the tester. Also, while that may help us catch the bug before we ship, I want to catch the bug before I check in. 

Test Feedback

The real power of Test-Driven Development is not in running the tests to catch defects, but the feedback the tests give you about the design of your code (if you can hear that feedback). What are tests telling us in this case?

With the current design, to eliminate this kind of bug means finding all the places we deal with attribute names, and writing a test. If there's 1000 places, I need 1000 tests. This is a test's way of saying "your business rule is duplicated all over the place". We can look at this as a DRY problem, and can improve the design by refactoring to eliminate that duplication.

Primitive Obsession

We could also look at this as a Primitive Obsession problem: dealing with strings instead of a type that represents a concept from the business domain. Here, the concept is "Attribute Name". The Whole Value pattern says "make and use new objects that represent the meaningful quantities of your business".


Yet another way to look at this problem is from the point of view as "test-as-spec". I want my unit tests to make sense to a person familiar with the problem domain.

The test I want to write is "Attribute names are case-insensitive". With the original design, there's no way to do that, both because the rule is written in many places in the code, and because the rule is embedded in other functions which implement some other business rule.

In order to get test-as-spec, the rule will need a single, canonical home, decoupled from the rest of the system.


Based on any of the above points of view, we can create a type that represents an attribute name:

struct AttributeName
    std::string Value;

And extract the equality check:

bool operator==(const AttributeName& lhs, const AttributeName& rhs)
    return boost::iequals(lhs.Value, rhs.Value);

And now we can write a single test, named after the business rule that it describes (test-as-spec):

void AttributeNamesAreCaseInsensitive()
    CPPUNIT_ASSERT(AttributeName{"foo"} == AttributeName{"FOO"});

Then we replace any instance of std::string attributeName with AttributeName attributeName and let the compiler tell us what to fix next. Keep chasing compiler errors until they are all gone.

Is this code perfect?

No, but it's better: all the places that compared attribute names before are now slightly simpler / easier to read / easier to write / easier to test.

It's not impossible to write a case-sensitivity bug for attribute names, it's now much harder to do the wrong thing, and much easier to do the right thing.

In the process of carrying out this refactoring I found another instance of this kind of attribute name case-sensitivity bug, which testing had not yet caught. Double win!

What do you think?


William Lee said...

Regarding Test Feedback, combined with Eric’s latest blog “TDD is for pros”( . I think there should be more discussion on “test design feedback”. Lacking a defined way to “listen to design feedback”, it makes me feel like how to draw an owl ( . So, any plan to write a blog about how to identify typical test feedback and how to fix it? 😉

-William Li

Jay Bazuzi said...

Yeah, what you describe is an important concern.

I'd like to shift the question from "how we detect design feedback in tests" do "how do we detect design feedback from anywhere?"

That's partly the purpose of this blog post, to present some ideas about detecting and responding to design feedback. I'll keep writing.

Pairing, even with someone that's is not a refactoring expert, will help both of you get better.

Tim Burris said...

I think this is a great idea but it seems to be unpossible in C# since string is baked into the language and the framework so deeply. I can get the class to work standalone, but I can't figure out a clean serialization story, plus I lose interning and the reference type passed by value semantics. :(

Jay Bazuzi said...

Tim, I learned how to do this in C# first!

- Make the Value field readonly.

- Use

I've never had a problem making serialization work. What protocol are you working in?

Tim Burris said...

Xml is the primary use case.

If I try to use CaseInsensitiveString as an XmlAttribute or XmlText, I get "Cannot serialize member... XmlAttribute/XmlText cannot be used to encode complex types."

If I try to use CaseInsensitiveString as an XmlElement, I get no value unless I expose the string-behind as a public property, which is undesirable, and in which case the value gets serialized with an extra layer of indirection that I don't want.

Jay Bazuzi said...

A few abstract thoughts; we can make them more concrete later:

- I don't have much used for a class called CaseInsensitiveString, with the possible exception of allowing them in a private field of a value class.

- I'm talking about classes used within the application core and at its ports, not when interfacing to an external dependency. [DataContract]s aren't really C#...

Tim Burris said...

My thought is that CaseInsensitiveString would be a base class encapsulating the case insensitive functionality, then there could exist subclasses representing the specific business logic properties.

In my case we have several distinct kinds of data that each want case insensitive comparison semantics: server name, (NTFS) path, several kinds of named identifiers. It doesn't make sense to reimplement the casing functionality once for each kind of property; that's barely an improvement over implementing it once for each comparison operation.

In my case, xml documents on disk and their in-memory representation are an intrinsic part of the application. The binaries would be useless without the data we distribute with them.

Tim Burris said...

I think I can do what I need by implementing IConvertible.

Jay Bazuzi said...

A case-insensitive string base class is exactly what I want to avoid.

I'm OK with:

class ServerName
CaseInsensitiveString Value;

Tim Burris said...

Why do you want to avoid a base class, or for that matter avoid providing CaseInsensitiveString for general/public use? I recognize that it would probably be best to avoid sending such a type across an interop boundary, but I'm building an application not a framework.

I'm abandoning efforts to get XmlSerializer to convert my type. My next attempt will be some codegen to implement a pattern like:
CaseInsensitiveString ServerName { get; private set; }
string _xml_ServerName { get { return ServerName.ToString(); } set { ServerName = new CaseInsensitiveString(value); } }

Jay Bazuzi said...

You're making me dig in to the reasons behind my sense of design, which is a good thing...

I try to save inheritance for is-a relationships. It may not be clear why I think that applies here, and inheriting implementation sure is convenient.

The classes I am advocating for are parts of the domain model, and do not generally belong in code that uses C# as a way to write the inputs to another system, like XmlSerializer or Entity Framework. It would be easier to explain this in real code, if you have a project you can share on GitHub.