Tuesday, October 18, 2016

Pinning Tests

I wrote this on the C2 Wiki, with the hopes that other people would help improve it. But now that site is down, so I'm posting it here:

Definition: A simple-minded automated test that locks down the behavior of existing code that otherwise is not well-tested, as a safety net while refactoring.

Example: Run some code and collect logs as a baseline. Each time you make a change, run the program again and compare the logs against the baseline. As long as there is no difference, you have some confidence that things are still working.

Pinning tests can make it safer to refactor. (Pinning tests can never make refactoring completely safe, because you'll forget important cases in your pinning tests. For safety, use #3 or #4 from Various Definitions of "Refactoring"). Pinning tests are a safety net, just in case.)

The most important features of pinning tests are:

  1. Give an obvious, definitive pass or fail result.
  2. Good coverage. Professional testers get really good at this; ask them to help.
  3. Faster is better, so you can run them often.

Non-requirements for pinning tests:

Robustness. Professional testers get really good at making robust tests that work on different computers, or at different screen resolutions, or across UI changes. Ask them to refrain - these tests are short lived, and the behavior of the system won't be changing (by definition of "Refactoring").

You don't need to run your pinning tests in every environment that you ship. For a GUI, it's fine to record mouse clicks and keystrokes.

Long-lived. The goal is to hold behavior constant for just long enough to ReFactor.

Clean code. Hacking the test together is OK. For example,

  • Use the C preprocessor to redirect troublesome API calls to write to a log instead.
  • Edit your HOSTS file to hijack accessing a network resource.


Tuesday, September 6, 2016

Proposed Refactoring: Introduce Parameter in Lambda

Given a lambda with a captured local variable,
  1. Add a new parameter to the lambda
  2. Inside the lambda, replace uses of the local with uses of the new parameter
  3. Where the lambda is called, pass in the local.


I believe this is a refactoring: I believe that this transformation has no effect on the behavior of the code. But I'm not completely certain.

This operation is not allowed if the value of the local is changed inside the lambda.

This is almost the same operation as Introduce Parameter.

Sunday, September 4, 2016

Proposed refactoring: extract and execute lambda

Given a statement block, wrap it in a lambda assigned to an Action variable, and execute it immediately.


I believe this is a refactoring: I believe that this transformation has no effect on the behavior of the code. But I'm not completely certain.

I think a similar recipe for expressions is equally valid, using a Func<> instead of Action.

This is almost the same operation as Extract Method.

Tuesday, July 5, 2016

"pure unit test" vs. "FIRSTness"

Sometimes we categorize tests into groups like "pure unit test", "focused integration test", "end-to-end-test", etc. That's a fine approach, and useful for a lot of cases.

For example, I find that pure unit tests are extremely valuable in giving me feedback about my code design, especially coupling and duplication. I don't even have to run the tests to get that value! Other types of tests have their value, but they don't give me that feedback.

Another categorization I sometimes find useful is based on the FIRST Properties of Unit Tests. You should read that link for the full story, but I'll summarize here:

Fast
Isolated (tests have a single reason to fail. One aspect of behavior = one test)
Repeatable (same result every time)
Self-verifying (tests report an unambiguous pass/fail)
Timely (each test is created just before it is needed)

It's common for programmers to have one set of tests that they run with every edit-build-test cycle on their dev machine. They might have another set they run to validate each checkin before it merges in to source control. Another that runs nightly or weekly. Another that runs before each release.

I've noticed is the decision about which tests fit in each of these buckets is less about "unit" vs. "integration" and more about "FIRS" (without the T). That is, if a test is fast and the results are reliable and useful, programmers will tend to run them more often. If a test is slow, or results require investigation, they will tend to run them less often.

Ideally, I'd like to see 99.9% of tests run in 1ms or less, be perfectly repeatable, with a clear pass/fail, and for each failure to make it obvious what aspect of what desired behavior is not right. You should strive for that. But today, given the tests you have, you may find value in bucketing your tests as I've described.

Thursday, June 23, 2016

How to document your build process for an open source C# project

As an Open Source contributor...

I find an interesting open source project that I want to contribute to. I fork/clone the repository to my machine. Then I have to figure out how to build it.

Is there a solution file? Or a script?

I try something and the build fails. Do I need a certain SDK or Visual Studio feature installed? Which version?

I get it to build and then I try to run the tests. 1/3rd of them fail, because they are looking for something that isn't installed on my machine.

If I'm lucky (!?) I find a document in the repository that claims to be build instructions, but it is jumbled and clearly out of date. I try to follow it, but something I need to install is no longer available, or not compatible with my version of Windows. Will a newer version of that thing work OK?

Uggh, what a mess.

As an Open Source maintainer...

I put together a cool little project in my spare time and post it online. It's simple and straightforward to build and run tests. 

Then a contributor complains that they can't build it. What information could possibly be missing? It's simple and straightforward, right? I write a small text file explaining the obvious instructions. The contributor tries to follow it but is even more confused. I don't have time for this.

Uggh, what a mess.

A solution

My solution is AppVeyor. I treat AppVeyor as the reference build environment.

Here's how:
  1. https://ci.appveyor.com/
  2. New Project, select your project.
  3. Settings -> Do what you need to get a green build + tests
  4. Settings -> Export YAML. Add it to your repo.
  5. Delete the AppVeyor project
  6. New Project again, but this time configure nothing. It will use the settings from your repo.
  7. Confirm that build + tests are still green

Now the instructions for how to build + run tests are in source control. Anyone can read them. There won't be any missing details. If a dependency changes, I won't miss updating the instructions, because AppVeyor will report my build is broken. 

No more mess.

Saturday, June 11, 2016

An example of good engineering

I often advocate for good engineering practices and the path to Zero Bugs. Talking about these things is great and all, but concrete examples are important. I recently published an open source project that I think is a good example of this kind of work.

I hope you will copy some of the ideas to use in your own projects. You can read the source here: https://github.com/JayBazuzi/ValueTypeAssertions

Great tests

Code Coverage is a dumb measure, especially in this case. There are very few branches in the code; two tests would hit 100% coverage. 

In this project, you can pick any line of code and modify it to be incorrect, and you'll get a test failure that tells you exactly what is wrong. That is much more valuable than any code coverage number.

I can't guarantee that it has 0 bugs, but I can say that every type of bug I have ever imagined or experienced in this code is covered by a test. 

The tests are organized like a spec, using namespaces/folders to organize tests the same way as if you were writing a spec. Each name indicates what aspect of the system's behavior is being covered. 

The tests are super-fast, which makes the edit-build-test cycle a happy experience. 

ReSharper

ReSharper settings are included in the repository. All sources have been formatted with these R# settings. This makes it easy to keep formatting / style consistent. 

If a random person on the internet decides to make a contribution, I don't have to explain the project's style - they can just let ReSharper take care of that. 

ReSharper Code Inspections are 100% clean, further helping keep the code clean and consistent.

AppVeyor

Every Pull Request is automatically validated by AppVeyor, including build + unit tests.

C# Warn-as-error is turned on for the AppVeyor build. I believe it's important to have 0 warnings - either heed the warning if it matters, or disable the warning if it doesn't. But I don't want to slow down my edit-build-test cycle just because if a warning, so I don't set warn-as-error on the desktop. But I dot set it in AppVeyor, to to ensure that all changes have 0 warnings before they hit master. 

AppVeyor runs ReSharper Code Inspections, again ensuring there are 0 issues before merging to master. This is especially important because not everyone has ReSharper.

The AppVeyor web site lets you edit build settings online. It's a convenient way to tune the settings. Once I had them just right, I downloaded the appveyor.yml file and added it to the repository. Then I deleted my AppVeyor project and recreated it from scratch, to ensure that no online edits were required -- everything is in the repo. If anyone wants to fork this project on GitHub and set up the same build, that will be easy.

NuGet

Each AppVeyor build produces a NuGet package, which means we know that there aren't any problems in the .nuspec file or anything like that.

When a commit is merged to master, a special AppVeyor build runs to generate an "official" nuget package which is then automatically uploaded to the nuget.org package repository. (The API key is encrypted). AppVeyor automatically updates the version number, and it includes a "-beta" tag so no one expects it to hold to any Semantic Versioning guarantees. 

When semver becomes important for the projet, I will implement a one-touch release process to nuget with non-beta versoin numbers.

The project itself

The whole purpose of this project is to help you get a step closer to zero bugs. 

It embodies all I have ever learned about how to implement equality in C#. Everything I have read; every mistake I have made; every mistake I can imagine making. It makes it easy to eliminate a class of errors: "C# class with incomplete or incorrect equality implementation".

It reduces the barrier to addressing Primitive Obsession, which means fewer bugs in the rest of your system, too.

The project is small

This is quite a small project. You may think that your codebase, being far larger and more complex, would not be amenable to this kind of engineering. I admit that I haven't proven otherwise. And even if you believe it would be possible and valuable to do it on your big project, you may not see how to map these ideas from here to there. Sorry.

But in some ways, the fact that it is small is part of its success. I have found a single need and satisfied that need in a single package. You can adopt this package without taking on any other requirements - no opinionated framework here. It adheres to the Single Responsibility Principle. It does what is needed and nothing else. Any time you can make a project do that, it's a win.

ValueTypeAssertions

The Problem

Primitive Obsession is one of the most pervasive code smells out there. You can address it by moving a primitive in to a simple class.


I call the resulting class a "value type", but don't confuse it with C#'s notion of a value type, which doesn't get its own heap allocation, and is passed-by-value to other methods, and is a source of bugs if it's mutable. I mean "a type that represent a value in some domain".
If you want to implement equality on that class, there are a lot of tricky details that are easy to get wrong, at least in C#. For example:


This will throw an exception when trying to cast the Bar to a Point. So you try to fix it:


This will throw when trying to call null.GetType(). Uggh.

You probably want to override operator==() as well.


The compiler tells you to implement operator!=() to go with it, so you copy/paste and change the method name:


Oops, you forgot to negate the check. Bug.

If the value in question is a case-insensitive identifier of some sort, it's important that the GetHashCode() is implemented correctly. Don't do this:


Maybe you want to implement IEquatable<>, too, and you better get these details right there, too.

Many programmers don't test these details at all, or they test a few but not all, and they have to repeat the same set of tests each time they introduce a new class. If you discover a new rule (ToString() should follow equality, right?) you have to update all the tests.

Prior Art

Assertion libraries typically have an equality assertion. For example, in NUnit:

    Assert.AreEqual( new Point(7,8), new Point(7,8) );

That is insufficient. It only tells you that one of the equality checks you've written is correct, and doesn't catch all the other cases listed above.

The Solution

ValueTypeAssertions addresses all the mistakes I have ever made, or seen made, or can imagine when implementing equality in C#. Grab it from NuGet, and write a unit test like this:

    ValueTypeAssertions.HasValueEquality(new NtfsPath("foo.txt"), new NtfsPath("foo.txt"));


This says "these two objects should equal, in every way that C# recognizes equality".

  ValueTypeAssertions.HasValueInequality(new NtfsPath("foo.txt"), new NtfsPath("bar.txt"));


Which says the same thing about not being equal.

If some part of your value should be case insensitive, just add another assertion:

  ValueTypeAssertions.HasValueEquality(new NtfsPath("foo.txt"), new NtfsPath("FOO.TXT"));

If you wrap two values, assert the combinations:

  ValueTypeAssertions.HasValueInequality(new Point(1, 2), new Point(1, 8));
  ValueTypeAssertions.HasValueInequality(new Point(1, 2), new Point(0, 2));

You can find the source on GitHub.

Feedback

Do you find this useful?

What change would make it more useful to you?

Is there a name for this that would be more obvious?

Wednesday, May 11, 2016

Extract Method introduces a bug in this corner case

I rely on automated Extract Method to do the right thing, regardless of test coverage. This is a key part of attacking legacy code that lacks tests. But if the Extract Method introduces a subtle bug, then I can't rely on it.

Here's the code:


As it is, the test passes. If you extract the indicated block, then the test fails. Extract Method should add a `ref`  to the parameter on the new method.

This repros with VS2013, VS2015, and ReSharper 8, 9, and 10.

Saturday, May 7, 2016

Examples of tiny test-induced design damage

Imagine you are trying to write a unit test for some code, but you're finding it difficult.

Maybe there's some complex detail in the middle of a method that is not relevant to the current test, and wouldn't it be nice to disable that bit of code just for the purpose of the test? Maybe you could add an optional boolean parameter to the method, which when set causes the detail to be skipped.

With the exception of getting legacy code under test to support you when refactoring, I see this as a bad thing, making the code worse just for the sake of testing.

Here's my list so far:
  • method marked 'internal' for testing
  • method marked 'virtual' for testing
  • method overload for testing
  • additional optional method parameter, only used for testing
  • public field that is only modified under test, to change behavior for testing
  • public field that is only read by tests
  • function replaced with mutable delegate field, only mutated for testing
Yes, TDD is about letting tests influence your design, but not in this way!

So how do you tell the difference? Here are a few ways:

  • Will this be used for both testing and in production? 
  • Do you feel the urge to add a comment saying why you did this?
  • If you removed the tests, would you keep this design?
  • Your own design sense. Do you think the design is better?
What to do about it?

Usually the desire to do this indicates that your class/function/module whatever is doing too much. 

Maybe you need to extract a class. If it's not obvious what belongs in the class, you might need to extract some methods first, to put in that new class.

A really common case is primitive obsession, like if the method deals with some string. If you move the string in to a new class, and then move that "deals with the string" code in to the class, then the class is small and easy to test and your code has improved. This is Whole Value.

Maybe there's something at the beginning or end of the method that talks to an external system, and that is making testing difficult. You could move those lines to the caller, and the method becomes testable.

I'd like to find some concrete examples.

Friday, May 6, 2016

Mob Programming conference 2016

Resources

Mobbing time lapse – a full day in 3 minutes

Woody Zuill keynote – how they found mobbing

Some Helpful Observations for successful Mob Programming (short slide deck)

People

Some of the people who I was glad to see at the conference:
  • Woody Zuill. Manager of the Hunter mob that discovered mobbing, and instigator of the #NoEstimates discussion
  • Llewellyn Falco. Creator of ApprovalTests, Teaching Kids Programming, credited with “strong-style” pair programming.
  • Nancy Van Schooenderwoert. Led a team of newbies to fantastic results, and wrote about it: http://www.leanagilepartners.com/library/Vanschooenderwoert-EmbeddedNumbers.pdf

There were around 50 people total, including people from Cornwall, Sweden, Denmark, and Finland.

Location

It was held at Microsoft’s New England Research and Development Center (“NERD Center”), right next to MIT. My cardkey didn’t work on the doors, though.

The 3 days beforehand were the Agile Games Conference, in the same space.

Structure

2 keynotes:
  • Woody Zuill on how they discovered mob programming
  • Llewellyn Falco on the science of mob programming (why it works)
4 mob programing workshops

2 open space slots.

The workshops were a chance to participate in a mob under the guidance of a mobbing expert. There were workshops at the introductory, intermediate, and advanced levels of mobbing.

Take-aways

The conference was less about teaching/learning, and more about experience. As such, most of my take-aways don’t fit in to an email. Hopefully I can facilitate these experiences for others.

Woody explicitly does not recommend mobbing. The important things he sees, which led to the discovery of mobbing + great results:

  • Kindness, consideration, and respect
  • The people doing the work should choose how they do they work
  • Turn up the good (work on making good things happen more, rather than fixing bad things – the bad things tend to melt away)

Mob programming is a skill. Don’t expect amazing results right at the outset.

At the conference I had the opportunity to experience mobbing at various levels, and this gave me a glimpse of what expert mobbing would look like. I can now see how that way of working would produce those amazing results.

I worker asked me to get the answer to the question “what is the ideal mob size?” The answer I found is largely about reframing the question:

If a team is not skilled at mobbing, then you won’t get great results, regardless of mob size. An expert mobbing team will be able to work well with 4 people or with 14. Get people that have each of the skills/knowledge/talents that will be needed, so they don’t get blocked.

I can now teach you to differentiate a male house sparrow from a male song sparrow, in less than a second.

Monday, April 11, 2016

Definitions of "Zero Bugs"

I am writing in response to this tweet:
A common definition of "Bug" is "Code that does not work according to spec." I see this as a deliberately narrow definition to cope with (coddle!) too many bugs. I want to come back to that, but first some definitions of Zero:
  • The normal known bug count is 0. Switch from counting bugs to counting days/months between bugs.
  • For every bug we've ever seen, we know that that class of bug will never happen again.
  • We no longer need a find-and-fix cycle before shipping a feature.
  • A mindset shift, from "bugs are inevitable" to "bugs are, uhh, evitable".
  • An ideal to aim for, which informs how we work each day.
  • A state where the rules of the game have changed, and we discard the protocols and cautions we had put in place to manage bugs.
As we approach Zero, you can change your definition of "Bug" to:
Are any of these definitions the same as "no customer will ever find a bug in this code, ever"? No, but that hardly matters. You certainly shouldn't let that be an excuse to argue that Zero Bugs is impossible instead of deciding to start down the path to #BugsZero.

Thursday, February 18, 2016

BugsZero @ Agile Open Northwest 2016

Neo: What are you trying to tell me? That I can catch all my bugs in testing?  
Morpheus: No, Neo. I'm trying to tell you that when you're ready, you won't have to. 
(paraphrased)

TLDR: You already know how to do it; no heroics required; go for low hanging fruit; start now.

Typically when I mention the idea of No Bugs to people, they respond with doubt and disbelief. They think I'm nuts, or they think I'm defining "bug" in a very narrow way, or that it could only be possible in some very specific context (no schedule pressure, a simple problem domain, greenfield development, etc.).

What is a bug?

The definition of bug I am using is very broad: anything that disappoints or surprises anyone.

The only people that use narrow definitions of bugs are the people who have lots of bugs. This is a coping technique that is unnecessary when you have no bugs.

If I wrote my code correctly, but something I depend on broke and now my site is down, is that a bug? Yes.

If the developer implemented code according to spec, but the spec was wrong, is that a bug? Of course it is.

I don't care about categorizing bugs. It's just bugs.

If you ever ask  "does X count as a bug", the answer should be "yes".

When is a bug?

Are we only talking about bugs that customers see? What if it's caught during testing?

I measure "bug injection" when the change is checked in to source control. When it escapes the developer's machine. In GitHub it would be when a pull request is merged in to master. I like this definition because it lets me lean on unit tests, static analysis, lint, etc. in an automated CI system.

Arlo wishes he could measure even earlier - if it gets typed in to the editor, it counts as a bug. More on that later...

What is zero?

At the AONW session Arlo asked the room how many bugs people currently have open in their bug tracking system. Answers looked like:
  • 1700
  • 250
  • 200
  • 200
  • 100

Then he asked Brian Geihsler about a project he was on. The answer had a very different shape:

  • 3 days to 3 weeks between bugs

(They also measured # of stories delivered between bugs.)

And then he asked Chris Lucian:

  • 12-18 months between bugs

Changing the rules

Are these zero? My inner mathematician says no, but my inner project manager says yes. If you can measure days between bugs, that changes the rules:

  • You no longer need to get the most expensive people in a room to triage bugs.
  • You never need to argue about whether something is a bug.
  • You never need to choose between fixing a bug and writing a feature. 
  • You can ship whenever you want.
How is this possible?

It's not about testing. It's about addressing the causes of bugs.

Where do bugs come from?

Bugs happen when a human makes an incorrect decision. 

The human brain is really good at making decisions, and doesn't let a lack of information get in the way. Even worse, it doesn't tell you that it's making a decision based on a lack of information. It just makes the decision and feels confident about it. Worse still, you have a limited short term memory, so even if the information you need is available to you, it may not all fit, but you won't know it.

Here are some ways that code can set you up to make bad decisions:
  • A variable is named "taxReturn" when it represents a "tax refund" (code that lies)
  • A variable is named "txRfnd" when it represents a "tax refund" (abbrs. obfuscate)
  • Two variables representing the same idea are named differently (unnecessary synonym)
  • One idea is expressed in more than one place
  • A function that is very long
  • Whitespace/indentation doesn't match the parse tree (Python wins here!)
Some examples out of code:
  • A dependency broke (add automated checking that the dependency still works)
  • I wrote a feature the customer doesn't want (pair with a customer)
How to get to zero bugs?

This is my favorite take-away from the AONW session: there's no secret. You already know how to get there. 

You already know how to get a little better. Rename a variable. Automate a step in your release process. Pair program on a kata for an hour. You can probably think of a dozen small improvements that you could make right now.

Each time there's a bug, look for some way you can avoid that class of issue. Pick the low-hanging fruit. The easiest, quickest, safest change that you know you can execute and get benefit from right away. Don't be ambitious. Do pick something that has been trouble recently.

Do it again. Keep iterating. 

How long will it take?

Assume it will take about 2 years to get to Zero Bugs. 

That means you need to progress 1% towards your goal each week. I know you know how to get 1% better right now.

It's a choice.

Now that you know how to stop writing bugs, the responsibility rests on your shoulders. If you're still writing bugs 2 years from now, it's because you decided to keep writing bugs.

Start now.

Wednesday, December 23, 2015

My ideal edit/build/test/commit/deploy/etc. system

There's a ton of variation out there in how teams set up the pipeline from "edit code" to "live in production". I want to talk about my ideal, to use as a reference point in further discussion.

TL;DR: When a change is pushed to master, it is proven ready for production.

"pushed to master" is equivalent to "makes it off a development machine".

It's common in Git to make multiple commits locally before pushing them up to the official repository. I am fine with those local commits not all passing tests. It's the "push" or "merge" that matters.

I take the term "master" from popular Git usage, but that's not important - it could be "trunk" or "Main" or whatever.

"Proven" here can mean a bunch of things. Obviously, it includes passing unit tests. It also includes compilation, so I will lean on the compiler. It also includes static analysis, which I will extend to eliminate classes of bugs.

It's important that this "proving" process be super fast, so that I never hesitate to run it. If it's slow, I'll want to separate the slow and fast parts, and require the only fast parts to be run on every change. The slow parts might run every few changes, or every night, or whatever, which means I don't know that master is always ready for production. So I look for ways to make it all super fast.

Sometimes a bug will slip through, and be caught by manual testing, or production monitoring, or by a customer. When this happens, I look for some way I can improve my "proving" to eliminate this class of bugs forever. In this way, over time my confidence in "ready for production" steadily grows.

Some teams have an "in development" branch, where changes can go before master, so that they can be shared between developers even if they're not production ready. In my ideal model, I don't need that. I use vertical slicing, safe refactoring, feature flags, etc. to be able to commit my changes quickly. My branches are short-lived. If my changes pass tests, I push them to master, and I'm done.

Some teams have an "in test" branch, where they'll take a snapshot of what's in master, and then run a testing pass before going to production (with some iteration for making additional fixes). In my ideal model, I don't need that. If my changes pass tests, I push them to master, and they're ready for production.

Ideally, there's an automated system that runs these builds + tests against proposed changes and then pushes them to master if they pass. In TFS they call this "gated checkin"; some people call it "Continuous Integration". The important thing is that you know for sure that master is always green - the validation always passes.

I want to reinforce the point that this is an ideal. I don't expect you to get there tomorrow. But I do want you to agree that this is both valuable and feasible, and start working towards this ideal today. Each step you take in this direction will make things a little better. You'll get there eventually.


And don't do something irresponsible like delete all your integrated tests, or fire your QA staff. Start moving towards this ideal, but keep your old process around until you can demonstrate that it is no longer giving you value.

Sunday, December 6, 2015

Types of integration/integrated test

I've noticed that people often use these terms interchangeably.

And when I look at the kinds of tests they're talking about, I see a bunch of different things. Each of these things is worth considering separately, but we lack crisp terminology for them. (I've touched on this before.)

1. Testing class A through B


2. Testing class A, but B is incidentally along for the ride


3. I have tested classes A and B separately, but now I want to test that they work together. 

That is, that they integrate correctly.


4. My business logic is testable in isolation, but then I have an adapter for each external system; I test these adapters against the real external system. I call this a focused integration test, and it happens when I use Ports/Adapters/Simulators.

5. I have unit tested bits of my system to some degree, but I don't have confidence that it's ready to ship until I run it in a real(ish) environment with real(ish) load. 

6. I am responsible for one service; you are responsible for another; our customers only care that they work together. We deploy our services to an integration environment, and run end-to-end tests there.



Every "Extract Method" starts with minus 1 points

Eric Gunnerson once wrote about the idea that, in programming language design, every potential language feature starts with "minus 100 points":
Every feature starts out in the hole by 100 points, which means that it has to have a significant net positive effect on the overall package for it to make it into the language. Some features are okay features for a language to have, they just aren't quite good enough to make it into the language.
Once a feature makes it in to a programming language, it's in there forever. If you later realize it could have been better if done a little differently, you're stuck. Features tend to join to create combinatoric complexity, so each feature you add now means potentially big costs down the line.



When refactoring, I say "Every 'Extract Method' starts with minus 1 points".

The default negative reflects the cost of looking in two places to understand your program, where previously everything was in one place. The extracted method has to provide some additional value to justify its existence.

If the new method lets you eliminate duplication, add points.

If the new method is poorly named (worse than good / accurate / honest), subtract points. If the name more clearly expresses intent, add points.

If the calling method is now easier to follow, add points.

It's not a very high bar, but if you can't get to positive territory before merging to master, throw away the refactoring.


Sunday, September 13, 2015

Unit testing microskills

In response to my Why we Test posts, George Dinwiddie had this to say:
The connection between why and how is important, but the details are not obvious. I'll pick a few values that people hope (unit) tests might offer, and give my thoughts on how to practice testing to deliver this value. (This is certainly not a complete analysis of the subject.)
prevent regressions due to future work
Most people pick up on this one right away: as long as you can get a green bar before making changes, and another green bar when you're done, your tests catch bugs before they get checked in. Great!

Speed, readability, and granularity of tests aren't as important as good coverage. They don't even have to be unit tests - any tests will do. Reliability with a clear pass/fail result is important, so that bug-induced test failures actually get recognized.

If a piece of code is a completely obvious expression of a business requirement, you still need to write a test for it, since the tests call out the intentional behavior.

"prevent regressions" does not appear to require test-first. In fact, teams that focus on this value tend to write many of their tests afterwards. Because the code isn't written for testability, it's hard to test (duh). Either we don't bother testing it, or we bend over backwards writing horrible tests that are hard to understand, and lock down implementation details, making future refactoring harder.
a safety net during refactoring
Readability and granularity of tests aren't as important as good coverage and speed. Slow tests mean you won't run as often, which means you won't catch mistakes as quickly, which makes refactoring more expensive. That changes the cost/value/risk equation for refactoring, so you won't refactor as often.

Test speed includes any time spent analyzing results and rerunning flaky tests, so make test results obvious and rock-solid.

Many organizations are nervous about the risk of bugs from refactoring, even though they tolerate bugs from feature work. In that context, great coverage is particularly important for the refactoring safety net.

In an effort to improve coverage, teams that focus on the refactoring safety net will often test implementation details, including breaking encapsulation and injecting mocks to access those details. In the process, they lock down those details, making refactoring more difficult. That's Irony Number One.

Getting proper coverage, for both "prevent regressions" and "refactoring safety net" can be difficult. Applying the Three Rules of TDD is an effective way to get the coverage that you actually need. As long as you avoid testing implementation details, you'll necessarily have to decouple your code to make this happen. So you'll naturally end up with a code base that is at least moderately well-factored, even before you try to use the tests as a refactoring safety net. That's Irony Number Two.
make DRY problems visible
DRY problems become visible in TDD when you find yourself writing the same test repeatedly. My favorite example is file path case insensitivity in Windows. Consider:

    if (File.GetExtension() == ".cs")

There's a bug here: if the file is named ".CS" then I want the software to work the same as ".cs". I can fix it locally, by switching to a case insensitive string comparison. And I diligently write a test for it. But then tomorrow I write another file extension check in another piece code, and I write another test. I may end up with a thousand expressions of this rule, and (if diligent) a thousand corresponding unit tests.

The rule I'm trying to test here is "File extensions are case-insensitive". I want to have exactly one test that describes and enforces that rule. Which means that rule must be expressed in exactly one place. That's DRY.

The correct response to "I'm testing this idea multiple times" is "extract the duplicated behavior from all the places it's used, and merge them to one place, and test that one place."

Note that test execution time is irrelevant here; you don't ever have to run your tests to get this value! However, responding to this design feedback leads to code that is factored in a way such that tests are naturally very fast (Irony Number Three!).

Readability is important: you have to be able to read the test to understand what requirement it's describing, to be able to detect the duplication.

Granularity is important: tests must each describe exactly one requirement, or the duplication won't be visible.

DRY reduces bugs, as it eliminates the risk of updating only 999 of the 1000 places a rule is expressed. DRY (along with Great Names / Coupling /Cohesion) is far more effective at eliminating bugs in shipped software than tests that are intended to catch bugs. (Irony Number Four)




Saturday, September 12, 2015

Why we test, Part 8: Because we are competent professionals

#15 in my list of reasons why we (unit) test, which I learned from James Shore:
Refactoring without tests is inherently unsafe, because of the risk of introducing bugs. As a professional, I would never take such risks. Therefore, I would only refactoring when I know I have good tests. In this way, TDD makes refactoring possible.
I may not be representing his idea with perfect fidelity; for that I apologize.

My comments:
  1. There is a class of programming languages* for which there exist reliable refactoring tools. With these tools I can safely refactor even without tests.
  2. The reliable tools work by following a recipe. If a human follows the same recipe carefully, they'll get the same result. That would work in strongly typed languages that lack good tooling.
  3. Plenty of people who make their careers as programmers ("professionals") do sloppy work, but not those who are competent.
  4. The tests have to be good. If you only write tests when it's easy, they won't give you enough protection. The only way I know to get this kind of test coverage is if you strictly follow the Three Rules of TDD.
  5. When naive** TDDers aim for 100% test coverage, they go to extreme lengths in their tests, including bad mocks and test cases that don't correspond to any business value. These common problems lock down implementation, which makes refactoring far more difficult; the opposite of Jim's goal.
* It's C# and Java

** most programmers

*** mocking is fantastic for Tell, Don't Ask, and problematic without TDA.

Sunday, August 23, 2015

My ideal backlog

Problem:

There are two ways people seem to want to use a backlog:

A) To sort by priority, so the next thing we do is the most important thing to do next.

B) To make sure we don't forget anything important.

In both cases, the cost and value get worse as the list grows. Good ideas that are 1/2-way down the list will get duplicated by mistake, but with different phrasing, so the duplication is not obvious. Sorting, de-duping, and understanding the items gets more expensive, but none of that effort actually creates any business value.

I see a lot of teams with backlogs that would take a year to work through, if no new ideas came along. And of course new ideas always come along, at least if you're working on anything that matters.

Since items come in to the backlog faster than they go out, the list steadily grows, and most ideas never leave the backlog. People start to believe that the backlog is where good ideas go to die.

Solution:

Keep the backlog short.

7 items seems ideal, because you can keep them all in your head long enough to understand the whole list.

When a new idea appears, compare it to the current backlog, and ask "is this item higher priority than any of the items currently on the list?" If not, then let it go. Don't worry about forgetting. Trust that if it becomes more important, it will grab your attention again, and can be added to the list at that time. More likely, you'll think of something even more awesome, and do that instead. That's a good thing: doing the more awesome things before the less awesome things.

Alternate Solution:

In many organizations, my proposal won't fly. People come to the team with requests, and would be upset if you said "It's not in our top 7, so we're letting it go."

In that case, keep two lists. The first list is the stuff you're going to do next (today/this sprint/whatever), and only has a few items on it. The second list is the bucket of possible future ideas, and can be any size. Spend as little time as possible grooming the second list.

When a new idea appears, compare it to the "To Do Next" list, and ask "is this higher priority than any of the items currently on the list?" If not, put it on the "Possible Future Ideas" list. Tell the requester that your idea is "on the backlog," and will be weighed against other items on the backlog when planning future releases. They'll understand that if you didn't do their idea, it's because something even better happened.

Sidebar: Hold prioritization very lightly.

We prioritize work by considering the estimated cost and value of that work. Both types of estimates are notoriously unreliable. You may believe you're working on the next most important thing, but you're probably wrong in some way that you can't know yet.

If you start working on an item, stay open to discovering that you should actually be doing something else. As Woody Zuill says:
This is another reason to slice work very thinly. The smaller the item, the sooner you can get to the point where you learn what you should really be doing, and the more likely it is that this current item will get completed and deliver some value before switching to your new discovery.



Sunday, July 19, 2015

GetRouteData() in ASP.NET WebApi

I've been trying to get System.Web.Http.HttpRouteCollection.GetRouteData() to work in ASP.NET WebApi recently, and had a hard time of it. In ASP.NET MVC it's really easy, but there are additional details I couldn't figure out in WebApi. There was even a detailed set of answers on StackOverflow, but when I tried them, they all failed in ways that didn't make sense to me.

And now I have seen it work, so I want to document it. Here's what I did:

  1. In VS 2013, New Project -> Web, ASP.NET Web Application
  2. Select WebAPI. Check "Add unit tests".
  3. Add the following unit test:

And here's a Git repository with the complete working solution.

(Thanks to this blog post for unblocking me.)

Thursday, May 14, 2015

The relationship between DRY and Coupling

I think that the DRY principle is a subset of* "Low Coupling".

DRY & Coupling:

If one rule is expressed in two places in your code (violating DRY), and you want to change the rule, you must edit both places. This is coupling.

byte[] OpenFile(string fileName)
{
    // Is it our file type?
    if (fileName.Extension == ".foo") ...

void AutoSaveFile(byte[] contents)
{
    path = Path.Combine(directory, DateTime.Now.ToString("dd_MM_yyyy") + ".foo");

If we decide to change our file extension to the much more reasonable ".bar", then we must edit both.

*possibly equivalent to

The Prime Refactoring

I used to believe that the two most important refactorings were Extract Method and Rename. The way they deliver value and the way they are used are quite different, so it's hard to compare, so I figured they had equal value.

Recently I've decided that Rename is slightly more urgent, if not more important. It is the first refactoring to learn; the first to teach; the first to apply. (Just slightly)

The problem is code that lies to you. It says it's doing one thing, but actually it's doing another. You either have to think really hard to figure that out (slow) or you misunderstand the code and write bugs.

Fix that first. It may lack cohesion, have tight coupling, and lots of duplication, but first introduce good names. Rename to make the code stop lying to you.

(Soon afterwards, start using Extract Method to give you more things to name.)

Monday, April 6, 2015

"good" names - a minbar

In code, naming things well is incredibly powerful. Names help with expressing intent, increasing cohesion, and identifying duplication.

Bad naming can do a lot of damage. Names that lie, mislead, or obfuscate will confuse a programmer, or at least make her work harder to get the job done.

I think a name is "good" when you don't have to examine what is behind the name to know what it does. It doesn't have to add additional value, it just has to avoid obfuscation. For example:

void AThenB()
{
    A();
    B();
}

If you see AThenB() in code, you'll know exactly what it does. Not a great name, but not a damaging name, either.

This is the minimum bar when naming a new entity in code. It's not a hard bar to meet. You can often do way better. But never check in any code that doesn't meet this bar.

JBrains calls it it "accurate names".

Arlo Belshee calls this "tweetable names":




Wednesday, March 18, 2015

The zeroth rule of software estimating

I realized that before even the first rule of software estimating must come:
Know why you are estimating.
We take it for granted that software estimating is something we must do. For many people, this is obvious. But when we start talking about why we estimate, I see many different answers. Perhaps it is not so obvious after all.

Some of the answers I have heard:

  1. To decide which work to do next.
  2. To decide how many items to start working on in an iteration.
  3. To decide how many people to hire.
  4. To sync up long-lead work (e.g. marketing).
  5. To evaluate and reward the performance of individuals.
  6. To evaluate and reward the performance of teams.
  7. To measure the impact of changes in process, tools, technical debt, etc.
  8. As a lever to push people to work harder.
It's common to choose more than one. This can produce really wacky results.

Whatever your reasons are, it's worth understanding them deeply. Is that something you really need? Is this approach really going to give you that result? Are there other ways that are more effective? 

Thursday, February 26, 2015

The second rule of software estimation

The more error there is in your estimates, the less precise you must be.

That's based on my past experience with being wrong a lot, and seeing other people be wrong a lot. If I tell you I can write a feature in a day, and sometimes I'm right, and sometimes it takes a month, then there's no reason to differentiate between 5-hour and 6-hour features when estimating.

I suspect that powers-of-n is a good model for many teams, where n depends on some combination of team familiarity with the code, technical debt, domain complexity, etc.

A statistician could certainly give some guidance here. Something about standard deviations.

A lot of teams like to use Fibbonaci numbers for their estimates, which seems weird to me. Why is this a good sequence? Why jump from 1 to 2 (a 100% increase) then to 3 (a 50% increase)? Can you really tell a 2 and a 3 apart, reliably enough to be useful?

In Fibonacci, the next number is "twice the average of the last two numbers", which is pretty close to "twice the last number". I doubt your estimates are reliable enough that the difference will matter. And powers of two are culturally familiar in software, easy to remember, and easy for programmers to add.

See also: the first rule.

Tuesday, February 24, 2015

The first rule of software estimating

Take a list of pieces of work you might do. Stories, features, products, I don't care. Find two that are the same size. Approximately.

Do them both. Measure how long they took. Did they come out the same?

If you can't reliably recognize two items as being the same size, then nothing else in estimation will work for you. It all builds on this.

How I write "contract tests"

This comes up in conversation often enough that I want to write it down..

Context:

My code talks to an external dependency that is awkward to use in unit tests.

I can refactor most of my code to eliminate the dependency. (See DEP and Whole Value). But I still have some code that talks to the external dependency. I wrap the dependency with an adapter (see Ports-and-Adapters) of significant thickness and abstraction (see Mimic Adapter). In test, I replace the real dependency with legitimate, but simplified test double (see Simulators). 

Problem:

I can't be certain that my simulator has fidelity with my real system. They may behave differently, allowing my tests to pass when my system has a bug. (This is a common problem with mocks.)

Solution:

Write one set of tests for the port, running the tests against both the real and simulated implementation.

In C#:


Tests on the simulator are fast enough to run with every build.

Tests on the real system may be slow; they may require awkward setup; they may cost real dollars to run. You may decide to run them only in your CI or once per sprint or whatever. Since adapters are relatively stable, that can be OK.



Tuesday, February 17, 2015

Bug metrics

Metrics are tricky. Plenty of ink has been spilled on that topic, so I'll leave it for now.

Around bugs, I know of 4 interesting metrics:
  • A: Count of active bugs
  • B: Time to fix
  • C: Fix rate
  • D: Injection rate
When I want to sound like I understand queuing theory, I call them Peak / Latency / Throughput / Load.

(I'm ignoring the disconnect between what we can measure and what is true. For example, bugs in the system that are impacting customers but are not currently tracked by the team. See http://jbazuzicode.blogspot.com/2014/11/measuring-bug-latency.html)

Customers only care about A and B.

Companies that I have worked at often give a lot of attention to A. For example, I've seen "Bug Hell", where any dev (or any team) with more than a certain number of active bugs must stop working on features until the bug count is lowered. 

In the orgs I'm familiar with, we tend to go immediately from A to C, with bad consequences. Focusing on C means devs will tend to choose narrower fixes; they'll allow tech. debt to accumulate; they'll forego testing; they'll fix cheap bugs before important bugs; they'll work when tired; they'll multitask. The inevitable bug bounce will be higher. This is all bad for customers; it's bad for business..

Getting B (latency) down is great, but it's not always directly actionable. You can prioritize bug fixes before feature work. You can strictly assign bugs back to the devs that created them, throttling the most prolific bug creators.

I see D (injection rate) as being a valuable thing to focus on (although it's difficult to measure). As you write fewer bugs, A and B will get better, which is good for customers. And C will become irrelevant.

Because A->C is such a deeply ingrained habit in our corporate culture, if you don't want that to happen, you have to actively exert effort to take things in a different direction. Every time someone says "we have N bugs", make sure they also say "remember to treat each bug as a learning experience - what can we do to make sure this kind of bug doesn't happen again?" and never say "we fixed M bugs this week."

(Thanks to Bill Hanlon for putting a lot of these ideas out there.)

using MS Fakes safely

MS Fakes can generate something called "Shims" which can override virtuals, and "Stubs" which can override anything, including statics and members of sealed classes.

If you decide to use them, I recommend using these rules:

Only generate the fakes you care about

Use Disable, Clear, and !.

<StubGeneration Disable="true"/>

<ShimGeneration>
  <Clear/>
    <Add FullName="Foo.Bar!" />

Enable diagnostics:

<Fakes xmlns="http://schemas.microsoft.com/fakes/2011/" Diagnostic="true">

Treat Fakes warnings as errors


Sadly, there's no easy way to do this. Edit:

C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v12.0\Fakes\Microsoft.QualityTools.Testing.Fakes.targets

Target BuildFakesAssemblies, the GenerateFakes task sets the FakesMessages property, which you always want to be blank, so add:

<Error Condition="@(FakesMessages) != '' Text="Error generating fakes" />

Saturday, February 14, 2015

Write your own unit test "framework"

If you haven't already done it, I recommend you try writing your own unit testing framework. Actually, do it several times, in several different ways.

The existing unit testing packages are sizable pieces of software, and I'm not recommending you spend weeks on this effort. Keep it simple. In fact, the bare minimum to get started with TDD is almost nothing:


Sure, there is value in automatic test discovery, in rich asserts, running all tests even when one fails, reporting, etc. But you don't have to have those things to get started. (Remember this next time you are away from WiFi and have a programming idea.)

Starting from this point, experiment with different ways to write a unit test framework. Some ideas to consider:
  • What's the #1 feature you miss the most in the above example?
  • A natural way to extend asserts in to your domain.
  • How easy is make the mistake of writing a test that never gets run?
  • If my tests are super-fast, how much overhead is there in test discovery and reporting?
  • Reporting that points directly to the site of the failure.
  • How much boilerplate does a developer have to write?
  • Test discovery: reflection ([Test]), inline functions (describe(()=>{})), or something else?
  • If you only supplied one built-in assert, would it be "Assert True", "Assert Equals", or something else? What are the implications?
  • Try both traditional asserts (AssertFoo(result...)) and fluent asserts (Assert.That(result).IsFoo(...)).
Let me know what you find.

Thursday, January 29, 2015

Exceptions are a primitive type

I hold that Exception is a primitive type, and so using one directly in your code is a common example of Primitive Obsession.

The antidote is to wrap the primitive in a Whole Value. It's a pretty straightforward transformation when your code looks like this:

- Make a new exception class, typically nested at the current scope.
- Name it based on the message text.
- Parameters to the message become parameters to the constructor and properties on the new class.
- Override the "Message" property to hold the string.Format() call.

Like this:
Like all good design moves, this helps testing.

Note that in the 2nd example, I'm separating "I expect this exception with these properties" from "the exception should be able to format itself like this". There's a nice separation of concerns.


Wednesday, January 14, 2015

Why I write horrible code. (And so can you!)

EDIT: I may have been too subtle.

Some readers think this is a list of excuses for writing bad code. It is not. Instead, I want to analyze the reasons I have written bad code in the past, so that I can look for ways to make future code better. I want to acknowledge my own limitations, so that I can find ways to compensate. I also believe that many programmers have similar challenges, and may be able to learn from this analysis. Furthermore, I hope that by hearing about my imperfections, you can become less afraid of sharing yours, and that can open up opportunities for your growth.



Today I overheard a friend say something like "Who would write code like this? How could they think it was a good idea?"

I've written a lot of bad code, which makes me a kind of reluctant expert on the topic. It's possible that I'm just worse than average, but I've seen some great programmers write bad code too.

Here are the reasons I can see:

  1. Expediency

This is the most common reason that programmers cite. As in "I could spend some additional time to make this code more beautiful, but we need this change right away." 

I agree that the value of our work is time-sensitive, so delivering it sooner is better. And I agree that we are not being paid for the beauty of our code, but only for the value delivered to customers.

However, encoded in that statement are certain beliefs, about the cost to make the code more beautiful, how much better the code could possibly be, the value offered by that better code, and the risks in getting there. I say "belief" because I think they could vary by programmer, project, technology, market, organization, etc. I'll try to cover these beliefs as I go.

  1. Good design is unfamiliar

While all programmers have suffered from poorly-designed code, well-designed code is all too rare. We may know what we hate about this code, but we have a hard time knowing what "great" would look like. My college professors talked about "low coupling and high cohesion", but that conversation was always in the abstract - I didn't know how to make sure my code actually had those attributes.

I've often thought I knew a great design for something, only to discover that I missed many important details. If I ever get my code to the point where I can use it, I have compromised the design so much that it's not the huge win I was hoping for. I believe most programmers have had similar experiences. This feeds back in to the belief that attempting to make code beautiful won't give much return.

  1. We don't know what we need yet

When I start on a programming task, I usually have a bunch of questions I can't correctly answer yet:
    • What does my customer really need from my program?

    • Will the feature I have in mind really meet that need?

    • What is the true behavior of the externals I intend to depend on? (Do they have the capabilities I need? How do I call these APIs correctly? Do they scale? Are they reliable? Any bugs that will sting me?)

    • What is a good design for my code, based on answers to the above?

    • What future work will be difficult because of design decisions I make now?
Whatever I write, I will soon discover that I was wrong about my answers to these questions, and my design is no longer well-suited to the new answers. If I worked hard on that design, that hard work is wasted. If I work hard to revise the design, I may discover tomorrow that my new answers are wrong, too, so the revised design is also waste. This means I should take shortcuts to get my work done and in use, so I can get that feedback sooner and more cheaply.

Of course, when I get finally get the feature right, customers will not be interested in paying me to go back and rewrite it for no reason.

I used to think this meant that instead of working on good designs, I should learn how to work in poorly-design code, getting great at analyzing it in the debugger and finding minimal fixes. Now I know how to refactor.

  1. We don't know how to refactor

One time you tried to clean up a mess in the code, and you broke something. Your boss yelled at you. Customers were unhappy. You had to work extra hours to fix things up. Now you're wiser, and when someone says "I want to refactor this", you say "only a little, and only if you have great tests, and only if there's plenty of time." Which means it seldom happens. So we don't get any practice refactoring.

But refactoring is key: if you don't know what good design looks like (in general or specific), then the only way to get a good design is to start with a bad one and refactor your way to good.

More generally, remember that it's up to you to invest in your own skills. Refactoring isn't inherently slow or risky, but learning refactoring and other skills takes time and temporarily reduces your performance. You can't count on your employer to cover that, but it still matters.

  1. Too-big steps

Suppose you decide to clean up that code mess, once and for all. Part-way though, you get in interrupted. Maybe the live site goes down and you have to fix it, and that eats up the rest of your day. And tomorrow you have to work on some important new feature. By the time you get back to the cleanup, much of your work is no longer valid.

The antidote is to work tiny and get done. Do the smallest cleanup you can, check it in, and get back to work. Don't aim for "good", just for "better". Make things a little better each day. See Two Minutes to Better Code.

  1. We don't know what we're missing

So you're a smart programmer. Fueled by caffeine and isolated by headphones, you can get your job done. The code you work in is a mess, but you're still delivering value. Sure, you wish the code was nicer, but how much difference would it really make? Is it really worth the investment?

If you're only accustomed to working in code that is a mess, you're in no position to make this judgement. I know that is hard to accept. Really well-designed code doesn't just make things better; it makes things different. Ways that just aren't visible from the old way of doing things. For example:
  • No need to track bugs in a database, because there are no bugs.
  • No need to keep a list of future work (product backlog), because you can just pivot as needed.
  • Easy to test everything with super-fast unit tests, because everything is appropriately decoupled.
  • Ship at will, because you can verify ready-to-ship in a matter of minutes.
  • Any complexity in the code indicates an opportunity to reduce essential complication, since there is no accidental complication. (See 7 minutes, 26 seconds for definitions)
If you've never seen this it sounds impossibly far-fetched. A pipe dream. So of course you wouldn't invest the effort required to get there. (You probably believe that most of your code system complexity is essential; you're wrong again. Sorry.)

  1. We incorrectly compare short-, medium-, and long-term impact

Code mess creates a drag on development. As development gets slower, pressure increases. You take a shortcut. The mess gets worse. A vicious cycle. Exponential growth of the mess. (See Nobody Ever Gets Credit for Fixing Problems that Never Happened.)

In the (very) short term, we can deliver value sooner by taking shortcuts.

In the medium term, we will deliver features more slowly. Less value to customers = bad business.

In the long term, the cost of new features is so great that you must throw things away and rewrite, which you should never do. This isn't "pie in the sky" thinking; this is "we want to stay in business for more than 5 years".

  1. We don't ask for help

Even when my programming is going really well, as soon as another person sees my work, they'll notice a problem that I missed. Each person can offer a different kind of insight in to the design. I can learn a lot from that.

So turn that dial up, from code reviews, to pair programming, to mobbing.

  1. The code is just too horrible

How fast you learn something is heavily dependent on how fast you can iterate.

If you don't know what great design looks like, and you're not already good at refactoring, and your code is really really horrible, and your build takes forever, and your tests are crap, then every step you take will go extremely slowly.

If this is your situation, you could practice your skills in side projects and code katas, or you could switch jobs. Develop those design and refactoring skills in a better environment, then come back to this legacy code when you're ready for that challenge.

Thursday, January 8, 2015

Saff Squeeze on recursive code with NCrunch

NCrunch makes all unit testing better, but there's something cool that happens when combining it with the Saff Squeeze, and something even cooler when the code under test is recursive.

In case you missed Kent Beck's Saff Squeeze:

The Saff Squeeze, as I call it, works by taking a failing test and progressively inlining parts of it until you can't inline further without losing sight of the defect. Here's the cycle:
    1. Inline a non-working method in the test.
    2. Place a (failing) assertion earlier in the test than the existing assertions.
    3. Prune away parts of the test that are no longer relevant.
    4. Repeat.
(I add Step 0; make a copy of the failing test.)

NCrunch helps because you can quickly see how far in the test you're getting until an assert fails. If the code under test is recursive, then:
Repeatedly incline the recursive call until NCrunch's code coverage dots show uncovered code.
Now your test does its job without any recursion, and you can continue to apply the Saff Squeeze as normal.

Wednesday, January 7, 2015

Why we Test, part 7, The Dead Horse

I've seen a wide range of practices that the practitioner claimed was TDD. (Arlo Belshee identifies 7). Obviously, outcomes vary.

People claimed that TDD was or was not effective in some way based on those results. To make it worse, I see wide variation in the stated purpose of TDD. If we don't see the same purpose, then we aren't measuring effectiveness the same way. For example:

  1. ensure correctness of new code
  2. prevent regressions due to future work
  3. point me directly at my mistake
  4. be fast enough to run often
  5. a safety net during refactoring
  6. the only way to be sure my tests are comprehensive
  7. to stop me from writing code I don't need
  8. make code coupling obvious
  9. make DRY problems visible
  10. support cohesion
  11. create the context for entering a Flow state
  12. regular rewards as I make progress
  13. confidence (possibly false!) that my program will work
  14. explain to another human what my code is intended to do

(I sometimes group these into Bugs, Design Feedback, Psychological Benefits, and Specification.)

If you start by practicing TDD a certain way, and see it succeed at one of the above, you'll be tempted to argue that is the "true purpose" of TDD.

If you start with a belief about the true purpose of TDD, and select a practice that doesn't do that, you'll think TDD doesn't work. (See We Tried Baseball...)

I say all this because I hope people will shift to an "all of the above" mindset, and adjust their understanding and practice to make that happen.

Why we Test, part 6: BDD vs. TDD

I've seen BDD advocates say "BDD is just TDD done right." (e.g. here and here) They seem to be saying "It's important to write your unit tests at the appropriate level of abstraction, using language from the problem domain, phrased for a human reader. Domain experts (e.g. users, business analysts) should be able to read, and perhaps write the tests."

More recently, I've seen "TDD is just BDD done right." (e.g. here and here) These people seem to be saying "It's important to use your unit tests to drive the design of your code. BDD is missing that important action." I think they're noticing that BDD doesn't include a Red-Green-Refactor cycle.

I think they're both right. Striving to write tests for humans provides the best guidance for refactoring, carrying the Ubiquitous Language deep in to the system and improving DRY and Cohesion in the system.

Tuesday, January 6, 2015

Why we Test: part 5: Two reasons for regression testing

In Part 1, I wrote: "I count on tests to catch mistakes before our customers do" and "Having tests means I can refactor safely".

In both cases, I want tests to catch my mistakes, but I now realize I should consider these separately.

Regression

In the first case I'm relying on the tests to confirm that I have written my code correctly, or that future functionality changes don't break previous functionality changes. I have written bugs plenty often, and I'm looking to the tests to tell me about. I'm definitely going to keep writing new features and fixing old bugs and shipping software.

When we decide to change old functionality, we'll want to change the old tests. So they should be malleable to provide their value. They should be readable and granular, so when they fail I can decide whether to change the product or change the test.

Pinning

In the second case, my decision about whether to refactor is heavily influenced by whether I have those tests. In a legacy (i.e. tightly coupled) system without good tests, most people will just leave things as-is instead of refactoring.

If I'm just looking for a safety net while I refactor, I can use Pinning Tests. They tests don't need to be malleable, since the product behavior is not changing. If they are fast, they don't need to be granular, since I can run them really often. They do need to be very reliable. They need to cover as many cases as I can manage, but only in the sections of code I'm touching. It's OK if the tests are ugly, if I'm just going to delete them at the end of my refactoring session (when my decoupled code is now amenable to unit testing.)

(In this context, when I say "refactor", I don't mean "a highly disciplined process of changing code without changing behavior, according to a recipe", or "using a high-fidelity automated tool that will safely change code without changing behavior". You could say I mean "tiny rewrites.")

Sunday, January 4, 2015

Simplest possible Git workflow

I'm working with a group that is getting ready to transition to Git, from a traditional centralized version control system.

Some thing I learned while working to revitalize endangered languages is that the first lesson should get students working in the new system, for real, as soon as possible. Applying that to Git, I want to ask "what's the minimum to get you started using Git in a real way without creating a mess that is difficult to clean up later."

A friend complained to me that every time he asks a Git expert a question about Git, the response starts with, "Well, first you have to understand how Git works". I want to offer a workflow that does not require understanding how Git works.

In Scott Chacon's "Introduction to Git" video, he says if you're comfortable with a version control system that is not Git, you're going to hate Git. How can we get around that?

Can I create a microverse where Git is simple and easy to understand, and yet still comprehensive and self-consistent?

"in a real way" for us means "a team of people with a central 'official' repository, that anyone can push to", so I can't ignore remotes/pulling/pushing for now, but I can ignore pull requests. A single person working alone on a single machine can simplify even further than what I describe here.

Prerequisites

I assume that an expert is available to set things up and teach these basics. I advise the expert to avoid talking about any additional details of Git, no matter how juicy.

Whether you choose rebase or merge (linear or non-linear history in master) is up to your expert. If you want to use rebase later, you should use it now, to avoid "creating a mess that is difficult to clean up later", at the cost of expanding "minimum to get you started". Personally, I like rebase.

In our case, everyone sets up their development environments in the same way, and we're using Windows. We push these settings to every machine:

    git.exe config push.default simple
    git.exe config pull.rebase true
    git.exe config core.autocrlf true
    git.exe config core.safecrlf true
    git.exe config rebase.autosquash true
    git.exe config core.editor '"%ProgramFiles%\Windows NT\Accessories\wordpad.exe"'
    git.exe config merge.conflictstyle diff3


and we assert that git config user.name and git config user.email are set.

The expert should create the central repository and instruct everyone on cloning it and help maintain the .gitignore.

Simplest Development Workflow

We can treat Git like an old-fashioned centralized system with a single branch. Let everyone work in master. (Branches are awesome, but understanding them is more than the newbie is ready for.)

You only need these Git commands:

> git pull
When you want to update your machine with the latest from the central repository.

> git add FILENAME
When you create a new file

> git status
> git diff
To see what changes you have pending (ignore the difference between staged and unstaged changes, but watch out for unstaged adds)

> git commit -a
> git pull
> git push
When you like your changes and want to share them with the world

> git reset --hard
> git clean -fd
When you don't like your changes

> git log
To see what has been done

The biggest risk I see here is if there's a merge conflict when you pull before pushing. Stand by to help people through that the first time.

Release Workflow

Release from master. If your team needs time to stabilize master before you can release, make everyone stop what they're doing and focus on completing the release. When you are done, add a tag, then let everyone get back to work.

What's next?

As needs arise, you can build on this model. A dev can start making multiple commits before pushing, or work in a feature branch and merge it, without anyone else needing to learn something new. So you can grow incrementally.

You'll probably want to use branches for releases pretty soon.

At some point, you'll need to have a big conversation about the underlying model of Git, and what rebasing means, etc. Put that off as long as you can, and then go deep.

I find gitk helps people visualize what is happening as things get more interesting.

Tuesday, December 30, 2014

Why We Test, part 4: Specification

Warning: this post doesn't feel great to me. A bit disorganized and ill-balanced. But I needed to get it out for completeness. Feedback welcome, as always!

Llewellyn Falco (here) and Arlo Belshee (here) both talk about the way that tests can provide the value of "specification": that tests can explain to another human what the program is supposed to do.

How does this compare to the values of catching bugs, informing design, and psychological reward? There's a subtle way in which focusing your energy on specification is hugely important.

What does it mean for to be a good spec?
  • Name of a test is business value oriented.
  • Expresses a single example of a business rule.
  • Uses terminology from the problem domain.
  • Meant to be read by a human. (programmer, not customer)
  • At the appropriate level of abstraction for a human reader.
  • Test doesn't make any non-business-value demands
This gets you the design feedback you need. You can only meet the goal of "test as spec" if you listen to the design feedback. You can't have a bunch of setup code (including mocks); that would distract from the core message of the test. The correct terminology gets pushed in to the system under test. These short, simple, straightforward tests are only possible when code is decoupled, and when each business rule is expressed in exactly one place (DRY).

Simple, decoupled tests are inherently fast.

When they fail, they tell you clearly why the failure matters.

It also gets you the comprehensive safety net: if you're focusing your attention on writing and satisfying the spec, all your code will have purpose and will will be covered by tests.

Because the test only makes demands for for business value, you are free to refactor without unnecessarily breaking tests.

This value appears when you focus on how tests are read, but also delivers value when tests are run and written.


Monday, December 29, 2014

Why We Test, part 3: Psychology

Warning: I think this is a bit of a crappy blog post. I needed to get the ideas out there for completeness, but I haven't thought through this part thoroughly.

In "TDD as the crack cocaine of software", Jef Claes talks about the way that TDD (with really fast tests) can create the preconditions for Flow. Flow is emotionally rewarding, so that becomes its own reason to write tests this way, in addition to the desirable outcomes of catching bugs and informing design.

Other psychological (not technical) reasons to do TDD or other types of testing:

  • Confidence. 
Knowing that I have tests makes me feel safe that I can make changes and ship working software. 

Note that this confidence may be false! For example, if I base that confidence on reported code coverage, even though code coverage often does not correlate with quality. (Even worse, if I ignore other quality-ensuring activities because I focus my attention on code coverage, my quality will suffer while my confidence increases.) It's very tempting to celebrate coverage numbers. Don't do it.

Interestingly, following The Three Rules of TDD will tend to result in very high coverage and good quality.
  • Incrementalism
TDD with a tiny Red/Green/Refactor cycle helps you take small steps. You get the feeling of making progress all that time. You're always a minute or so away from a reverting to a green bar.
  • Tracking progress/status
If I get interrupted, I can look at my most recent unit tests to remind myself what I was doing. I can write my next failing test as a note to my future self about what I wanted to do next. (While I'm away, I'll probably change my mind, but the note is still valuable.)

Also, if I commit each passing test, another programmer can read the history to see the path I took. But I'm getting off topic for this post.

Maybe you can think of more examples, or a better way to organize these ideas - let me know!