Thursday, February 26, 2015

The second rule of software estimation

The more error there is in your estimates, the less precise you must be.

That's based on my past experience with being wrong a lot, and seeing other people be wrong a lot. If I tell you I can write a feature in a day, and sometimes I'm right, and sometimes it takes a month, then there's no reason to differentiate between 5-hour and 6-hour features when estimating.

I suspect that powers-of-n is a good model for many teams, where n depends on some combination of team familiarity with the code, technical debt, domain complexity, etc.

A statistician could certainly give some guidance here. Something about standard deviations.

A lot of teams like to use Fibbonaci numbers for their estimates, which seems weird to me. Why is this a good sequence? Why jump from 1 to 2 (a 100% increase) then to 3 (a 50% increase)? Can you really tell a 2 and a 3 apart, reliably enough to be useful?

In Fibonacci, the next number is "twice the average of the last two numbers", which is pretty close to "twice the last number". I doubt your estimates are reliable enough that the difference will matter. And powers of two are culturally familiar in software, easy to remember, and easy for programmers to add.

See also: the first rule.

Tuesday, February 24, 2015

The first rule of software estimating

Take a list of pieces of work you might do. Stories, features, products, I don't care. Find two that are the same size. Approximately.

Do them both. Measure how long they took. Did they come out the same?

If you can't reliably recognize two items as being the same size, then nothing else in estimation will work for you. It all builds on this.

How I write "contract tests"

This comes up in conversation often enough that I want to write it down..

Context:

My code talks to an external dependency that is awkward to use in unit tests.

I can refactor most of my code to eliminate the dependency. (See DEP and Whole Value). But I still have some code that talks to the external dependency. I wrap the dependency with an adapter (see Ports-and-Adapters) of significant thickness and abstraction (see Mimic Adapter). In test, I replace the real dependency with legitimate, but simplified test double (see Simulators). 

Problem:

I can't be certain that my simulator has fidelity with my real system. They may behave differently, allowing my tests to pass when my system has a bug. (This is a common problem with mocks.)

Solution:

Write one set of tests for the port, running the tests against both the real and simulated implementation.

In C#:


Tests on the simulator are fast enough to run with every build.

Tests on the real system may be slow; they may require awkward setup; they may cost real dollars to run. You may decide to run them only in your CI or once per sprint or whatever. Since adapters are relatively stable, that can be OK.



Tuesday, February 17, 2015

Bug metrics

Metrics are tricky. Plenty of ink has been spilled on that topic, so I'll leave it for now.

Around bugs, I know of 4 interesting metrics:
  • A: Count of active bugs
  • B: Time to fix
  • C: Fix rate
  • D: Injection rate
When I want to sound like I understand queuing theory, I call them Peak / Latency / Throughput / Load.

(I'm ignoring the disconnect between what we can measure and what is true. For example, bugs in the system that are impacting customers but are not currently tracked by the team. See http://jbazuzicode.blogspot.com/2014/11/measuring-bug-latency.html)

Customers only care about A and B.

Companies that I have worked at often give a lot of attention to A. For example, I've seen "Bug Hell", where any dev (or any team) with more than a certain number of active bugs must stop working on features until the bug count is lowered. 

In the orgs I'm familiar with, we tend to go immediately from A to C, with bad consequences. Focusing on C means devs will tend to choose narrower fixes; they'll allow tech. debt to accumulate; they'll forego testing; they'll fix cheap bugs before important bugs; they'll work when tired; they'll multitask. The inevitable bug bounce will be higher. This is all bad for customers; it's bad for business..

Getting B (latency) down is great, but it's not always directly actionable. You can prioritize bug fixes before feature work. You can strictly assign bugs back to the devs that created them, throttling the most prolific bug creators.

I see D (injection rate) as being a valuable thing to focus on (although it's difficult to measure). As you write fewer bugs, A and B will get better, which is good for customers. And C will become irrelevant.

Because A->C is such a deeply ingrained habit in our corporate culture, if you don't want that to happen, you have to actively exert effort to take things in a different direction. Every time someone says "we have N bugs", make sure they also say "remember to treat each bug as a learning experience - what can we do to make sure this kind of bug doesn't happen again?" and never say "we fixed M bugs this week."

(Thanks to Bill Hanlon for putting a lot of these ideas out there.)

using MS Fakes safely

MS Fakes can generate something called "Shims" which can override virtuals, and "Stubs" which can override anything, including statics and members of sealed classes.

If you decide to use them, I recommend using these rules:

Only generate the fakes you care about

Use Disable, Clear, and !.

<StubGeneration Disable="true"/>

<ShimGeneration>
  <Clear/>
    <Add FullName="Foo.Bar!" />

Enable diagnostics:

<Fakes xmlns="http://schemas.microsoft.com/fakes/2011/" Diagnostic="true">

Treat Fakes warnings as errors


Sadly, there's no easy way to do this. Edit:

C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v12.0\Fakes\Microsoft.QualityTools.Testing.Fakes.targets

Target BuildFakesAssemblies, the GenerateFakes task sets the FakesMessages property, which you always want to be blank, so add:

<Error Condition="@(FakesMessages) != '' Text="Error generating fakes" />

Saturday, February 14, 2015

Write your own unit test "framework"

If you haven't already done it, I recommend you try writing your own unit testing framework. Actually, do it several times, in several different ways.

The existing unit testing packages are sizable pieces of software, and I'm not recommending you spend weeks on this effort. Keep it simple. In fact, the bare minimum to get started with TDD is almost nothing:


Sure, there is value in automatic test discovery, in rich asserts, running all tests even when one fails, reporting, etc. But you don't have to have those things to get started. (Remember this next time you are away from WiFi and have a programming idea.)

Starting from this point, experiment with different ways to write a unit test framework. Some ideas to consider:
  • What's the #1 feature you miss the most in the above example?
  • A natural way to extend asserts in to your domain.
  • How easy is make the mistake of writing a test that never gets run?
  • If my tests are super-fast, how much overhead is there in test discovery and reporting?
  • Reporting that points directly to the site of the failure.
  • How much boilerplate does a developer have to write?
  • Test discovery: reflection ([Test]), inline functions (describe(()=>{})), or something else?
  • If you only supplied one built-in assert, would it be "Assert True", "Assert Equals", or something else? What are the implications?
  • Try both traditional asserts (AssertFoo(result...)) and fluent asserts (Assert.That(result).IsFoo(...)).
Let me know what you find.