This week we ran a #mobprogramming session with 35 people. Here are some notes about how that went:
READ MORE
Jay Bazuzi's coding blog
Sunday, November 19, 2017
Port And Transport And Port
Port And Transport And Port
I use Ports-and-Adapters to abstract away my application’s interactions with external systems. I bend the dependency’s interface to the shape that I want for my domain. This makes it easier to think about my code and to unit test it.
New Blog
I'm experimenting with a new blog. I hope it will be easier to right better technical posts if I can use Markdown.
Check it out: http://jay.bazuzi.com/
Check it out: http://jay.bazuzi.com/
Sunday, October 1, 2017
Fast tests for integration points
Ports-and-Adapters is a good design approach for separating business logic from external dependencies, aka Mine vs. Thine.
Like all good designs, Ports-and-Adapters makes things more testable. Everything is tested in a tight edit/build/test cycle except for the "real" adapters. The "real" adapters don't change very much, so we test them on a slower cadence.
"Don't change very much" isn't very reassuring though. I don't think about the real adapters much, but I at least want something to tell me that my real adapters aren't changing. If they need to change, grab my attention so I can run the focused integration tests.
Arlo Belshee suggested record/reply tests. Here's an example, in C# with HTTP:
While developing the adapter we run "focused integration tests", testing the adapter it against the real dependency. For each test we record the HTTP requests and responses.
Since these tests are slow/flaky/expensive we don't run them in the edit/build/test cycle, but only when actively working on the adapter.
A mock encodes what we know about the thing we're mocking. We write our understanding in code. If our understanding doesn't match the real service, our tests can pass but the system is will fail in production.
New requirements mean extending the mock. As the mock grows, it needs good design to keep from becoming unmaintainable. This recorder is cheap to extend: write a new test, run it, save the results.
Like all good designs, Ports-and-Adapters makes things more testable. Everything is tested in a tight edit/build/test cycle except for the "real" adapters. The "real" adapters don't change very much, so we test them on a slower cadence.
"Don't change very much" isn't very reassuring though. I don't think about the real adapters much, but I at least want something to tell me that my real adapters aren't changing. If they need to change, grab my attention so I can run the focused integration tests.
Arlo Belshee suggested record/reply tests. Here's an example, in C# with HTTP:
Record-and-passthrough integration testing
Test->HttpClient: HTTP Request HttpClient->Pass-through Recorder: HTTP Request note right of HttpClient: Record the request Pass-through Recorder->Some service: HTTP Request Some service->Pass-through Recorder: HTTP Response note right of HttpClient: Record the response Pass-through Recorder->HttpClient: HTTP Response HttpClient->Test: HTTP Request
While developing the adapter we run "focused integration tests", testing the adapter it against the real dependency. For each test we record the HTTP requests and responses.
Since these tests are slow/flaky/expensive we don't run them in the edit/build/test cycle, but only when actively working on the adapter.
Verify-and-replay isolated testing
Test->HttpClient: HTTP Request HttpClient->Player/Verifier: HTTP Request note right of Player/Verifier: Verify the request note right of Player/Verifier: Return the recorded the response Player/Verifier->HttpClient: HTTP Response HttpClient->Test: HTTP Request
While doing development on the rest of the system, or while refactoring in the real adapter, we run the real adapter against the recorded messages. This test tells us that the adapter's behavior hasn't changed (in any way that we test for), without the speed/reliability/cost of talking to the real service.
#NoMocks
This is not a mock. How so? And why not use a mock?A mock encodes what we know about the thing we're mocking. We write our understanding in code. If our understanding doesn't match the real service, our tests can pass but the system is will fail in production.
New requirements mean extending the mock. As the mock grows, it needs good design to keep from becoming unmaintainable. This recorder is cheap to extend: write a new test, run it, save the results.
Code
Loading Gist eb5b5fc548d253155ad113ba60898ba8....
Friday, August 18, 2017
Refactor a lot, but only when it's appropriate
Don't just refactor for fun. Refactor in service of delivering business value. If there's some terrible code that you never need to touch, then there's no reason to change it. Leave it terrible.
So, when are the right times to refactor?
So, when are the right times to refactor?
- When you're changing code. Refactor to make it well-designed for its new purpose.
- When you're reading code. Every time you gain some understanding, refactor to record that understanding. Lots of renames.
- When you're afraid of code. If there's code you should be changing or reading, but you avoid because it's such a mess, then you should definitely refactor it.
Note that this refactoring is a small improvement each time, not a dramatic major rewrite. The goal is Better, not Good.
Wednesday, August 16, 2017
1* Agile is not nothing
Sometimes when people learn about the Agile Fluency Model they think of the first zone as "not very good". We should of course be aiming for one of the higher zones, right?
Maybe. You have to figure out which is the best zone for your context. Focus on Value is awesome in itself. As describe by Arlo Belshee:
Maybe. You have to figure out which is the best zone for your context. Focus on Value is awesome in itself. As describe by Arlo Belshee:
- Plan and work in units of value, not in technical components.
- Deliver their work by unit of value, even when that cross-cuts with the technical design.
- Regularly focus on the 20% of the work with the highest value and drop the rest.
- Regularly inspect and adapt. Own their own people, practices, platform, and data, and improve them constantly.
It's also not easy. Many teams are not there and would need a lot of work to get there. Don't take it for granted.
Wednesday, August 2, 2017
You really should get rid of those strings.
If you're looking for something to improve in your code, the #1 thing is not in this blog post. It's Naming is a Process.
The #2 might be eliminating the Primitive Obsession of string parameters to functions. The short recipe is:
1. Pick a string parameter to some method.
2. Write a new trivial value type* that contains that string.
3. Fix the method.
4. Fix all callers.
5. Let the type propogate.
You can often figure out the name of the new type from the name of the parameter:
string attributeName; // <-- suggests an `AttributeName` type
string previousZipCode; // <-- suggests a `ZipCode` type
Why?
- It gives you a place to hang behaviors, eventually growing into a Whole Value.
- Eliminates bugs where you pass a CustomerId where only an OrderId is allowed (assuming strong types)
This isn't a strict rule. Don't go out and do this to every string parameter you have right now. Do it a bit at a time, in code you already need to change or understand, when you're pretty sure what the new class should be.
*Value Type in the DDD sense and not in the C# sense
The #2 might be eliminating the Primitive Obsession of string parameters to functions. The short recipe is:
1. Pick a string parameter to some method.
2. Write a new trivial value type* that contains that string.
3. Fix the method.
4. Fix all callers.
5. Let the type propogate.
You can often figure out the name of the new type from the name of the parameter:
string attributeName; // <-- suggests an `AttributeName` type
string previousZipCode; // <-- suggests a `ZipCode` type
Why?
- It gives you a place to hang behaviors, eventually growing into a Whole Value.
- Eliminates bugs where you pass a CustomerId where only an OrderId is allowed (assuming strong types)
This isn't a strict rule. Don't go out and do this to every string parameter you have right now. Do it a bit at a time, in code you already need to change or understand, when you're pretty sure what the new class should be.
*Value Type in the DDD sense and not in the C# sense
Saturday, July 29, 2017
Prevent infinitely many bugs with this one simple trick
Here's a way to quickly find a bug in any legacy code:
Step 1: Find a case-insensitive string comparison.
Step 1: Find a case-insensitive string comparison.
boost::remove_erase_if(attributeNameAndValuePairs, [](const auto& nameAndValue) {
return boost::iequals(nameAndValue.name, "foo"); // <-- here
});
return boost::iequals(nameAndValue.name, "foo"); // <-- here
});
Step 2: Trace the use of those strings to find somewhere they are compared with the default comparison. Bug!
if (attributeName == "bar") // <-- bug!
Sometimes it's less obvious:
return std::find(allowedAttributes.cbegin(), allowedAttributes.cend(), attribute.name) != allowedAttributes.cend(); <-- hard to see bug!
I've written this kind of bug many times, and so has everyone else, judging by how often I see it. Recently I wrote it again, and decided to dig a little deeper instead of just fixing this one new case.
Ways to see the problem
Is this a testing problem?
Should we have caught this bug sooner by testing more carefully?
I could write a unit test, asserting that this function handles casing correctly, for each of the dozen or so places that could get this wrong. But if I can remember to write that test, then I can remember to write the code the correct way. Both my code and my unit tests are limited by imagination. How do I catch all the cases?
Getting a second person to do the testing can help: they'll see things I don't see, and that is fantastic. But still, testing is only as good as the imagination of the tester. Also, while that may help us catch the bug before we ship, I want to catch the bug before I check in.
Test Feedback
The real power of Test-Driven Development is not in running the tests to catch defects, but the feedback the tests give you about the design of your code (if you can hear that feedback). What are tests telling us in this case?
With the current design, to eliminate this kind of bug means finding all the places we deal with attribute names, and writing a test. If there's 1000 places, I need 1000 tests. This is a test's way of saying "your business rule is duplicated all over the place". We can look at this as a DRY problem, and can improve the design by refactoring to eliminate that duplication.
Primitive Obsession
We could also look at this as a Primitive Obsession problem: dealing with strings instead of a type that represents a concept from the business domain. Here, the concept is "Attribute Name". The Whole Value pattern says "make and use new objects that represent the meaningful quantities of your business".
Test-as-Spec
Yet another way to look at this problem is from the point of view as "test-as-spec". I want my unit tests to make sense to a person familiar with the problem domain.
The test I want to write is "Attribute names are case-insensitive". With the original design, there's no way to do that, both because the rule is written in many places in the code, and because the rule is embedded in other functions which implement some other business rule.
In order to get test-as-spec, the rule will need a single, canonical home, decoupled from the rest of the system.
Solution
Based on any of the above points of view, we can create a type that represents an attribute name:
struct AttributeName
{
std::string Value;
}
And extract the equality check:
bool operator==(const AttributeName& lhs, const AttributeName& rhs)
{
return boost::iequals(lhs.Value, rhs.Value);
}
And now we can write a single test, named after the business rule that it describes (test-as-spec):
void AttributeNamesAreCaseInsensitive()
{
CPPUNIT_ASSERT(AttributeName{"foo"} == AttributeName{"FOO"});
}
Then we replace any instance of std::string attributeName with AttributeName attributeName and let the compiler tell us what to fix next. Keep chasing compiler errors until they are all gone.
Is this code perfect?
No, but it's better: all the places that compared attribute names before are now slightly simpler / easier to read / easier to write / easier to test.
It's not impossible to write a case-sensitivity bug for attribute names, it's now much harder to do the wrong thing, and much easier to do the right thing.
In the process of carrying out this refactoring I found another instance of this kind of attribute name case-sensitivity bug, which testing had not yet caught. Double win!
What do you think?
Saturday, May 6, 2017
Releases per bug
Traditional teams count "# of active bugs" and "# of bugs fixed per week" and the like. These drive the wrong behaviors, rewarding create/find/fix over eliminating the underlying causes of bugs.
As the bug injection rate approaches zero, you can shift how you work and how you think about bugs. A few metrics that I like for BugsZero teams:
- # of releases since the last bug
- # of user stories completed since the last bug
- # of user stories shipped since the last bug
Slightly better every day
If you have legacy code, you should strive to make things slightly better every day. (Sometimes referred to as the scout rule).
There's a lot of nuance in here that is important to understand.
Things
Only improve what matters for the work you're doing today.
- Reading. If you need to read some code, record your understanding of that code by renaming something or extracting a method.
- Writing. If you need to change some code, refactor it to make the change you're doing a little easier and safer.
- Not changing. If the right place for this change is in module A, but you know better than to touch A because it's so incredibly terrible, so instead you hack the change in to module B: go refactor a little bit in A so that maybe in the future you can put the next change where it belongs.
- Tools. Anything that affects your ability to get work done is potentially in scope. For example, if you find yourself waiting for the build a lot, then do something to optimize the build.
Slightly
You find the code you need to work in. It's a mess. You know how you would like it to work. You've been itching to rewrite it for a while, and then it would be pretty good and not suck.
Don't.
Making it all the way good would take too long. You need to get your work done. And there are other parts of the system that need attention, too - focusing on this could would mean leaving that other code in a bad state.
Make it only slightly better now. Trust that if you need to touch it again tomorrow, that's when you'll make it slightly better again.
Better
Extracting a method can make code better. But the feature or bug fix you're putting in will make things worse. To actually leave the code better than you found it, you must make more improvement than you degrade it.
Code quality is difficult to measure, but we can measure things like cyclomatic complexity and lines of code and build/test/deploy duration, and I do mean that these should improve over time, even as the system gains more capabilities and delivers more business value.
Every day
There are exceptions. Maybe you need to get this bug fix done in a hurry, and can't see a quick, safe way to improve the code. But most of the time, things are better at the end of the day than at the beginning. And there will be days when things are on fire; but most of the time, things are better at the end of the week than at the beginning.
Friday, March 10, 2017
Three kinds of code
I propose a refactoring "Extract Open Source Project".
We build software systems to some purpose. But when I read code, I see that some of that code directly serves that purpose while other code does not. I see three categories:
In an e-commerce system, that's code that says "when a customer uses a discount code, the discount is applied to the order."
We build software systems to some purpose. But when I read code, I see that some of that code directly serves that purpose while other code does not. I see three categories:
Features
This is the stuff you and your customers care about. It's the reason your software system exists.In an e-commerce system, that's code that says "when a customer uses a discount code, the discount is applied to the order."
If you learn about code smells, great names, and duplication, and then refactor with those in mind, you'll find that some code is explicitly the feature and some that is not. That leads to:
Utilities
Code that helps you write code, but has nothing to do with the problem domain you're working in.
It's often write in the middle of the rest of your code, but as you refactor to improve readability and reduce duplication, it can become visible. For example, consider this refactoring sketch:
If you refactor mercilessly, you'll end up with a lot of this stuff. It's not part of the value you offer, and it would be useful to the programming community. Factor it out to be an open source project and share with the world.
Some examples of this are Boost in C++ and Rails in Ruby.
In that e-commerce system, it might be class like "Money".
While this is not the value you are offering, it is key to offering that value. You get to decide whether to release it as open source (so other people can build more systems in the same domain), or keep it under wraps (so your competition has to build their own).
It's often write in the middle of the rest of your code, but as you refactor to improve readability and reduce duplication, it can become visible. For example, consider this refactoring sketch:
Loading Gist ae5bd627269cb00ceaa9f9c3d3294dea...
If you refactor mercilessly, you'll end up with a lot of this stuff. It's not part of the value you offer, and it would be useful to the programming community. Factor it out to be an open source project and share with the world.
Some examples of this are Boost in C++ and Rails in Ruby.
Domain Libraries
You'll also have some code that is specific to your domain, but is not the feature you're creating. This is code that lets you describe your feature. A library for building features in this domain. Maybe a DSL.In that e-commerce system, it might be class like "Money".
While this is not the value you are offering, it is key to offering that value. You get to decide whether to release it as open source (so other people can build more systems in the same domain), or keep it under wraps (so your competition has to build their own).
Monday, February 20, 2017
Test-as-spec and assertion syntax
I like to say that tests should, first and foremost, be a human-readable spec. Let's look at what that can mean for how we write assertions.
Suppose we're writing a card game, and we want to assert that a deck of cards is sorted the way you'd find them when you first open the box. (I'm using this simple example as a proxy for the kinds of more complex problems that we see in legacy code. It's up to you to map these ideas to that context.)
An approach I see in a lot of code is to iterate over the cards to assert. Perhaps something like:
Loading Gist aa1a17188812f640eba4d94495de892e...
This kind of code makes it obvious that an AssertEquals would be valuable, so that on failure you can see the expected and actual values in the test results.
If this test fails, you only know about one incorrect card. If there are more, you won't know until you fix the current error and rerun the test.
A richer assertion library might offer AssertSorted. It could even take a set of 1 or more sort key selectors. The result might look like:
Loading Gist aa1a17188812f640eba4d94495de892e...
(That's C++ lambda syntax, if you haven't seen it before).
Both of these approaches are "computer science" solutions - they work in the solution domain, and use the language of computer code. If I want my test to be a human readable spec, I need to use the language of the problem domain. I could take a step in that direction by extracting a method, giving:
Loading Gist aa1a17188812f640eba4d94495de892e...
But we're also doing TDD. In TDD, we want the tests to give us feedback about the design of the code. And this test is saying "the notion of being sorted that is missing from the code under test". Taking an intuitive leap, the class that should hold that notion is a "deck of cards", which is also missing from the code under test. That leads to:
Loading Gist aa1a17188812f640eba4d94495de892e...
I like the improvements to the design of the code and the way the test reads, but I am sad to lose the ability to provide a detailed report when this assertion fails. I'm not sure how I would fix that, or if it would ever actually matter.
It's interesting to me that we're back to the
bool
-only assertion from the first example.Saturday, February 18, 2017
Micro-ATDD
I strive to make all my tests be both microtests and acceptance tests, an idea I learned from Arlo Belshee.
When I say this to people, they are usually confused first, then doubtful when I explain. I don't think I'm ready to address the doubt, but maybe I can address the confusion today.
When I say this to people, they are usually confused first, then doubtful when I explain. I don't think I'm ready to address the doubt, but maybe I can address the confusion today.
Microtests
Coined by GeePaw Hill (see his article for his original definition), a microtest is like a unit test, but it has all the qualities I wish all unit tests had. It's fast and focused. It answers the question "does this little piece of code do what I intend it to do?"
Because microtests talk directly to the system under test, they are written in terms of the SUT.
It's obvious that a microtest can only be used on parts of the code that are simple and decoupled and isolated. An integration test is never a microtest.
Acceptance Test
An acceptance test describes expected software behaviors from the point of view of a user or other stakeholder. It answers the question "does this system meet the requirements I expect of to meet?"
Because acceptance tests are written in conversation with that user, they are written in the language of that user. They are organized like a spec.
My ideal tests
I want tests that hold to all of the above. My ideal tests are super fast, 100% reliable, simple, isolated, written in the language of the user, and easy to read. (This requires the SUT to be decoupled, cohesive, well-named, and DRY - properties I already want.)
Every test is both an acceptance test and a microtest.
The doubt
The usual objection I hear is "while isolated unit tests tell you about each of the little pieces, you still need some kind of integration test to confirm that all the parts work when you put them together."
Well, that's true for most programs, but it's only necessary because of how your code is organized. "parts work when you put them together" means "the desired behaviors of the program (the acceptance criteria) are emergent properties of the system". But we know how to refactor. If two parts of the system need to work together, we can put them together in the code, and then use a microtest to assert that desired behavior.
Friday, February 17, 2017
AONW2017: Amazing Distributed Teams
My new job involves teams that are distributed over a bunch of locations, mostly on the West Coast of the USA. Each team has people at multiple locations.
I went to Agile Open NorthWest 2017 with the question "How can we make distributed teams awesome?" Here's what we came up with:
There are a bunch of known good practices to help distributed teams not suck too much. Doing them won't get us to "awesome", but at least we can get up to "not sucky". So let's start by writing down these practices:
I went to Agile Open NorthWest 2017 with the question "How can we make distributed teams awesome?" Here's what we came up with:
There are a bunch of known good practices to help distributed teams not suck too much. Doing them won't get us to "awesome", but at least we can get up to "not sucky". So let's start by writing down these practices:
- Communicate a lot
- Don't let remote people be 2nd-class. Make everyone equally remote, even if some are in the same building.
- Chat room for all communication, even within an office
- Experienced people who don't need to learn as much are at less of a disadvantage when remote*
- Because remote pairing is more tiring, be deliberate about taking breaks.
- use Pomodoro
- Do your homework before coming to meetings, so you don't need to rely as much on awkward VTC communication
- Show up to meetings on time
- Don't let people get blocked on questions. If someone raises a question in chat, don't leave them hanging.
- Have the whole team mob for 1 hour to start the day, to get alignment on the day's work
- Synchronize time of work
- Meet face-to-face regularly. Have a budget to bring people together.
- Many companies save money by having people work from home. Direct some of that savings to equipment, travel, etc.
- Consider paying out-of-pocket to make remote more awesome, and then ask for reimbursement if it helps.
- Remember that patience online is short, and accommodate that fact.
- Create team agreements - they're at least as important as for teams that sit together
- Recognize the expression of Conway's Law: limited communications affect software architecture
- Telepresence robots can help. Make sure they are human size (short robots get treated like children)
- Retro often, with a relentless focus on the things that make remoting difficult
- Build the team
- Play games together remotely (poker, Halo)
- Friday beers in VTC
Yet-unsolved problems that generally make remote work suck:
- There is no good remote whiteboard
- Estimating remotely is particularly bad (plug for #NoEstimates)
- When tools/tech stop working right when we need them, we have a bad time
- It's hard to influence the org beyond the team / hard to influence culture
And then we looked at advantages to remote work / distributed teams - how they can be better than teams that sit together:
- 50 people can write on a Google Doc at once, while only a couple people can write on a whiteboard at once.
- Better ergonomics are possible. No crowding around a single screen.
- Remote breaks are real breaks. When you step away, no one can reach you. Go outside!
- You have access to a broader pool of talent.
- It increases diversity, even compared to the same people sitting together
- Enables a 24-hour development cycle
- Can accommodate people in new ways. For example, a person with a partial hearing loss can turn up their headphone volume, instead of asking everyone to remember to speak up.
- Can accommodate varying communication styles
We measure how awesome a team is with two questions:
- Are we delivering (steadily increasing) value?
- Are people happy?
*I don't think this is true, but it come up in the session, so I put it here in the list.
Thursday, February 9, 2017
Safely extract a method in any C++ code
Moved here for better formatting: http://jay.bazuzi.com/Safely-extract-a-method-in-any-C++-code/
Tuesday, October 18, 2016
Pinning Tests
I wrote this on the C2 Wiki, with the hopes that other people would help improve it. But now that site is down, so I'm posting it here:
Definition: A simple-minded automated test that locks down the behavior of existing code that otherwise is not well-tested, as a safety net while refactoring.
Example: Run some code and collect logs as a baseline. Each time you make a change, run the program again and compare the logs against the baseline. As long as there is no difference, you have some confidence that things are still working.
Pinning tests can make it safer to refactor. (Pinning tests can never make refactoring completely safe, because you'll forget important cases in your pinning tests. For safety, use #3 or #4 from Various Definitions of "Refactoring"). Pinning tests are a safety net, just in case.)
The most important features of pinning tests are:
Non-requirements for pinning tests:
Robustness. Professional testers get really good at making robust tests that work on different computers, or at different screen resolutions, or across UI changes. Ask them to refrain - these tests are short lived, and the behavior of the system won't be changing (by definition of "Refactoring").
You don't need to run your pinning tests in every environment that you ship. For a GUI, it's fine to record mouse clicks and keystrokes.
Long-lived. The goal is to hold behavior constant for just long enough to ReFactor.
Clean code. Hacking the test together is OK. For example,
Definition: A simple-minded automated test that locks down the behavior of existing code that otherwise is not well-tested, as a safety net while refactoring.
Example: Run some code and collect logs as a baseline. Each time you make a change, run the program again and compare the logs against the baseline. As long as there is no difference, you have some confidence that things are still working.
Pinning tests can make it safer to refactor. (Pinning tests can never make refactoring completely safe, because you'll forget important cases in your pinning tests. For safety, use #3 or #4 from Various Definitions of "Refactoring"). Pinning tests are a safety net, just in case.)
The most important features of pinning tests are:
- Give an obvious, definitive pass or fail result.
- Good coverage. Professional testers get really good at this; ask them to help.
- Faster is better, so you can run them often.
Non-requirements for pinning tests:
Robustness. Professional testers get really good at making robust tests that work on different computers, or at different screen resolutions, or across UI changes. Ask them to refrain - these tests are short lived, and the behavior of the system won't be changing (by definition of "Refactoring").
You don't need to run your pinning tests in every environment that you ship. For a GUI, it's fine to record mouse clicks and keystrokes.
Long-lived. The goal is to hold behavior constant for just long enough to ReFactor.
Clean code. Hacking the test together is OK. For example,
- Use the C preprocessor to redirect troublesome API calls to write to a log instead.
- Edit your HOSTS file to hijack accessing a network resource.
Tuesday, September 6, 2016
Proposed Refactoring: Introduce Parameter in Lambda
Given a lambda with a captured local variable,
- Add a new parameter to the lambda
- Inside the lambda, replace uses of the local with uses of the new parameter
- Where the lambda is called, pass in the local.
Loading Gist 9bb6b3066f60fb7aaa7ad1f39af3cd1d...
I believe this is a refactoring: I believe that this transformation has no effect on the behavior of the code. But I'm not completely certain.
This operation is not allowed if the value of the local is changed inside the lambda.
This is almost the same operation as Introduce Parameter.
Sunday, September 4, 2016
Proposed refactoring: extract and execute lambda
Given a statement block, wrap it in a lambda assigned to an Action variable, and execute it immediately.
Loading Gist 21a92bc8770caf19bf24812dff938822...
I believe this is a refactoring: I believe that this transformation has no effect on the behavior of the code. But I'm not completely certain.
I think a similar recipe for expressions is equally valid, using a
Func<>
instead of Action
.This is almost the same operation as Extract Method.
Tuesday, July 5, 2016
"pure unit test" vs. "FIRSTness"
Sometimes we categorize tests into groups like "pure unit test", "focused integration test", "end-to-end-test", etc. That's a fine approach, and useful for a lot of cases.
For example, I find that pure unit tests are extremely valuable in giving me feedback about my code design, especially coupling and duplication. I don't even have to run the tests to get that value! Other types of tests have their value, but they don't give me that feedback.
Another categorization I sometimes find useful is based on the FIRST Properties of Unit Tests. You should read that link for the full story, but I'll summarize here:
Fast
Isolated (tests have a single reason to fail. One aspect of behavior = one test)
Repeatable (same result every time)
Self-verifying (tests report an unambiguous pass/fail)
Timely (each test is created just before it is needed)
It's common for programmers to have one set of tests that they run with every edit-build-test cycle on their dev machine. They might have another set they run to validate each checkin before it merges in to source control. Another that runs nightly or weekly. Another that runs before each release.
I've noticed is the decision about which tests fit in each of these buckets is less about "unit" vs. "integration" and more about "FIRS" (without the T). That is, if a test is fast and the results are reliable and useful, programmers will tend to run them more often. If a test is slow, or results require investigation, they will tend to run them less often.
Ideally, I'd like to see 99.9% of tests run in 1ms or less, be perfectly repeatable, with a clear pass/fail, and for each failure to make it obvious what aspect of what desired behavior is not right. You should strive for that. But today, given the tests you have, you may find value in bucketing your tests as I've described.
For example, I find that pure unit tests are extremely valuable in giving me feedback about my code design, especially coupling and duplication. I don't even have to run the tests to get that value! Other types of tests have their value, but they don't give me that feedback.
Another categorization I sometimes find useful is based on the FIRST Properties of Unit Tests. You should read that link for the full story, but I'll summarize here:
Fast
Isolated (tests have a single reason to fail. One aspect of behavior = one test)
Repeatable (same result every time)
Self-verifying (tests report an unambiguous pass/fail)
Timely (each test is created just before it is needed)
It's common for programmers to have one set of tests that they run with every edit-build-test cycle on their dev machine. They might have another set they run to validate each checkin before it merges in to source control. Another that runs nightly or weekly. Another that runs before each release.
I've noticed is the decision about which tests fit in each of these buckets is less about "unit" vs. "integration" and more about "FIRS" (without the T). That is, if a test is fast and the results are reliable and useful, programmers will tend to run them more often. If a test is slow, or results require investigation, they will tend to run them less often.
Ideally, I'd like to see 99.9% of tests run in 1ms or less, be perfectly repeatable, with a clear pass/fail, and for each failure to make it obvious what aspect of what desired behavior is not right. You should strive for that. But today, given the tests you have, you may find value in bucketing your tests as I've described.
Thursday, June 23, 2016
How to document your build process for an open source C# project
As an Open Source contributor...
I find an interesting open source project that I want to contribute to. I fork/clone the repository to my machine. Then I have to figure out how to build it.
I try something and the build fails. Do I need a certain SDK or Visual Studio feature installed? Which version?
I get it to build and then I try to run the tests. 1/3rd of them fail, because they are looking for something that isn't installed on my machine.
If I'm lucky (!?) I find a document in the repository that claims to be build instructions, but it is jumbled and clearly out of date. I try to follow it, but something I need to install is no longer available, or not compatible with my version of Windows. Will a newer version of that thing work OK?
Uggh, what a mess.
As an Open Source maintainer...
I put together a cool little project in my spare time and post it online. It's simple and straightforward to build and run tests.
Then a contributor complains that they can't build it. What information could possibly be missing? It's simple and straightforward, right? I write a small text file explaining the obvious instructions. The contributor tries to follow it but is even more confused. I don't have time for this.
Uggh, what a mess.
A solution
My solution is AppVeyor. I treat AppVeyor as the reference build environment.
Here's how:
- https://ci.appveyor.com/
- New Project, select your project.
- Settings -> Do what you need to get a green build + tests
- Settings -> Export YAML. Add it to your repo.
- Delete the AppVeyor project
- New Project again, but this time configure nothing. It will use the settings from your repo.
- Confirm that build + tests are still green
Now the instructions for how to build + run tests are in source control. Anyone can read them. There won't be any missing details. If a dependency changes, I won't miss updating the instructions, because AppVeyor will report my build is broken.
No more mess.
Saturday, June 11, 2016
An example of good engineering
I often advocate for good engineering practices and the path to Zero Bugs. Talking about these things is great and all, but concrete examples are important. I recently published an open source project that I think is a good example of this kind of work.
I hope you will copy some of the ideas to use in your own projects. You can read the source here: https://github.com/JayBazuzi/ValueTypeAssertions
I hope you will copy some of the ideas to use in your own projects. You can read the source here: https://github.com/JayBazuzi/ValueTypeAssertions
Great tests
Code Coverage is a dumb measure, especially in this case. There are very few branches in the code; two tests would hit 100% coverage.
In this project, you can pick any line of code and modify it to be incorrect, and you'll get a test failure that tells you exactly what is wrong. That is much more valuable than any code coverage number.
I can't guarantee that it has 0 bugs, but I can say that every type of bug I have ever imagined or experienced in this code is covered by a test.
The tests are organized like a spec, using namespaces/folders to organize tests the same way as if you were writing a spec. Each name indicates what aspect of the system's behavior is being covered.
The tests are super-fast, which makes the edit-build-test cycle a happy experience.
ReSharper
ReSharper settings are included in the repository. All sources have been formatted with these R# settings. This makes it easy to keep formatting / style consistent.
If a random person on the internet decides to make a contribution, I don't have to explain the project's style - they can just let ReSharper take care of that.
ReSharper Code Inspections are 100% clean, further helping keep the code clean and consistent.
AppVeyor
Every Pull Request is automatically validated by AppVeyor, including build + unit tests.
C# Warn-as-error is turned on for the AppVeyor build. I believe it's important to have 0 warnings - either heed the warning if it matters, or disable the warning if it doesn't. But I don't want to slow down my edit-build-test cycle just because if a warning, so I don't set warn-as-error on the desktop. But I dot set it in AppVeyor, to to ensure that all changes have 0 warnings before they hit master.
AppVeyor runs ReSharper Code Inspections, again ensuring there are 0 issues before merging to master. This is especially important because not everyone has ReSharper.
The AppVeyor web site lets you edit build settings online. It's a convenient way to tune the settings. Once I had them just right, I downloaded the appveyor.yml file and added it to the repository. Then I deleted my AppVeyor project and recreated it from scratch, to ensure that no online edits were required -- everything is in the repo. If anyone wants to fork this project on GitHub and set up the same build, that will be easy.
NuGet
Each AppVeyor build produces a NuGet package, which means we know that there aren't any problems in the .nuspec file or anything like that.
When a commit is merged to master, a special AppVeyor build runs to generate an "official" nuget package which is then automatically uploaded to the nuget.org package repository. (The API key is encrypted). AppVeyor automatically updates the version number, and it includes a "-beta" tag so no one expects it to hold to any Semantic Versioning guarantees.
When semver becomes important for the projet, I will implement a one-touch release process to nuget with non-beta versoin numbers.
The project itself
The whole purpose of this project is to help you get a step closer to zero bugs.
It embodies all I have ever learned about how to implement equality in C#. Everything I have read; every mistake I have made; every mistake I can imagine making. It makes it easy to eliminate a class of errors: "C# class with incomplete or incorrect equality implementation".
It reduces the barrier to addressing Primitive Obsession, which means fewer bugs in the rest of your system, too.
The project is small
This is quite a small project. You may think that your codebase, being far larger and more complex, would not be amenable to this kind of engineering. I admit that I haven't proven otherwise. And even if you believe it would be possible and valuable to do it on your big project, you may not see how to map these ideas from here to there. Sorry.
But in some ways, the fact that it is small is part of its success. I have found a single need and satisfied that need in a single package. You can adopt this package without taking on any other requirements - no opinionated framework here. It adheres to the Single Responsibility Principle. It does what is needed and nothing else. Any time you can make a project do that, it's a win.
ValueTypeAssertions
The Problem
Primitive Obsession is one of the most pervasive code smells out there. You can address it by moving a primitive in to a simple class.
Loading Gist 58e39edaba3f6aea753f5ea5d0484c95...
I call the resulting class a "value type", but don't confuse it with C#'s notion of a value type, which doesn't get its own heap allocation, and is passed-by-value to other methods, and is a source of bugs if it's mutable. I mean "a type that represent a value in some domain".If you want to implement equality on that class, there are a lot of tricky details that are easy to get wrong, at least in C#. For example:
Loading Gist 4ae17ec794f49665844e3b3e01614867...
This will throw an exception when trying to cast the Bar to a Point. So you try to fix it:
Loading Gist 939c8c7d570e7a7476659fcb75ae2658...
This will throw when trying to call null.GetType(). Uggh.
You probably want to override operator==() as well.
Loading Gist 0a70ef42a2ec6c24e6bb6425defa058e...
The compiler tells you to implement operator!=() to go with it, so you copy/paste and change the method name:
Loading Gist e6fad485cf86e0b9832f6ebccae2dcad...
Oops, you forgot to negate the check. Bug.
If the value in question is a case-insensitive identifier of some sort, it's important that the GetHashCode() is implemented correctly. Don't do this:
Loading Gist 8d025d68706998bf091445454a2e4c74...
Maybe you want to implement IEquatable<>, too, and you better get these details right there, too.
Many programmers don't test these details at all, or they test a few but not all, and they have to repeat the same set of tests each time they introduce a new class. If you discover a new rule (ToString() should follow equality, right?) you have to update all the tests.
Prior Art
Assertion libraries typically have an equality assertion. For example, in NUnit: Assert.AreEqual( new Point(7,8), new Point(7,8) );
That is insufficient. It only tells you that one of the equality checks you've written is correct, and doesn't catch all the other cases listed above.
The Solution
ValueTypeAssertions addresses all the mistakes I have ever made, or seen made, or can imagine when implementing equality in C#. Grab it from NuGet, and write a unit test like this: ValueTypeAssertions.HasValueEquality(new NtfsPath("foo.txt"), new NtfsPath("foo.txt"));
This says "these two objects should equal, in every way that C# recognizes equality".
ValueTypeAssertions.HasValueInequality(new NtfsPath("foo.txt"), new NtfsPath("bar.txt"));
Which says the same thing about not being equal.
If some part of your value should be case insensitive, just add another assertion:
If some part of your value should be case insensitive, just add another assertion:
ValueTypeAssertions.HasValueEquality(new NtfsPath("foo.txt"), new NtfsPath("FOO.TXT"));
If you wrap two values, assert the combinations:
You can find the source on GitHub.
What change would make it more useful to you?
Is there a name for this that would be more obvious?
ValueTypeAssertions.HasValueInequality(new Point(1, 2), new Point(1, 8));
ValueTypeAssertions.HasValueInequality(new Point(1, 2), new Point(0, 2));
Feedback
Do you find this useful?What change would make it more useful to you?
Is there a name for this that would be more obvious?
Wednesday, May 11, 2016
Extract Method introduces a bug in this corner case
I rely on automated Extract Method to do the right thing, regardless of test coverage. This is a key part of attacking legacy code that lacks tests. But if the Extract Method introduces a subtle bug, then I can't rely on it.
Here's the code:
As it is, the test passes. If you extract the indicated block, then the test fails. Extract Method should add a `
This repros with VS2013, VS2015, and ReSharper 8, 9, and 10.
Here's the code:
Loading Gist d1375358582079e8d2f9bf09e33017e5...
As it is, the test passes. If you extract the indicated block, then the test fails. Extract Method should add a `
ref
` to the parameter on the new method.This repros with VS2013, VS2015, and ReSharper 8, 9, and 10.
Saturday, May 7, 2016
Examples of tiny test-induced design damage
Imagine you are trying to write a unit test for some code, but you're finding it difficult.
Maybe there's some complex detail in the middle of a method that is not relevant to the current test, and wouldn't it be nice to disable that bit of code just for the purpose of the test? Maybe you could add an optional boolean parameter to the method, which when set causes the detail to be skipped.
With the exception of getting legacy code under test to support you when refactoring, I see this as a bad thing, making the code worse just for the sake of testing.
Here's my list so far:
Maybe there's some complex detail in the middle of a method that is not relevant to the current test, and wouldn't it be nice to disable that bit of code just for the purpose of the test? Maybe you could add an optional boolean parameter to the method, which when set causes the detail to be skipped.
With the exception of getting legacy code under test to support you when refactoring, I see this as a bad thing, making the code worse just for the sake of testing.
Here's my list so far:
- method marked 'internal' for testing
- method marked 'virtual' for testing
- method overload for testing
- additional optional method parameter, only used for testing
- public field that is only modified under test, to change behavior for testing
- public field that is only read by tests
- function replaced with mutable delegate field, only mutated for testing
Yes, TDD is about letting tests influence your design, but not in this way!
So how do you tell the difference? Here are a few ways:
So how do you tell the difference? Here are a few ways:
- Will this be used for both testing and in production?
- Do you feel the urge to add a comment saying why you did this?
- If you removed the tests, would you keep this design?
- Your own design sense. Do you think the design is better?
What to do about it?
Usually the desire to do this indicates that your class/function/module whatever is doing too much.
Maybe you need to extract a class. If it's not obvious what belongs in the class, you might need to extract some methods first, to put in that new class.
A really common case is primitive obsession, like if the method deals with some string. If you move the string in to a new class, and then move that "deals with the string" code in to the class, then the class is small and easy to test and your code has improved. This is Whole Value.
Maybe there's something at the beginning or end of the method that talks to an external system, and that is making testing difficult. You could move those lines to the caller, and the method becomes testable.
I'd like to find some concrete examples.
Friday, May 6, 2016
Mob Programming conference 2016
Resources
Mobbing time lapse – a full day in 3 minutesWoody Zuill keynote – how they found mobbing
Some Helpful Observations for successful Mob Programming (short slide deck)
People
Some of the people who I was glad to see at the conference:- Woody Zuill. Manager of the Hunter mob that discovered mobbing, and instigator of the #NoEstimates discussion
- Llewellyn Falco. Creator of ApprovalTests, Teaching Kids Programming, credited with “strong-style” pair programming.
- Nancy Van Schooenderwoert. Led a team of newbies to fantastic results, and wrote about it: http://www.leanagilepartners.com/library/Vanschooenderwoert-EmbeddedNumbers.pdf
There were around 50 people total, including people from Cornwall, Sweden, Denmark, and Finland.
Location
It was held at Microsoft’s New England Research and Development Center (“NERD Center”), right next to MIT. My cardkey didn’t work on the doors, though.The 3 days beforehand were the Agile Games Conference, in the same space.
Structure
2 keynotes:- Woody Zuill on how they discovered mob programming
- Llewellyn Falco on the science of mob programming (why it works)
2 open space slots.
The workshops were a chance to participate in a mob under the guidance of a mobbing expert. There were workshops at the introductory, intermediate, and advanced levels of mobbing.
Take-aways
The conference was less about teaching/learning, and more about experience. As such, most of my take-aways don’t fit in to an email. Hopefully I can facilitate these experiences for others.Woody explicitly does not recommend mobbing. The important things he sees, which led to the discovery of mobbing + great results:
- Kindness, consideration, and respect
- The people doing the work should choose how they do they work
- Turn up the good (work on making good things happen more, rather than fixing bad things – the bad things tend to melt away)
Mob programming is a skill. Don’t expect amazing results right at the outset.
At the conference I had the opportunity to experience mobbing at various levels, and this gave me a glimpse of what expert mobbing would look like. I can now see how that way of working would produce those amazing results.
I worker asked me to get the answer to the question “what is the ideal mob size?” The answer I found is largely about reframing the question:
If a team is not skilled at mobbing, then you won’t get great results, regardless of mob size. An expert mobbing team will be able to work well with 4 people or with 14. Get people that have each of the skills/knowledge/talents that will be needed, so they don’t get blocked.
I can now teach you to differentiate a male house sparrow from a male song sparrow, in less than a second.
Monday, April 11, 2016
Definitions of "Zero Bugs"
I am writing in response to this tweet:
A common definition of "Bug" is "Code that does not work according to spec." I see this as a deliberately narrow definition to cope with (coddle!) too many bugs. I want to come back to that, but first some definitions of Zero:@arlobelshee @moshjeier @jaybazuzi @lisacrispin @dwhelan I agreed for some def'n of terms. Please link to your def'n of those terms.— mheusser (@mheusser) April 11, 2016
- The normal known bug count is 0. Switch from counting bugs to counting days/months between bugs.
- For every bug we've ever seen, we know that that class of bug will never happen again.
- We no longer need a find-and-fix cycle before shipping a feature.
- A mindset shift, from "bugs are inevitable" to "bugs are, uhh, evitable".
- An ideal to aim for, which informs how we work each day.
- A state where the rules of the game have changed, and we discard the protocols and cautions we had put in place to manage bugs.
Are any of these definitions the same as "no customer will ever find a bug in this code, ever"? No, but that hardly matters. You certainly shouldn't let that be an excuse to argue that Zero Bugs is impossible instead of deciding to start down the path to #BugsZero.Anything surprising, confusing or disappointing anyone.— Arlo Belshee (@arlobelshee) April 11, 2016
Thursday, February 18, 2016
BugsZero @ Agile Open Northwest 2016
Neo: What are you trying to tell me? That I can catch all my bugs in testing?
Morpheus: No, Neo. I'm trying to tell you that when you're ready, you won't have to.
(paraphrased)
TLDR: You already know how to do it; no heroics required; go for low hanging fruit; start now.
Typically when I mention the idea of No Bugs to people, they respond with doubt and disbelief. They think I'm nuts, or they think I'm defining "bug" in a very narrow way, or that it could only be possible in some very specific context (no schedule pressure, a simple problem domain, greenfield development, etc.).
What is a bug?
The definition of bug I am using is very broad: anything that disappoints or surprises anyone.
The only people that use narrow definitions of bugs are the people who have lots of bugs. This is a coping technique that is unnecessary when you have no bugs.
If I wrote my code correctly, but something I depend on broke and now my site is down, is that a bug? Yes.
If the developer implemented code according to spec, but the spec was wrong, is that a bug? Of course it is.
I don't care about categorizing bugs. It's just bugs.
If you ever ask "does X count as a bug", the answer should be "yes".
When is a bug?
Are we only talking about bugs that customers see? What if it's caught during testing?
I measure "bug injection" when the change is checked in to source control. When it escapes the developer's machine. In GitHub it would be when a pull request is merged in to master. I like this definition because it lets me lean on unit tests, static analysis, lint, etc. in an automated CI system.
Arlo wishes he could measure even earlier - if it gets typed in to the editor, it counts as a bug. More on that later...
What is zero?
At the AONW session Arlo asked the room how many bugs people currently have open in their bug tracking system. Answers looked like:
- 1700
- 250
- 200
- 200
- 100
Then he asked Brian Geihsler about a project he was on. The answer had a very different shape:
- 3 days to 3 weeks between bugs
(They also measured # of stories delivered between bugs.)
And then he asked Chris Lucian:
- 12-18 months between bugs
Changing the rules
Are these zero? My inner mathematician says no, but my inner project manager says yes. If you can measure days between bugs, that changes the rules:
- You no longer need to get the most expensive people in a room to triage bugs.
- You never need to argue about whether something is a bug.
- You never need to choose between fixing a bug and writing a feature.
- You can ship whenever you want.
How is this possible?
It's not about testing. It's about addressing the causes of bugs.
Where do bugs come from?
Bugs happen when a human makes an incorrect decision.
The human brain is really good at making decisions, and doesn't let a lack of information get in the way. Even worse, it doesn't tell you that it's making a decision based on a lack of information. It just makes the decision and feels confident about it. Worse still, you have a limited short term memory, so even if the information you need is available to you, it may not all fit, but you won't know it.
Here are some ways that code can set you up to make bad decisions:
- A variable is named "taxReturn" when it represents a "tax refund" (code that lies)
- A variable is named "txRfnd" when it represents a "tax refund" (abbrs. obfuscate)
- Two variables representing the same idea are named differently (unnecessary synonym)
- One idea is expressed in more than one place
- A function that is very long
- Whitespace/indentation doesn't match the parse tree (Python wins here!)
Some examples out of code:
- A dependency broke (add automated checking that the dependency still works)
- I wrote a feature the customer doesn't want (pair with a customer)
How to get to zero bugs?
This is my favorite take-away from the AONW session: there's no secret. You already know how to get there.
You already know how to get a little better. Rename a variable. Automate a step in your release process. Pair program on a kata for an hour. You can probably think of a dozen small improvements that you could make right now.
Each time there's a bug, look for some way you can avoid that class of issue. Pick the low-hanging fruit. The easiest, quickest, safest change that you know you can execute and get benefit from right away. Don't be ambitious. Do pick something that has been trouble recently.
Do it again. Keep iterating.
How long will it take?
Assume it will take about 2 years to get to Zero Bugs.
That means you need to progress 1% towards your goal each week. I know you know how to get 1% better right now.
It's a choice.
Now that you know how to stop writing bugs, the responsibility rests on your shoulders. If you're still writing bugs 2 years from now, it's because you decided to keep writing bugs.
Start now.
Wednesday, December 23, 2015
My ideal edit/build/test/commit/deploy/etc. system
There's a ton of variation out there in how teams set up the pipeline from "edit code" to "live in production". I want to talk about my ideal, to use as a reference point in further discussion.
TL;DR: When a change is pushed to master, it is proven ready for production.
"pushed to master" is equivalent to "makes it off a development machine".
It's common in Git to make multiple commits locally before pushing them up to the official repository. I am fine with those local commits not all passing tests. It's the "push" or "merge" that matters.
I take the term "master" from popular Git usage, but that's not important - it could be "trunk" or "Main" or whatever.
"Proven" here can mean a bunch of things. Obviously, it includes passing unit tests. It also includes compilation, so I will lean on the compiler. It also includes static analysis, which I will extend to eliminate classes of bugs.
It's important that this "proving" process be super fast, so that I never hesitate to run it. If it's slow, I'll want to separate the slow and fast parts, and require the only fast parts to be run on every change. The slow parts might run every few changes, or every night, or whatever, which means I don't know that master is always ready for production. So I look for ways to make it all super fast.
Sometimes a bug will slip through, and be caught by manual testing, or production monitoring, or by a customer. When this happens, I look for some way I can improve my "proving" to eliminate this class of bugs forever. In this way, over time my confidence in "ready for production" steadily grows.
Some teams have an "in development" branch, where changes can go before master, so that they can be shared between developers even if they're not production ready. In my ideal model, I don't need that. I use vertical slicing, safe refactoring, feature flags, etc. to be able to commit my changes quickly. My branches are short-lived. If my changes pass tests, I push them to master, and I'm done.
Some teams have an "in test" branch, where they'll take a snapshot of what's in master, and then run a testing pass before going to production (with some iteration for making additional fixes). In my ideal model, I don't need that. If my changes pass tests, I push them to master, and they're ready for production.
Ideally, there's an automated system that runs these builds + tests against proposed changes and then pushes them to master if they pass. In TFS they call this "gated checkin"; some people call it "Continuous Integration". The important thing is that you know for sure that master is always green - the validation always passes.
I want to reinforce the point that this is an ideal. I don't expect you to get there tomorrow. But I do want you to agree that this is both valuable and feasible, and start working towards this ideal today. Each step you take in this direction will make things a little better. You'll get there eventually.
And don't do something irresponsible like delete all your integrated tests, or fire your QA staff. Start moving towards this ideal, but keep your old process around until you can demonstrate that it is no longer giving you value.
TL;DR: When a change is pushed to master, it is proven ready for production.
"pushed to master" is equivalent to "makes it off a development machine".
It's common in Git to make multiple commits locally before pushing them up to the official repository. I am fine with those local commits not all passing tests. It's the "push" or "merge" that matters.
I take the term "master" from popular Git usage, but that's not important - it could be "trunk" or "Main" or whatever.
"Proven" here can mean a bunch of things. Obviously, it includes passing unit tests. It also includes compilation, so I will lean on the compiler. It also includes static analysis, which I will extend to eliminate classes of bugs.
It's important that this "proving" process be super fast, so that I never hesitate to run it. If it's slow, I'll want to separate the slow and fast parts, and require the only fast parts to be run on every change. The slow parts might run every few changes, or every night, or whatever, which means I don't know that master is always ready for production. So I look for ways to make it all super fast.
Sometimes a bug will slip through, and be caught by manual testing, or production monitoring, or by a customer. When this happens, I look for some way I can improve my "proving" to eliminate this class of bugs forever. In this way, over time my confidence in "ready for production" steadily grows.
Some teams have an "in development" branch, where changes can go before master, so that they can be shared between developers even if they're not production ready. In my ideal model, I don't need that. I use vertical slicing, safe refactoring, feature flags, etc. to be able to commit my changes quickly. My branches are short-lived. If my changes pass tests, I push them to master, and I'm done.
Some teams have an "in test" branch, where they'll take a snapshot of what's in master, and then run a testing pass before going to production (with some iteration for making additional fixes). In my ideal model, I don't need that. If my changes pass tests, I push them to master, and they're ready for production.
Ideally, there's an automated system that runs these builds + tests against proposed changes and then pushes them to master if they pass. In TFS they call this "gated checkin"; some people call it "Continuous Integration". The important thing is that you know for sure that master is always green - the validation always passes.
I want to reinforce the point that this is an ideal. I don't expect you to get there tomorrow. But I do want you to agree that this is both valuable and feasible, and start working towards this ideal today. Each step you take in this direction will make things a little better. You'll get there eventually.
And don't do something irresponsible like delete all your integrated tests, or fire your QA staff. Start moving towards this ideal, but keep your old process around until you can demonstrate that it is no longer giving you value.
Sunday, December 6, 2015
Types of integration/integrated test
I've noticed that people often use these terms interchangeably.
And when I look at the kinds of tests they're talking about, I see a bunch of different things. Each of these things is worth considering separately, but we lack crisp terminology for them. (I've touched on this before.)
And when I look at the kinds of tests they're talking about, I see a bunch of different things. Each of these things is worth considering separately, but we lack crisp terminology for them. (I've touched on this before.)
1. Testing class A through B
Loading Gist cc384ef8739df7f39ca7...
2. Testing class A, but B is incidentally along for the ride
Loading Gist a816a0e33d2e6fc2b593...
3. I have tested classes A and B separately, but now I want to test that they work together.
That is, that they integrate correctly.
Loading Gist 6581d5b326653cdd8b52...
4. My business logic is testable in isolation, but then I have an adapter for each external system; I test these adapters against the real external system. I call this a focused integration test, and it happens when I use Ports/Adapters/Simulators.
5. I have unit tested bits of my system to some degree, but I don't have confidence that it's ready to ship until I run it in a real(ish) environment with real(ish) load.
6. I am responsible for one service; you are responsible for another; our customers only care that they work together. We deploy our services to an integration environment, and run end-to-end tests there.
Every "Extract Method" starts with minus 1 points
Eric Gunnerson once wrote about the idea that, in programming language design, every potential language feature starts with "minus 100 points":
When refactoring, I say "Every 'Extract Method' starts with minus 1 points".
The default negative reflects the cost of looking in two places to understand your program, where previously everything was in one place. The extracted method has to provide some additional value to justify its existence.
If the new method lets you eliminate duplication, add points.
If the new method is poorly named (worse than good / accurate / honest), subtract points. If the name more clearly expresses intent, add points.
If the calling method is now easier to follow, add points.
It's not a very high bar, but if you can't get to positive territory before merging to master, throw away the refactoring.
Every feature starts out in the hole by 100 points, which means that it has to have a significant net positive effect on the overall package for it to make it into the language. Some features are okay features for a language to have, they just aren't quite good enough to make it into the language.Once a feature makes it in to a programming language, it's in there forever. If you later realize it could have been better if done a little differently, you're stuck. Features tend to join to create combinatoric complexity, so each feature you add now means potentially big costs down the line.
When refactoring, I say "Every 'Extract Method' starts with minus 1 points".
The default negative reflects the cost of looking in two places to understand your program, where previously everything was in one place. The extracted method has to provide some additional value to justify its existence.
If the new method lets you eliminate duplication, add points.
If the new method is poorly named (worse than good / accurate / honest), subtract points. If the name more clearly expresses intent, add points.
If the calling method is now easier to follow, add points.
It's not a very high bar, but if you can't get to positive territory before merging to master, throw away the refactoring.
Sunday, September 13, 2015
Unit testing microskills
In response to my Why we Test posts, George Dinwiddie had this to say:
Speed, readability, and granularity of tests aren't as important as good coverage. They don't even have to be unit tests - any tests will do. Reliability with a clear pass/fail result is important, so that bug-induced test failures actually get recognized.
If a piece of code is a completely obvious expression of a business requirement, you still need to write a test for it, since the tests call out the intentional behavior.
"prevent regressions" does not appear to require test-first. In fact, teams that focus on this value tend to write many of their tests afterwards. Because the code isn't written for testability, it's hard to test (duh). Either we don't bother testing it, or we bend over backwards writing horrible tests that are hard to understand, and lock down implementation details, making future refactoring harder.
Test speed includes any time spent analyzing results and rerunning flaky tests, so make test results obvious and rock-solid.
Many organizations are nervous about the risk of bugs from refactoring, even though they tolerate bugs from feature work. In that context, great coverage is particularly important for the refactoring safety net.
In an effort to improve coverage, teams that focus on the refactoring safety net will often test implementation details, including breaking encapsulation and injecting mocks to access those details. In the process, they lock down those details, making refactoring more difficult. That's Irony Number One.
Getting proper coverage, for both "prevent regressions" and "refactoring safety net" can be difficult. Applying the Three Rules of TDD is an effective way to get the coverage that you actually need. As long as you avoid testing implementation details, you'll necessarily have to decouple your code to make this happen. So you'll naturally end up with a code base that is at least moderately well-factored, even before you try to use the tests as a refactoring safety net. That's Irony Number Two.
if (File.GetExtension() == ".cs")
There's a bug here: if the file is named ".CS" then I want the software to work the same as ".cs". I can fix it locally, by switching to a case insensitive string comparison. And I diligently write a test for it. But then tomorrow I write another file extension check in another piece code, and I write another test. I may end up with a thousand expressions of this rule, and (if diligent) a thousand corresponding unit tests.
The rule I'm trying to test here is "File extensions are case-insensitive". I want to have exactly one test that describes and enforces that rule. Which means that rule must be expressed in exactly one place. That's DRY.
The correct response to "I'm testing this idea multiple times" is "extract the duplicated behavior from all the places it's used, and merge them to one place, and test that one place."
Note that test execution time is irrelevant here; you don't ever have to run your tests to get this value! However, responding to this design feedback leads to code that is factored in a way such that tests are naturally very fast (Irony Number Three!).
Readability is important: you have to be able to read the test to understand what requirement it's describing, to be able to detect the duplication.
Granularity is important: tests must each describe exactly one requirement, or the duplication won't be visible.
DRY reduces bugs, as it eliminates the risk of updating only 999 of the 1000 places a rule is expressed. DRY (along with Great Names / Coupling /Cohesion) is far more effective at eliminating bugs in shipped software than tests that are intended to catch bugs. (Irony Number Four)
Good collection of "whys." I'm also looking for "hows."
— George Dinwiddie (@gdinwiddie) September 12, 2015
The connection between why and how is important, but the details are not obvious. I'll pick a few values that people hope (unit) tests might offer, and give my thoughts on how to practice testing to deliver this value. (This is certainly not a complete analysis of the subject.)prevent regressions due to future workMost people pick up on this one right away: as long as you can get a green bar before making changes, and another green bar when you're done, your tests catch bugs before they get checked in. Great!
Speed, readability, and granularity of tests aren't as important as good coverage. They don't even have to be unit tests - any tests will do. Reliability with a clear pass/fail result is important, so that bug-induced test failures actually get recognized.
If a piece of code is a completely obvious expression of a business requirement, you still need to write a test for it, since the tests call out the intentional behavior.
"prevent regressions" does not appear to require test-first. In fact, teams that focus on this value tend to write many of their tests afterwards. Because the code isn't written for testability, it's hard to test (duh). Either we don't bother testing it, or we bend over backwards writing horrible tests that are hard to understand, and lock down implementation details, making future refactoring harder.
a safety net during refactoringReadability and granularity of tests aren't as important as good coverage and speed. Slow tests mean you won't run as often, which means you won't catch mistakes as quickly, which makes refactoring more expensive. That changes the cost/value/risk equation for refactoring, so you won't refactor as often.
Test speed includes any time spent analyzing results and rerunning flaky tests, so make test results obvious and rock-solid.
Many organizations are nervous about the risk of bugs from refactoring, even though they tolerate bugs from feature work. In that context, great coverage is particularly important for the refactoring safety net.
In an effort to improve coverage, teams that focus on the refactoring safety net will often test implementation details, including breaking encapsulation and injecting mocks to access those details. In the process, they lock down those details, making refactoring more difficult. That's Irony Number One.
Getting proper coverage, for both "prevent regressions" and "refactoring safety net" can be difficult. Applying the Three Rules of TDD is an effective way to get the coverage that you actually need. As long as you avoid testing implementation details, you'll necessarily have to decouple your code to make this happen. So you'll naturally end up with a code base that is at least moderately well-factored, even before you try to use the tests as a refactoring safety net. That's Irony Number Two.
make DRY problems visibleDRY problems become visible in TDD when you find yourself writing the same test repeatedly. My favorite example is file path case insensitivity in Windows. Consider:
if (File.GetExtension() == ".cs")
There's a bug here: if the file is named ".CS" then I want the software to work the same as ".cs". I can fix it locally, by switching to a case insensitive string comparison. And I diligently write a test for it. But then tomorrow I write another file extension check in another piece code, and I write another test. I may end up with a thousand expressions of this rule, and (if diligent) a thousand corresponding unit tests.
The rule I'm trying to test here is "File extensions are case-insensitive". I want to have exactly one test that describes and enforces that rule. Which means that rule must be expressed in exactly one place. That's DRY.
The correct response to "I'm testing this idea multiple times" is "extract the duplicated behavior from all the places it's used, and merge them to one place, and test that one place."
Note that test execution time is irrelevant here; you don't ever have to run your tests to get this value! However, responding to this design feedback leads to code that is factored in a way such that tests are naturally very fast (Irony Number Three!).
Readability is important: you have to be able to read the test to understand what requirement it's describing, to be able to detect the duplication.
Granularity is important: tests must each describe exactly one requirement, or the duplication won't be visible.
DRY reduces bugs, as it eliminates the risk of updating only 999 of the 1000 places a rule is expressed. DRY (along with Great Names / Coupling /Cohesion) is far more effective at eliminating bugs in shipped software than tests that are intended to catch bugs. (Irony Number Four)
Saturday, September 12, 2015
Why we test, Part 8: Because we are competent professionals
#15 in my list of reasons why we (unit) test, which I learned from James Shore:
My comments:
** most programmers
*** mocking is fantastic for Tell, Don't Ask, and problematic without TDA.
Refactoring without tests is inherently unsafe, because of the risk of introducing bugs. As a professional, I would never take such risks. Therefore, I would only refactoring when I know I have good tests. In this way, TDD makes refactoring possible.I may not be representing his idea with perfect fidelity; for that I apologize.
My comments:
- There is a class of programming languages* for which there exist reliable refactoring tools. With these tools I can safely refactor even without tests.
- The reliable tools work by following a recipe. If a human follows the same recipe carefully, they'll get the same result. That would work in strongly typed languages that lack good tooling.
- Plenty of people who make their careers as programmers ("professionals") do sloppy work, but not those who are competent.
- The tests have to be good. If you only write tests when it's easy, they won't give you enough protection. The only way I know to get this kind of test coverage is if you strictly follow the Three Rules of TDD.
- When naive** TDDers aim for 100% test coverage, they go to extreme lengths in their tests, including bad mocks and test cases that don't correspond to any business value. These common problems lock down implementation, which makes refactoring far more difficult; the opposite of Jim's goal.
** most programmers
*** mocking is fantastic for Tell, Don't Ask, and problematic without TDA.
Sunday, August 23, 2015
My ideal backlog
Problem:
There are two ways people seem to want to use a backlog:A) To sort by priority, so the next thing we do is the most important thing to do next.
B) To make sure we don't forget anything important.
In both cases, the cost and value get worse as the list grows. Good ideas that are 1/2-way down the list will get duplicated by mistake, but with different phrasing, so the duplication is not obvious. Sorting, de-duping, and understanding the items gets more expensive, but none of that effort actually creates any business value.
I see a lot of teams with backlogs that would take a year to work through, if no new ideas came along. And of course new ideas always come along, at least if you're working on anything that matters.
Since items come in to the backlog faster than they go out, the list steadily grows, and most ideas never leave the backlog. People start to believe that the backlog is where good ideas go to die.
Solution:
Keep the backlog short.7 items seems ideal, because you can keep them all in your head long enough to understand the whole list.
When a new idea appears, compare it to the current backlog, and ask "is this item higher priority than any of the items currently on the list?" If not, then let it go. Don't worry about forgetting. Trust that if it becomes more important, it will grab your attention again, and can be added to the list at that time. More likely, you'll think of something even more awesome, and do that instead. That's a good thing: doing the more awesome things before the less awesome things.
Alternate Solution:
In many organizations, my proposal won't fly. People come to the team with requests, and would be upset if you said "It's not in our top 7, so we're letting it go."In that case, keep two lists. The first list is the stuff you're going to do next (today/this sprint/whatever), and only has a few items on it. The second list is the bucket of possible future ideas, and can be any size. Spend as little time as possible grooming the second list.
When a new idea appears, compare it to the "To Do Next" list, and ask "is this higher priority than any of the items currently on the list?" If not, put it on the "Possible Future Ideas" list. Tell the requester that your idea is "on the backlog," and will be weighed against other items on the backlog when planning future releases. They'll understand that if you didn't do their idea, it's because something even better happened.
Sidebar: Hold prioritization very lightly.
We prioritize work by considering the estimated cost and value of that work. Both types of estimates are notoriously unreliable. You may believe you're working on the next most important thing, but you're probably wrong in some way that you can't know yet.If you start working on an item, stay open to discovering that you should actually be doing something else. As Woody Zuill says:
#AgileMaxim 1: It is in the doing of the work that we discover the work that we must do. Doing exposes reality. http://t.co/4WbMeKnqWF
— Woody Zuill (@WoodyZuill) July 30, 2015
This is another reason to slice work very thinly. The smaller the item, the sooner you can get to the point where you learn what you should really be doing, and the more likely it is that this current item will get completed and deliver some value before switching to your new discovery.Sunday, July 19, 2015
GetRouteData() in ASP.NET WebApi
I've been trying to get System.Web.Http.HttpRouteCollection.GetRouteData() to work in ASP.NET WebApi recently, and had a hard time of it. In ASP.NET MVC it's really easy, but there are additional details I couldn't figure out in WebApi. There was even a detailed set of answers on StackOverflow, but when I tried them, they all failed in ways that didn't make sense to me.
And now I have seen it work, so I want to document it. Here's what I did:
And here's a Git repository with the complete working solution.
(Thanks to this blog post for unblocking me.)
And now I have seen it work, so I want to document it. Here's what I did:
- In VS 2013, New Project -> Web, ASP.NET Web Application
- Select WebAPI. Check "Add unit tests".
- Add the following unit test:
Loading Gist f06775e1c92131352ed5...
And here's a Git repository with the complete working solution.
(Thanks to this blog post for unblocking me.)
Thursday, May 14, 2015
The relationship between DRY and Coupling
I think that the DRY principle is a subset of* "Low Coupling".
If we decide to change our file extension to the much more reasonable ".bar", then we must edit both.
*possibly equivalent to
DRY & Coupling:
If one rule is expressed in two places in your code (violating DRY), and you want to change the rule, you must edit both places. This is coupling.byte[] OpenFile(string fileName) { // Is it our file type? if (fileName.Extension == ".foo") ... void AutoSaveFile(byte[] contents) { path = Path.Combine(directory, DateTime.Now.ToString("dd_MM_yyyy") + ".foo");
If we decide to change our file extension to the much more reasonable ".bar", then we must edit both.
*possibly equivalent to
The Prime Refactoring
I used to believe that the two most important refactorings were Extract Method and Rename. The way they deliver value and the way they are used are quite different, so it's hard to compare, so I figured they had equal value.
Recently I've decided that Rename is slightly more urgent, if not more important. It is the first refactoring to learn; the first to teach; the first to apply. (Just slightly)
The problem is code that lies to you. It says it's doing one thing, but actually it's doing another. You either have to think really hard to figure that out (slow) or you misunderstand the code and write bugs.
Fix that first. It may lack cohesion, have tight coupling, and lots of duplication, but first introduce good names. Rename to make the code stop lying to you.
(Soon afterwards, start using Extract Method to give you more things to name.)
Recently I've decided that Rename is slightly more urgent, if not more important. It is the first refactoring to learn; the first to teach; the first to apply. (Just slightly)
The problem is code that lies to you. It says it's doing one thing, but actually it's doing another. You either have to think really hard to figure that out (slow) or you misunderstand the code and write bugs.
Fix that first. It may lack cohesion, have tight coupling, and lots of duplication, but first introduce good names. Rename to make the code stop lying to you.
(Soon afterwards, start using Extract Method to give you more things to name.)
Monday, April 6, 2015
"good" names - a minbar
In code, naming things well is incredibly powerful. Names help with expressing intent, increasing cohesion, and identifying duplication.
Bad naming can do a lot of damage. Names that lie, mislead, or obfuscate will confuse a programmer, or at least make her work harder to get the job done.
I think a name is "good" when you don't have to examine what is behind the name to know what it does. It doesn't have to add additional value, it just has to avoid obfuscation. For example:
If you see
This is the minimum bar when naming a new entity in code. It's not a hard bar to meet. You can often do way better. But never check in any code that doesn't meet this bar.
JBrains calls it it "accurate names".
Arlo Belshee calls this "tweetable names":
Bad naming can do a lot of damage. Names that lie, mislead, or obfuscate will confuse a programmer, or at least make her work harder to get the job done.
I think a name is "good" when you don't have to examine what is behind the name to know what it does. It doesn't have to add additional value, it just has to avoid obfuscation. For example:
void AThenB() { A(); B(); }
If you see
AThenB()
in code, you'll know exactly what it does. Not a great name, but not a damaging name, either.This is the minimum bar when naming a new entity in code. It's not a hard bar to meet. You can often do way better. But never check in any code that doesn't meet this bar.
JBrains calls it it "accurate names".
Arlo Belshee calls this "tweetable names":
A fn is tweetable iff I can tweet the name sans context and you will write exactly the same body. Untweetable code is crap.
— Arlo Belshee (@arlobelshee) December 12, 2014
Wednesday, March 18, 2015
The zeroth rule of software estimating
I realized that before even the first rule of software estimating must come:
Some of the answers I have heard:
Know why you are estimating.We take it for granted that software estimating is something we must do. For many people, this is obvious. But when we start talking about why we estimate, I see many different answers. Perhaps it is not so obvious after all.
Some of the answers I have heard:
- To decide which work to do next.
- To decide how many items to start working on in an iteration.
- To decide how many people to hire.
- To sync up long-lead work (e.g. marketing).
- To evaluate and reward the performance of individuals.
- To evaluate and reward the performance of teams.
- To measure the impact of changes in process, tools, technical debt, etc.
- As a lever to push people to work harder.
It's common to choose more than one. This can produce really wacky results.
Whatever your reasons are, it's worth understanding them deeply. Is that something you really need? Is this approach really going to give you that result? Are there other ways that are more effective?
Thursday, February 26, 2015
The second rule of software estimation
The more error there is in your estimates, the less precise you must be.
That's based on my past experience with being wrong a lot, and seeing other people be wrong a lot. If I tell you I can write a feature in a day, and sometimes I'm right, and sometimes it takes a month, then there's no reason to differentiate between 5-hour and 6-hour features when estimating.
I suspect that powers-of-n is a good model for many teams, where n depends on some combination of team familiarity with the code, technical debt, domain complexity, etc.
A statistician could certainly give some guidance here. Something about standard deviations.
A lot of teams like to use Fibbonaci numbers for their estimates, which seems weird to me. Why is this a good sequence? Why jump from 1 to 2 (a 100% increase) then to 3 (a 50% increase)? Can you really tell a 2 and a 3 apart, reliably enough to be useful?
In Fibonacci, the next number is "twice the average of the last two numbers", which is pretty close to "twice the last number". I doubt your estimates are reliable enough that the difference will matter. And powers of two are culturally familiar in software, easy to remember, and easy for programmers to add.
See also: the first rule.
I suspect that powers-of-n is a good model for many teams, where n depends on some combination of team familiarity with the code, technical debt, domain complexity, etc.
A statistician could certainly give some guidance here. Something about standard deviations.
A lot of teams like to use Fibbonaci numbers for their estimates, which seems weird to me. Why is this a good sequence? Why jump from 1 to 2 (a 100% increase) then to 3 (a 50% increase)? Can you really tell a 2 and a 3 apart, reliably enough to be useful?
In Fibonacci, the next number is "twice the average of the last two numbers", which is pretty close to "twice the last number". I doubt your estimates are reliable enough that the difference will matter. And powers of two are culturally familiar in software, easy to remember, and easy for programmers to add.
See also: the first rule.
Tuesday, February 24, 2015
The first rule of software estimating
Take a list of pieces of work you might do. Stories, features, products, I don't care. Find two that are the same size. Approximately.
Do them both. Measure how long they took. Did they come out the same?
If you can't reliably recognize two items as being the same size, then nothing else in estimation will work for you. It all builds on this.
Do them both. Measure how long they took. Did they come out the same?
If you can't reliably recognize two items as being the same size, then nothing else in estimation will work for you. It all builds on this.
How I write "contract tests"
This comes up in conversation often enough that I want to write it down..
Context:
My code talks to an external dependency that is awkward to use in unit tests.
I can refactor most of my code to eliminate the dependency. (See DEP and Whole Value). But I still have some code that talks to the external dependency. I wrap the dependency with an adapter (see Ports-and-Adapters) of significant thickness and abstraction (see Mimic Adapter). In test, I replace the real dependency with legitimate, but simplified test double (see Simulators).
Problem:
I can't be certain that my simulator has fidelity with my real system. They may behave differently, allowing my tests to pass when my system has a bug. (This is a common problem with mocks.)
Solution:
Write one set of tests for the port, running the tests against both the real and simulated implementation.
In C#:
Loading Gist 3122a1f93b2eb7596d93...
Tests on the simulator are fast enough to run with every build.
Tests on the real system may be slow; they may require awkward setup; they may cost real dollars to run. You may decide to run them only in your CI or once per sprint or whatever. Since adapters are relatively stable, that can be OK.
Tuesday, February 17, 2015
Bug metrics
Metrics are tricky. Plenty of ink has been spilled on that topic, so I'll leave it for now.
Around bugs, I know of 4 interesting metrics:
Around bugs, I know of 4 interesting metrics:
- A: Count of active bugs
- B: Time to fix
- C: Fix rate
- D: Injection rate
When I want to sound like I understand queuing theory, I call them Peak / Latency / Throughput / Load.
(I'm ignoring the disconnect between what we can measure and what is true. For example, bugs in the system that are impacting customers but are not currently tracked by the team. See http://jbazuzicode.blogspot.com/2014/11/measuring-bug-latency.html)
Customers only care about A and B.
Companies that I have worked at often give a lot of attention to A. For example, I've seen "Bug Hell", where any dev (or any team) with more than a certain number of active bugs must stop working on features until the bug count is lowered.
In the orgs I'm familiar with, we tend to go immediately from A to C, with bad consequences. Focusing on C means devs will tend to choose narrower fixes; they'll allow tech. debt to accumulate; they'll forego testing; they'll fix cheap bugs before important bugs; they'll work when tired; they'll multitask. The inevitable bug bounce will be higher. This is all bad for customers; it's bad for business..
Getting B (latency) down is great, but it's not always directly actionable. You can prioritize bug fixes before feature work. You can strictly assign bugs back to the devs that created them, throttling the most prolific bug creators.
I see D (injection rate) as being a valuable thing to focus on (although it's difficult to measure). As you write fewer bugs, A and B will get better, which is good for customers. And C will become irrelevant.
Because A->C is such a deeply ingrained habit in our corporate culture, if you don't want that to happen, you have to actively exert effort to take things in a different direction. Every time someone says "we have N bugs", make sure they also say "remember to treat each bug as a learning experience - what can we do to make sure this kind of bug doesn't happen again?" and never say "we fixed M bugs this week."
(Thanks to Bill Hanlon for putting a lot of these ideas out there.)
Subscribe to:
Posts (Atom)