Jay Bazuzi's coding blog

Sunday, November 17, 2013

Logging an Item

MSBuild items seem like the should be almost the same as properties, but they are different in some odd ways. Time to add support for them.

Most of the time was just spent iterating with MSBuild, trying to understand what it expected, and how it would respond to different inputs. I'm glad I'm encapsulating that here in this class, so the rest of my Task code doesn't have to worry about it.

Loading Gist 7523672

EDIT: After posting this, I noticed a bug. I had hard-coded the name on line 7 of LogProperty. Some TDD gurus insist that you write more than one test before removing hard-coded results. If I had done that, I would have caught that bug before committing my code. I'm tired, so I'm looking for shortcuts; if I had a pair, I bet we wouldn't have let that happen.

Refactoring the test

All the TDD gurus admonish us to keep our test code well-factored.* I know I'll want to write more tests soon, so now is a good time to clean up the test I have. Let's refresh ourselves with starting code:

Loading Gist 7519763

This is a job for Extract Method. Most of the effort goes in to picking the right places to slice the method in to bits.

It's easier to know where to slice when you have a second (or 3rd!) example - it's easier to see the duplication. On the other hand, refactoring 3 is more work than refactoring 1. I took a quick stab at writing a new test, saw the duplication, deleted it, and then refactored. Call this an educated guess.

Here's what I ended up with:

Loading Gist 7522819

I'm curious about this new assert. Some TDD gurus mention custom asserts as one element of advanced TDD, so I am going to pay attention here and see if anything interesting happens.

Next up, a new feature, with a new test. Let's see if my refactoring works out.

*Arlo Belshee goes a step further: don't worry about duplication between tests as much as making the rests read cleanly. They should say exactly what you want to test, no more, no less. He says tests should be "WET" - Write Explicit Tests, while product code should be "DRY" - Don't Repeat Yourself.

Refactoring the logging

Now that the tests are working OK, I want to turn my attention back to the other goal: cleaner code for task parameter logging. Here's the current implementation:

Loading Gist 7517928

I want to refactor this to generalize. I'm not going to add additional tests just yet, even though maybe I should. At least I have enough test coverage that I don't have to spend time manually testing as I go.

I used very small steps, but the outline is:

Extract Method LogInputs, push it up to base class LoggingTask
Replace the hard-coded "Foo" and this.Foo with reflection via this.GetType().GetProperty("Foo").Name and .GetValue(this).
Extract method LogInput, taking a PropertyInfo
Replace GetProperty() with a loop over GetProperties(), filtering by the [Required] attribute.
Extract Method GetPropertyInfos and HasAttribute.

Here's the result:

Loading Gist 7521986

As an indication that I'm on the right track, see how small the test task is:

Loading Gist 7521995

You will also need this extension:

Loading Gist 7522018

I don't really like that he [Required] attribute is used to find parameters; that's not quite the right rule. But I'll come back to that later.

Creating the MSBuild project in-memory

Now that we're no longer running the MSBuild command line, the next isolating refactoring is to create the project file in memory, too. To do that, I used ProjectRootElement:

Loading Gist 7519460

Note this depends on the following extension method, which I introduced to reduce duplication:

Loading Gist 7519477

I have an urge to write an extension method to replace AddTask which will ensure that the project has a UsingTask already. In fact, I'll go ahead and do that, so you can see the transformation.

Loading Gist 7519648

I also made AddUsingTask generic:

Loading Gist 7519677

Which allows the following simpler test:

Loading Gist 7519763

Running MSBuild in-proc in a unit test

In the previous step, I executed my code under test (a custom MSBuild task) by running MSBuild as a separate process, with a hard-coded path to the EXE, passing a project file on the command line, and capturing standard output. An ideal unit test runs the system-under-test (SUT) in isolation, and this is far from isolated. That's fine in spike mode, but over time I want to refactor my tests to improve isolation.

To take a step in that direction, I'm am switching from Process.Start to using the MSBuild APIs directly. It took a while to figure out a way that works, but now I've got one:

Loading Gist 7519022

This requires a custom logger to gather the output, which I wrote like this:

Loading Gist 7519004

This new logger is pretty specialized - high importance messages only. I have an urge to generalize it to let the importance be a parameter, but for now, I'm applying YAGNI and leaving it as simple as I can.

Running MSBuild in a unit test

My aim now is to write a simple, fast, reliable test that will execute my custom MSBuild task. Currently, I can run it with the command line:

PS> msbuild .\My.proj /v:d

One challenge with testing it this way is that the output has a lot of non-repeatable test, including timestamps. Just to make things easier, I'll elevate the logging importance to "high".

Log.LogMessage(MessageImportance.High, "  Foo = {0}", this.Foo);

Then I'll run with minimal verbosity, and nologo to clean up the output:

PS> msbuild .\My.proj /v:m /nologo
  Inputs:
    Foo = hi

Now I need to actually write a test. The simplest test I can think of:

Loading Gist 7518446

Note that I wrote the expected output as empty, to see the exact actual output (which is already known good, since I'm writing the test after the fact). I'll grab that actual output, make it the expected value, and the test will pass. That way of working is the basis of Llewellyn Falco's ApprovalTests, which I may switch to in a future step. But for now, make the test pass & check in:

            const string expected = @"  Inputs:
    Foo = hi
";

spiking a custom MSBuild Task

My team has a bunch of MSBuild tasks. Most of them write their input parameters to the log, like this:

Loading Gist 7517928

As of recent versions of MSBuild, this logging already happens automatically, but only in diagnostic logs. Those logs are so big as to be unusable for us, so we want this information to appear in the detailed (or perhaps normal) logs.

Today, I want to explore two ideas:

Using TDD on MSBuild tasks, and
Removing duplication of this logging code.

I'll start with a spike, to confirm that I do indeed know how to write & run a custom MSBuild task. The task implementation is presented above. Next, I write a small MSBuild project:

Loading Gist 7518019

And then run it on the command line (the 2nd run is with diagnostic verbosity, so you can see what the built-in logging looks like)

Loading Gist 7518058

Great, I know how to do something. :-)

Next up, turn this in to a test.

Sunday, September 15, 2013

TDDing to ILock

My guiding star: when faced with a difficult TDD problem, refactor to make it easy. There are many techniques for doing so (I only know a handful).

What if we use dependency injection?

1. Wrap the `lock` keyword in a custom class.

2. Pass in that custom object

3. In tests, pass a mock of that custom object instead.

Here's the resulting implementation of Counter:


    class Counter

    {

        readonly ILock @lock;



        public Counter(ILock @lock)

        {

            this.@lock = @lock;

        }



        internal int MoveNext()

        {

            using (this.@lock.Acquire())

            {

                return CurrentValue++;

            }

        }



        public int CurrentValue { get; private set; }

    }

One way of looking at what I've done here is addressed the Primitive Obsession around the _syncRoot object in the conventional pattern. You can't get much more primitive than System.Object!

You can see the resulting code here: https://github.com/JayBazuzi/TddLocks/tree/IDisposableLockToken/LocksExperiment1/LocksExperiment1

One thing to look at is my commit history: https://github.com/JayBazuzi/TddLocks/commits/IDisposableLockToken/LocksExperiment1/LocksExperiment1

Note that I explicitly called out which steps were RED, GREEN, and REFACTOR.

5b36285 REFACTOR; Rename `MyLock` -> `Lock`

c149f15 REFACTOR: Rename `Lock()` -> `Acquire()`

66518d7 GREEN: Simple `Lock()` and `IsLocked`

9b27a8c RED: LockedLockShouldBeLocked

Also, as the code currently stands (https://github.com/JayBazuzi/TddLocks/tree/IDisposableLockToken/LocksExperiment1/LocksExperiment1), there are 3 implementations of ILock:

MonitorLock, which attempts to duplicate the behavior of the `lock` keyword in C#, according to the language specification.

MockLock, which makes it easy to confirm that code is locking as expected.

SingleThreadedLock, which actually doesn't lock at all.

I created the different Lock implementations with TDD. All 3 pass the exact same set of unit tests.

Fun stuff.

Conventional threadsafety

The conventional approach is to use the `lock` keyword to wrap operations that are sensitive to threading.


    class Counter

    {

        readonly object _syncRoot = new object();



        int i = 0;



        internal int GetValue()

        {

            lock (_syncRoot)

            {

                return i++;

            }

        }

    }

It's popular, and there's nothing wrong with it per se, but my goal is to use TDD. I can write a test that motivates the non-lock parts of this code, but I don't know how to write a test that demands the locking.

TDD and multithreaded code

I recently watched a video of a talk by Venkat Subramaniam called "Test Driving and Unit Testing Thread Safety". http://www.agilealliance.org/resources/learning-center/test-driving-and-unit-testing-thread-safety

My first understanding of the title of the talk was that Venkat was going to describe a novel way to test code for thread safety. That understanding makes sense if you think of unit testing as a way to measure & ensure the quality of code.

However, I keep having to remind myself that TDD is not primarily a testing activity; quality assurance is not its primary goal. The primary purpose of TDD is design - to help you write well-designed code. That works because you can't test things in isolation if coupling is high.

Any time you're doing TDD and have code you don't know how to test in isolation, you have 3 options:

Don't test it.
Test it in context, in an integration test / manual testing / etc.
Refactor.

That last one is the gift TDD offers, and that's what Venkat was recommending in his presentation.

As I was watching Venkat code, I realized that what he was creating looked very familiar. I went back and found this blog post from 9 years ago: http://blogs.msdn.com/b/jaybaz_ms/archive/2004/05/06/127480.aspx. What's interesting to me here is that we came up with that code without doing TDD. We were only coding for fun, and we didn't understand TDD at the time. We just refactored mercilessly and that's what we ended up with.

I've also been reading Arlo Belshee's blog. He talks about mocks as a code smell, and the pattern of using simulators instead of mocks for external dependencies.

Thinking about Venkat's talk, Arlo's blog, and my old blog post, I decided to attack this problem again as a kind of kata. Specifically, I wanted to practice some ideas I've been studying recently:

Strict RED/GREEN/REFACTOR
Work Tiny.
NCrunch.
Letting TDD drive the design.
Treating mocks as an invitation to develop multiple useful implementations

Problem statement

Write a threadsafe counter. It supplies sequential integers to callers from multiple threads.

That is, make this threadsafe:


    class Counter

    {

        int i = 0;

        internal int GetValue()

        {

            return i++;

        }

    }

(https://github.com/JayBazuzi/TddLocks/blob/master/LocksExperiment1/LocksExperiment1/Counter.cs)

Wednesday, June 16, 2010

Single-line blocks: A radical view

For a long time I've been a staunch advocate of using braces around single-line blocks in C-like languages:

   // OK
   if (foo)
   {
   return 1;
   }

   // Bad
   if (foo)
   return 1;

The obvious reason is to protect against this kind of bug:

   // very Bad
   if (foo)
   Debug.WriteLine("returning 1"); // added later
   return 1;

There's actually a bug in Visual Studio that we shipped because of this kind of mistake. If memory serves, it was in VS 2002. The repro:

Start debugging.
Open the watch window
Make a Remote Desktop connection to the session running the debugger

Result:

The font in the watch window changes to the system font (big & ugly).

The problem here was a bit of code in DllMain of a debugger DLL where a single-line block didn't have braces, and then another line was added later. Turns out DllMain gets called when connecting in a RDP session. The Watch Window font gets deleted prematurely. This was in the early days of Terminal Sevices / RDP, so it took us a little while to get cluefull (opposite of clueless!).

It would have helped if we auto-formatted our code, but we didn't like what the C++ auto-formatter did by default, so we didn't run it.

Since then, I have argued that all single-line blocks must have braces, to prevent this kind of bug.

Later I started reading about Extreme Progamming, and their radical views on simplicity. I decided that simple, single-line blocks are a nice goal, because they mean your code is simple. So, I now have an exception to my rule. A single-line block can skip the braces if 1) it is on the same line as the if/for/whatever statement, and 2) the whole line is short. Also, I consider this state of affairs is particularly desirable.

So:

// Good

if (foo) return 1;

else if (bar) return 2;

else return 0;

Tuesday, January 6, 2009

A way to manage extension methods in C#

I like C#'s extension methods, but I'm concerned about "polluting the namespace". I'm not the only one. I'm especially concerned about extensions to "object", as they appear everywhere. They can be useful, but wow that's a big impact. How to make valuable, wide-reaching extension methods possible without affecting lots of code that doesn't need the extension?

I've come up with an approach that helps me be picky about which extension methods are available in which contexts. That helps me manage how much pollution I am willing to accept. It is also means that most of the extension methods available in a given context are the ones that are relevant to the code I'm trying to write in that context.

Example:

namespace MyProject.Extensions.TypeExtensions
{
    static class _
    {
        public static bool DerivesFrom(this Type type, Type baseType)
        {
            while (type != null)
            {
                if (type.BaseType == baseType)
                {
                    return true;
                }

                type = type.BaseType;
            }

            return false;
        }
    }
}

Which is normally used like this:

    using Extensions.TypeExtensions;

    var typesDerivedFromFoo = from type in typeof(foo).Assembly.GetTypes()
                              where type.DerivesFrom(typeof(Foo))
                              select type;

Consequences

* That this approach works best if files are small, which means that my classes need to be small, too. If all your code is in a few files, there's no point to my method.

* All my extension methods are in the Extensions namespace.

* It's not friendly in languages that don't have extension methods. (You have to use the full name of the extension type):

Extensions.TypeExtensions._.DerivesFrom(type, baseType)

This isn't beautiful, but let's compare it to what you'd write if you didn't use extension methods at all:

TypeExtensions.DerivesFrom(type, baseType)

There is only an extra "_." and an extra "Extensions.", which isn't that much.

* My approach prevents you from using these methods as if they weren't extension methods, the way we did before:

using Extensions.TypeExtensions;
_.DerivesFrom(type, baseType);

Because "_." is ambiguous as soon as you bring in more than one extension.

* While C# requires that extension methods be defined in a class, the language doesn't care what the name of the class is. My approach reflects that, by giving the class a "nothing" name (and using the containing namespace as the visible name).

* The word "Extensions" appears twice in the namespace name, which is repetitive, redundant, says something that was already said. I'd rather call it "Extensions.Type", but that means that the name of type I'm extending ("Type") isn't available - I have to say "System.Type". Maybe that's a better choice.

Tuesday, November 11, 2008

Version control that scales down: Mercurial

When working on a small software project for myself, I still want version control, but I don't want the overhead of, say, TFS.

I decided to give Mercurial a try, and it seems to scale down very well. Here's what I have done:

Install TortioseHG
Open a PowerShell Prompt
Go to the directory that contains my source
hg init # create a new "repository"
hg addremove # add all files in the current directory
hg revert # avoid adding files you don't want
hg commit

A good idea is to create a .hgignore file to tell it what files you don't want to add, if there are some. For a small C#/NUnit project, mine looks like this right now:

syntax:glob

*.suo

*.user

bin\

obj\

TestResult.xml

NUnit.VisualState.xml

Then, to add this file, do hg addremove, then hg commit.

Then, you just edit your files as needed. When you're ready to commit, you do hg addremove, hg commit.

I don't have to worry about keeping my filesystem and my project in sync with my source control. It just does it.

In another directory I have some PowerShell scripts. Since I already had Mercurial installed, it was easy: hg init, hg addremove, hg commit. Tada, it's under version control.

Things I like about Mercurial:

No server setup
No need to decide where on the server your files should live
No need to "check out" a file before you can edit it - adding version control doesn't interrupt your existing workflow.
Works offline just as well as online

Mercurial has a lot more power if you want to scale up, with rich branching & merging. You can create a branch for an experiment or a 1-off bug fix release, all offline. That's cool, but right now that's not important for me: I really just want to track my changes.

The main thing I wish for, and it's pretty minor, is PowerShell cmdlets. They're really nice to work with. I don't expect them to appear any time soon.

People often look to version control software to provide backups of their source code. I only have Mercurial on one machine right now, and standard guidance says I should put the small Mercurial server on another machine (my Windows Home Server seems like a good choice), and "push" my changes on to it regularly. I'm not doing that, because my changes are backed up in other ways:

The directory that contains all my source code projects is in a Windows Live Mesh folder. That means it's backed up, even the full history, unless I accidentally delete the whole thing.
Every night my computer is backed up to my Windows Home Server. Even if I delete by accident, I won't lose much.

I'm not sure how well mesh & Mercurial will get along. If I sit make edits on computer A, and Mesh syncs to computer B, I could commit from B, and that will get synced back to A. That isn't what Mercurial is intended to do, but I can't think of a reason it wouldn't work. The idea of being able to go computer hopping without having to first commit, push or pull, merge, commit each time seems attractive.

Backup up your pending changes (TFS)

More than once I have destroyed my data by accident. I've certainly lost more data this way than any other. Version control is great for a lot of reasons; having a backup is just one of them. But if I have changes that aren't checked in, they are at risk. When I was working with TFS for version control, I wrote this PowerShell script to back up all my changes to shelvesets. TFS has the ability to enumerate all workspaces, so it's easy: you just run it once and it will back them all up. The shelveset names begin with "ZZZ" to sort them to the end of the shelveset list. It makes sense to run this as a nightly scheduled task. The script is also interesting as an example of how to manipulate TFS from PowerShell, via the TFS APIs, instead of trying to parse textual output:

[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.TeamFoundation.Client")
[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.TeamFoundation.VersionControl.Client")

$localWorkspaceInfos = [Microsoft.TeamFoundation.VersionControl.Client.Workstation]::Current.GetAllLocalWorkspaceInfo() | where { $_.Computer -eq $env:COMPUTERNAME }

"Found {0} workspaces to back up" -f $localWorkspaceInfos.Count | Write-Verbose

Download:

Warning: it has been a year+ since I last ran this script, and I don't have access to TFS these days to test it. I think it works, but YMMV.