Sunday, September 28, 2014

Getting started in legacy code

This tweet got me thinking about how I make commits:
I started to reply in tweets, but decided I wanted a lot more than 140 characters.

Formatting

When I come on to a new project, one of the first things I like to do is bulk format all code. I don't care what the formatting is (your conventions are fine with me), as long as it's consistent an automated. Ongoing cost should be near zero.

What is refactoring?

A refresher on definitions of refactoring:
  1. Cleaning up the mess while working on something else.
  2. Cleaning up the mess while not working on something else.
  3. Highly disciplined manual application of a recipe for known-safe behavior-idempotent code transformation.
  4. Automated code transformation with a known-safe tool
Arlo Belshee argues that only #3 and #4 deserve to be called "refactoring". He calls #1 and #2 "rewrites", albeit highly localized. All the problems with rewrites still apply, just on a much smaller scale.

I'm inclined to agree with him, but most of the people I work with are happy to call #1 and #2 refactoring as well. I'm not interested in arguing about semantics, but I do find each of these definitions distinctly valuable.

I always advocate for removing barriers to Refactoring #4. They're unlikely to introduce bugs. They almost always make the code better (while most changes make the code worse). They're usually really easy to reverse if we change our minds later.

Refactoring #4 in traditional environments

My current project is sadly trapped in a non-distributed version control system, without lightweight branching. Code reviews are generally required before each change, but we make an exception for known-safe automated refactorings. For example, I might make a commit with this description:
R#: Extract Method `C.Foo`
The R# prefix indicates that this is a known-safe refactoring executed with ReSharper. The `back-tick` syntax is taken from Markdown. I check this in with the minimum of ceremony.

With practice, I have learned to make dramatic shifts in the clarity & structure of code using only the automated refactorings in ReSharper.

It's still my responsibility to make sure I don't break anything. Some ReSharper code transformations are not true refactorings. It's on me to know the difference, and to use the tool safely.

Usually I'll do this with a specific goal in mind: a feature or bug fix. Once the code is really nicely factored in that area, the behavior change is simple and obvious. I can send that out for code review by itself; doing that CR is really easy. (Sometimes people think I only ever work on easy problems!)

I will Refactor this way when I first come in to an area that I plan to work on. As I understand the code, I'll execute these refactorings to express my new understanding, checking in as I go. I'm much more conservative here, because I don't understand overall the intended design, and don't want to naively introduce an inappropriate structure. I pick the most valuable refactorings that I see. In this mode, if I get interrupted part-way through, at least I have made things better. (Thanks to Woody Zuill and Llewellyn Falco for that insight.)

Because I make these changes quickly, I rarely have to merge with someone else's changes. If they have a hard time merging with my changes, we can easily throw mine away and recreate them.

Modern environments

When I get to use Git or other modern source control, that allows an even better workflow.

First, I make a local branch.

Then I'll refactor as above, where each commit goes in separately. I still use prefixes like `R#:` to call those out. I can go a lot faster, because I can postpone any validation until the end of the process.

If I'm working on a feature or fix, goes in the same branch.

If I'm in a code-review-required situation, this pull request is easy to review if you look at each commit separately. 

If every change in the branch is cohesive, I'll merge like this:

> git rebase master
> git checkout master
> git merge --no-ff --no-commit FOO
> git commit    # provide a summary description

This gives a nice-looking commit history:

> git log --oneline --decorate --graph
*   ae593a5 (HEAD, master) Fix blah
|\
| * 67fa3f5 (FOO) Fix bug #2
| * fa27d00 R#: Rename `X` -> `Y`
| * 3a77461 R#: Extract Method `C.Bar`
|/

Just because I have a private branch doesn't mean I want to spend a lot of time there, though. No long-lived parallel development.

Refactoring #3

I'm still looking for a good way to call these out. What I want to tell you is "I didn't intend to change any behavior here. If you're searching commit history for a deliberate behavior change, look elsewhere. If you see a behavior change here, it was not deliberate." In the past I have used "REFACTOR: " as a prefix for this kind of commit.

other tags

Formatting changes are really dull. No one will ever want to read them. They affect many lines of code, so separating them out is really valuable. I commit them with a "FORMATTING: " prefix.

Sometimes I only change a unit test - not product code - and so there's no way I could be introducing a product code bug. Then I might prefix with "TEST: " (although if I extract a method from a unit test, maybe I tag that as a refactoring - still undecided).


No comments: