Thursday, February 9, 2017

Safely extract a method in any C++ code

Let me set the scene:

You're reading some code. Some old, gnarly C++ code. The method you're looking at is 5000 lines long. It's clearly really important. You need to understand it, but it's such a mess.

You notice a 1000 line block of code that seems to stand apart from the rest in some way. Your intuition as a programmer says "this chunk of code is different from the rest of the method, and it might make sense as a new method."

If you could Extract Method, that would help. It would let you shrink the problem space down as you endeavor to understand the code. The parameter list would make it obvious which variables are relevant to this block, and which are not - the data flow would become more clear.

Maybe you use CLion, or ReSharper, or Visual C++, or Visual Assist, and you try its built-in Extract Method feature. But these tools are far from perfect. They often introduce a behavior change, like copying a parameter instead of passing by reference. If you're lucky you get a compile error and back up; if you're unlucky, the behavior change is subtle and you've just introduced a bug.

With that in mind, I offer this recipe for C++ Extract Method. Follow the recipe exactly, and you can be sure the code will work the same after as before, or the recipe will kick you out, saying you can't extract this method. You won't have to do a bunch of careful analysis - we'll lean on the compiler for that. (If it ever does introduce a behavior change for you, that's a bug in the recipe. Let me know and we'll fix it.)

Thanks to Llewellyn Falco and Arlo Belshee who said the right things at the right time to make this recipe pop in to existence, and my coworkers at Tableau Software for filling in a bunch of details. They deserve 99% of the credit.

The recipe.

If you get to the end of step 1, the refactoring is possible - it will produce a valid result.

This recipe only works on whole blocks (surrounded by braces) or a single for/while/if statement. Consider using Introduce Block to get braces around the code you want to extract.

You need C++11 or later. You only need it for the duration of this refactoring, for this one file. So if you can't upgrade your whole project right now, that's fine. Get a C++11 compiler running on this file, refactor, and revert the tools.

Each step is a safe micro-refactoring. You can check in at the end of each step.

The underlying principle is Tennent's Correspondence Principle.

1. Introduce a lambda

Surround the block in question with:

 [&]() {
  // original code
 }();

If you haven't seen this syntax before, check out the C++ lambda docs.

Compile the file. Possible errors:
  • not all control paths return a value. You have an early return. Back up and either Eliminate Early Return/Continue/Break or extract something different.

  • a break/continue statement may only be used within.  You have a break/continue. Back up and either Eliminate Early Return/Continue/Break or extract something different.
Check the new lambda for any return statements. If there are any returns and it's obvious that all code paths return, then add a return statement. If there are any returns and it's not obvious that all code paths return, then back up and either Eliminate Early Return/Continue/Break or try extracting something different.

2. Introduce Variable on the lambda

i.e.
 [&]() {
  // ...
 }();

becomes:

 auto Applesauce = [&]() {
  // ...
 };
 Applesauce();

Compile to make sure you didn't typo.

3. Set the return type

Set the return type on the lambda (even if it's `void`). In Visual Studio, the tooltip over `auto` will tell you the type.

i.e.:
 auto Applesauce = [&]() -> SOMETYPE {
  // ...
 };

Compile to make sure you got the return type correct.

4. Capture explicitly

Replace `[&]` with `[this]` (or `[]` in a free function) and compile.

For each error about a variable that must be captured:
  • Copy the variable name
  • Paste it in to the capture list, prefixed with `&`
  • Repeat until green.
i.e.:
 auto Applesauce = [this, &foo]() -> void {
    // something with foo...
 };

The order of the capture list will influence the order of the parameters of the final function. If you want the parameters in a particular order, now is a good time to reorder the capture list.

5. Convert captures to parameters

For each captured local variable (except `this`)
  • Go to the definition of the variable
  • Copy the variable declaration (e.g. `Column* pCol`)
  • Paste in to the lambda parameter list
  • Make the parameter const and by-reference
  • Remove the variable from the capture list
  • Pass the variable in to the call
  • Compile.
i.e.:
 Foo foo = ...
 Bar* bar = ...
 auto Applesauce = [this, &foo, &bar]() -> void {
    // something with foo and bar...
 };
 Applesauce();
becomes

 Foo foo = ...
 Bar* bar = ...
 auto Applesauce = [this](Foo& foo, Bar*& bar) -> void {
    // something with foo and bar...
 };
 Applesauce(foo, bar);
Note: even pointers must be passed by reference.

6. Try to eliminate `this` capture

  • Remove `this` from the capture list
  • Compile
  • If the compile fails, undo

7. Convert lambda to function

If `this` is captured, use 7A.
If `this` is not captured, use 7B.

7A. Convert this-bound lambda to member function

  • Cut the lambda statement and paste it outside the current function
  • Remove `= [this]`
  • Copy the signature line
  • Add `SomeClass::`
  • In the header file, add the function declaration in a private section.
  • Compile 
i.e.:
 auto SomeClass::Applesauce(Foo& foo, Bar*& bar) -> void {
// ... };

Note: this is using new C++11 syntax for functions. You may want to convert to the old syntax, like this:

 void SomeClass::Applesauce(Foo& foo, Bar*& bar) {
  // ...
 };

7B. Convert non-this Lambda to free function

  • Cut the lambda statement and paste it above the current function.
  • Remove `= []`
  • Wrap it in an unnamed namespace
  • Compile
If the free function uses typedefs/aliases or classes nested in the original class, convert the free function to a private static function of the original class (7A).

i.e.:

namespace {
 auto Applesauce(Foo& foo, Bar*& bar) -> void {
// ... };
}

Note: this is using new C++11 syntax for functions. You may want to convert to the old syntax, like this:
namespace {
 void Applesauce(Foo& foo, Bar*& bar) {
// ... };
}

2 comments:

Arne Mertz said...

Thanks for the detailed recipe! I've written something similar but way shorter a while ago (see below). You mention setting the return type, but if what you factor out was a block in the beginning, then there is none. If it wasn't a block, then introducing the lambda (or any block) might result in compile errors when variables that are created in the new block are used after it. These together make up the return type of the lambda and the new function.



https://arne-mertz.de/2016/10/large-legacy-applications-tools/#Example_factor_out_a_function

Jay Bazuzi said...

Cool Arne.

I may be misunderstanding the concern you bring up. The recipe demands a block (or a scoping statement) to avoid the problems of changing variable scope and lifetime. But blocks can return a value. For example: https://gist.github.com/JayBazuzi/9b5b2aeb95368b10decbb4c9706a88ae

I do think there's something that needs fixing in the recipe. There are multiple cases but they are all a bit muddled today:

1. No return statements, no problem.
2. Return in a `void` method.
3. Return a value in a non-`void` method.

If there is a return, you have to make sure that all paths return, and in #2 the compiler won't help you.

Also, a previous iteration of the recipe could extract an expression, not just a block of statements, but I removed that for simplicity. We'll have to fill that in later.