Time to Bring Back the Toolsmith?

Good old The Mythical Man-Month. Here’s Fred Brook’s description of one of the roles I am identifying and quite often find myself in lately:

The Toolsmith: File-editing, text-editing, and interactive debugging services are now readily available, so that a team will rarely need its own machine and machine-operating crew. But these services must be available with unquestionably satisfactory response and reliability; and the surgeon must be sole judge of the adequacy of the service available to him. He needs a toolsmith, responsible for ensuring this adequacy of the basic service and for constructing, maintaining, and upgrading special tools — mostly interactive computer services — needed by his team. Each team will need its own toolsmith, regardless of the excellence and reliability of any centrally provided service, for his job is to see to the tools needed or wanted by his surgeon, without regard to any other team’s needs. The tool-builder will often construct specialized utilities, catalogued procedures, macro libraries.

You don’t usually see “Toolsmith” in a business card (I reckon I’ve never), but I am starting to recognize it as a very distinct role in agile teams. More often than not, collective code ownership makes this role distributed across the team, so everyone ends up taking care of the development environment, and the point of this post is to bring this forward for discussion. I see the collective ownership of the tools and development environment as a good thing, but not necessarily in an ad-hoc way as I see in most projects I’ve worked on.

It’s important to keep in mind that the development envinroment for a project is, essentially, a system in production. It needs just as much attention and support as any other production system, and it’s fairly easy to spot the lack of it: painfully long and quite often broken builds, continuous integration systems not properly set up, manual steps sprinkled over any otherwise automatable task, inconsistent settings and software setups in the developer workstations and a list with several manual steps to set up a new one.

I came to the conclusion that every single step added to any continuously repeated task during the development process adds not only a burden to the developers, but increases the chance of failure massively. If the only thing you have to do in order to make sure you can check in your code into continuous integration is run one build script, there’s not much you can do to screw it up except not running it. Add another couple of steps to that, and the chances of something going wrong increase exponentially.

The difference between:

run build


run build

Is that you get a false negative every time you forget to clean-database or start-webserver. And you have to run it all over again, this time blaming yourself for being such a git. By this point, your attention span is gone and you’re thinking about a trip to the coffee machine. If this is currently happening on your team, you need a toolsmith.