Use AI _in_ a tool not _as_ a tool

Or to be more exact, use a coding agent in a tool. This is my new starting point every time I begin to think of a tool I want to implement.

Being part of the pipeline

A couple of weeks ago I thought of creating an OpenCode agent that will check my uncommitted changes, construct a commit message and finally make the actual commit.

The first version was ok but I wanted to make a few changes. First I wanted the agent to follow certain guidelines, so I edited the agent file. Then I wanted to change the way it constructed the git command, so I edited the agent file. Finally I wanted to support the usage of GitButler, so I edited the agent file.

At that point I realized that I was violating the single responsibility principle big time. No matter the change I’ve kept editing the same component. And it hit me, fascinated by all the things I was able to do with a coding agent I failed to apply good engineering practices when it comes to the construction of tools.

Small coherent components that do one thing and do it well

When we write code we tend to break it into small modules, classes, functions that have one responsibility and provide a lean API. This way we can reuse components, combine them in different ways and replace them easily.

No need to do the opposite when it comes to tooling. The terminal has led the way by having small tools that do one thing (ls, grep, cat etc) and can be combined, by using pipes, into an entire workflow. My tools need to embrace that as well.

ai-commit.sh

So I broke my agent’s workflow into distinct components:

Create a single prompt from collecting all changes. This can be done by a bash script.
Feed the prompt to an agent that is configured to create a title and a message. This can be done by OpenCode with a custom agent.
Get the agent’s output and use it to make a commit using the appropriate tool. This can be done by a bash script.

I created the two scripts and also wrote one more that ties everything together: ai-commit.sh

In case you are wondering why Hemingway (or Hemi for the friends) is in the picture, it’s just the name I gave to what was left from the original agent: hemi.md

Pinky and Brain v2

Three months I wrote about Pinky and Brain, my agent/subagent duo that was helping me plan and execute a task while being in the loop. I was using it quite often but whenever I did I also saw the number of my available copilot requests decreasing fast! So I started avoiding it and preferred writing code manually like a caveman.

My setup was a testimony of using coding agent as a tool. It was doing everything. Apart from planning, that should be done by it, it was also looping through tasks, delegating work, making commits, asking for the user’s approval to continue looping. Many of these operations where new requests (x3 because of Opus).

So I sat down and broke that too:

Planning is still being make by an agent (Brain). Only this time it is not tied to a model and it is very restricted. It can only save the created tasks to beads which will be loaded as a skill.
Execution is still being done by an agent (Pinky). Only this time it is even more simpler. It is asked to just follow instructions nothing else.
Everything else is part of a bash script that (a) uses bd to get the next task, (b) provides its description as a prompt to Pinky, (c) closes the task when Pinky returns, (d) uses ai-commit.sh to create a commit, (e) loop again.

I’ve been using it for a couple of weeks now and I’m sure I have a better consumption of requests.

Relax, take a step back and start from the business logic

Another year of advent of code and this time I decided to participate. Not only that but I thought it would be a great way to teach myself Ruby.
This means that my train of thought gets interrupted quite often since for every little thing I come up with I have to stop and search how Ruby does it.

Day 3 – Gear Ratios

In today’s challenge the target is to calculate the sum of all numbers that are adjacent, even diagonally, to a symbol. The input looks like this:

467..114..
...*......
..35..633.
......#...
617*......
.....+.58.
..592.....
......755.
...$.*....
.664.598..

and a symbol is everything that is not a number or a dot (.).

Head first

Seeing this input combined with the fact that I don’t know Ruby throw me in a crazy rabbit hole where I was searching for two-dimensional arrays at one point and parsing strings at another.

Then I remembered that the adjacent part includes diagonals too so I dropped everything and start thinking of how I will combine numbers from one line with symbols from another.

This is getting big. Should I start smaller? Should I try to approach this in a 2×2 array? Should I do this or that? Chaos!

Start from the business logic

Thankfully after taking a break and drinking some water I realized that my need to answer all my unknowns had taken the best of me and I was viewing things wrong.

It does not matter how the input looks. It is just that, an input. I shouldn’t start from there.
It does not matter what/how Ruby does things. It is just a tool.

What matters is the business logic and in this case its quite simple:

Our business entities are Numbers and Symbols.
Our business logic dictates that a Number is next to a Symbol if it lies to the area that surrounds it.

Translating this to code:

	class Symbol
	attr_reader :value, :row, :column

	def initialize(value, row, column)
	@value = value
	@row = row
	@column = column
	end
	end

	class Number
	attr_reader :value, :row, :starting_column, :ending_column

	def initialize(value, row, starting_column, ending_column)
	@value = value
	@row = row
	@starting_column = starting_column
	@ending_column = ending_column
	end

	def is_next_to?(symbol)
	return false if (@row <= symbol.row – 2) \|\| (@row >= symbol.row + 2)
	return false if (@ending_column <= symbol.column – 2) \|\| (@starting_column >= symbol.column + 2)
	true
	end
	end

view raw gear_ratios.rb hosted with ❤ by GitHub

made things so much simpler:

	class GearRatios
	def initialize(numbers, symbols)
	@numbers = numbers
	@symbols = symbols
	end

	def sum
	@numbers.filter { \|number\| @symbols.any? { \|symbol\| number.is_next_to? symbol } }
	.sum { \|number\| number.value }
	end
	end

view raw gear_ratios_2.rb hosted with ❤ by GitHub

After writing and testing the business logic all I had left to do was to write the code that will produce our lists for numbers and symbols. In this case it just happens to be a two-dimensional array with string values.

Start from what matters

Being overwhelmed or not, relax, take a step back and start from what matters.

Presentation is important but it shouldn’t drive an approach since it might change often. Input is also important but it shouldn’t matter if we are dealing with a database, a web service or the file system.

Start from the business, make it work and then try to figure out how everything else can be plugged in.

Don’t share constants between production and test code

Building upon my previous post and the trick of being specific in the values the code respects, one pattern that I’ve noticed which can easily lead in many false positive tests is sharing a constant value between production and test code.

If the test code reads the value from the production, any change that was done by mistake will not affect the test which will continue to pass!

21 yeas of age

Lets say that we have two services, one checks if a customer can enter a casino and the other if she can buy alcohol. For both cases the law states that the minimum legal age is 21 years old.

The code has a configuration file, a domain and two modules for each service:

	// Production code:

	// configuration
	object Config {
	const val MIN_LEGAL_AGE = 21
	}

	// domain
	class Person(val age: Int)

	// entrance module
	fun canEnterCasino(person: Person): Boolean {
	return person.age >= Config.MIN_LEGAL_AGE
	}

	// alcohol module
	fun canBuyAlcohol(person: Person): Boolean {
	return person.age >= Config.MIN_LEGAL_AGE
	}

	// Test code:

	// entrance module
	fun `a customer can enter the casino when she is older than 21 years of age`() {
	val twentyOneYearOld = Person(Config.MIN_LEGAL_AGE)

	val actual = canEnterCasino(twentyOneYearOld)

	assertTrue(actual)
	}

	// alcohol module
	fun `a customer can buy alcohol when she is older than 21 years of age`() {
	val twentyOneYearOld = Person(Config.MIN_LEGAL_AGE)

	val actual = canBuyAlcohol(twentyOneYearOld)

	assertTrue(actual)
	}

view raw dont_share_constants__sharing.kt hosted with ❤ by GitHub

As you can see the tests consume the minimum age directly from the production code but the test suite passes, life is good.

Then one day, the law changes and the minimum legal age for entering a casino drops to 20 years! Simple change, not much of a challenge for the old timers so the task is being given to the new teammate who does not know all modules yet and is also a junior software engineer.
She sees the test, changes the value in the name to 20, sees the config, changes the constant’s value to 20, runs the test suite, everything passes, life is good! Only that it isn’t because the casino’s software now allows selling alcohol to 20 year olds!

Keep them separate

If the test code did not use the production’s code

	// Production code:

	// configuration
	object Config {
	const val MIN_LEGAL_AGE = 20
	}

	// domain
	class Person(val age: Int)

	// entrance module
	fun canEnterCasino(person: Person): Boolean {
	return person.age >= Config.MIN_LEGAL_AGE
	}

	// alcohol module
	fun canBuyAlcohol(person: Person): Boolean {
	return person.age >= Config.MIN_LEGAL_AGE
	}

	// Test code:

	// entrance module
	fun `a customer can enter the casino when she is older than 20 years of age`() {
	val twentyOneYearOld = Person(20)

	val actual = canEnterCasino(twentyOneYearOld)

	assertTrue(actual) // passed
	}

	// alcohol module
	fun `a customer can buy alcohol when she is older than 21 years of age`() {
	val twentyOneYearOld = Person(21)

	val actual = canBuyAlcohol(twentyOneYearOld)

	assertTrue(actual) // failed
	}

view raw dont_share_constants__not_sharing.kt hosted with ❤ by GitHub

then, after changing the constant’s value, the test suite would fail alerting the software engineer that something has broken forcing her to figure it out and craft another solution.

Don’t force your objects to construct what they need

Let’s say we have an object that handles instances of Person. For example PeopleScreen:

	// Person.kt
	class Person(
	val name: String,
	val surname: String
	)

	// PeopleScreen.kt
	class PeopleScreen(
	private val people: List<Person>
	) {

	fun render() {
	people.forEachIndexed { index, person ->
	println("${index + 1}. ${person.name}, ${person.surname}")
	}
	}
	}

	// Usage:
	fun main() {
	val people = listOf(
	Person("Joe", "Dow"),
	Person("Jill", "Doe"),
	Person("Jack", "Black")
	)

	val screen = PeopleScreen(people)
	screen.render()
	}

view raw dont_force_construction__the_proper_way.kt hosted with ❤ by GitHub

PeopleScreen renders instances of Person so this should be the only format we provide to it. Let me explain.

Forced construction

There is a new flow that ends in opening PeopleScreen but all the information for the list of people are in a Map<String, String>. There is no reason to alter PeopleScreen in order to support this new format:

	// Person.kt
	class Person(
	val name: String,
	val surname: String
	)

	// PeopleScreen.kt
	class PeopleScreen(
	private val people: List<Person>
	) {

	constructor(people: Map<String,String>) : this(
	people.map { entry -> Person(entry.key, entry.value) }
	)

	fun render() {
	people.forEachIndexed { index, person ->
	println("${index + 1}. ${person.name}, ${person.surname}")
	}
	}
	}

	// Usage:
	fun main() {
	val people = mapOf(
	"Joe" to "Dow",
	"Jill" to "Doe",
	"Jack" to "Black"
	)

	val screen = PeopleScreen(people)
	screen.render()
	}

view raw dont_force_construction__the_forced_way.kt hosted with ❤ by GitHub

the new format is passed through an overloaded constructor but the same goes if we use a setter method

Why we shouldn’t do it

We could argue that by doing so we tie the object with each special format making the code hard to maintain and scale but the real reason is that we violate the SRP principle since PeopleScreen will have more than one reasons to change. One if something changes in the way we render and two if something changes in Person‘s construction.

What we should do

We should keep PeopleScreen only consuming Person and move all transformations to their own objects allowing a coordinator to transform and pass data around.

Good practices: First write the test then fix the bug

You get a report about a bug. You open the app, follow the steps to reproduce it and, as mentioned in the report, your app is misbehaving. What’s next?

You can either dig immediately in the code and fix the bug or you can re-reproduce the bug, only this time in a test. The second. Always go with the second option.

Here is why:

From now on you will have a regression test.
Meaning that if a change in the code breaks what you fixed you’ll get notified from the test suite and not your users
It keeps you focused / You know when you finished.
This is a benefit you get from TDD in general. When the test passes the bug is fixed and you can move to your next task. Also, since you have to make the test pass, anything else that popped up during your research for the bug can wait (I usually write it down to a notepad I keep next my keyboard).
You get a better understanding of the code.
By trying to write the test you get a better knowledge of how things are connected and communicate. Especially if you are new to a project this will boost your understanding significantly.
You discover more corner cases.
There are times that by writing this one test and seeing what inputs a class/function can have, you wonder how will the app behave under certain values. Finish the task at hand and then add a test for each case you want to explore. You might end up solving more bugs!