When not to automate: the decision rule I use before I build anything

I have deleted more automations than I currently run. Not because they broke — because they worked perfectly and made things worse.

That sentence is worth sitting with, because it’s the thing the “automate everything” crowd never addresses. An automation that runs reliably and degrades the quality of your work is not a productivity win. It is a liability that looks like one. The only signal that catches it is noticing, several weeks after the fact, that the outputs you’re producing have quietly gotten worse — and that the automation was covering up the decline.

This is the decision rule I now run before I build anything. It’s a single question with three sub-checks, and it has saved me more time than most of the automations I would have built without it.

The cases that taught me the rule

I have built four automations that I later deleted. All four worked. Here is what they did and why they went.

Auto-summarised meeting notes. A model running on every meeting transcript — instant meeting summaries sent thirty minutes after every call. The problem: the summaries were accurate and useless. Meeting notes need curation, not compression. The automation surfaced what was said in proportion to how much it was said. What actually mattered in most meetings was one sentence spoken once, tentatively, that I had annotated and would have built the brief around. That sentence did not stand out in a summary. It looked like the other forty-nine sentences. The automation produced clean, professionally formatted documentation of the least-important parts of every meeting.

Replaced by: the three-tag annotation system. Manual. Not automatable.

Auto-generated client follow-up drafts. A model drafting follow-up emails from the meeting summary. The problem: it applied a uniform professional register to relationships that required differentiated handling. One client wanted brevity and directness. Another needed warmth and context. Another was in a politically sensitive position that required careful framing. The model produced correct, smooth, professional follow-ups. They were wrong for the relationship in the same way a pre-written sympathy card is wrong — grammatically fine, situationally off. Three clients responded in a tone that told me they’d felt the absence of a real person on the other end. The automation was technically correct; the outputs were professionally wrong.

Replaced by: nothing. Some follow-ups have to be written by hand.

Briefing digests. A daily digest of everything tagged as important across my notes from the past week. The problem: the habit of daily digests made me less selective about what I tagged as important. When a system will aggregate everything, you lose the discipline of deciding what matters before you capture it. The digest accumulated. Comprehensiveness replaced selectivity, and selectivity was the point.

Replaced by: a weekly fifteen-minute manual review of the week’s raw notes. I pick the three things that actually moved something. That constraint is load-bearing.

Confidence scores. Asked the model to rate its own research outputs on a 1–10 confidence scale. The problem: the numbers were fabricated. The model pattern-matched on what a confidence score looks like and produced plausible numbers. I deferred to them. I had automated the mimicry of judgment and mistaken it for judgment itself.

Replaced by: my own uncertainty labelling. When I’m not sure something is right, I write “CHECK:” before it. When I am sure, I write the claim. That system is binary and honest. It costs nothing.

All four were deleted not because they broke but because they were optimising for efficiency in places where efficiency was not the limiting constraint.

The decision rule

Before I build any automation, I run one question:

Is the bottleneck in this task the time it takes, or the judgment it requires?

If the answer is time, automation is worth considering.

If the answer is judgment, automation is worth avoiding — or at least, the judgment-containing step should be the one thing the automation doesn’t touch.

Three sub-checks follow from that:

Sub-check one: what does failure look like, and how fast does it surface? Some failures are obvious and fast — the automation breaks, nothing works, you notice immediately. Some failures are invisible and slow — the automation runs, produces plausible output, and degrades quality over weeks. The slow ones are worse. Before building, I ask: if this automation produces bad output silently, how long will it take me to notice? If the answer is “more than a week,” I want a human checkpoint before the output is used, not after.

Sub-check two: am I automating the representation of the work, or the work itself? This is the most important distinction I’ve found, and it’s easy to miss because good AI output looks like good work. Formatting a brief, drafting a summary, structuring a document — these are representations of thinking, not thinking. Automating them is low-risk if the underlying thinking is yours. But if the automation produces the representation before the thinking has happened, you get something that looks complete and isn’t. The automation I should build is the one that formats the output after I’ve done the judgment work. The one I should avoid is the one that produces the output instead of it.

Sub-check three: does this task compound? Some activities get more valuable as you do them repeatedly — you develop pattern recognition, build relationship knowledge, refine a model of a client’s situation. If you automate a compounding activity, you stop compounding. The machine runs the reps and you don’t. Over a year, the difference is significant. Over three years, it’s the gap between someone who has a genuinely deep view of their domain and someone who has good tooling. I keep anything that compounds in my hands.

What the rule permits

To be clear: I automate a lot. This rule does not produce a skeptic who does everything by hand. It produces specificity about where the automation goes.

I automate formatting and structure. I automate compression of material I’ve already curated and annotated. I automate retrieval — building systems that surface past thinking on demand. I automate scheduling and routing: the administrative layer of any workflow that is purely about moving things from one place to another.

I do not automate the first articulation of my position on anything. I do not automate relationship handling. I do not automate the step that requires me to decide what matters. And I do not automate anything where the failure mode is confident-sounding wrongness, because that failure mode is invisible until it isn’t.

The question is never “can this be automated?” It is “what is the real cost if this produces a plausible wrong answer, and who notices?”

The document I keep

I maintain a short document — a decision log — where I record every automation I’ve considered and didn’t build, and why. It has about two dozen entries. It is, genuinely, more useful than the list of automations I run, because the entries contain the reasoning about why a given task is judgment work, and that reasoning transfers to new decisions. Every time I am tempted to build something for a similar task, I check the log first.

The decision to not automate is as deliberate as the decision to automate. Make it on purpose, document it, and it stops being a missed opportunity. It becomes policy.

That document — the tool decision matrix and the decision log format I use — is available as a template. The logic in the form is the same logic I’ve described here; the template makes it reusable across recurring decisions.

→ Tool Decision Framework — $39. Structured decision matrix + decision log. Drop into Obsidian or Notion. (Shipping to the newsletter list first; sign up below.)

Four automations deleted. Each one taught something the build-first community doesn’t say out loud: the question isn’t whether you can automate it. It’s whether, two months from now, you’ll be glad you did.

Next in Systems → How I keep AI output honest — the checks I run before anything goes to a client.

Companion template · from this article

The Prospect Intake + Tension Brief

The exact Obsidian template and the full prompt chain — the structured intake note, the contrast prompt, and the provocation checklist. Drop it into your vault and run your next prospect through it in forty minutes.

$29 Coming soon

Disclosure: Some links on this site are affiliate links — if you buy through them I may earn a commission, at no cost to you. I only recommend tools I actually run, and I tell you when something I tried didn't make the cut. That's the whole promise here.