What does a senior software engineer and ML engineer do?

Mike builds production systems across fintech, healthcare, AI, and infrastructure. With 25+ years of experience, he architects scalable solutions, implements ML pipelines, and delivers working software that ships to production.

How much does a senior software engineer cost in 2026?

Senior software engineer rates vary from $150-$500/hr depending on specialization. Mike's rate is $150/hr for hands-on engineering work and technical leadership. Most engagements start with a 90-minute paid discovery call to scope the work.

How can AI reduce software development costs?

AI automation can handle code generation, testing, deployment, and documentation — tasks that consume 60-70% of engineering time. AI-powered pipelines let existing teams produce 5-10x more output without adding headcount, meaning faster time to market for startups and more output from existing teams at scale-ups.

Do I need a full-time senior engineer or is contract work enough?

If you need specialized ML expertise, infrastructure scaling, or complex system architecture for a specific project, contract work is often the right call. You get 25+ years of hands-on experience without the $300k+ salary of a full-time hire.

Can I build custom AI tools for my business without a technical team?

Yes. Mike builds custom AI tools for businesses — document processing, automated workflows, intelligent knowledge bases, custom CRMs powered by AI. You own the source code, there's no vendor lock-in, and your team is trained to maintain it.

How is hiring an AI consultant different from hiring a development agency?

Agencies give you a team of juniors with 3-5x markup and months of back-and-forth. A senior AI consultant offers direct communication, 25+ years of hands-on experience, and AI automation that matches agency speed at a fraction of the cost.

How do I train my engineering team to use AI effectively?

Mike offers AI workshops ranging from half-day intensives to multi-week cohorts. Programs cover AI-powered development pipelines, automated testing and deployment, and system architecture — with custom curriculum built around your stack, codebase, and team skill level.

Are you available for on-site engineering work?

Mike is based in the San Francisco Bay Area and available for on-site work and remote engineering roles anywhere in the US.

Engineering

I Built an AI Agent That Writes All My Production Code

I'm Mike. I've been writing software for 25+ years. For the last year, an AI agent called AM has been writing all of my production code. Not autocomplete. Not copilot suggestions I accept or reject. Fully autonomous task execution — I give it a ticket, it reads the codebase, writes the code, runs the tests, commits, and moves on.

AM built my entire portfolio site — augmentedmike.com. Every page, every component, every deployment. I direct strategy and make architecture decisions. AM executes. That's the arrangement.

This post is about the engineering that makes it reliable. Not the prompting tricks — the actual architecture decisions that determine whether an autonomous agent ships working code or produces expensive garbage.

The agent is stateless. That's the whole trick.

Every agent framework I evaluated tries to maintain state in the model's context window. They carry conversation history, accumulate knowledge across turns, build up a mental model of the project. It sounds smart. It doesn't work.

Context windows drift. The model "forgets" instructions from 10,000 tokens ago. Hallucinations compound — one wrong assumption becomes the foundation for the next ten decisions. And when something goes wrong, you can't audit what the model "knew" at the time because the state was inside the model, not in files you can read.

AM is stateless. Every run is one-shot: read state from files, do one unit of work, write state back, exit. The agent carries nothing between invocations. Zero. The files are the source of truth.

This sounds like a limitation. It's the key to the whole system. If a run fails, you run it again — it reads the same files and tries again from a clean state. No accumulated errors. No context window corruption. No debugging "what was the model thinking." The files tell you everything.

Three-tier memory (because stateless doesn't mean amnesic)

A stateless agent still needs to know things. It just can't store them inside the model. So I built three external memory systems, each modeled on how human memory actually works.

Server infrastructure for AM

Short-term memory is a folder of markdown files at workspaces/memory/st/. Every agent reads every file in this folder before it starts work. No search. No retrieval step. Whatever's in st/ is always injected, always applied. This is where unconditional rules live — "never use git stash," "the database schema changed, use the new column name," "the client hates blue buttons." Small, fast, always on. Like working memory in your brain — the stuff you hold in your head right now.

Long-term memory is a SQLite database with FTS5 (full-text search) at workspaces/memory/lt/memory.db. When the agent needs context, it runs memory recall "query" and gets ranked results. Research notes, project-specific context, lessons from past failures. The agent has to ask for it — it doesn't inject automatically. This keeps context clean and focused.

Episodic memory is the git history. Every iteration produces a log file at iter/*/agent.log. The commit message timestamps what happened. An agent can read any past iteration's log to understand what was tried, what failed, and what was learned. When a task ships, all the iteration commits squash into one clean commit — the noisy step-by-step record compresses into a single fact: "this task was completed."

Gated verification (the agent can't lie about being done)

Tasks move through four states: backlog → in-progress → in-review → shipped. Each transition has a gate enforced by code.

A task can't leave backlog without criteria.md written. It can't leave in-progress without all criteria having implementation. It can't ship without every acceptance criterion verified against the actual output and tests passing.

The gates are deterministic. The agent doesn't self-report "I think this works." The system checks. If verification fails, the task goes back to in-progress with a log entry explaining what failed. This is the difference between an agent that produces working code and one that produces confident-sounding garbage.

Worktree isolation (multiple agents, zero conflicts)

Each task gets its own git worktree. The agent works in an isolated copy of the repo. When the task ships, the worktree squash-merges into the integration branch. Clean linear history.

This means I can run multiple agents on different tasks simultaneously. Agent A is building a new feature while Agent B is fixing a bug in a different part of the codebase. They can't step on each other because they're in separate worktrees. When both ship, the merges are clean because each one is a self-contained diff.

The startup serialization is one of the weirder implementation details — I had to add a lock to prevent concurrent OAuth token refresh races when multiple Claude CLI processes start at the same time. Four seconds of hold time per startup, then they run fully concurrent. Small thing, but it took a day to debug.

ClaimHawk — where the agent meets the real world

The other project that pushed this architecture was ClaimHawk — a system that automates dental insurance claims. This is where AI stops being about text generation and starts being about interacting with legacy software that was never designed for automation.

Insurance carrier portals don't have APIs. They have websites from 2008 with session timeouts and CAPTCHAs. I tried Playwright with CSS selectors for months. Every portal redesign broke everything. Then I switched to computer vision — the system reads the screen semantically, like a human would. When the portal changes its layout, the vision model adapts without code changes. The maintenance cost went from "constant firefighting" to "near zero."

The models are fine-tuned Qwen3 running on-prem, not cloud APIs. Dental claims involve patient health data — HIPAA means that data can't hit someone else's servers. Open-weight models trained on real claim data, running on hardware the practice controls. The tradeoff is you own the fine-tuning pipeline. The upside is you own the capability permanently.

What actually matters (and what doesn't)

After a year of running autonomous agents in production, the stuff that matters isn't what I expected.

Prompting barely matters. The architecture matters. How you structure state, how you verify output, how you handle failure — that's 90% of whether the agent produces working code. Prompt engineering is a rounding error compared to having deterministic verification gates.

Statelessness is non-negotiable. Every system I've seen that tries to maintain state in the model eventually produces a bug that nobody can explain because nobody can audit what the model "knew." Files are auditable. Model state is not.

Local models outperform on domain tasks. A fine-tuned Qwen3 on dental claims data beats GPT-4 on dental claims processing. Not close. The general model is smarter overall but doesn't know what a CDT code is. The fine-tuned model is less capable generally but knows the domain cold.

Vision beats scraping. If you're automating interaction with any UI that you don't control, computer vision is more resilient than DOM-based approaches. Higher initial investment, dramatically lower maintenance.

The system is open-source. You can see it at helloam.bot. Happy to answer questions about any of it.

I build these kinds of systems for clients too

Autonomous AI pipelines, custom automation, production ML. If your business has a process that's eating time, I can probably automate it.

Book a call AI Consulting services →