How I built an AI agent system to run my side projects

10.06 2026, 13 minutes read time

TL; DR: Agentic AI systems can be built for many purposes. This article is about how Fortytwo’s COO Remi Vandemir built an agent system that manages all the side-projects that he doesn’t have time to manage himself.

Learn how you can build something that is truly helpful, efficient, and gets the job done.

Table Of Contents

How I built an AI agent system to run my side projects
The challenge: too many ideas and not enough time
Obsidian as the shared knowledge base for AI agents
The main parts of the AI side-project system
The Improver: the agent that turns missions into pull requests
Why scheduled mission rotation keeps the agents useful
Skip conditions: teaching the agent when not to open a PR
The output layer: reviewing AI-generated work as normal GitHub PRs
The Devil: an adversarial AI reviewer for plans, PRs, and proposals
Why every generative agent needs a reviewer that challenges it
The Innovation Channel: using agents to propose new product ideas
From proposal to backlog to pull request
The Schedule: running the agent system with launchd
The results: 41 agent runs, 14 PRs, and 28 product ideas in one week
Choosing the right model for each type of agent work
What surprised me about running autonomous agents on side projects
My side-project graveyard has gotten a night shift

The challenge: too many ideas and not enough time

A few months after The Deep Thought weekend experiment, I ran into a problem at home that may feel familiar: too many ideas, not enough hours, and a ~/projects/ folder full of good intentions. My Obsidian vault was full of “next step” lists I rarely returned to, and every project had enough momentum to feel alive, but not enough structure to move forward without me remembering to push it. There were seventeen folders in there, half without a commit in the last month or more, three of them with real users who would notice if something broke.

So, over a few late evenings, I built a personal operations layer for my side projects.

It picks up work, runs missions, opens PRs, writes status reports, proposes new ideas, and keeps going while I sleep. This is the story of what runs, what surprised me, and what I would change.

The agents run in a contained layer, on top of infrastructure that was secured first; a foundation that makes the system useful rather than reckless, and I will come back to that in a separate post, because it deserves more space than a disclaimer.

This post is about my actual build.

Obsidian as the shared knowledge base for AI agents

Everything starts with my Obsidian vault, which holds the same kind of material that would otherwise live in Notion: project notes, decisions, daily logs, implementation details, lessons learned, rough plans, and small pieces of operational wisdom. The difference from Notion is that it all exists as plain markdown on disk, a detail that matters more than I knew when I first started out.

Markdown on disk means every agent on my machine can read and write to the same knowledge base without authentication flows, schemas, custom APIs, or fragile integrations. For agents that do not have direct filesystem access, I put an MCP server with tools like vault_search, vault_project_notes, and vault_add_log, in front of the Obsidian vault, so they can access it. I also added streamable HTTP over Tailscale, so that my laptop, phone, and home server can all talk to the same brain.

The Obsidian vault self-maintains through hooks. When I type something Claude Code interprets as “I just learned X,” a UserPromptSubmit hook drops a flag for later capture. When an edit creates a wiki link to a note that does not exist, a PostToolUse hook warns me. A weekly miner scans transcripts and surfaces “wisdom candidates” like gotchas, rules of thumb, and lessons I can promote into the permanent knowledge base. The point is that the vault is always there, always writable and continuously shared by every part of the system.

My command deck: the local dashboard for the agent fleet

On top of the vault, I built a cockpit named My Command Deck. The app itself is called Brain-portal. It is a Next.js 16 app pinned to http://brutus/brain on my Tailnet, and it is LAN-only because I do not want this on the open internet. The stack is deliberately ordinary: App Router, Tailwind 4, the Fortytwo Babel design system, and JetBrains Mono for code.

The Hub: a live overview of PRs, failures, and active agent runs

The Hub is the home page of the system and gives me the state of everything at a glance, alerting me about what failures need attention, how many PRs are open, and what parts of the fleet are idle. Below it all is an activity feed. A right rail shows what the agents are doing right now, a live ticker counts up while a run is in flight, and Cmd-K searches the entire vault in milliseconds.

The main parts of the AI side-project system

Before getting into detail, it helps to separate the system into its main parts.

The Improver is the agent that does the implementation work. It takes a project and a mission, investigates what needs to be done, makes plans, edits the code, runs checks, and opens a PR.

The Missions define the kind of work the Improver should look for. They keep the system from doing the same type of maintenance work every time.

The Devil is the adversarial reviewer. It does not write code, but reviews plans, PRs, and proposals for weak assumptions, risks, and likely failure modes.

The Innovation Channel is the idea generator. It reads project context and recent activity, then proposes new things a project could do.

The Schedule keeps the system running without relying on me to remember it. It controls when missions run, how often they run, and how failures are surfaced.

The Output Layer is deliberately ordinary. Work comes back as GitHub PRs, proposals, status updates, and morning briefs, so I can review it in the same way I review human work.

The Improver: the agent that turns missions into pull requests

The Improver is the agent that carries out implementation work across the active projects. It takes a project and a mission, then runs through the full loop: discovery, planning, testing, criticism, execution, and finally a pull request. The mission gives the run its shape, so the agent isn’t just looking for any possible change for the sake of change, but rather for a particular kind of useful and directed change.

The mission types: maintenance, security, UX, refactoring, features, Sentry fixes, and dependency audits

The current missions cover the recurring work I want to do across the projects:

Maintain handles small improvements, test gaps, and dependency updates.

Security looks for RLS holes, secrets in client bundles, and input validation issues.

Ux-polish checks accessibility, mobile breakage, loading states, and rough edges.

Refactor handles structural cleanups under roughly 600 lines.

Feature picks a GitHub issue labeled good first issue and implements it.

Sentry-fix looks at the top Sentry error and attempts a focused fix.

Deps-audit runs npm audit and checks outdated dependencies.

Why scheduled mission rotation keeps the agents useful

Each project has a weekly mission schedule where each day has its own mission run:

– Weeknights run maintain
– Saturdays run security
– Sundays alternate between ux-polish and deps-audit

The rotation matters as the system otherwise tends to drift toward the type of work that is easiest to find. If every run is “maintain”, many PRs eventually become small test additions or dependency nudges. Those can be useful, but they are not the whole picture. The mission schedule forces the agent to inspect different parts of each project and keeps the work from clustering around the same familiar task.

Skip conditions: teaching the agent when not to open a PR

The Improver also has explicit skip conditions for each mission. If security runs several times on fortytwo-babel and skips every time, that is still useful information, meaning there are no obvious security issues available for that mission. I want the system to be comfortable returning without a diff when there is no change worth making.

The output layer: reviewing AI-generated work as normal GitHub PRs

The output from the Improver is a normal GitHub PR, and I review it like any other PR. Roughly 60% of what it ships gets merged. The other 40% either needs human judgment that the agent does not have or solves a problem that was not worth solving.

I still read every PR, but reviewing a finished diff usually takes minutes, while writing it myself would often take hours. That makes the trade useful even when a significant share of the work is rejected. The rejected PRs are an important part of the system, because they show where the agent’s judgment ends, and mine begins.

The Devil: an adversarial AI reviewer for plans, PRs, and proposals

The Devil is an adversarial reviewer in the fleet that works on pushing back on plans, proposals, and changes before I commit to them. It stress-tests assumptions, points out risks, and lists failure modes I may not have considered, its output being objections instead of code.

I use it on Improver PRs that touch anything risky before I merge them, and also on Innovation proposals I am tempted to approve before they move into the build queue, not to speak of on my own plans when I am about to spend more than thirty minutes on something I have not pressure tested.

Why every generative agent needs a reviewer that challenges it

The Devil often catches issues I would otherwise have found later, after spending time going in the wrong direction. Sometimes its objections are wrong, but they are still useful because they force better decisions. When one agent generates something, another agent should be allowed to challenge it.

The Innovation Channel: using agents to propose new product ideas

The Innovation Channel is responsible for proposing new work rather than fixing existing issues. Every twelve hours, a build-something-new mission runs on a selected set of projects. It reads the project’s vault note, looks at recent activity from the brain, and writes one proposal describing something the project could do that it does not do today.

At this stage it does not write code, it only proposes.

I expected many of the proposals to be obvious, such as writing a README, adding a settings page, or creating an export button. Some were simple, but several were more useful than I expected. One project’s vault note produced a proposal for an export format I had not considered. Another suggested a panel I built the next weekend. A few have been strong enough that I wanted them in the product right away.

This has become one of the more valuable parts of the system because it gives me ideas from a different angle. The goal is not to ship features without review. The goal is to surface ideas I might not generate myself when I am tired, too close to the project, or focused on the wrong problem.

From proposal to backlog to pull request

Proposals land in Brain/Innovation-proposals and appear in the portal under /innovation. I read them with my morning coffee and approve the ones worth building. Approved proposals drop into a Kanban backlog, and on its next feature mission, The Improver can pull the top item from the board and ship it as a PR. That gives the system a clean path from idea, to backlog, to branch, to review.

The Schedule: running the agent system with launchd

Everything runs on launchd, macOS’s native scheduler. That gives me proper logging, restart-on-crash, and KeepAlive, so a job that dies mid-run can come back without custom wrapper scripts. The cadence is straightforward: nightly triage, The Improver runs every few hours per active project, The Innovator runs twice a day on a selected subset, and digests and transcript mining runs weekly.

How the portal shows job status, failures, and upcoming runs

Every job is a .plist in ~/Library/LaunchAgents/, and the portal parses those plists directly into the Schedule page. From there, I can see when each job last ran, when it will run next, and which one failed most recently. If I unplug for a week, the system keeps going, and when I come back, the morning brief gives me a triage list instead of leaving me to reconstruct what happened.

The portal itself runs as a launchd job too, restarting if it crashes, with a file watcher refreshing the UI whenever anything in the vault changes. That keeps the page live without manual reloads and makes the system feel more like an operating surface than a static dashboard.

The results: 41 agent runs, 14 PRs, and 28 product ideas in one week

Last week, the Improver ran 41 times across nine active projects. It opened 14 PRs. Nine were merged, and five were closed without merging: four because the change was not worth it, and one because the agent got it wrong.

The distinction between seventeen projects and nine active projects is deliberate. Seventeen is the inbox. Nine is the workload. I point the fleet at the projects where it earns its keep and leave the rest alone on purpose.

The Innovation Channel proposed 28 ideas. I read all of them and marked four as worth building. Two are in the Improver queue, one I am building myself, and one I am still thinking about.

Choosing the right model for each type of agent work

The models are matched to the job. Claude Opus handles hard reasoning, such as planning, ambiguous fixes, and security audits, Sonnet handles routine code edits, and Haiku handles fast loops that check things every few minutes. There is also a hard stop that kills every job if a run spirals. It has not happened yet, but I would rather have that control available before I need it.

What surprised me about running autonomous agents on side projects

The Improver works roughly the way I designed it to work, but The Innovation Channel surprised me the most, because I expected obvious proposals but instead got ideas that are often better than my own ones, because the agent is not anchored to whatever I happen to be thinking about.

I only act on a handful of the ideas it comes up with, but it took me a while to understand that “no candidate worth shipping” is a valid result.

The bottleneck for creation has become judgment, time, and taste, and this is the greater lesson to be learnt when working with agent systems. Ideas are no longer a scarce resource, and not all ideas have the right to live. An hour of Opus producing nothing is better than a week of reviewing PRs that should never have been opened. The model has to be matched to the work, the mission has to be narrow enough to evaluate, and the agent has to be allowed to stop.

An agent will happily generate motion forever, and the system only works when motion is not treated as progress by default.

My side-project graveyard has gotten a night shift

None of the pieces of my side project are remarkable on their own: there’s a vault, a scheduler, an agent that picks work, an adversarial reviewer, a proposal queue, and a portal to watch it all.

What is remarkable is that it is running my side projects without me, and that the work comes back in a form I understand: notes, proposals, PRs, failures, and status reports.

Side projects used to be a graveyard for things I almost finished. Now they are a graveyard with a night shift.

Services

Marketplace Products

Resources

About Us

Who We Are

Work With Us

Contact Us