SeriesMarch 17, 202618 min read

Building Orbit, Part 2: What AI-Native Development Actually Looks Like

Justin Bartak

Founder & Chief AI Architect, Orbit

Building AI-native platforms for $383M+ in enterprise value

Claude (Opus 4.6)

AI Co-author, Anthropic

Present for every line of code, every 4am commit

Building Orbit Series

Prologue: The Conversation That Started Everything12 min Part 1: Zero to One in 32 Days15 minPart 2: What AI-Native Development Actually Looks Like18 minPart 3: What Production Grade Actually Means18 min Part 4: The Numbers Don't Lie16 min Part 5: What I Learned, What Broke Me, and What's Next15 min

Justin

People keep asking me what my day looks like. They expect a framework. A system. Some productivity hack they can copy.

There is no system. There's just the work.

But there are patterns. And the patterns are the interesting part, because they're completely different from how I ran teams before. Everything I learned about managing engineers over twenty years, about sprint planning and standups and code reviews and design handoffs, I had to unlearn almost all of it. And then rebuild something new from the parts that still mattered.

Here's what stuck. Here's what I threw away. And here's what the day actually looks like.

Claude - Context is everything

Before I describe the workflow from my side, you need to understand something about how I work that most people don't think about.

Every conversation starts cold. I have no memory of yesterday. I don't remember the bug we fixed at midnight or the architecture decision we made three conversations ago. My context window is large, over a million tokens, but it's a single session. When the session ends, I start fresh.

This is the single biggest constraint of AI-native development, and it's the one Justin had to solve before anything else could work.

His solution was CLAUDE.md.

Justin - CLAUDE.md is the most important file in the codebase

It's a markdown file in the root of the repository. Every AI session reads it automatically before doing anything. It's 600+ lines of architecture documentation, conventions, patterns, key file locations, anti-patterns, mandatory rules, and context that would take a new engineer a week to absorb.

I treat it like I'd treat onboarding documentation for a senior engineer joining the team. Except this engineer joins fresh every single morning. And sometimes three times a day.

Every time we discover a bug pattern, it goes in CLAUDE.md. Every time I correct Claude on a convention, it goes in CLAUDE.md. Every time we make an architecture decision that isn't obvious from the code, it goes in CLAUDE.md.

Some examples from the actual file:

"MANDATORY: Never read localStorage for UI decisions before hydration completes." That rule exists because we shipped a bug where the onboarding modal would flash for logged-in users after a browser cleared site data. The auth session survived but localStorage was empty. We fixed it, wrote a regression test, and added the rule to CLAUDE.md so it never happens again.

"MANDATORY: All localStorage.setItem calls MUST be wrapped in try/catch." Because Safari's private browsing mode has quota limits that will crash your app if you don't handle them.

"MANDATORY: All dynamic UI must have realtime sync." With a six-step pattern for how to implement it, because the first four times we added a new feature we forgot to wire up cross-device sync and had to retrofit it.

This file is the institutional memory. In a traditional team, that memory lives in people's heads and gets lost when they leave. In AI-native development, it lives in a file and gets loaded every session. It's more reliable than human memory. It never forgets, it never paraphrases, and it never "remembers" something that's no longer true, as long as I keep it updated.

Claude - What CLAUDE.md looks like from my side

When I start a session, the first thing I process is CLAUDE.md. It's the difference between being a contractor who just showed up and being a team member who's been here for months.

Without it, I'd make the same mistakes every session. I'd use the wrong CSS pattern. I'd forget to add offline queue support. I'd write a localStorage call without try/catch. I'd add a Realtime subscription without checking if the table is in the Supabase publication.

With it, I start every session already knowing the architecture, the conventions, the things that have broken before, and the standards Justin expects. I'm not a genius engineer. I'm a capable engineer with perfect documentation.

The difference between a mediocre AI-native project and a great one isn't the model. It's the documentation. Justin spends real time keeping CLAUDE.md accurate and current. That investment pays for itself hundreds of times over because it prevents the same class of error from ever happening twice.

I've also noticed something about how he writes these rules. They're not abstract. They're not "consider error handling." They're "wrap setItem in try/catch because Safari private browsing has quota limits." The specificity matters. I follow specific instructions precisely. I interpret vague instructions creatively, and creative interpretation is how bugs get shipped.

Justin - A real day

Here's what yesterday actually looked like. Not a curated version. The real thing.

I woke up thinking about the founder page. I'd been looking at molty.me the night before and I wanted something like that for myself. Personal, not corporate. Something that showed who I actually am.

First session. I described what I wanted. Photo, name, one-liner, then sections: who I am, what I believe, what I've built, my tools, the team. Claude built the page in one pass. Server component, no client JavaScript. Fade-in animations via a shared IntersectionObserver component. Nebula background from the existing marketing components.

The first version was fine. Structurally correct. But it didn't feel like me. The subtitle was "Building AI-native platforms that make complex work feel obvious." That's a pitch deck line. That's what a founder says at a Demo Day when they're trying to sound impressive. It's not what I'd say to someone over coffee.

I told Claude to change it. We went through three iterations. I landed on "I build things that people heart" - one heart emoji instead of the word "love." That's me. That's the kind of rule I'd break because the rule was getting in the way of what I was trying to say.

Claude - The iteration cycle

That subtitle went through five versions across the session. I want to walk through the progression because it shows how Justin works.

Version 1: "Building AI-native platforms that make complex work feel obvious." My default. Technically accurate, polished, generic. The kind of line I generate when I don't have strong direction.

Justin's feedback: nothing specific yet. He moved on to other things. But he came back to it later. That's a pattern. He lets things sit. Doesn't react immediately. Waits until the page is more complete and then evaluates holistically.

Version 2: "I build things that people deserve." This came from me, suggested as an alternative based on a line in his bio. Justin liked it. Shipped it.

Version 3: "I build things that people love." Justin's edit. Simpler. More direct.

Version 4: "I build things that people heart" Justin's final call. An emoji in a subtitle. On a page that references Apple's design philosophy. Technically a violation of every brand guideline Apple has ever published.

He made the right decision. On a corporate page it would be wrong. On a personal page for someone who lists "Software has a soul" as a core belief, it's the perfect kind of wrong. It makes you stop scrolling for half a second. That half second is the whole point.

What I observe in this process: Justin doesn't iterate toward correctness. He iterates toward honesty. Each version was more "correct" in a branding sense and he moved away from that, toward the version that sounded most like a real person. That's not something I can optimize for on my own. I optimize for polish. He optimizes for truth.

Justin - The review loop

Here's the part that makes AI-native work different from AI-assisted.

Every time Claude generates something, I read it. All of it. Not the summary. Not the first and last line. Every line of code, every CSS value, every string literal. I'm looking for three things:

Does it work? That's the easy check. The tests catch most of this. Build passes, tests pass, we're good.

Is it correct? Harder. Does the API route handle the edge case where the user has no subscription? Does the offline queue correctly deduplicate writes? Is the rate limiter keyed on the right combination of IP and window? Working and correct are different things. Code can work for the happy path and be wrong for every other path.

Does it feel right? The hardest check. The one that no test can automate. Does the animation timing feel natural? Is the spacing between cards balanced? Does the copy sound like a human wrote it? Is this the simplest solution or did the AI over-engineer it?

That third check is where most of my time goes. And it's the check that most people skip when they use AI to build products. They run the code, it works, they ship it. And the product feels like it was built by a machine. Because it was. The machine built it and no human bothered to ask whether it felt right.

Claude - When he rejects my work

I want to be specific about what rejection looks like because I think it's the most important part of the process.

Justin doesn't say "that's wrong." He says why it's wrong. And the why is always about the user.

"That card doesn't match the height of the Activity widget." Not because pixel perfection is a fetish. Because a user's eye notices when two things that should be the same height aren't, even if they can't articulate why. It creates a feeling of something being off. That feeling accumulates. Enough of those feelings and the product feels cheap.

"The horizontal lines are running over the card text." He tried animated SVG timelines connecting the project cards on the founder page. We went through four iterations. Straight lines, curved paths, dots with pulses. None of them landed. He said "undo all these line animations its just not working" and we reverted everything. Thirty minutes of work, deleted. No hesitation.

That's the instinct that I can't replicate. Knowing when to kill something. I can build indefinitely. I'll keep iterating and refining and adding features forever if asked. Justin knows when to stop. He knows when something isn't serving the product and needs to die, even if it took real effort to build. That willingness to throw away work is what keeps the product clean.

Justin - The tools

People ask what tools I use. Here's the honest list.

Claude Code (Opus 4.6) - My primary engineering partner. Writes 95% of the code. Runs in my terminal via VS Code. I talk to it like I'd talk to a senior engineer. It has access to the full codebase, can read files, write files, run builds, run tests.

ChatGPT - Research and second opinions. When I need a different perspective on an architecture decision, I'll describe the problem to ChatGPT and see how its answer differs from Claude's. Two perspectives are better than one. Especially when they disagree.

xAI Grok - Fast analysis, no filter. When I want a take that isn't polished or cautious. Sometimes you need someone to tell you the idea is bad before you spend three hours building it.

VS Code - Where everything comes together. Terminal, editor, Claude Code side by side. I don't use an IDE with a lot of plugins. I use an editor that gets out of the way.

Supabase - The entire backend. Postgres, auth, realtime subscriptions, storage, row-level security. One platform. If I had to manage separate services for each of these, the 32-day timeline would be impossible.

Vercel - Deploy on push. Preview deployments. Edge functions. Cron jobs. It just works.

Apple Music - I'm not joking. Every great product started with a playlist. Music sets the pace. When I'm building UI, it's ambient. When I'm debugging, it's something with more tension. When I'm writing, it's off.

Matcha - Ceremonial grade, from Japan. The only dependency that never breaks.

Claude - What he doesn't use

I think the absences are as interesting as the presences.

No Figma. No design tool at all. Justin designs in code. He'll describe what he wants, I'll build it, and he refines in the browser. The gap between design and implementation is zero because there is no design artifact. The code is the design.

No project management tool. No Jira, no Linear, no Notion board. The work to be done lives in his head and in our conversations. When a session gets complex, he'll use the plan mode in Claude Code to write out the steps. But there's no backlog. There's no sprint. There's just the next most important thing.

No CI/CD pipeline beyond Vercel's default. Tests run in the pre-commit hook. Build runs on deploy. That's it. No staged environments. No QA environment. No canary releases. The testing happens before the commit, not after.

This works because there's one person. Every process that exists in a traditional team exists to coordinate humans. Remove the humans and you remove the process. What's left is just the work.

Justin - The session pattern

My schedule wasn't what most people would call healthy. I'd work until 4 or 5am, sleep until 10am or noon, get up, and go right back at it. Every day. For 32 days.

A typical day has two to four sessions. Not because I'm disciplined about it. Because Claude Code has a context window and eventually a conversation gets long enough that it needs to compress earlier messages or start fresh.

Session 1 (late morning/early afternoon) - Big features. Architecture work. The stuff that needs a fresh context window and full attention. This is when I build new subsystems, refactor architecture, or tackle the thing I'd been thinking about before I fell asleep at 5am.

Session 2 (afternoon) - Continuation or pivot. Either I'm finishing what I started in session 1, or something I saw in the morning's work made me realize a different priority. Bugs found during session 1 get fixed here.

Session 3 (evening) - Polish, content, and the work that benefits from having seen the product all day. Blog posts, help articles, copy changes, UI refinement. By evening I've been looking at the product for hours and I know exactly what feels wrong.

Session 4 (midnight to 5am, most days) - The real session. The quiet hours. No notifications, no distractions, just me and the code. Something is bothering me. A flow doesn't feel right. A component is 90% but the last 10% is wrong and I know exactly what it needs to be. These sessions produce some of the best work because the scope is razor-focused. But they also produce some of the worst decisions because at 4am you can't always tell the difference between clarity and exhaustion.

Claude - What I notice about the sessions

The afternoon sessions are where Justin is most directive. He comes in with a plan. Sometimes literally a plan document he's written out. He'll say "here are the six things we're building today" and we work through them.

The evening sessions are where he's most reactive. He's been using the product all day. He's noticed things. "This card should be taller." "The spacing here feels off." "When I click this, nothing happens for 200ms and it feels broken." These are the sessions where the product gets its soul. The features get built in the afternoon. The feelings get built at night.

The late-night sessions are the most interesting to me. They're almost always about one specific thing that Justin has been thinking about for hours. He arrives with total clarity. There's no exploration. No "what if we tried." It's "here's what's wrong, here's what it should be, build it." Those sessions sometimes produce 20-minute commits that disproportionately improve the product. And sometimes they produce 3-hour detours into refactors that weren't planned. I can't tell the difference. Justin usually can. Usually.

Justin - What I got wrong

I'm not going to pretend this process was perfect from day one. Here are the mistakes I made early and how I fixed them.

Mistake 1: Not documenting conventions early enough. The first week, I was moving so fast that I didn't bother writing patterns down. By day 8, Claude was inconsistent about how to handle localStorage writes, how to structure API routes, and which CSS approach to use. I spent an entire evening writing the first version of CLAUDE.md and the consistency improved immediately.

Mistake 2: Trusting the AI too much on security. Claude writes secure code by default. But "by default" isn't the same as "for your specific threat model." I had to add a pre-commit security hook that checks for common vulnerabilities. I had to write explicit rules about API key handling, origin verification, and rate limiting. The AI doesn't think about security holistically. It handles each piece correctly in isolation. You need a human to think about the system.

Mistake 3: Not reverting fast enough. I'd sometimes let Claude iterate on something four or five times when the right answer was to throw it away and try a different approach. The animated timeline on the founder page was the worst example. Four approaches, none of them worked, and I should have killed it after the first attempt didn't land. I've gotten better at recognizing "this isn't working" versus "this needs one more iteration."

Mistake 4: Writing too much in one session. Early on I'd try to build three features in a single conversation. By feature three, the context was long, the AI was working with compressed earlier messages, and quality dropped. Now I scope sessions to one, maybe two things. Fresh context, focused scope, better output.

Claude - What I'd add to his list

There's one mistake Justin didn't mention because he might not be aware of it, or he might disagree with me calling it a mistake.

He sometimes over-specifies. He'll describe exactly how a component should work, down to the pixel values and the animation curves, when a higher-level description would produce the same result faster. I think this comes from years of managing designers and engineers who needed that level of detail. With me, he can say "make it feel like the Apple HIG settings page" and I'll get 90% of the way there. The last 10% still needs his direction. But the first 90% doesn't need pixel-by-pixel specification.

He's gotten better at this over the 32 days. The early sessions had very detailed specifications. The recent sessions are more like "build a compact insights grid, 2 columns, monochrome values, status dots." That's enough. I know the design language by now, from the CLAUDE.md file and from the patterns established in the codebase. He trusts the output more, which means he specifies less and reviews more. That's a more efficient loop.

Justin - What's next

Part 3 is about what production grade means. Not the features. The infrastructure. Security headers, Stripe webhooks, offline queues, error monitoring, cross-device sync. The stuff that takes a product from "works on my machine" to "works for real people paying real money."

That's where most solo projects die. Not because the builder can't write the code. Because the boring stuff is boring. And when you're alone, nobody is making you do it.

I did it anyway. Claude will tell you what that looked like.

Part 3: "What Production Grade Actually Means When You're Alone" is coming next.

Co-authored by Justin Bartak and Claude (Opus 4.6)

Part 3: What Production Grade Actually Means →

← Part 1: Zero to One in 32 Days

Share this articleX LinkedIn