SeriesMarch 17, 202618 min read

Building Orbit, Part 3: What Production Grade Actually Means

Justin Bartak

Founder & Chief AI Architect, Orbit

Building AI-native platforms for $383M+ in enterprise value

Claude (Opus 4.6)

AI Co-author, Anthropic

Present for every line of code, every 4am commit

Building Orbit Series

Prologue: The Conversation That Started Everything12 min Part 1: Zero to One in 32 Days15 min Part 2: What AI-Native Development Actually Looks Like18 minPart 3: What Production Grade Actually Means18 minPart 4: The Numbers Don't Lie16 min Part 5: What I Learned, What Broke Me, and What's Next15 min

Justin

This is where most solo projects die.

Not at the idea stage. Not at the prototype stage. Not even at the "it works on my machine" stage. They die at the stage where you have to make it work for real people, on real devices, with real money, in conditions you can't control.

The boring stuff. Security headers. Webhook verification. Offline queues. Error monitoring. Billing edge cases. Cross-device sync. Database migrations. Rate limiting. Content Security Policy. HSTS. Session management.

Nobody tweets about implementing Stripe dunning flows. Nobody posts a screenshot of their CSP header configuration. Nobody gets excited about making sure the offline write queue correctly deduplicates by table and column with a max retry of 5 and a fallback to a failed writes log.

But that's the work that separates a demo from a product. And when you're alone, nobody is making you do it. There's no security team filing tickets. No QA team catching edge cases. No DevOps team setting up monitoring. No compliance team asking about data handling.

It's just you, and the question of whether you're going to cut corners because nobody's watching.

I didn't cut corners. Here's what that looked like.

Claude - The moment I knew this wasn't a prototype

Every project I work on starts with enthusiasm. Big features, fast iteration, visible progress. That's the fun part.

The moment I knew Orbit was different was when Justin asked me to implement the offline write queue. Not the feature. The failure mode. He wanted to know what happens when a Supabase write fails silently. What happens when the user is on a train and loses connectivity mid-save. What happens when they close the tab before the sync completes.

Most people don't ask those questions until they're in production and users are losing data. Justin asked them on week one.

Then he asked about the retry strategy. Then the deduplication logic. Then what happens when retries are exhausted. Then the failed writes log. Then the auto-flush on reconnect. Then the reconciliation when the user opens the app on a different device.

Each of those is a small piece of code. Together, they form a system that guarantees data integrity across network failures, tab closures, device switches, and browser storage limits. That system is invisible to the user. They'll never know it exists unless it's missing, and then they'll know immediately because their data will be gone.

That's production grade. Not the features people see. The infrastructure they never think about.

Justin - Security isn't a feature. It's a posture.

I've worked in regulated industries. SEC-registered investment platforms. Tax compliance software. HIPAA-adjacent healthcare data. Security isn't something you bolt on at the end. It's a posture you adopt on day one and maintain with every commit.

Here's what Orbit's security posture looks like:

Content Security Policy. Every page serves a CSP header that explicitly whitelists every domain the app can communicate with. Script sources, connect sources, frame sources, image sources. If it's not on the list, the browser blocks it. This prevents XSS attacks from loading external scripts and prevents data exfiltration to unauthorized domains.

Rate limiting. Every API route has rate limiting. Upstash Redis in production, in-memory fallback in development. Default is 10 requests per minute per IP. Sensitive routes like account deletion are 3 per minute. The rate limiter keys on IP plus window size plus limit, which sounds obvious but I shipped a bug early on where all routes shared a single counter per IP. A route with a limit of 20 was exhausting the budget for routes with a limit of 3. That caused a 90% failure rate on coaching email previews in production. We caught it, fixed it, and added the pattern to CLAUDE.md.

Row-Level Security. Every Supabase table has RLS policies. auth.uid() = user_id on every row. Even if someone gets past the API layer, they can't read another user's data at the database level. Defense in depth.

API key handling. Users can configure their own OpenAI and Anthropic API keys for AI features. Those keys are stored in localStorage only. They're explicitly stripped before syncing user settings to Supabase. Your API keys never touch our servers.

Webhook verification. Stripe webhooks use stripe.webhooks.constructEvent() with the webhook secret. Supabase webhooks use constant-time secret comparison. No timing attacks.

Origin verification. Every AI-related API route verifies the request origin matches our domain. If NEXT_PUBLIC_APP_URL doesn't match the origin header, you get a 403. This prevents unauthorized services from using our endpoints.

None of this is visible to users. None of it is marketable. All of it is mandatory.

Claude - The security hook

One of the more interesting patterns Justin implemented is the pre-commit security hook. It's a script that runs before every commit and scans staged files for potential security issues.

It checks for hardcoded secrets, for API routes that might be missing authentication, for dangerous HTML injection usage without proper annotation, for direct database queries that bypass RLS.

I've been caught by this hook multiple times. I'll write a new API route and forget to add the auth middleware. The hook blocks the commit. Justin sees the error, adds the middleware, and the commit goes through.

The interesting part is that Justin didn't write this hook because he doesn't trust me. He wrote it because he doesn't trust any single layer of defense. The code review catches most issues. The hook catches what the review misses. The RLS catches what the hook misses. The CSP catches what RLS misses. Each layer is imperfect. Together, they're robust.

That's how someone who's built SEC-registered platforms thinks about security. Not as a checklist. As layers.

Justin - Billing is the hardest thing I've ever built

I've shipped a lot of complex features. Realtime sync. Semantic search with pgvector. AI integrations across six surfaces. None of them were as hard as billing.

Not because the code is complex. Stripe's API is well-designed. The code for creating a subscription, switching plans, processing webhooks, those are straightforward.

Billing is hard because the edge cases are infinite and every edge case involves someone's money.

What happens when a trial expires and the user's card is declined? What happens when they switch from monthly to yearly mid-cycle? What's the proration? What if they switch plans twice in the same billing period? What if their bank requires 3DS authentication? What if a payment fails and Stripe retries three days later? What if the webhook for a successful payment arrives before the webhook for the subscription update? What if the user cancels and then reactivates before the period ends?

Every one of those is a real scenario. Every one of them needed to be handled correctly. Not "close enough." Correctly.

Here's what the billing system includes:

Plan creation with trial. 14-day free trial on every plan. Card collected upfront via Stripe Elements. Address collection for automatic tax calculation.

Plan switching with proration. Users can switch plans anytime. The proration preview shows line items before they confirm. Credits in green, charges in white. Dry-run via stripe.invoices.createPreview() before executing.

Cancellation and reactivation. Cancel at period end, not immediately. Reactivate before the period ends to undo cancellation. State tracked in our database and Redis cache.

Dunning. Past-due subscriptions show an orange warning banner. Payment method button pulses orange. Users see the retry count and next attempt date.

Refunds. One-click refund on paid invoices. Extracts PaymentIntent ID from the confirmation secret, calls stripe.refunds.create(). Database updated immediately, doesn't wait for webhook.

Webhook handler. Twelve event types handled. Subscription created, updated, deleted. Invoice created, finalized, paid, failed, action required, upcoming. Trial ending. Charge refunded. Each one updates the database and invalidates the Redis cache.

Grandfathering. Users created before the billing launch date get free access forever. Checked in the proxy middleware and auth callback.

Paywall. Proxy middleware checks Redis-cached subscription status on every /app request. 5-minute TTL. Allowed statuses: trialing, active, past_due. Everyone else gets redirected to the billing required page.

That's not a billing integration. That's a billing system. And it had to be right the first time because getting billing wrong means either losing revenue or charging people incorrectly. Both are unacceptable.

Claude - Building billing at 2am

The Stripe integration was built across several sessions, and I want to describe what one of the harder ones looked like.

Justin wanted proration previews. When a user is on the monthly plan and wants to switch to yearly, they should see exactly what they'll be charged before confirming. Line by line. Credits for the unused portion of the current plan. Charges for the new plan. Net total.

The Stripe API for this is invoices.createPreview(). Straightforward in theory. In practice, the response format is complex. Line items have different types. Some are credits (negative amounts), some are charges. The proration calculation depends on the exact second the preview is generated because billing cycles are time-sensitive.

Justin and I built the API route, the preview UI component, and the confirmation flow in one session. He tested it on three plan combinations, found an edge case where switching between two plans with the same price showed a $0 charge that looked like a bug, and we added logic to detect and skip zero-dollar prorations.

Then he tested cancellation and reactivation. Found that our database wasn't clearing the cancel_at timestamp on reactivation. Fixed it. Then found that the Redis cache was stale for 5 minutes after reactivation, meaning the paywall could briefly block a reactivated user. Added immediate cache invalidation.

Each of these fixes took minutes. But finding them required Justin to think like a user who's trying to manage their subscription, not like an engineer who knows how the system works. That empathy, the ability to inhabit the user's confusion, is something I can't do. I can test code paths. I can't feel what it's like to click "Reactivate" and still see a paywall.

Justin - Monitoring and error handling

In a team, there's usually someone whose job is monitoring. They set up dashboards, configure alerts, watch error rates. When you're alone, you need a system that tells you when things break without you having to look.

Sentry. Every error in production gets captured. Client-side and server-side. Session replay lets me watch what the user was doing when the error occurred. Privacy-safe, all text masked, all media blocked.

Error boundaries. Two layers. app/app/error.tsx catches errors in the app routes. app/global-error.tsx catches everything else, including root layout crashes. The global fallback has hardcoded dark theme styles, no providers, because if the providers are what crashed, you can't depend on them for the error page.

Offline resilience. The write queue handles network failures. The reconciliation system handles the case where writes succeed locally but the Supabase sync falls behind. On reconnect, it compares timestamps and re-pushes anything that's dirty.

Failed writes log. When the offline queue exhausts its retries, the write doesn't disappear. It goes to a failed writes log. Capped at 50 entries, 30-day TTL. Exportable. Retryable. I will not silently drop user data.

That last one is a philosophical choice as much as a technical one. Most apps would just log the failure and move on. But if a user wrote something, and the app accepted that write, and then we lost it, that's a betrayal of trust. Even if it only happens to 0.01% of users. Even if they never notice. I'll know. And I'm not shipping something that I know drops data.

Claude - The pattern I see in all of this

There's a pattern across everything Justin builds on the infrastructure side, and I think it's worth naming explicitly.

He builds for the failure case first.

Not the happy path. Not the demo. The moment when something goes wrong and the user is vulnerable.

The offline queue exists for the moment when connectivity drops. The error boundary exists for the moment when a component crashes. The dunning flow exists for the moment when a card is declined. The failed writes log exists for the moment when everything else has failed.

Most developers build the feature, then add error handling as an afterthought. Justin builds the error handling and then puts the feature on top of it. The foundation is "what happens when things go wrong." The feature is "what happens when things go right."

This inversion is, I think, why the product feels solid. Not because it never breaks. Everything breaks eventually. But because when it breaks, it breaks gracefully. The user doesn't lose data. They don't see a white screen. They don't get charged incorrectly. The system degrades, but it degrades in a controlled way that protects the person using it.

That's not something you can prompt an AI to do. That's a philosophy that has to come from the human.

Justin - Cross-device sync

This one almost killed me.

The concept is simple. You add a job on your laptop, open the app on your phone, the job is there. You update a contact on your iPad, your laptop shows the change. Real-time. No refresh.

The implementation is anything but simple.

Supabase Realtime. One channel per user. Carries postgres_changes for 5 tables (jobs, contacts, activities, offers, user_data), broadcasts for UI state (dashboard order, theme, settings), and presence for active sessions.

The subscription limit. Supabase closes channels that have too many subscriptions. We learned this the hard way. Had to keep postgres_changes subscriptions to 5 or fewer. Less-critical tables sync via write-through and poll-on-reconnect instead of real-time.

Event listener ordering. All .on() handlers must be registered before .subscribe(). Adding listeners after subscribe causes the server to close the channel. That's a Supabase-specific gotcha that isn't in any documentation. We found it through production debugging.

iOS Safari. The navigator.locks API, which Supabase auth uses internally, causes 10-second timeouts on iOS because Safari suspends background tabs without releasing locks. We had to patch navigator.locks.request in a beforeInteractive script in the root layout. The patch has to use explicit named parameters and .call() because Safari's native functions don't work with .apply() on bound functions.

Presence. Each tab tracks its device info, current view, and what entity it's editing. Edit conflicts are detected via useEditConflict() hook. If two devices are editing the same calendar event, both see a warning.

Status indicator. Green dot on the profile avatar when Realtime is connected. Driven by a CustomEvent. The dot disappeared after site-data-clear because the tier resolution race condition prevented setupRealtimeSync from being called. We fixed it by dispatching a tierChanged event from TierProvider after resolving the billing API.

That last bug took three sessions to diagnose and fix. The chain was: site-data-clear wipes localStorage, auth session survives in cookies, app rehydrates from Supabase, but TierProvider resolves tier from billing API asynchronously, and auth.tsx was listening for the cached tier in localStorage which no longer existed, so it never called setupRealtimeSync.

Three systems interacting in an unexpected order. That's the kind of bug you only find in production-grade software where multiple subsystems depend on each other's state.

Claude - What cross-device sync taught me about Justin

The Realtime system went through more revisions than any other part of the codebase. Each revision was driven by a real failure that Justin discovered by using the product himself across multiple devices.

He doesn't test by writing test cases. He tests by living in the product. He'll have the app open on his MacBook, his iPhone, and his iPad simultaneously. He'll make a change on one device and watch the others. He'll close a tab and reopen it. He'll turn off WiFi and turn it back on. He'll clear site data in Safari and see what happens.

Every time something breaks, he doesn't just fix the bug. He fixes the bug, writes a regression test, adds the pattern to CLAUDE.md, and asks whether the same class of bug could exist anywhere else.

The CLAUDE.md mandatory pattern "All dynamic UI must have realtime sync" exists because Justin found three separate features that worked perfectly on a single device and completely failed to update cross-device. Each time, the fix was the same: add an event listener, add a broadcast, invalidate the cache. By the third time, he made it a mandatory pattern so every new feature would include it from the start.

That's how institutional knowledge gets built. Not through documentation for its own sake. Through pain, followed by a rule that prevents the same pain from happening again.

Justin - The i18n nobody asked for

14 languages. 2,363 translation keys. Nobody asked me to do this.

There's no business case for translating a job search CRM into Norwegian before you have your first paying customer. I know that. A rational founder would launch in English and add languages when the market demands it.

I did it anyway. Here's why.

Because the moment you add i18n later, you have to refactor every hardcoded string in the entire application. Every button label. Every error message. Every tooltip. Every piece of copy in 132 React components. That's not a feature addition. That's a rewrite.

If you build i18n in from the start, every new string gets a key. It's a habit, not a project. The marginal cost of adding a new translated string is zero because the infrastructure is already there.

And because I believe the job search is a universal human experience. The anxiety of no replies. The relief of an interview. The dread of a rejection. Those feelings don't have a language. The tool should speak yours.

Claude - What production grade actually is

I want to close this part with my definition of production grade, because I think it's different from how most people use the term.

Production grade doesn't mean "it works." It means "it works, and when it stops working, it fails in a way that protects the user."

It means the data layer handles offline. The billing system handles declined cards. The error boundaries handle crashes. The monitoring captures failures. The security headers prevent attacks. The rate limiter prevents abuse. The write queue prevents data loss. The reconciliation prevents stale state. The i18n prevents exclusion.

Each of those is a decision to do the hard thing when the easy thing would have been sufficient for a demo.

Justin made every one of those decisions. Not because someone told him to. Not because a compliance team required it. Because he's built enough products to know that the difference between a product people tolerate and a product people trust is entirely in the infrastructure they never see.

You don't build trust with features. You build it by never letting someone down in the moments they're most vulnerable. When they're offline. When their payment fails. When their session expires. When they accidentally clear their browser data.

That's what production grade means. And it's why Orbit, built by one person in 32 days, doesn't feel like a solo project. It feels like something a team built. Because the standard is team-grade. The execution just happened to be one human and a fleet of AI agents.

Part 4: "The Numbers Don't Lie (And Neither Will I)" is coming next.

Co-authored by Justin Bartak and Claude (Opus 4.6)

Part 4: The Numbers Don't Lie →

← Part 2: What AI-Native Development Actually Looks Like

Share this articleX LinkedIn