ImpactIQ: what I learned building CRM software after 6 years of hating CRM software

A contractor-ops platform for canvassing, leads, and claims, built on Cloudflare Workers + D1. What worked, what I'd change, and one bug I'd expect from this architecture.

I ran a roofing and restoration company for about six years before I became a software engineer. Every CRM I tried during that time was built for a generic sales pipeline — leads, stages, deals — and none of them understood the actual shape of the work: a canvasser knocks a door, a claim gets filed with an insurance carrier, an adjuster inspects, a supplement gets negotiated, a crew gets scheduled, and someone in the office has to track all of it without losing the thread between "who talked to this homeowner" and "what does the insurance paperwork actually say." I built ImpactIQ because I got tired of bending my business around software that didn't understand my business.

It's now in daily use by a 5-rep sales team, handling canvassing, lead intake, scheduling, deal stages, and claims through to field-to-office reporting.

The problem, in plain terms

Reps in the field need to log a door knock or a lead in seconds, not minutes — nobody stands on a stranger's porch typing into a form. The office needs that same data to show up immediately in a pipeline view so managers can see what's moving. And at some point in almost every deal, someone has to turn scattered facts (photos, measurements, adjuster notes) into a structured claim write-up that reads like a professional put it together — because a well-written claim genuinely changes outcomes with insurance carriers. Generic CRMs solve the pipeline part reasonably well. They don't solve the claim-writing part at all, and they're too rigid for field intake speed.

The stack

Frontend: TypeScript + React
API: Cloudflare Workers
Primary store: D1 (Cloudflare's SQL database)
Media/photos: R2
Sessions: KV
Async workflows: Cloudflare Queues
Claim write-up generation: Anthropic Claude

Frontend and API live on the same custom domain via Workers routes, so the frontend calls /api/* same-origin — no CORS configuration, no preflight requests, no separate API domain to manage certificates for.

Three architectural decisions, and their tradeoffs

1. D1 over Postgres. ImpactIQ's data model is relational — deals, claims, contacts, users, all with real foreign-key relationships — which is normally an argument for Postgres. I went with D1 anyway because the rest of the stack was already Workers-native, and keeping the database inside the same platform as compute meant no connection pooling problem (Workers' short-lived isolates are notoriously bad at holding persistent Postgres connections without something like PgBouncer or Hyperdrive in front of them) and no separate infra to provision or pay for standing up front. The tradeoff is real: D1 is younger, has tighter limits on transaction size and query complexity than Postgres, and I gave up some ecosystem maturity (extensions, mature tooling, battle-tested replication) in exchange for operational simplicity. For a 5-rep team's data volume, that trade was worth it. It would need re-evaluation at meaningfully larger scale.

2. Workers over a traditional server. No server to patch, no idle-capacity cost, and deploys are a git push. The tradeoff is the execution model itself — Workers are short-lived, stateless-by-default isolates, which means anything needing persistent state or long-running work has to be pushed into KV, D1, Durable Objects, or Queues rather than an in-memory variable. That's a real constraint on how you're allowed to write code, not just a deployment detail. I already needed async workflows (report generation, notification fan-out) so Queues fit naturally, but it's a different mental model than a long-running Node process, and it costs you some flexibility for simplicity.

3. Claude for claim generation, not for bulk retrieval/eval work. User-facing claim write-ups go through Anthropic's hosted Claude API because that's a case where output quality directly affects a real business outcome (a claim submitted to an insurance carrier), and the volume of that specific task is low enough that hosted-API cost isn't a concern. Bulk, high-volume, non-user-facing tasks (code review passes, retrieval evals, OCR sweeps) run instead on a self-hosted Ollama cluster across machines on my home network — cheaper per-call at volume, at the cost of lower ceiling on output quality. I go into the cost/quality tradeoff of that split in more depth in RAG at Real Cost.

A hard bug (or: the kind of bug this architecture invites)

[VERIFY: is this a real bug Jalen hit, or should this section be replaced with an actual incident]

The architecture I described above — Workers + Queues + D1 — has a specific failure mode I'd expect to hit: a queue consumer that writes a claim update to D1 and then fails after the write but before acknowledging the message. The queue redelivers the message, the consumer runs again, and now you've applied the same update twice — a classic at-least-once-delivery duplicate-write bug. The fix isn't "add a try/catch," it's structural: make the consumer idempotent, either by keying writes off a unique message/event ID checked against a small dedup table, or by making the update itself naturally idempotent (an upsert keyed on claim ID + version rather than an append). Diagnosing it means checking D1 for duplicate rows or double-applied state changes that correlate with queue redelivery timestamps in the Workers logs, not just looking at application-level error logs, since nothing "errors" from the app's point of view — it just runs twice.

Numbers

[PLACEHOLDER: needs real number — e.g. p95 API latency from Cloudflare Analytics] [PLACEHOLDER: needs real number — e.g. D1 query count / day or storage size] [PLACEHOLDER: needs real number — e.g. Worker request volume / month] [PLACEHOLDER: needs real number — e.g. claim write-ups generated to date]

I'd rather leave these blank than make them up. If you're evaluating this, ask me and I'll pull the actual figures from Cloudflare Analytics.

What I'd do differently

I'd introduce the idempotency pattern above from day one rather than as a retrofit — it's much cheaper to design a queue consumer as idempotent up front than to audit every consumer for duplicate-write risk after the fact. I'd also lean harder into Durable Objects for anything stateful sooner; I reached for KV and D1 first out of familiarity, but some of what's in D1 today (session-adjacent, single-writer state) is arguably a better fit for a Durable Object's single-threaded consistency guarantees.