A vision model screens every lead listing photo for framing problems the moment it lands, sends only the genuinely borderline ones to a human, and auto-emails the right Merchandising Specialist and Regional Manager with the photo, the specific issue, and reshoot instructions.
One person, 2,000+ photos a day, eyeballed by hand.
Every day, someone at Redline opens the day's batch of 2,000+ first photos — the lead listing photo for each used car on a dealer lot — and looks at each one to decide whether it's framed correctly. When a photo is bad, they manually compose an email to the Merchandising Specialist (MS) who shot it and that MS's Regional Manager (RM), so the car can be reshot and re-uploaded.
This is the kind of work that is both critically important and soul-crushing to do by hand. A bad lead photo is the first thing a buyer sees; it directly affects how fast a car sells. But asking a human to make the same yes/no framing judgment two thousand times before lunch guarantees three things:
The honest failure mode isn't that the reviewer is bad at it — it's that attention is finite. By photo 1,500, the bar quietly moves. Two reviewers won't reject the same set. And the reshoot email is a second manual job stacked on top of the first: find the MS, find their RM, attach the photo, describe what's wrong, hit send. That second job is exactly the part that gets dropped when the day gets busy — so bad photos stay live longer than anyone wants.
A vision model with an explicit rubric, a confidence band that routes borderline cases to a human, and a templated reshoot email.
We don't ask a model "is this a good photo?" and trust a vibe. We give it the same checklist the human uses, score each item, and combine those into an accept / reject / not-sure decision. The model never silently invents a standard — it answers a fixed rubric, and we can read its reasons.
Each first photo is checked against a small, concrete set of pass/fail criteria. These are the things a Redline reviewer is actually looking for:
This list is the spec we tune against, and it's the first thing we'll confirm with Redline so the model is scored on exactly the rules the team enforces today.
There are two honest ways to build the scorer, and they trade off the same way:
The path is: ship (A) fast, use it to generate and confirm labels, then quietly swap individual checks over to (B) where it's clearly better and cheaper. Redline sees a working system the whole time.
Redline already has the most valuable thing for this project: a history of human decisions. Every photo previously flagged-and-reshot is a labeled "reject"; the vast majority that passed are "accepts." We pull a few thousand of each, have the current reviewer confirm a clean validation slice (a few hundred images they re-judge carefully against the rubric), and use that as ground truth. That validation set is what we measure precision and recall against — and what catches the model drifting.
This is the part that decides whether MSs trust the system or learn to ignore it. The model returns a confidence; we split it into three lanes:
per-photo verdict
├─ HIGH-confidence PASS → silently accept, log it
├─ HIGH-confidence FAIL → queue reshoot email (MS + RM)
└─ BORDERLINE (uncertain) → human review queue
reviewer clicks Accept / Reject
→ that click becomes a new training label
The whole point is that the human now only looks at the borderline few percent instead of all 2,000 — and every click they make feeds back as a fresh label. The system gets more confident over time, so the review queue shrinks. We tune the band's width to hit our error targets, not to hit a tidy number.
We will deliberately bias the system toward precision on rejects. A false reject (telling an MS to reshoot a perfectly good photo) is the expensive mistake: it wastes a field rep's drive time and, worse, teaches them the tool cries wolf. Once an MS believes that, they ignore every email — and the project is dead even if it's technically accurate.
The borderline lane is what lets us be aggressive on precision without dropping real problems on the floor — anything we're not sure about goes to a person, it doesn't get auto-rejected or auto-passed.
When a photo is a confirmed fail (auto or human), the system composes a templated email to the MS and CCs their RM. It includes the photo itself, the VIN/stock context, which rubric items failed in plain language, and concrete reshoot instructions for that specific problem (e.g. "back up ~6 feet and shoot the front-driver 3-quarter angle so the full car fits"). Generic "reshoot this" emails get ignored; specific ones get acted on.
SUBJECT: Reshoot needed — [Year Make Model] · Stock #[####] · [Dealer]
Hi [MS first name],
The first photo for this vehicle needs a reshoot before it lists well:
• Vehicle is cut off on the right (rear bumper out of frame)
• Angle is closer to a flat side than the standard 3-quarter front
How to fix:
• Step back ~6 ft and frame the full car with margin on all sides
• Stand off the front-driver corner for the 3-quarter front angle
[ photo thumbnail ] Reshoot & re-upload when you're next on the lot.
— Redline Merchandising QA (automated) · RM: [RM name] cc'd
Event-driven on AWS. New photo lands → screen it → route it → notify. No standing servers to babysit.
[ Photo source ] (DAM / S3 bucket / pipeline API — TBC w/ Redline)
│ new first-photo event
▼
[ EventBridge ] ──► [ Lambda: ingest ] pull image + VIN/stock + MS/RM mapping
│
▼
[ Lambda/Batch: score ] ──► [ Bedrock vision-LLM + rubric ]
│ returns per-criterion verdict + confidence
▼
┌──────── confidence router ────────┐
│ PASS (high) → log to DynamoDB │
│ FAIL (high) → SES reshoot email │──► MS + RM
│ BORDERLINE → SQS review queue │──► human dashboard → click = new label
└────────────────────────────────────┘
│
▼
[ DynamoDB log ] ──► [ Dashboard: daily ops + accuracy ]
[ labeled-data store → future fine-tune ]
It's event-driven and serverless, so it costs almost nothing when idle and scales to a daily spike of 2,000+ without provisioning. The backfill path (AWS Batch) is a one-time job that scores the historical photo set to build the validation/label set. The steady-state path (Lambda) handles each new photo as it arrives. The dashboard is a static front-end reading DynamoDB through a thin API — nothing exotic to run.
Phased so Redline sees a working screener fast, then we tune accuracy and close the loop. AI-assisted dev makes this genuinely quick — the gating item is photo-source access, not code.
Confirm where photos live and how we read them (DAM export, S3 bucket, or pipeline API), confirm the MS→RM mapping source, and lock the rubric with the current reviewer. Pull a sample of accept/reject history. Nothing downstream starts cleanly until image access is real — this is the dependency that actually matters.
Wire Bedrock vision-LLM to the rubric, return structured per-criterion verdicts + confidence, and run it over a few hundred labeled samples. Produce a first precision/recall read against the validation set. This is the demo that proves the concept.
Tune the three-lane router for our precision target, stand up the SQS-backed human review queue, and build the reviewer screen (Accept / Reject, one click = one label). Verify the borderline volume is small enough to be sane.
Templated SES emails to MS + RM with photo, failed criteria in plain language, and reshoot instructions. Start in a shadow/approval mode — emails are drafted and reviewed before they actually send — until precision is proven, then flip to auto-send.
Daily ops view (photos screened, auto-passed, flagged, in review queue, emails sent) plus an accuracy panel tracking precision/recall against the rolling validation set and the false-reject rate. This is how Redline trusts it and how we catch drift.
Once labels have accumulated, train a dedicated classifier and move high-volume checks off the LLM where it's clearly cheaper and more consistent. Pure optimization — only done when the numbers justify it.
Almost all the risk-to-schedule is here, not in the build. The single most important item is read access to the photos.
| What | Why we need it | Form |
|---|---|---|
| Read access to first photos (the blocker) | To screen each photo. Need to confirm the source: photo pipeline / DAM, an S3 bucket, or a vendor API. Everything depends on this. | S3 bucket + IAM role, or API endpoint + credentials |
| New-photo trigger | To screen in near-real-time as photos land rather than batch-polling. | S3 event / webhook / queue — whatever the source supports |
| Photo → VIN / stock / dealer metadata | So the email says which car, at which dealer. | Field in the photo record or a lookup we can join on |
| Photo → MS mapping | To know who shot it / who reshoots it. | From the photo metadata or a roster (possibly Rippling/Predian) |
| MS → RM mapping | To CC the right Regional Manager. | Org roster / spreadsheet / HR export |
| MS & RM email addresses | To send the reshoot notification. | Directory export |
| Historical accept/reject examples | To build the validation set and, later, fine-tune. | Past flagged-photo emails / logs, or a tagged sample |
| The framing rubric, confirmed | So the model scores exactly Redline's current standard. | 30–60 min with the current reviewer + a few annotated examples |
| SES sending domain | So emails come from a verified redline-branded address, not spam. | Domain verification / DKIM in SES |
The build is low-risk. The two things that can hurt us are model accuracy and getting clean access to the photos.
Build is cheap and fast. The ongoing story is model-inference cost and a little accuracy tuning — not server maintenance.
The honest version: the cost to own this is not engineering hours after launch — there's very little to maintain. The two things that need ongoing attention are: