Merchandising Specialist Route Optimization

The Problem Today

100+ routes, balanced by hand and gut, in a CRM that doesn't optimize.

Redline runs 100+ Merchandising Specialist (MS) routes. Each MS drives from their home to a set of dealer lots, photographs inventory, and the lots they cover are their territory. Two things constantly change underneath that assignment: demand (how many cars are missing photos at each store, which moves daily) and supply (who's available, how many hours they're cleared to work, where they live). When those drift apart, a route quietly goes wrong — an MS is underutilized and idle, or slammed and falling behind, or burning two hours a day in windshield time getting to a far corner of their patch.

Today this balancing is done by Regional Managers using judgment and a field-service CRM (Mobiwork, ~$60k/yr) that does route planning and check-in status — but does not optimize. Nobody can sit down with 100+ routes, a live traffic model, every rep's home address, every store's missing-photo count, and each rep's billing-vs-capacity number and actually compute the better assignment. It's not a failure of effort; it's a combinatorial problem that humans cannot solve well by hand. So routes are reshuffled reactively, usually only when something is visibly broken.

100+

MS routes to balance continuously

Daily

demand (cars missing photos) changes

$20k

weekly billing goal the utilization target ties to

Reactive

routes fixed only when visibly broken

The cost shows up in three places: idle capacity (a rep cleared for 40 hours billing like 30 because their territory dried up), missed coverage (cars sitting without a lead photo longer than they should, which directly slows the sale), and wasted drive time (paying for windshield hours instead of photographed cars). None of those are visible on a dashboard today, so none of them get managed until they're bad.

Why this is a Big Build, stated plainly This is a capacitated vehicle-routing problem with time windows, layered on top of a territory-assignment/balancing problem — a genuine operations-research problem that companies build whole teams around. The "AI" here is an optimizer plus a demand forecast; a language model is, at most, a thin layer that explains a proposed change in English. Anyone who tells you an LLM "does the routing" is selling you something. We will not trivialize the solver, because the solver is the project.

How It Works

Forecast demand → build a live travel-time matrix → solve a capacitated VRP under utilization constraints → explain the diff → a human approves it.

What we're actually optimizing

Strip away the buzzwords and the goal is concrete: maximize the cars-missing-photos we service per paid drive-hour, while keeping every rep close to their target utilization and minimizing time spent driving from each rep's own home. Those three pulls fight each other, which is exactly why it needs a solver and not a spreadsheet:

Coverage wants to send reps wherever the missing-photo demand is highest.
Utilization wants every rep at ~their billing target against their cleared capacity — not idle, not buried.
Windshield time wants every rep working close to home, because drive time is paid, unbillable, and the thing reps hate most.

The two-layer formulation

We solve this as two coupled problems, because they operate on different clocks. Treating them as one giant model is how these projects die.

Layer 1 — Territory assignment / balancing (slow clock, weekly-ish). Which stores belong to which MS? This is the assignment that affects a rep's commute and income, so it changes rarely and only with human sign-off. Formally it's a capacitated assignment with geographic-compactness and workload-balance objectives: assign each store to a rep so each rep's total expected workload fits their capacity, territories stay compact (close to home, not crisscrossing), and utilization is even across the team.
Layer 2 — Daily routing within a territory (fast clock, daily). Given a rep's stores and today's demand, what's the route — order of stops, respecting each store's open hours and the rep's shift length? This is a classic capacitated VRP with time windows (CVRPTW), one vehicle (the rep) per territory, capacity = the rep's available hours, demand = estimated service time per store driven by cars-missing-photos there.

The model, written out honestly

This is the part we refuse to hand-wave. The optimizer needs a precise objective and precise constraints, or it produces confident nonsense:

DECISION VARIABLES
  x[s,r]            store s assigned to rep r        (territory layer)
  y[i,j,r]          rep r drives i -> j               (routing layer)
  t[s,r]            arrival time of rep r at store s

MAXIMIZE
    Σ  serviced_demand(s)              // cars-missing-photos covered
  − w_drive · Σ travel_minutes         // paid windshield time
  − w_util  · Σ |utilization(r) − target|   // distance from target util
  − w_home  · Σ home_to_first/last leg  // commute fairness

SUBJECT TO
  Σ_s service_time(s)·x[s,r] ≤ capacity(r)    // rep can't exceed cleared hrs
  store open-hours window:  open(s) ≤ t[s,r] ≤ close(s)
  each store served at most once per cycle
  route continuity / no subtours          (the classic VRP constraint)
  utilization(r) within [floor, target]   // the drift trigger lives here
  territory change only if Δ objective > threshold   // stability guard

WHERE
  service_time(s) ≈ base_visit + k · cars_missing_photos(s)   // demand → minutes
  travel_minutes from a LIVE traffic-aware distance matrix
  capacity(r)     from Rippling hours; utilization from Predian billing

The weights w_drive, w_util, w_home are not cosmetic — they encode Redline's actual priorities, and tuning them against real routes is a meaningful chunk of the work. A model that's mathematically optimal but ignores that reps have lives is a model nobody will let run.

The solver — what actually computes this

The engine is Google OR-Tools (its CP-SAT and dedicated routing library), the same toolkit used for real-world fleet routing. VRP with time windows is NP-hard; you do not solve 100+ routes to provable optimality on a deadline. OR-Tools gives us a strong metaheuristic approach — build a feasible solution with a cheap heuristic, then improve it with guided local search / large-neighborhood search under a time budget — which lands very good solutions in seconds-to-minutes. We decompose by region so each solve is a tractable size, then balance across region boundaries in the territory layer. This is standard, proven practice; the skill is in the modeling and tuning, not in inventing an algorithm.

Where demand forecasting comes in

"Cars missing photos" is the demand signal, and routing on today's snapshot alone is twitchy — it would whipsaw territories every time a big trade-in batch lands at one store. So we forecast near-term demand per store (a simple, explainable time-series / moving-rate model to start, not a black box) and route against the expected load, not the noisy instantaneous number. This is the legitimate "AI/ML" in the project, alongside the optimizer.

Where the LLM fits — and where it doesn't

The language model does not do the routing. Its only job is the human-facing layer: turn a proposed change into a plain-English explanation a Regional Manager can read and approve — "Move 6 stores from Dana to Marcus. Dana is at 71% utilization and these stores add 38 min of drive time for her; they're 12 min from Marcus's home and bring him from 84% to target. Net: −2.1 paid drive-hours/week, both reps within target." That's a genuinely useful translation of an optimization diff into a decision a person can own. It is a thin, supervised layer on top of a hard numerical core — not the core itself.

The honest mental model Optimizer = the brain (OR-Tools, does the math). Forecast = the eyes (predicts demand). LLM = the mouth (explains the proposal). Human RM = the hands (approves and owns the change). We are explicit about this division because conflating them is the most common way these projects over-promise.

Architecture & Stack

A nightly/continuous evaluation pipeline that pulls demand + HR + billing, builds a traffic-aware matrix, runs the solver, and surfaces proposals to a dashboard for approval. Most of the engineering is integration and the solver model, not infrastructure.

Google OR-Tools (CP-SAT + Routing) Google Maps / Mapbox Matrix API (traffic) Predian (billing → utilization) Rippling (home addresses + hours/capacity) Inventory / photo system (cars-missing-photos) Geocoding (store addresses → lat/lng) AWS Fargate / Batch (solver runs) AWS Lambda + EventBridge (orchestration) Amazon RDS / Postgres + PostGIS (geo data) Amazon Bedrock (LLM explanation layer) Web dashboard (S3 + CloudFront) CloudWatch (monitoring)

Data flow

[ Predian ]  billing per rep ─────┐
[ Rippling ] home addr + hours ───┤
[ Photo/inventory feed ]──────────┤  scheduled pulls (EventBridge)
   cars-missing-photos per store   │
                                   ▼
                        [ Lambda: ingest + normalize ]
                                   │   stores → geocode (cached, PostGIS)
                                   ▼
                        [ Demand forecast ]  expected load per store
                                   │
                                   ▼
                  [ Build distance/time matrix ]
                  Maps/Mapbox Matrix API, traffic-aware
                  (cached + batched — this is the $$ line)
                                   │
                                   ▼
        [ Fargate/Batch: OR-Tools solver ]
        ├─ Layer 1: territory assignment / balance
        └─ Layer 2: per-territory CVRPTW route
                                   │  current vs proposed diff
                                   ▼
        [ utilization check ]  any rep below floor or above target?
                                   │  if drift → raise a PROPOSAL
                                   ▼
        [ Bedrock: explain the diff in English ]
                                   │
                                   ▼
        [ Dashboard ]  route-health + proposed changes
                       RM reviews → APPROVE / REJECT
                       approved change → write back / export to CRM

Why this shape

The solver runs are bursty and CPU-heavy, so they live on Fargate/Batch (spin up, solve, spin down) rather than a standing server. Everything else — ingest, orchestration, the dashboard — is light and serverless. PostGIS holds geocodes and territory geometry so the geographic-compactness math is cheap and we're not re-geocoding the same addresses daily. The single most important architectural decision is caching the travel-time matrix aggressively: that API is the recurring cost, and naive re-querying every store-pair every run would be both slow and expensive.

Build Plan

Strictly phased, and the phasing is the risk management. We earn trust read-only first, then suggest, then optimize. Reps' incomes and commutes are on the line — we do not flip on an autonomous route-shuffler on day one.

Phase 0 — Discovery, data access & the matrix question

~1–2 weeks · the real unlock

Confirm and wire read access to Predian (billing), Rippling (home addresses + hours), and the cars-missing-photos feed. Geocode the store list. Define exactly how billing maps to a utilization number and how hours map to capacity — this definition is a business decision, not just a data pull. Run a costed test of the Maps/Mapbox Matrix API at our store-pair volume so we know the run cost before we commit. Nothing downstream is real until these feeds and that cost are confirmed.

Phase 1 — Route-health analytics (READ-ONLY)

~2–3 weeks · "we can finally see it"

No optimization yet. Build the utilization dashboard: every rep's billing-vs-capacity, current territory on a map, drive-time estimate, and cars-missing-photos backlog per store. This alone is valuable — for the first time Redline sees which routes are off-target. It also validates the data quality before we let a solver act on it. Trust starts here.

Phase 2 — The solver, run in shadow

~3–5 weeks · "the math works on real data"

Stand up the OR-Tools model — territory layer + per-territory CVRPTW — and the demand forecast. Run it against live data but show proposals only to us, comparing solver output to what RMs actually did. Tune the objective weights (drive vs utilization vs commute) against reality. This is the longest, hardest phase and where the genuine engineering lives. We expect to iterate on the model here, not ship it the first week.

Phase 3 — Suggest changes to humans

~2–3 weeks · "RMs get explainable proposals"

Surface drift-triggered proposals to Regional Managers with the LLM-written explanation and a clear before/after (utilization, drive time, coverage). RM clicks Approve or Reject; nothing changes without approval. We track which proposals get accepted — that acceptance rate is the honest measure of whether the model is actually good.

Phase 4 — Continuous evaluation + write-back

~2–3 weeks · "it runs on a cadence"

Move to continuous evaluation: the system watches utilization, raises a proposal when a route drifts below floor or above target, and — once approved — exports the new route/territory back to the field-service CRM or to the reps. Add drift alerts so RMs are nudged, not surprised. Approval stays mandatory for territory changes indefinitely.

Phase 5 — Tuning & (optional) CRM displacement

ongoing · cost & consolidation win

Ongoing weight-tuning and forecast refinement. If the routing engine plus the route-health dashboard prove out, this becomes a credible path toward retiring part of the ~$60k/yr Mobiwork spend — but only after it has clearly earned that trust in production. We don't promise that on day one.

Honest estimate This is a multi-week-to-multi-month project even with AI-assisted dev, and the timeline is dominated by three things that AI coding does not shortcut: solver modeling and tuning, data integration across four systems, and change management with reps whose incomes and commutes move. Realistically: a useful read-only dashboard (Phase 1) inside the first month; a trustworthy proposal engine (Phase 3) is a multi-month effort. Anyone quoting "a couple weeks" for the whole thing has not modeled the solver.

Data & Access Needed

Four live data sources have to line up, plus a paid traffic/distance API. The schedule risk is concentrated here and in agreeing the utilization definition — not in the code.

What	Why we need it	Source / form
Cars-missing-photos per store (the demand signal)	Drives service-time estimates and where coverage is needed. The core input to the whole model.	Photo/inventory system — API or scheduled export, per-store counts
Billing per rep (Predian)	Numerator of utilization. Tells us which routes are under- or over-target.	Predian API or export, per-rep / per-period
MS home addresses (Rippling)	Routes start and end at home; commute fairness and territory compactness depend on it.	Rippling API / HR export → geocoded
Hours worked + cleared capacity (Rippling)	Denominator of utilization and the hard capacity constraint per rep.	Rippling API / HR export
Store addresses + geocodes	Every node in the routing graph. Cached so we geocode once, not daily.	Store roster → geocoding API → PostGIS
Store open-hours / access windows	Time-window constraints in the VRP — can't photograph a closed lot.	Store roster or sensible defaults to confirm
Traffic-aware distance/time matrix	Travel minutes between every relevant store-pair, with traffic. This is the paid dependency and the main run cost.	Google Maps or Mapbox Matrix API + key + billing account
The utilization definition	Exactly how billing→utilization and hours→capacity are computed, and the target/floor. A business decision we must agree before the solver means anything.	Working session with Redline ops leadership
MS → RM mapping + approval roles	To route each proposal to the manager who owns that rep's territory.	Org roster / HR export
CRM write path (later)	To push approved routes back to the field tool / reps.	Mobiwork API or export, TBC in Phase 4

Risks & Open Questions

The build risk is real but it's the people risk that can sink this. Moving someone's territory moves their income and their commute.

RISK · territory changes affect real incomes and commutes A "better" route on paper can mean a rep loses billable stores or gains a longer drive. If the system shuffles territories without human ownership, you will damage morale and trust faster than any efficiency gain repays. Mitigation: every territory change is RM-approved, never automatic; the objective explicitly penalizes commute increases and we cap how much any one rep's territory can change in a cycle; proposals always show the human impact, not just the math.

RISK · the solver is the project, and it is hard VRP with time windows plus territory balancing is NP-hard and genuinely difficult to model well. A naive solver produces routes that look optimal and are operationally absurd (ignoring lunch, real road networks, rep preferences). Mitigation: use proven OR-Tools metaheuristics under a time budget, decompose by region, validate every output in shadow mode against what RMs actually do before anyone acts on it, and tune weights against reality — not in the abstract.

RISK · garbage in (data quality across four systems) The model is only as good as the demand feed, the billing-to-utilization mapping, and the geocodes. Stale missing-photo counts or a wrong home address silently corrupt the optimization. Mitigation: the read-only Phase 1 dashboard exists partly to surface data quality before the solver acts on it; add validation and freshness checks on every feed.

RISK · distance-matrix API cost & rate limits Querying traffic-aware travel times across 100+ routes' worth of store-pairs can get expensive and rate-limited if done naively. Mitigation: aggressive caching, batch only the pairs that can plausibly be on a route, refresh traffic on a sensible cadence rather than every run, and cost-test in Phase 0.

RISK · over-optimization / churn Re-routing on daily noise would whipsaw territories and exhaust everyone. Mitigation: forecast demand and route against expected load; require a meaningful improvement threshold before proposing a territory change; keep the slow-clock / fast-clock split so daily routing tweaks don't trigger territory upheaval.

OPEN QUESTIONS What exactly is the utilization target and floor that triggers a change? · How fresh is the cars-missing-photos feed, and is it per-store? · How is service time per car estimated — is there historical timing data? · Do reps have fixed days/stores we must respect as hard constraints? · How sensitive is changing a rep's territory — is there a contractual or commission structure that constrains it? · Maps vs Mapbox for the matrix, and what's the acceptable monthly API spend? · Is the goal eventually to displace Mobiwork, or to run alongside it?

Cost to Own

Unlike the quick wins, the durable cost here is real: a paid distance-matrix API, periodic solver tuning, and ongoing change management. We say so plainly.

Run cost — the distance-matrix API is the line item The recurring infrastructure (Fargate solver bursts, Lambda, RDS, dashboard, the occasional Bedrock explanation call) is modest and predictable. The cost that actually matters is the Google Maps / Mapbox Matrix API — traffic-aware travel times across 100+ routes' worth of store-pairs. With aggressive caching and batching this is controllable, but it is a genuine, ongoing, usage-based bill that scales with how often you re-route. This must be budgeted, not waved away.

The honest version: the cost to own this is not "keep the lights on" — it's three things, and all three are ongoing:

Solver maintenance & tuning. As Redline's business changes — new stores, new reps, shifting priorities — the objective weights and constraints need revisiting. This is real, periodic engineering attention (think occasional focused work, not a full-time role, once it's stable), and it's the work that keeps the proposals trustworthy.
Data-integration upkeep. Four upstream systems (Predian, Rippling, the photo feed, the CRM) can each change their API or export format. Each connector is a thing to monitor and occasionally fix. More moving parts than a single-source project, by design.
Organizational change management. This is the cost people forget. Reps and RMs have to trust the proposals and adopt the workflow. That's training, communication, and tuning based on feedback — and it's arguably the largest cost of the whole effort, paid in attention rather than dollars. The phased, approval-gated, read-only-first rollout exists precisely to make this cost survivable.

Bottom line — scope this with eyes open The payoff is large: higher rep utilization against the $20k weekly billing goal, less paid windshield time, faster photo coverage where demand is highest, and a credible long-term path to reducing the ~$60k/yr CRM spend. But this is a BIG BUILD. The hard, expensive part is the solver, the four-way data integration, and the change management — not writing code, which AI accelerates. Treat it as a multi-month investment that earns trust in stages, and it's well worth doing. Treat it as a quick automation, and it will disappoint.