Wednesday, June 17, 2026

Architecting Intelligent Carrier Selection: From Data Foundations to Real-Time Optimization in Logistics Networks

Related stories

Most logistics networks I’ve worked with were leaving real money on the table at the moment a shipment was handed off to a carrier. Multi-carrier networks look efficient on paper, with negotiated rates, regional coverage, and a planner who knows the map, but the actual handoff decision is usually made by static rules written years ago, on data that no longer reflects what is happening in the network today. As an engineering leader at Wayfair and previously at Amazon, I’ve spent the last several years building systems that turn that handoff into a learned, data-driven decision rather than a guess. In this column, I want to walk through what an AI-driven carrier selection engine actually looks like inside a real platform, from the events you need to capture, to the model you train on top of them, to the guardrails that keep the whole thing safe under load.

When people ask me what carrier selection really is, I tell them it is a constrained optimization problem dressed up as a routing problem. You have a shipment with a promised delivery window, a service level, and a destination. You have a roster of carriers, each with its own price card, capacity ceiling, lane coverage, and historical reliability profile. The job of the system is to pick one, and to do it tens of thousands of times an hour, across many warehouses, without violating contracts, without overflowing any single carrier, and without quietly degrading customer experience.

Why carrier selection projects fail at the data layer, not the model layer

I have never seen a carrier selection project fail because the model was wrong. I have seen plenty fail because the data underneath the model was wrong, late, or inconsistent across systems. Before anyone writes a line of training code, the data layer has to be honest.

At a minimum, you need a clean event stream and a clean tabular store. The event stream captures every status transition a shipment goes through, from tender to delivery, and each event needs a stable identifier, a source timestamp, and a code that maps to a normalized vocabulary across carriers. That last part is harder than it sounds. Two carriers will tell you “delayed” using fifteen different status strings. You need a normalization layer that translates carrier-specific codes into a single internal taxonomy.

The tabular store holds the slower-moving truth. Lane definitions and origin-destination pairs. Negotiated rate cards by service level. Historical on-time performance by carrier, lane, and time of year. Capacity commitments and burn-down against them. Warehouse cut-off times. Transit-time distributions, not averages. The 90th percentile is what determines whether you make a promise.

For features, the ones that earned their keep were on-time rate by carrier-lane-week, exception rate by carrier-lane, capacity utilization against committed volume, carrier transit-time distribution conditioned on day-of-week, and a freshness flag that tells you how stale the underlying signal is. A model that doesn’t know its own data is two days old will quietly make confident, wrong decisions for two days.

Estimators, optimizers, and policies: the three layers of the carrier selection model

Once the data layer is honest, the model itself is more straightforward than people expect. I think of it as three layers stacked together.

Figure 1. Three-layer carrier selection model. Estimators feed into the optimizer, which is wrapped by the policy layer before producing a final carrier decision.

 

The first layer is a set of estimators: expected transit time and its uncertainty, probability of an exception or missed window, and effective unit cost after surcharges and accessorials. The workhorses here are gradient-boosted tree models (XGBoost, LightGBM, CatBoost), which have become the documented standard for transit-time and ETA prediction in transportation because they handle tabular feature data well, train fast, and serve at single-digit milliseconds. Train on the historical event stream, retrain weekly, and evaluate against a held-out window that respects time order. Shuffling introduces time leakage.

The second layer is the optimizer. The estimators feed into an objective function that minimizes expected total landed cost, subject to two kinds of constraints. Hard constraints cover capacity ceilings, geographic coverage, and the customer’s promised window. Soft constraints cover volume concentration, contractual minimums, and carrier share targets. The optimizer can be as simple as a linear program for a single warehouse and as complex as a multi-leg assignment problem when you are routing across the first, middle, and last mile.

The third layer is the policy layer, and the one most teams underinvest in. The policy layer wraps the optimizer with the things a business actually cares about, including blacklists for carriers that are in dispute, manual overrides during incidents, fairness rules across carriers in a tier, holiday-week behavior, and a kill switch. Skip it, and every business request becomes a model retrain.

Also Read: What is Data Sprawl? Causes, Risks, and How to Control It in 2025

Training offline, deciding online: the architecture behind real-time carrier selection

The system architecture I’ve found most workable is a clean split between offline and online, with a deliberate bridge. 

Figure 2. Online and offline architecture, with the bridge.

 

The offline pipeline produces a versioned model bundle weekly. A feature store contract and shadow/replay tooling form the bridge that prevents training-serving skew before the model reaches the request path.

Offline is where you train, backtest, and produce the versioned bundle, a set of model artifacts, a feature store snapshot, and a config that the online system pulls atomically. Online is where the actual selection happens, on the request path, fast and deterministic. We built ours on Java and Kotlin services backed by Postgres, Kafka for the event bus, and Python for the model-serving tier.

The bridge is the part most teams underbuild. You need a feature store or an equivalent contract that guarantees the features the model sees in production are computed the same way as the features it was trained on. The failure mode this prevents has a name in the literature, training-serving skew, and it is the single most common cause of a model that looks great in offline backtests and quietly degrades the moment it touches live traffic. Open-source feature stores like Feast and managed platforms like Tecton exist for exactly this reason. You need shadow mode, where the new model scores every decision in parallel with the current production policy without actually changing behavior. And you need a replay tool to diagnose divergence before it reaches the customer. None of it is glamorous, and all of it is the difference between a model that survives its first peak week and one that doesn’t.

From shadow mode to full production: how to launch without breaking logistics

A carrier selection system is a high-leverage decision system. It is also one of the easiest places to do real damage if you ship a bad model into a live network. The guardrails matter at least as much as the model.

Three guardrails earned their place every time I’ve built one of these. First, hard caps per carrier are enforced in the policy layer regardless of what the optimizer wants. Second, fairness floors, so that strategic partners don’t lose all their share to a marginal short-term cost win. Third, an explicit out-of-distribution check at scoring time.

 

Figure 3. Rollout pattern, shadow to full production. New models advance through shadow mode and progressive canary stages, with automatic rollback on any metric breach.

For rollout, the pattern is shadow mode first, then a small canary by region or warehouse, then a controlled A/B by shipment, then a stepped ramp to full production. At each stage, you measure the same handful of business metrics, namely total landed cost per shipment, on-time rate, first-attempt success rate, exception rate, and carrier mix versus contractual commitments. If any of those move in the wrong direction beyond a defined threshold, the rollback is automatic.

The numbers behind AI-driven carrier selection

When we rolled this out across a continental delivery network, nine engineering teams, and more than fifty warehouses, the production result was a seven percent annual reduction in logistics cost. The reduction held quarter over quarter, because the system kept learning as the network changed and the policy layer caught edge cases before they reached the customer.

The seven percent number is not magic. It is what happens when a static rules engine is replaced by something that understands lane-level reliability, capacity pressure, and the cost of a missed window, and is allowed to act on that understanding within explicit guardrails. The published industry context suggests that seven percent is on the conservative end of what is possible, not the upper bound.

Three lessons for engineering leaders before building a carrier selection engine

The data layer is the project. If the data layer isn’t honest, the model layer will look impressive in a backtest and quietly destroy value in production. Spend two-thirds of your first quarter on the data, and don’t apologize for it.

The policy layer is engineering, not configuration. Caps, fairness, kill switches, manual overrides, and fallback to a known-good policy are all things you have to design with the same rigor as the model itself.

Never run a new model in full production without shadow mode first. The single most reliable predictor of a successful rollout I’ve seen is a team that ran their candidate in shadow for at least two weeks, fixed three things they didn’t expect, and only then asked for the canary.

Carrier selection is one of the highest-leverage problems in last-mile logistics, and one of the most underbuilt. Clean data, an honest model, and a real policy layer will not make a good conference talk. They will quietly save several million dollars a year.

Subscribe

- Never miss a story with notifications


    Latest stories