How humans and AI actually complete a task together

If you have run an AI pilot in an operations team in the past two years, you have probably encountered at least one of three patterns. The first is overflow: the AI produces so many proposals that the team reviewing them falls behind within a week, and the notifications quietly start getting ignored. The second is stall: the AI stops at the first decision point and waits for a human to review, and the process gets stuck because the human has other things to do and cannot always be available at the moment the AI asks. The third is miss: a condition was supposed to trigger AI action and nothing happened, because no one was watching the trigger and the AI had no way to fire on its own.

Each of these is usually described as a UX problem. The notifications are too aggressive, the approval flow is too demanding, the trigger is poorly configured. In our experience, these are not UX problems. They are symptoms of a mismatch between how the AI is actually being asked to interact with the team and how the platform underneath supports that interaction. The tool treats every AI engagement as the same thing, and the team ends up bending the work around that uniformity until something fails in a way that looks individual but is structural.

Three modes, distinguished by context

When we look carefully at how one human and one AI combine on one task, three distinct interaction modes show up. They are distinguished not by who has the final decision authority, which most frameworks emphasise, but by how context flows between the AI and the human at that step. Context, not authority, is the coupling mechanism. Getting this distinction right is the difference between an AI deployment that feels natural and one that feels invasive.

In supervision, the AI holds the operational context and the human samples it. The AI observes state continuously, seeing what is running, what is committed, what thresholds are approaching, what deviations are surfacing. The human checks in when attention allows, reviews what the AI has noticed, and decides whether anything warrants action. The asymmetry here is about observation bandwidth. The AI can watch everything without fatigue. The human cannot, but can bring judgment when something genuinely matters.

Supervision is the mode most operations teams want from AI, and it is the mode most tools fail to deliver. The failure mode is overflow: the AI's observations become notifications, the notifications accumulate, and the human's sampling behaviour degrades into mass-dismissal. The cause is that the AI is not actually supervising. It is forwarding. True supervision requires the AI to prioritise, cluster, and surface only what the operational context says matters. That in turn requires context structured enough for the AI to know what "matters" means in this setting.

In pair execution, the AI and the human share context turn by turn. The AI proposes an action within a step, the human reviews and accepts or modifies, the context updates, the AI proposes the next thing. At each turn, the AI's working context is shaped by the human's input, and the human's attention is anchored by the AI's proposal. The asymmetry is about judgment density. The AI reasons fast but imperfectly; the human reasons slowly but with context and accountability the AI lacks.

Pair execution is what most AI products actually offer, and it is appropriate for work that is genuinely paired. Writing a proposal, reviewing a plan, preparing a decision. It fails when it is applied to work that does not need per-step human judgment. The failure mode is stall: the AI stops at every boundary, the human is needed to keep the process moving, and the human's day gets in the way of the work. What the team wanted as AI-assisted work becomes AI-constrained work.

In delegation, the AI acts within a bounded context projected by the dispatch layer, on a step whose preconditions, constraints, and successor are all defined by the process specification. Delegation is not autonomy. The human is still accountable; the human is simply not cognitively present at the moment of execution. The asymmetry is about scope, not authority. The AI operates inside a well-defined operational envelope: what it can see, what it can propose, what it can commit, what it must return. That envelope is projected from the graph, not constructed by the AI.

When the envelope is well-defined, delegation is safe. When it is vague, delegation is dangerous, and teams are right to be nervous. The risk in delegation is never the AI's reasoning capability; it is the quality of the envelope the AI is operating in.

The collapse

Here is where most AI deployments in operations go wrong. Most tools support only pair execution natively. They can embed a chat interface in a workflow step, pass some context to a model, and surface a proposal for a human to review. That is what the underlying architecture is designed to do.

Teams then simulate supervision by adding dashboards and alerts on top of the pair-execution tool. The AI observes through the same mechanism it uses for pair execution, just with broader context. The observations become messages, the messages become notifications, and overflow follows. Teams simulate delegation by writing longer prompts. The prompt tells the AI what the step is, what the constraints are, what the expected output looks like, what to do if something goes wrong. The longer the prompt, the more the AI's behaviour appears to be delegated. But the prompt is a reconstruction of context that should have been projected, and the reconstruction can be wrong, incomplete, or out of date.

In both cases, the mode collapses back to pair execution, just poorly. Supervision becomes noisy; delegation becomes risky. The team concludes that AI is not ready for their operations, when in fact the tool has never delivered the modes they needed. Deciding between AI platforms on the basis of which one has the better chat interface or the larger context window is orthogonal to the problem. The problem is the architecture beneath the interaction.

Context as the coupling mechanism

The real work in designing an AI-integrated task is not choosing which interaction mode to use. It is designing what context the step owns, what it inherits from the surrounding structure, and what it returns when it completes. Get the context projection right and the three modes become naturally available at different steps. Get it wrong and every step falls back to pair execution, with a tired human in the loop.

Consider a logistics dispatch step where a mission needs to be assigned to a barge. The context the AI needs includes the current fleet state, the mission's origin and destination constraints, the temporal window defined by lock schedules on the route, and the set of barges whose current positions and existing commitments make them candidates. The context the AI does not need includes the full history of every barge's maintenance records, the identities of the drivers, or the financial terms of the mission. The difference between useful context and noise is a structural question about the graph underneath the step, not a question about prompt engineering.

If this context is projected by the system at the moment the step is created, from typed relationships in the operational graph, the AI can run in delegation mode for well-constrained missions and in pair-execution mode for the ambiguous ones. If the context has to be assembled by the AI through retrieval queries or prompt templates, only pair execution is available, and the team has to be present for every assignment. The mode is not a policy choice made at runtime. It is a property that emerges from how well the step is structurally specified.

The mode is not a policy choice made at runtime. It is a property of how well the step is structurally specified.

A practical test

For any step in your current process where AI is involved, two questions are worth asking. What does the AI need to see to do this step, and where does that context come from? What happens if a human is not available at the moment the step needs to execute?

If the answer to the first question is that the user types the relevant context into a prompt, the step is structurally locked into pair execution, regardless of what the tool promises. If the answer to the second question is that nothing happens and the process stalls, you are running the step in pair execution even if you wanted delegation. These are not failings of the team or of the model. They are design constraints produced by the architecture underneath the tool. When that architecture cannot project context structurally, the three interaction modes collapse into one, and the one that remains is the one that demands the most from human attention.

The next altitude

The three-mode framework is a diagnostic, not a taxonomy to argue about. When an AI deployment feels wrong, it usually feels wrong because the mode the team is running does not match the mode the work requires. Sometimes the work is genuinely pair-execution work and the tooling is fine. Often it is not, and the tooling is quietly limiting what is possible.

The architectural implication is that these modes become structurally available only when the process above the step is specified clearly enough for context to be projected rather than reconstructed. Interaction design at the task level depends on process design at the next level up. We take up that question in the next post in this series, where the three planning modes at the process level become the structural foundation that makes the three interaction modes at the task level possible.

For now, the question worth sitting with. In your current AI deployment, which of the three modes is each step actually running in, and which one did you want it to run in?