Let’s dive into a quick story.
Mike is a logistics manager for a big supply chain operation, overseeing shipments, inventory, and distribution schedules. One afternoon, an autonomous AI agent on his team flags an unexpected issue—a manufacturing delay at a supplier two levels down the chain.
Normally, Mike wouldn’t have caught the problem until shipments were already affected. But the AI agent doesn’t just dump data on his desk. Instead, it flags the risk like a sharp-eyed colleague, offering a recommendation.
“This supplier has a pattern of delays during peak season. Based on historical reliability and cost-effectiveness, I suggest switching to an alternate supplier with minimal disruption.”
Mike hesitates. Switching suppliers mid-cycle is a risk. He needs more confidence. Instead of making a knee-jerk decision, he turns back to the AI agent. “How accurate have your past disruption predictions been?”
The AI pulls up its track record. “Over the last 18 months, I’ve correctly identified disruptions 92% of the time.”
That helps, but Mike isn’t convinced yet. “What about supplier B? They’ve been reliable before.”
“Supplier B has a longer lead time and higher costs,” the AI replies. “However, if speed is your top priority, I can adjust my recommendation.”
Mike considers this. “What about supplier C? Any chance they can handle this?”
“Supplier C is an option, but they have a 15% failure rate on last-minute orders. If reducing risk is the goal, supplier A is still your best bet.”
Now, with a clear picture of his choices, Mike approves the switch to supplier A.
This is what a looped-in-human workflow looks like—where leaders like Mike don’t just use AI as a tool, they collaborate with it. They bounce solutions back and forth, refining decisions the way they would with a trusted colleague.
But trust and collaboration doesn’t happen overnight. It’s built over time, shaped by every interaction.
While Mike’s story is made up, the challenge is real.
As AI shifts from automation to autonomous collaboration, the way we interact with it has to evolve. In Designing Human-Machine Interactions in an Autonomous Agent World, I explored how AI isn’t just a tool but a collaborator, requiring new design principles to create meaningful human-agent interactions.
But designing better interactions is only half the equation—measuring their success is just as critical. At Outshift, we’re focused on exactly this: how do we build The Internet of Agents to work seamlessly with people and prove that agents are a reliable partner? That’s why measuring trust and collaboration in AI isn’t just a technical challenge; it’s a fundamental shift in how we define success.
Would Mike rely on it the next time? Would his colleagues? If supply chain teams were to track AI-assisted interventions, they wouldn’t just look at whether the AI flagged disruptions accurately—they’d also need to understand its impact on decision-making confidence, operational speed, and overall efficiency.
We’re used to tracking how often AI gets the right answer, but what about how it actually affects decision-making? Does it offer new insights that push people to think differently? Does it make experts more confident in their choices? These are the questions we should be asking.
Why Traditional AI Metrics Fall Short
Accuracy, Latency, Efficiency, Hallucination Rate, etc.
AI metrics, like these, revolve around the model and its efficiency.
Measuring only model performance is like rating a chef just on how fast they chop onions. It’s one metric, but it hardly tells the whole story.
Did the AI actually improve the way people made decisions? Did it make teamwork smoother or lighten the mental load? AI isn’t just about getting the right answer—it’s about fitting into real workflows, making people more capable, and proving itself as a reliable partner over time.
So how do we measure what actually matters for The Internet of Agents.
Measuring What Matters: New Metrics for AI-Human Collaboration
If we really want AI to work as a teammate, not just a tool, we need a new way to measure its performance.
At Outshift, we’re thinking a lot about how to measure agentic collaboration in the Internet of Agents. Here are some key areas where we believe new kinds of metrics are needed:
Decision-Making Metrics
Decision Latency: How long does it take from problem identification to resolution when AI is involved?
Trust Reinforcement: How often do people validate or refine AI suggestions, and does trust in AI improve over time?
Consensus Speed: How quickly can a team (including AI) align on a decision?
Workflow & Efficiency Metrics
Task Hand-off Effectiveness: Does AI seamlessly transition tasks between humans and agents, or does it create bottlenecks?
Cognitive Load Reduction: Does AI meaningfully reduce the mental effort required to complete a task, or does it add more work?
Hybrid Intelligence Utilization: Is AI being used to elevate human expertise, rather than just automate tasks?
Collaboration & Interaction Metrics
Human Override Rate: How often do users reject or correct AI-generated actions
Agent Coordination Efficiency: How well do multiple AI agents collaborate on a shared task?
AI Adoption Rate: How frequently do users accept AI suggestions without modification?
Self-Correction Rate: How often does AI revise its own output before human intervention is required?
Moving Beyond the Thumbs Up
Today’s AI feedback mechanisms are basic at best. A thumbs-up/thumbs-down rating system isn’t enough to understand whether AI is truly working for us.
We need structured, contextual feedback loops that offer deeper insights:
Contextual Feedback: Tracking how often AI contributions actually improve human decision-making, not just how quickly they respond.
Collaboration Satisfaction Surveys: Gauging how well AI integrates into workflows from the user’s perspective.
Adaptive Feedback Loops: AI that learns from human revisions, pauses, and escalations to continuously improve its performance.
Designing the Right Framework
It’s not just about picking the right metrics, it’s about designing AI systems that can be observed, evaluated, and improved based on those metrics. At Outshift we're considering ways to do this, including:
Building an AI-Human Performance Dashboard that visualizes trust levels, task handoff success, and collaboration trends.
Identifying Friction Points in workflows to refine AI behaviors and reduce inefficiencies.
Developing Real-Time Observability Tools to capture AI-human interaction data in a structured way.
The Future of AI Measurement
Right now, AI is mostly graded on how fast and accurate it is—like a student acing multiple-choice tests but never working in a group project. To really achieve The Internet of Agents, we need to shift the focus to how well AI actually collaborates with people.
If we measure that, we’re measuring the future of work itself.
So, that gives us a world where AI agents can discover and authenticate one another, share complex information securely, and adapt to uncertainty while collaborating across different domains. And users will be working with agents that will pursue complex goals with limited direct supervision, acting autonomously on behalf of them.
As a design team, we are actively shaping how we navigate this transformation. And one key question keeps emerging: How do we design AI experiences that empower human-machine teams, rather than just automate them?
The Agentic Teammate: Enhancing Knowledge Work
In this new world, AI agents become our teammates, offering powerful capabilities:
Knowledge Synthesis: Agents aggregate and analyze data from multiple sources, offering fresh perspectives on problems.
Scenario Simulation: Agents can create hypothetical scenarios and test them in a virtual environment, allowing knowledge workers to experiment and assess risks.
Constructive Feedback: Agents critically evaluate human-proposed solutions, identifying flaws and offering constructive feedback.
Collaboration Orchestration: Agents work with other agents to tackle complex problems, acting as orchestrators of a broader agentic ecosystem.
So, that gives us a world where AI agents can discover and authenticate one another, share complex information securely, and adapt to uncertainty while collaborating across different domains. And users will be working with agents that will pursue complex goals with limited direct supervision, acting autonomously on behalf of them.
As a design team, we are actively shaping how we navigate this transformation. And one key question keeps emerging: How do we design AI experiences that empower human-machine teams, rather than just automate them?
The Agentic Teammate: Enhancing Knowledge Work
In this new world, AI agents become our teammates, offering powerful capabilities:
Knowledge Synthesis: Agents aggregate and analyze data from multiple sources, offering fresh perspectives on problems.
Scenario Simulation: Agents can create hypothetical scenarios and test them in a virtual environment, allowing knowledge workers to experiment and assess risks.
Constructive Feedback: Agents critically evaluate human-proposed solutions, identifying flaws and offering constructive feedback.
Collaboration Orchestration: Agents work with other agents to tackle complex problems, acting as orchestrators of a broader agentic ecosystem.
Addressing the Challenges: Gaps in Human-Agent Collaboration
All this autonomous help is great, sure – but it's not without its challenges.
Autonomous agents have fundamental gaps that we need to address to ensure successful collaboration:
So, that gives us a world where AI agents can discover and authenticate one another, share complex information securely, and adapt to uncertainty while collaborating across different domains. And users will be working with agents that will pursue complex goals with limited direct supervision, acting autonomously on behalf of them.
As a design team, we are actively shaping how we navigate this transformation. And one key question keeps emerging: How do we design AI experiences that empower human-machine teams, rather than just automate them?
The Agentic Teammate: Enhancing Knowledge Work
In this new world, AI agents become our teammates, offering powerful capabilities:
Knowledge Synthesis: Agents aggregate and analyze data from multiple sources, offering fresh perspectives on problems.
Scenario Simulation: Agents can create hypothetical scenarios and test them in a virtual environment, allowing knowledge workers to experiment and assess risks.
Constructive Feedback: Agents critically evaluate human-proposed solutions, identifying flaws and offering constructive feedback.
Collaboration Orchestration: Agents work with other agents to tackle complex problems, acting as orchestrators of a broader agentic ecosystem.
Empowering Users with Control
Establishing clear boundaries for AI Agents to ensure they operate within a well-defined scope.
Designing Tomorrow's Human-Agent Collaboration At Outshift
These principles are the foundation for building effective partnerships between humans and AI at Outshift.
Building Confidence Through Clarity
Surface AI reasoning, displaying: Confidence Levels, realistic expectations, and the extent of changes to enable informed decision-making.
Always Try To Amplify Human Potential
Actively collaborate through simulations and come to an effective outcome together.
Let Users Stay In Control When It Matters
Easy access to detailed logs and performance metrics for every agent action, enabling the review of decisions, workflows, and ensure compliance. Include clear recovery steps for seamless continuity.
Take It One Interaction at a Time
See agent actions in context and observe agent performance in network improvement.
Addressing the Challenges: Gaps in Human-Agent Collaboration
All this autonomous help is great, sure – but it's not without its challenges.
Autonomous agents have fundamental gaps that we need to address to ensure successful collaboration:
Addressing the Challenges: Gaps in Human-Agent Collaboration
All this autonomous help is great, sure – but it's not without its challenges.
Autonomous agents have fundamental gaps that we need to address to ensure successful collaboration:
The Solution:
Five Design Principles for Human-Agent Collaboration
What to Consider:
Five Design Principles for Human-Agent Collaboration
Put Humans in the Driver's Seat
Users should always have the final say, with clear boundaries and intuitive controls to adjust agent behavior. An example of this is Google Photos' Memories feature which allows users to customize their slideshows and turn the feature off completely.
Make the Invisible Visible
The AI's reasoning and decision-making processes should be transparent and easy to understand, with confidence levels or uncertainty displayed to set realistic expectations. North Face's AI shopping assistant exemplifies this by guiding users through a conversational process and providing clear recommendations.
Anticipate edge cases to provide clear recovery steps, while empowering users to verify and adjust AI outcomes when needed. ServiceNow's Now Assist AI is designed to allow customer support staff to easily verify and adjust AI-generated insights and recommendations.
Collaborate, Don't Just Automate
Prioritize workflows that integrate human and AI capabilities, designing intuitive handoffs to ensure smooth collaboration. Aisera HR Agents demonstrate this by assisting with employee inquiries while escalating complex issues to human HR professionals.
Earn Trust Through Consistency:
Build trust gradually with reliable results in low-risk use cases, making reasoning and actions transparent. ServiceNow's Case Summarization tool is an example of using AI in a low-risk scenario to gradually build user trust in the system's capabilities.
Designing Tomorrow's Human-Agent Collaboration At Outshift
These principles are the foundation for building effective partnerships between humans and AI at Outshift.
As we refine our design principles and push the boundaries of innovation, integrating advanced AI capabilities comes with a critical responsibility. For AI to become a trusted collaborator—rather than just a tool—we must design with transparency, clear guardrails, and a focus on building trust. Ensuring AI agents operate with accountability and adaptability will be key to fostering effective human-agent collaboration. By designing with intention, we can shape a future where AI not only enhances workflows and decision-making but also empowers human potential in ways that are ethical, reliable, and transformative.
Because in the end, the success of AI won’t be measured by its autonomy alone—but by how well it works with us to create something greater than either humans or machines could achieve alone.
Designing Tomorrow's Human-Agent Collaboration At Outshift
These principles are the foundation for building effective partnerships between humans and AI at Outshift.
Empowering Users with Control
Establishing clear boundaries for AI Agents to ensure they operate within a well-defined scope.
Building Confidence Through Clarity
Surface AI reasoning, displaying:
Confidence levels
Realistic Expectations
Extent of changes to enable informed decision-making
Always Try To Amplify Human Potential
Actively collaborate through simulations and come to an effective outcome together.
Let Users Stay In Control When It Matters
Easy access to detailed logs and performance metrics for every agent action, enabling the review of decisions, workflows, and ensure compliance. Include clear recovery steps for seamless continuity.
Take It One Interaction At A Time
See agent actions in context and observe agent performance in network improvement.
As we refine our design principles and push the boundaries of innovation, integrating advanced AI capabilities comes with a critical responsibility. For AI to become a trusted collaborator—rather than just a tool—we must design with transparency, clear guardrails, and a focus on building trust. Ensuring AI agents operate with accountability and adaptability will be key to fostering effective human-agent collaboration. By designing with intention, we can shape a future where AI not only enhances workflows and decision-making but also empowers human potential in ways that are ethical, reliable, and transformative.
Because in the end, the success of AI won’t be measured by its autonomy alone—but by how well it works with us to create something greater than either humans or machines could achieve alone.