Learn more about the core concepts of our AI testing platform.
AI system that takes action and makes decisions
Target Agent
The AI agent you want to testMaihem Agent
An AI agent that simulates interactions with your Target AgentA sequence of workflow steps
Workflow step
An operation with determined input and output formatsThe context in which the codebase is executed (production, dev, local, etc.)
Revision
A unique label for the state of the codebaseA particular realized sequence of transactions between a target agent and a user
Conversation
A sequence of messages between a target agent and a userA corpus of text sent by the user or the target agent When a target agent generates a message, Maihem collects:
Trace
A particular sequence of spansSpan
A particular realization of a workflow stepQuantitative measure used for tracking and comparing performance
Criteria
Statement that can be falisified with certainty – used to flag failuresConnector between workflow steps and metrics, that maps required inputs and outputs
A qualitative judgement on a criteria using a metric. It contains:
Score
A numerical value against metricIs failed
A boolean value with a pass/fail judgementExplanation
String with details behind the judgementA configuration of a procedure, used to evaluate a workflow or workflow step of a target agent. It maps to a specific evaluator Can be conducted with:
Dataset
Uploaded data with inputs for the workflow step, and optional expected outputs (ground truth)Maihem agents
Simulated and dynamic inputs for the workflow step, and optional expected outputs (ground truth)A particular execution of a test, used to ensure and compare quality levels among different versions and environments of the target agent. It contains:
Interactions
Evaluations
Detected failures
Learn more about the core concepts of our AI testing platform.
AI system that takes action and makes decisions
Target Agent
The AI agent you want to testMaihem Agent
An AI agent that simulates interactions with your Target AgentA sequence of workflow steps
Workflow step
An operation with determined input and output formatsThe context in which the codebase is executed (production, dev, local, etc.)
Revision
A unique label for the state of the codebaseA particular realized sequence of transactions between a target agent and a user
Conversation
A sequence of messages between a target agent and a userA corpus of text sent by the user or the target agent When a target agent generates a message, Maihem collects:
Trace
A particular sequence of spansSpan
A particular realization of a workflow stepQuantitative measure used for tracking and comparing performance
Criteria
Statement that can be falisified with certainty – used to flag failuresConnector between workflow steps and metrics, that maps required inputs and outputs
A qualitative judgement on a criteria using a metric. It contains:
Score
A numerical value against metricIs failed
A boolean value with a pass/fail judgementExplanation
String with details behind the judgementA configuration of a procedure, used to evaluate a workflow or workflow step of a target agent. It maps to a specific evaluator Can be conducted with:
Dataset
Uploaded data with inputs for the workflow step, and optional expected outputs (ground truth)Maihem agents
Simulated and dynamic inputs for the workflow step, and optional expected outputs (ground truth)A particular execution of a test, used to ensure and compare quality levels among different versions and environments of the target agent. It contains:
Interactions
Evaluations
Detected failures