How to use expert feedback to optimize AI agents?
In many real-world applications, there is no clear ground truth label for what a “good” agent response is. Often, all we have is user feedback and preferences (“this is wrong”, “missing context”, “too verbose”, etc.). This feedback is an extremely valuable supervision signal, but turning it into effective optimization of agent behavior is not straightforward:
Stochasticity & replay
To learn from feedback, we often need to “replay” the original sample or trace. But agentic systems (with tools, RAG, branching, etc.) are stochastic, so re-running the same input may not reproduce the same trajectory or output.
Linking feedback to replays
Even if we can approximate the original run, evaluating a new or re-played trace against the old feedback is non-trivial. The feedback is textual, often high-level and contextual, not a simple scalar reward.
Optimizing config and structure
Finally, we want to optimize both the agent configuration (prompts, hyperparameters, tools, thresholds) and the agent graph/structure (which nodes, in what order, with what routing). Jointly optimizing these under noisy, text-based feedback is a challenging learning and search problem.
In this notebook, using an agentic RAG example, we show how to operationalize this:
📝 Convert user feedback on agentic runs into an annotation benchmark on RELAI
🎯 Use the Maestro agent optimizer to consume that benchmark and automatically improve both the config and the graph of the agent
🔁 Close the loop from user preference → benchmark → optimization → better agent in a reproducible, data-driven way
🔗 Notebook: https://lnkd.in/eWXRxHEz
Powered by RELAI (relai.ai)