diff --git a/website/docs/_blogs/2023-11-20-AgentEval/index.mdx b/website/docs/_blogs/2023-11-20-AgentEval/index.mdx index f388c074530a..f1a16e171ef3 100644 --- a/website/docs/_blogs/2023-11-20-AgentEval/index.mdx +++ b/website/docs/_blogs/2023-11-20-AgentEval/index.mdx @@ -14,7 +14,7 @@ tags: [Evaluation] **TL;DR:** * As a developer of an LLM-powered application, how can you assess the utility it brings to end users while helping them with their tasks? * To shed light on the question above, we introduce `AgentEval` — the first version of the framework to assess the utility of any LLM-powered application crafted to assist users in specific tasks. AgentEval aims to simplify the evaluation process by automatically proposing a set of criteria tailored to the unique purpose of your application. This allows for a comprehensive assessment, quantifying the utility of your application against the suggested criteria. -* We demonstrate how `AgentEval` work using [math problems dataset](https://docs.ag2.ai/latest/docs/blog/2023/06/28/MathChat) as an example in the [following notebook](https://github.com/ag2ai/ag2/blob/main/notebook/agenteval_cq_math.ipynb). Any feedback would be useful for future development. Please contact us on our [Discord](https://discord.gg/pAbnFJrkgZ). +* We demonstrate how `AgentEval` work using [math problems dataset](https://docs.ag2.ai/latest/docs/blog/2023/06/28/MathChat) as an example in the [following notebook](https://github.com/ag2ai/ag2/blob/main/notebook/agentchat_agenteval_cq_math.ipynb). Any feedback would be useful for future development. Please contact us on our [Discord](https://discord.gg/pAbnFJrkgZ). @@ -56,7 +56,7 @@ critic = autogen.AssistantAgent( ) ``` -Next, the critic is given successful and failed examples of the task execution; then, it is able to return a list of criteria (Fig. 1). For reference, use the [following notebook](https://github.com/ag2ai/ag2/blob/main/notebook/agenteval_cq_math.ipynb). +Next, the critic is given successful and failed examples of the task execution; then, it is able to return a list of criteria (Fig. 1). For reference, use the [following notebook](https://github.com/ag2ai/ag2/blob/main/notebook/agentchat_agenteval_cq_math.ipynb). * The goal of `QuantifierAgent` is to quantify each of the suggested criteria (Fig. 1), providing us with an idea of the utility of this system for the given task. Here is an example of how it can be defined: