Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions website/docs/_blogs/2023-11-20-AgentEval/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ tags: [Evaluation]
**TL;DR:**
* As a developer of an LLM-powered application, how can you assess the utility it brings to end users while helping them with their tasks?
* To shed light on the question above, we introduce `AgentEval` — the first version of the framework to assess the utility of any LLM-powered application crafted to assist users in specific tasks. AgentEval aims to simplify the evaluation process by automatically proposing a set of criteria tailored to the unique purpose of your application. This allows for a comprehensive assessment, quantifying the utility of your application against the suggested criteria.
* We demonstrate how `AgentEval` work using [math problems dataset](https://docs.ag2.ai/latest/docs/blog/2023/06/28/MathChat) as an example in the [following notebook](https://github.com/ag2ai/ag2/blob/main/notebook/agenteval_cq_math.ipynb). Any feedback would be useful for future development. Please contact us on our [Discord](https://discord.gg/pAbnFJrkgZ).
* We demonstrate how `AgentEval` work using [math problems dataset](https://docs.ag2.ai/latest/docs/blog/2023/06/28/MathChat) as an example in the [following notebook](https://github.com/ag2ai/ag2/blob/main/notebook/agentchat_agenteval_cq_math.ipynb). Any feedback would be useful for future development. Please contact us on our [Discord](https://discord.gg/pAbnFJrkgZ).



Expand Down Expand Up @@ -56,7 +56,7 @@ critic = autogen.AssistantAgent(
)
```

Next, the critic is given successful and failed examples of the task execution; then, it is able to return a list of criteria (Fig. 1). For reference, use the [following notebook](https://github.com/ag2ai/ag2/blob/main/notebook/agenteval_cq_math.ipynb).
Next, the critic is given successful and failed examples of the task execution; then, it is able to return a list of criteria (Fig. 1). For reference, use the [following notebook](https://github.com/ag2ai/ag2/blob/main/notebook/agentchat_agenteval_cq_math.ipynb).

* The goal of `QuantifierAgent` is to quantify each of the suggested criteria (Fig. 1), providing us with an idea of the utility of this system for the given task. Here is an example of how it can be defined:

Expand Down
Loading