Add MTEB Scandinavian results for nicher92/saga-embed_v1#483
Add MTEB Scandinavian results for nicher92/saga-embed_v1#483KennethEnevoldsen merged 3 commits intoembeddings-benchmark:mainfrom
Conversation
|
Updated the BornholmBitextMining scores. My code PR refactor fixed a small routing bug where the empty string prompt wasn't being applied correctly, so the score for this task improved slightly. |
|
link to PR: embeddings-benchmark/mteb#4371 |
There was a problem hiding this comment.
Your files should be in folder results/nicher92__saga-embed_v1/3be07ac3d7c3e00e4402ae9285b23fcf8fda6735
There was a problem hiding this comment.
Thanks for the comment, I have moved it.
Model Results ComparisonReference models: Results for
|
| task_name | google/gemini-embedding-001 | nicher92/saga-embed_v1 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| AngryTweetsClassification | 0.6413 | 0.634 | 0.5487 | 0.6975 | Qwen/Qwen3-Embedding-8B | False |
| BornholmBitextMining | 0.5169 | 0.3603 | 0.4416 | 0.7798 | jinaai/jina-embeddings-v5-text-small | False |
| DKHateClassification | 0.8702 | 0.7371 | nan | 0.8702 | google/gemini-embedding-001 | False |
| DalajClassification | 0.5047 | 0.5035 | 0.5001 | 0.6586 | microsoft/harrier-oss-v1-27b | False |
| DanFeverRetrieval | 0.4170 | 0.3885 | 0.4087 | 0.4170 | google/gemini-embedding-001 | False |
| DanishPoliticalCommentsClassification | 0.5207 | 0.4622 | 0.3826 | 0.5456 | Qwen/Qwen3-Embedding-4B | False |
| LccSentimentClassification | 0.6993 | 0.6387 | 0.594 | 0.7687 | Alibaba-NLP/gte-Qwen2-7B-instruct | False |
| MassiveIntentClassification | 0.8492 | 0.6985 | 0.6542 | 0.8908 | codefuse-ai/F2LLM-v2-14B | False |
| MassiveScenarioClassification | 0.8955 | 0.74 | 0.7047 | 0.9422 | codefuse-ai/F2LLM-v2-14B | False |
| NoRecClassification | 0.6427 | 0.5573 | 0.5607 | 0.6751 | codefuse-ai/F2LLM-v2-14B | False |
| NorQuadRetrieval | 0.3393 | 0.2576 | 0.2558 | 0.3508 | openai/text-embedding-3-large | False |
| NordicLangClassification | 0.8597 | 0.9111 | 0.8015 | 0.9578 | microsoft/harrier-oss-v1-27b | False |
| NorwegianCourtsBitextMining | 0.9342 | 0.9327 | 0.9404 | 0.9481 | jinaai/jina-embeddings-v5-text-nano | False |
| NorwegianParliamentClassification | 0.5672 | 0.5691 | 0.5614 | 0.7007 | Qwen/Qwen3-Embedding-8B | False |
| SNLHierarchicalClusteringP2P | 0.6141 | 0.5684 | 0.5592 | 0.6512 | Salesforce/SFR-Embedding-2_R | False |
| SNLHierarchicalClusteringS2S | 0.5991 | 0.5711 | 0.5591 | 0.6486 | Qwen/Qwen3-Embedding-8B | False |
| SNLRetrieval | 0.9907 | 0.9759 | 0.9548 | 0.9907 | google/gemini-embedding-001 | False |
| ScalaClassification | 0.5185 | 0.5212 | 0.5157 | 0.9112 | microsoft/harrier-oss-v1-27b | False |
| SweFaqRetrieval | 0.8475 | 0.8506 | 0.8004 | 0.8682 | codefuse-ai/F2LLM-v2-14B | False |
| SweRecClassification | 0.8661 | 0.8329 | 0.7749 | 0.8757 | Qwen/Qwen3-Embedding-8B | False |
| SwedishSentimentClassification | 0.9713 | 0.9646 | 0.9318 | 0.9740 | codefuse-ai/F2LLM-v2-14B | False |
| SwednClusteringP2P | 0.4584 | 0.4046 | 0.3691 | 0.6213 | Qwen/Qwen3-Embedding-4B | False |
| SwednClusteringS2S | 0.3332 | 0.2292 | 0.2 | 0.4309 | Qwen/Qwen3-Embedding-8B | False |
| SwednRetrieval | 0.8580 | 0.8457 | 0.7916 | 0.8580 | google/gemini-embedding-001 | False |
| TV2Nordretrieval | 0.9712 | 0.9589 | 0.9537 | 0.9785 | voyageai/voyage-code-3 | False |
| TwitterHjerneRetrieval | 0.9802 | 0.7816 | 0.3522 | 0.9802 | google/gemini-embedding-001 | False |
| VGHierarchicalClusteringP2P | nan | 0.4588 | 0.44 | 0.5243 | codefuse-ai/F2LLM-v2-14B | False |
| VGHierarchicalClusteringS2S | 0.4602 | 0.4213 | 0.3824 | 0.4683 | Salesforce/SFR-Embedding-2_R | False |
| Average | 0.6936 | 0.6348 | 0.5903 | 0.7494 | nan | - |
Training datasets: NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoNQ-VN, NanoNQRetrieval, RedditClustering, RedditClustering-VN, RedditClustering.v2, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2
Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.
|
Hi, if everything looks good here, can we merge it? Thank you. |
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Results looks good - thanks for the ping!
Description
Adding MTEB Scandinavian results for
nicher92/saga-embed_v1.Checklist
mteb/models/model_implementations/