Skip to content

Add MTEB Scandinavian results for nicher92/saga-embed_v1#483

Merged
KennethEnevoldsen merged 3 commits intoembeddings-benchmark:mainfrom
nicher92:main
Apr 30, 2026
Merged

Add MTEB Scandinavian results for nicher92/saga-embed_v1#483
KennethEnevoldsen merged 3 commits intoembeddings-benchmark:mainfrom
nicher92:main

Conversation

@nicher92
Copy link
Copy Markdown
Contributor

@nicher92 nicher92 commented Apr 14, 2026

Description

Adding MTEB Scandinavian results for nicher92/saga-embed_v1.

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/
    • No, but there is an existing PR: #4371
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

@nicher92
Copy link
Copy Markdown
Contributor Author

Updated the BornholmBitextMining scores. My code PR refactor fixed a small routing bug where the empty string prompt wasn't being applied correctly, so the score for this task improved slightly.

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

link to PR: embeddings-benchmark/mteb#4371

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your files should be in folder results/nicher92__saga-embed_v1/3be07ac3d7c3e00e4402ae9285b23fcf8fda6735

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, I have moved it.

@github-actions
Copy link
Copy Markdown

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: nicher92/saga-embed_v1
Tasks: AngryTweetsClassification, BornholmBitextMining, DKHateClassification, DalajClassification, DanFeverRetrieval, DanishPoliticalCommentsClassification, LccSentimentClassification, MassiveIntentClassification, MassiveScenarioClassification, NoRecClassification, NorQuadRetrieval, NordicLangClassification, NorwegianCourtsBitextMining, NorwegianParliamentClassification, SNLHierarchicalClusteringP2P, SNLHierarchicalClusteringS2S, SNLRetrieval, ScalaClassification, SweFaqRetrieval, SweRecClassification, SwedishSentimentClassification, SwednClusteringP2P, SwednClusteringS2S, SwednRetrieval, TV2Nordretrieval, TwitterHjerneRetrieval, VGHierarchicalClusteringP2P, VGHierarchicalClusteringS2S

Results for nicher92/saga-embed_v1

task_name google/gemini-embedding-001 nicher92/saga-embed_v1 intfloat/multilingual-e5-large Max result Model with max result In Training Data
AngryTweetsClassification 0.6413 0.634 0.5487 0.6975 Qwen/Qwen3-Embedding-8B False
BornholmBitextMining 0.5169 0.3603 0.4416 0.7798 jinaai/jina-embeddings-v5-text-small False
DKHateClassification 0.8702 0.7371 nan 0.8702 google/gemini-embedding-001 False
DalajClassification 0.5047 0.5035 0.5001 0.6586 microsoft/harrier-oss-v1-27b False
DanFeverRetrieval 0.4170 0.3885 0.4087 0.4170 google/gemini-embedding-001 False
DanishPoliticalCommentsClassification 0.5207 0.4622 0.3826 0.5456 Qwen/Qwen3-Embedding-4B False
LccSentimentClassification 0.6993 0.6387 0.594 0.7687 Alibaba-NLP/gte-Qwen2-7B-instruct False
MassiveIntentClassification 0.8492 0.6985 0.6542 0.8908 codefuse-ai/F2LLM-v2-14B False
MassiveScenarioClassification 0.8955 0.74 0.7047 0.9422 codefuse-ai/F2LLM-v2-14B False
NoRecClassification 0.6427 0.5573 0.5607 0.6751 codefuse-ai/F2LLM-v2-14B False
NorQuadRetrieval 0.3393 0.2576 0.2558 0.3508 openai/text-embedding-3-large False
NordicLangClassification 0.8597 0.9111 0.8015 0.9578 microsoft/harrier-oss-v1-27b False
NorwegianCourtsBitextMining 0.9342 0.9327 0.9404 0.9481 jinaai/jina-embeddings-v5-text-nano False
NorwegianParliamentClassification 0.5672 0.5691 0.5614 0.7007 Qwen/Qwen3-Embedding-8B False
SNLHierarchicalClusteringP2P 0.6141 0.5684 0.5592 0.6512 Salesforce/SFR-Embedding-2_R False
SNLHierarchicalClusteringS2S 0.5991 0.5711 0.5591 0.6486 Qwen/Qwen3-Embedding-8B False
SNLRetrieval 0.9907 0.9759 0.9548 0.9907 google/gemini-embedding-001 False
ScalaClassification 0.5185 0.5212 0.5157 0.9112 microsoft/harrier-oss-v1-27b False
SweFaqRetrieval 0.8475 0.8506 0.8004 0.8682 codefuse-ai/F2LLM-v2-14B False
SweRecClassification 0.8661 0.8329 0.7749 0.8757 Qwen/Qwen3-Embedding-8B False
SwedishSentimentClassification 0.9713 0.9646 0.9318 0.9740 codefuse-ai/F2LLM-v2-14B False
SwednClusteringP2P 0.4584 0.4046 0.3691 0.6213 Qwen/Qwen3-Embedding-4B False
SwednClusteringS2S 0.3332 0.2292 0.2 0.4309 Qwen/Qwen3-Embedding-8B False
SwednRetrieval 0.8580 0.8457 0.7916 0.8580 google/gemini-embedding-001 False
TV2Nordretrieval 0.9712 0.9589 0.9537 0.9785 voyageai/voyage-code-3 False
TwitterHjerneRetrieval 0.9802 0.7816 0.3522 0.9802 google/gemini-embedding-001 False
VGHierarchicalClusteringP2P nan 0.4588 0.44 0.5243 codefuse-ai/F2LLM-v2-14B False
VGHierarchicalClusteringS2S 0.4602 0.4213 0.3824 0.4683 Salesforce/SFR-Embedding-2_R False
Average 0.6936 0.6348 0.5903 0.7494 nan -

Training datasets: NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoNQ-VN, NanoNQRetrieval, RedditClustering, RedditClustering-VN, RedditClustering.v2, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2



Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

@nicher92
Copy link
Copy Markdown
Contributor Author

Hi, if everything looks good here, can we merge it? Thank you.

Copy link
Copy Markdown
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results looks good - thanks for the ping!

@KennethEnevoldsen KennethEnevoldsen merged commit aecf2ce into embeddings-benchmark:main Apr 30, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants