Skip to content

Add BidirLM/BidirLM-Omni-2.5B-Embedding results#481

Merged
KennethEnevoldsen merged 7 commits intoembeddings-benchmark:mainfrom
Nicolas-BZRD:add-bidirlm-omni-2.5b-results
Apr 21, 2026
Merged

Add BidirLM/BidirLM-Omni-2.5B-Embedding results#481
KennethEnevoldsen merged 7 commits intoembeddings-benchmark:mainfrom
Nicolas-BZRD:add-bidirlm-omni-2.5b-results

Conversation

@Nicolas-BZRD
Copy link
Copy Markdown
Contributor

BidirLM/BidirLM-Omni-2.5B-Embedding

Omnimodal encoder (text / image / audio), 2.5B parameters.
Paper: https://arxiv.org/abs/2604.02045
Hub: https://huggingface.co/BidirLM/BidirLM-Omni-2.5B-Embedding
Model implementation PR: embeddings-benchmark/mteb#4370

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/
  • The results submitted are obtained using the reference implementation
  • My model is available publicly on HuggingFace
  • I solemnly swear that for all results submitted I have not trained on the
    evaluation dataset including training splits. If I have, I have disclosed it
    clearly.

@KennethEnevoldsen KennethEnevoldsen changed the title feat: add BidirLM/BidirLM-Omni-2.5B-Embedding results Add BidirLM/BidirLM-Omni-2.5B-Embedding results Apr 13, 2026
@KennethEnevoldsen KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Apr 13, 2026
@KennethEnevoldsen KennethEnevoldsen removed the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Apr 16, 2026
@Nicolas-BZRD
Copy link
Copy Markdown
Contributor Author

@KennethEnevoldsen should I do something now that the other PR has been accepted?

@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 19, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: BidirLM/BidirLM-Omni-2.5B-Embedding
Tasks: AILAStatutes, AROCocoOrder, AROFlickrOrder, AROVisualAttribution, AROVisualRelation, AfriSentiClassification, AlloProfClusteringS2S.v2, AlloprofReranking, AmazonCounterfactualClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, ArmenianParaphrasePC, BLINKIT2IMultiChoice, BUCC.v2, BeijingOpera, BelebeleRetrieval, BibleNLPBitextMining, BigPatentClustering.v2, BiorxivClusteringP2P.v2, BirdCLEF, BornholmBitextMining, BrazilianToxicTweetsClassification, BulgarianStoreReviewSentimentClassfication, CEDRClassification, CIFAR100ZeroShot, CIRRIT2IRetrieval, CLSClusteringP2P.v2, CREMADPairClassification, CREMA_D, CREMA_DClustering, CSFDSKMovieReviewSentimentClassification, CTKFactsNLI, CUB200I2IRetrieval, CVBenchCount, CVBenchDepth, CVBenchDistance, CVBenchRelation, CataloniaTweetClassification, ClothoT2ARetrieval, CommonLanguageAgeDetection, CommonVoiceMini21T2ARetrieval, Core17InstructionRetrieval, Country211, Country211ZeroShot, CovidRetrieval, CyrillicTurkicLangClassification, CzechProductReviewSentimentClassification, DBpediaClassification, DTD, DalajClassification, DiaBlaBitextMining, EstonianValenceClassification, EuroSAT, FER2013ZeroShot, FGVCAircraftZeroShot, FSD2019Kaggle, FaroeseSTS, Fashion200kI2TRetrieval, FilipinoShopeeReviewsClassification, FinParaSTS, FinancialPhrasebankClassification, FleursT2ARetrieval, FloresBitextMining, Food101ZeroShot, GTSRB, GTZANAudioReranking, GTZANGenre, GermanSTSBenchmark, GigaSpeechT2ARetrieval, GreekLegalCodeClassification, GujaratiNewsClassification, HALClusteringS2S.v2, HagridRetrieval, HatefulMemesI2TRetrieval, IEMOCAPGender, IN22GenBitextMining, ImageCoDe, ImageNetDog15Clustering, IndicCrosslingualSTS, IndicGenBenchFloresBitextMining, IndicLangClassification, IndonesianIdClickbaitClassification, InfoSeekIT2TRetrieval, IsiZuluNewsClassification, ItaCaseholdClassification, JSICK, JamAltArtistA2ARetrieval, JamAltLyricA2TRetrieval, KorHateSpeechMLClassification, KorSarcasmClassification, KurdishSentimentClassification, LEMBPasskeyRetrieval, LegalBenchCorporateLobbying, MACST2ARetrieval, MIRACLRetrievalHardNegatives, MInDS14, MLQARetrieval, MacedonianTweetSentimentClassification, MalteseNewsClassification, MasakhaNEWSClassification, MasakhaNEWSClusteringS2S, MassiveIntentClassification, MedrxivClusteringP2P.v2, MridinghamTonic, MultiEURLEXMultilabelClassification, MultiHateClassification, NIGHTSI2IRetrieval, NMSQAPairClassification, NTREXBitextMining, NepaliNewsClassification, News21InstructionRetrieval, NollySentiBitextMining, NordicLangClassification, NorwegianCourtsBitextMining, NusaParagraphEmotionClassification, NusaTranslationBitextMining, NusaX-senti, NusaXBitextMining, OVENIT2TRetrieval, OdiaNewsClassification, OpusparcusPC, OxfordPets, OxfordPetsZeroShot, PAC, PatchCamelyon, PawsXPairClassification, PlscClusteringP2P.v2, PoemSentimentClassification, PolEmo2.0-OUT, PpcPC, PunjabiNewsClassification, RESISC45, RP2kI2IRetrieval, RTE3, RavdessZeroshot, Robust04InstructionRetrieval, RomaniBibleClustering, RuBQReranking, SCIDOCS, SIB200ClusteringS2S, SIBFLEURS, SICK-R, STS12, STS13, STS13VisualSTS, STS14, STS15, STS15VisualSTS, STS17, STS17MultilingualVisualSTS, STS22.v2, STSB, STSBenchmark, STSBenchmarkMultilingualVisualSTS, STSES, SUN397, ScalaClassification, SemRel24STS, SentimentAnalysisHindi, SinhalaNewsClassification, SiswatiNewsClassification, SlovakMovieReviewSentimentClassification, SpartQA, SpeechCommandsZeroshotv0.02, SpokenSQuADT2ARetrieval, SprintDuplicateQuestions, StackExchangeClustering.v2, StackOverflowQA, StanfordCarsZeroShot, StatcanDialogueDatasetRetrieval, SwahiliNewsClassification, SwednClusteringP2P, SwissJudgementClassification, T2Reranking, TERRa, TRECCOVID, Tatoeba, TempReasonL1, TinyImageNetClustering, ToxicConversationsClassification, TswanaNewsClassification, TweetTopicSingleClassification, TwitterHjerneRetrieval, TwitterURLCorpus, UrbanSound8KT2ARetrieval, VQA2IT2TRetrieval, VehicleSoundClustering, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreShiftProjectRetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreTabfquadRetrieval, VidoreTatdqaRetrieval, VisualNewsI2TRetrieval, VisualSTS-b-Multilingual, VoxCelebSA, VoxPopuliAccentPairClassification, VoxPopuliGenderClustering, VoxPopuliLanguageID, VoyageMMarcoReranking, WITT2IRetrieval, WebLINXCandidatesReranking, WebQAT2ITRetrieval, WikiCitiesClustering, WikiClusteringP2P.v2, WikipediaRerankingMultilingual, WikipediaRetrievalMultilingual, WinoGrande, Winoground, XM3600T2IRetrieval, XNLI, indonli

Results for BidirLM/BidirLM-Omni-2.5B-Embedding

task_name BidirLM/BidirLM-Omni-2.5B-Embedding google/gemini-embedding-001 intfloat/multilingual-e5-large Max result Model with max result In Training Data
AILAStatutes 0.4051 0.4877 0.2084 0.9451 Octen/Octen-Embedding-8B-INT8 False
AROCocoOrder 0.6297 nan nan 0.5283 jinaai/jina-clip-v1 False
AROFlickrOrder 0.7180 nan nan 0.5660 openai/clip-vit-base-patch32 False
AROVisualAttribution 0.6372 nan nan 0.7691 Salesforce/blip-itm-base-coco False
AROVisualRelation 0.5364 nan nan 0.5920 royokong/e5-v False
AfriSentiClassification 0.4568 0.5356 0.455 0.5688 tencent/KaLM-Embedding-Gemma3-12B-2511 False
AlloProfClusteringS2S.v2 0.5604 0.5636 0.3328 0.6110 microsoft/harrier-oss-v1-27b False
AlloprofReranking 0.8140 0.8177 0.6944 0.8540 Octen/Octen-Embedding-8B False
AmazonCounterfactualClassification 0.7961 0.8820 0.6965 0.9696 GeoGPT-Research-Project/GeoEmbedding False
ArXivHierarchicalClusteringP2P 0.6397 0.6492 0.5569 0.6869 NovaSearch/jasper_en_vision_language_v1 False
ArXivHierarchicalClusteringS2S 0.6396 0.6384 0.5367 0.6548 Qwen/Qwen3-Embedding-8B False
ArguAna 0.5704 0.8644 0.5436 0.8979 voyageai/voyage-3-m-exp False
ArmenianParaphrasePC 0.9482 0.9689 0.9493 0.9703 tencent/KaLM-Embedding-Gemma3-12B-2511 False
BLINKIT2IMultiChoice 0.7736 nan nan 0.7637 google/siglip-so400m-patch14-384 False
BUCC.v2 0.9899 0.9899 0.9878 0.9905 codefuse-ai/F2LLM-v2-8B False
BeijingOpera 0.9025 nan nan 0.9745 Qwen/Qwen2-Audio-7B False
BelebeleRetrieval 0.7515 0.9073 0.7791 0.9380 clips/e5-base-trm-nl False
BibleNLPBitextMining 0.1663 0.2072 0.1665 0.9899 deepvk/USER-bge-m3 False
BigPatentClustering.v2 0.4359 0.3806 0.3147 0.4578 BidirLM/BidirLM-0.6B-Embedding False
BiorxivClusteringP2P.v2 0.4424 0.5386 0.372 0.8417 codefuse-ai/F2LLM-4B False
BirdCLEF 0.3310 nan nan 0.4520 MIT/ast-finetuned-audioset-10-10-0.4593 False
BornholmBitextMining 0.5663 0.5169 0.4416 0.7798 jinaai/jina-embeddings-v5-text-small False
BrazilianToxicTweetsClassification 0.2864 0.2802 0.2123 0.3813 microsoft/harrier-oss-v1-27b False
BulgarianStoreReviewSentimentClassfication 0.7027 0.7813 0.6385 0.8159 microsoft/harrier-oss-v1-27b False
CEDRClassification 0.5487 0.5742 0.4484 0.7301 sergeyzh/BERTA False
CIFAR100ZeroShot 0.7666 nan nan 0.9109 QuanSun/EVA02-CLIP-bigE-14-plus False
CIRRIT2IRetrieval 0.1649 nan nan 0.3500 voyageai/voyage-multimodal-3 False
CLSClusteringP2P.v2 0.4427 0.4268 0.4037 0.7572 Qwen/Qwen3-Embedding-8B False
CREMADPairClassification 0.5320 nan nan 0.6887 Qwen/Qwen2-Audio-7B False
CREMA_D 0.2992 nan nan 0.7399 Qwen/Qwen2-Audio-7B False
CREMA_DClustering 0.0060 nan nan 0.3237 Qwen/Qwen2-Audio-7B False
CSFDSKMovieReviewSentimentClassification 0.4145 0.4938 0.3484 0.6790 microsoft/harrier-oss-v1-27b False
CTKFactsNLI 0.8156 0.8759 0.7984 0.8993 omarelshehy/arabic-english-sts-matryoshka False
CUB200I2IRetrieval 0.0000 nan nan 0.8624 facebook/dinov2-giant False
CVBenchCount 0.3477 nan nan 0.6256 TIGER-Lab/VLM2Vec-LoRA False
CVBenchDepth 0.5150 nan nan 0.6333 Salesforce/blip-image-captioning-large False
CVBenchDistance 0.5317 nan nan 0.5967 Salesforce/blip-image-captioning-large False
CVBenchRelation 0.6015 nan nan 0.7169 TIGER-Lab/VLM2Vec-Full False
CataloniaTweetClassification 0.4790 0.5451 0.504 0.7983 microsoft/harrier-oss-v1-27b False
ClothoT2ARetrieval 0.1205 nan nan 0.5873 microsoft/msclap-2022 False
CommonLanguageAgeDetection 0.1504 nan nan 0.2051 laion/larger_clap_general False
CommonVoiceMini21T2ARetrieval 0.6361 nan nan 0.6515 LCO-Embedding/LCO-Embedding-Omni-3B False
Core17InstructionRetrieval 0.0227 0.0769 -0.0162 0.1461 nvidia/llama-embed-nemotron-8b False
Country211 0.1168 nan nan 0.3247 google/siglip-so400m-patch14-384 False
Country211ZeroShot 0.1040 nan nan 0.3405 QuanSun/EVA02-CLIP-bigE-14-plus False
CovidRetrieval 0.8128 0.7913 0.7561 0.9606 TencentBAC/Conan-embedding-v2 False
CyrillicTurkicLangClassification 0.7040 0.9530 0.4085 0.9944 microsoft/harrier-oss-v1-27b False
CzechProductReviewSentimentClassification 0.6191 0.6816 0.5714 0.7667 Bytedance/Seed1.6-embedding-1215 False
DBpediaClassification 0.9501 0.9476 0.8828 0.9926 Qwen/Qwen3-Embedding-8B False
DTD 0.7751 nan nan 0.8114 QuanSun/EVA02-CLIP-bigE-14-plus False
DalajClassification 0.5047 0.5047 0.5001 0.6586 microsoft/harrier-oss-v1-27b False
DiaBlaBitextMining 0.8815 0.8723 0.8483 0.8882 codefuse-ai/F2LLM-v2-14B False
EstonianValenceClassification 0.4916 0.5352 0.4289 0.6764 microsoft/harrier-oss-v1-27b False
EuroSAT 0.9025 nan nan 0.9386 QuanSun/EVA02-CLIP-bigE-14-plus False
FER2013ZeroShot 0.6275 nan nan 0.5844 royokong/e5-v False
FGVCAircraftZeroShot 0.1968 nan nan 0.6025 google/siglip-so400m-patch14-384 False
FSD2019Kaggle 0.4623 nan nan 0.6413 laion/larger_clap_general False
FaroeseSTS 0.7175 0.8612 0.7239 0.9739 Gameselo/STS-multilingual-mpnet-base-v2 False
Fashion200kI2TRetrieval 0.0549 nan nan 0.2406 google/siglip-large-patch16-384 False
FilipinoShopeeReviewsClassification 0.4064 0.4845 0.3527 0.5279 microsoft/harrier-oss-v1-27b False
FinParaSTS 0.2182 0.2860 0.2492 0.3505 jinaai/jina-embeddings-v5-text-nano False
FinancialPhrasebankClassification 0.9171 0.8864 0.8394 0.9519 microsoft/harrier-oss-v1-0.6b False
FleursT2ARetrieval 0.7351 nan nan 0.7134 LCO-Embedding/LCO-Embedding-Omni-3B False
FloresBitextMining 0.6545 0.8371 0.8108 0.9087 SamilPwC-AXNode-GenAI/PwC-Embedding_expr False
Food101ZeroShot 0.7688 nan nan 0.9546 google/siglip-so400m-patch14-384 False
GTSRB 0.8644 nan nan 0.8899 QuanSun/EVA02-CLIP-bigE-14-plus False
GTZANAudioReranking 0.7476 nan nan 0.8541 OpenMuQ/MuQ-MuLan-large False
GTZANGenre 0.7410 nan nan 0.9310 Qwen/Qwen2-Audio-7B False
GermanSTSBenchmark 0.8539 0.8809 0.8408 0.9541 Gameselo/STS-multilingual-mpnet-base-v2 False
GigaSpeechT2ARetrieval 0.8305 nan nan 0.8329 LCO-Embedding/LCO-Embedding-Omni-7B False
GreekLegalCodeClassification 0.4770 0.4376 0.3713 0.8052 Bytedance/Seed1.6-embedding-1215 False
GujaratiNewsClassification 0.8849 0.9205 0.7674 0.9343 Bytedance/Seed1.6-embedding-1215 False
HALClusteringS2S.v2 0.3299 0.3200 0.2261 0.3255 microsoft/harrier-oss-v1-27b False
HagridRetrieval 0.9883 0.9931 0.9891 0.9931 google/gemini-embedding-001 False
HatefulMemesI2TRetrieval 0.8290 nan nan 0.8416 google/siglip-so400m-patch14-384 False
IEMOCAPGender 0.8538 nan nan 0.9362 laion/clap-htsat-fused False
IN22GenBitextMining 0.7777 0.9375 0.7675 0.9375 google/gemini-embedding-001 False
ImageCoDe 0.1438 nan nan 0.1520 laion/CLIP-ViT-g-14-laion2B-s34B-b88K False
ImageNetDog15Clustering 0.5619 nan nan 0.9263 facebook/dinov2-giant False
IndicCrosslingualSTS 0.4883 0.6287 0.4387 0.8477 Gameselo/STS-multilingual-mpnet-base-v2 False
IndicGenBenchFloresBitextMining 0.8969 0.9677 0.8875 0.9881 Sailesh97/Hinvec False
IndicLangClassification 0.8735 0.8769 0.2025 0.9930 Bytedance/Seed1.6-embedding-1215 False
IndonesianIdClickbaitClassification 0.5920 0.6700 0.6122 0.7560 nvidia/llama-embed-nemotron-8b False
InfoSeekIT2TRetrieval 0.1050 nan nan 0.2265 voyageai/voyage-multimodal-3 False
IsiZuluNewsClassification 0.2592 0.4053 0.3241 0.4257 microsoft/harrier-oss-v1-27b False
ItaCaseholdClassification 0.8552 0.7330 0.6679 0.9439 bigscience/sgpt-bloom-7b1-msmarco False
JSICK 0.8498 0.8499 0.7981 0.8963 Octen/Octen-Embedding-8B False
JamAltArtistA2ARetrieval 0.6692 nan nan 0.9689 laion/larger_clap_music_and_speech False
JamAltLyricA2TRetrieval 0.4475 nan nan 0.7596 LCO-Embedding/LCO-Embedding-Omni-3B False
KorHateSpeechMLClassification 0.1115 0.1769 0.1049 0.7625 Bytedance/Seed1.6-embedding-1215 False
KorSarcasmClassification 0.5851 0.6051 0.5679 0.8190 ICT-TIME-and-Querit/BOOM_4B_v1 False
KurdishSentimentClassification 0.6584 0.8639 0.7708 0.9403 Bytedance/Seed1.6-embedding-1215 False
LEMBPasskeyRetrieval 0.4800 0.3850 0.3825 1.0000 tencent/KaLM-Embedding-Gemma3-12B-2511 False
LegalBenchCorporateLobbying 0.9516 0.9598 0.8972 0.9696 voyageai/voyage-3-large False
MACST2ARetrieval 0.0840 nan nan 0.4173 microsoft/msclap-2022 False
MIRACLRetrievalHardNegatives 0.6377 0.7042 0.5923 0.7305 nvidia/llama-embed-nemotron-8b False
MInDS14 0.9107 nan nan 0.8942 LCO-Embedding/LCO-Embedding-Omni-7B False
MLQARetrieval 0.7480 0.8416 0.7566 0.8416 google/gemini-embedding-001 False
MacedonianTweetSentimentClassification 0.6662 0.7183 0.6192 0.7547 Qwen/Qwen3-Embedding-4B False
MalteseNewsClassification 0.3305 0.3738 0.2395 0.6938 Bytedance/Seed1.6-embedding-1215 False
MasakhaNEWSClassification 0.7645 0.8355 0.7754 0.9009 Bytedance/Seed1.6-embedding-1215 False
MasakhaNEWSClusteringS2S 0.5418 0.5745 0.3804 0.7365 Bytedance/Seed1.6-embedding-1215 False
MassiveIntentClassification 0.6942 0.8192 0.6025 0.9194 voyageai/voyage-3-m-exp False
MedrxivClusteringP2P.v2 0.4358 0.4716 0.3431 0.7199 codefuse-ai/F2LLM-4B False
MridinghamTonic 0.4535 nan nan 0.6117 Qwen/Qwen2-Audio-7B False
MultiEURLEXMultilabelClassification 0.0501 0.0528 0.0516 0.0968 Bytedance/Seed1.6-embedding-1215 False
MultiHateClassification 0.7086 0.7247 0.6357 0.8621 microsoft/harrier-oss-v1-27b False
NIGHTSI2IRetrieval 0.2392 nan nan 0.2646 QuanSun/EVA02-CLIP-bigE-14-plus False
NMSQAPairClassification 0.9599 nan nan 0.9760 LCO-Embedding/LCO-Embedding-Omni-7B False
NTREXBitextMining 0.8200 0.9364 0.914 0.9592 microsoft/harrier-oss-v1-27b False
NepaliNewsClassification 0.9609 0.9814 0.8847 0.9953 jinaai/jina-embeddings-v5-text-small False
News21InstructionRetrieval 0.0026 0.1026 -0.0006 0.1145 google/embeddinggemma-300m False
NollySentiBitextMining 0.4114 0.6871 0.675 0.8376 microsoft/harrier-oss-v1-27b False
NordicLangClassification 0.6854 0.8597 0.8015 0.9578 microsoft/harrier-oss-v1-27b False
NorwegianCourtsBitextMining 0.9320 0.9342 0.9404 0.9481 jinaai/jina-embeddings-v5-text-nano False
NusaParagraphEmotionClassification 0.5376 0.5638 0.4166 0.8374 Bytedance/Seed1.6-embedding-1215 False
NusaTranslationBitextMining 0.7731 0.7752 0.672 0.9222 Qwen/Qwen3-Embedding-8B False
NusaX-senti 0.7181 0.8031 0.7055 0.8482 Bytedance/Seed1.6-embedding-1215 False
NusaXBitextMining 0.7998 0.8252 0.7267 0.9056 Bytedance/Seed1.6-embedding-1215 False
OVENIT2TRetrieval 0.1237 nan nan 0.1640 voyageai/voyage-multimodal-3 False
OdiaNewsClassification 0.8233 0.9184 0.8001 0.9779 microsoft/harrier-oss-v1-27b False
OpusparcusPC 0.9543 0.9662 0.9451 0.9698 microsoft/harrier-oss-v1-27b False
OxfordPets 0.8528 nan nan 0.9509 google/siglip-large-patch16-384 False
OxfordPetsZeroShot 0.5129 nan nan 0.9678 google/siglip-large-patch16-384 False
PAC 0.6502 0.7168 0.7033 0.8811 Bytedance/Seed1.6-embedding-1215 False
PatchCamelyon 0.7265 nan nan 0.7731 google/siglip-large-patch16-384 False
PawsXPairClassification 0.5617 0.5999 0.5507 0.7557 Bytedance/Seed1.6-embedding-1215 False
PlscClusteringP2P.v2 0.7447 0.7431 0.7161 0.7542 tencent/KaLM-Embedding-Gemma3-12B-2511 False
PoemSentimentClassification 0.6588 0.5966 0.5067 0.8642 Bytedance/Seed1.6-embedding-1215 False
PolEmo2.0-OUT 0.7287 0.7753 0.3648 0.8063 microsoft/harrier-oss-v1-27b False
PpcPC 0.9364 0.9550 0.9116 0.9576 microsoft/harrier-oss-v1-27b False
PunjabiNewsClassification 0.7962 0.8261 0.807 0.8879 Bytedance/Seed1.6-embedding-1215 False
RESISC45 0.9030 nan nan 0.9277 QuanSun/EVA02-CLIP-bigE-14-plus False
RP2kI2IRetrieval 0.9998 nan nan 0.7277 nyu-visionx/moco-v3-vit-b False
RTE3 0.8866 0.8955 0.8752 0.9173 Bytedance/Seed1.6-embedding-1215 False
RavdessZeroshot 0.3424 nan nan 0.3167 LCO-Embedding/LCO-Embedding-Omni-7B False
Robust04InstructionRetrieval 0.0038 -0.0241 -0.0748 0.1244 Qwen/Qwen3-Embedding-4B False
RomaniBibleClustering 0.4063 0.4322 0.4092 0.4658 microsoft/harrier-oss-v1-27b False
RuBQReranking 0.7511 0.7384 0.756 0.8051 ai-sage/Giga-Embeddings-instruct False
SCIDOCS 0.2332 0.2515 0.1745 0.5986 IEITYuan/Yuan-embedding-2.0-en False
SIB200ClusteringS2S 0.4260 0.4174 0.3945 0.7929 codefuse-ai/F2LLM-v2-14B False
SIBFLEURS 0.4753 nan nan 0.4739 LCO-Embedding/LCO-Embedding-Omni-7B False
SICK-R 0.8125 0.8275 0.8023 0.9465 Gameselo/STS-multilingual-mpnet-base-v2 False
STS12 0.8270 0.8155 0.8002 0.9546 Gameselo/STS-multilingual-mpnet-base-v2 False
STS13 0.8816 0.8989 0.8155 0.9776 Gameselo/STS-multilingual-mpnet-base-v2 False
STS13VisualSTS 0.6206 nan nan 0.8160 voyageai/voyage-multimodal-3 False
STS14 0.8493 0.8541 0.7772 0.9753 Gameselo/STS-multilingual-mpnet-base-v2 False
STS15 0.9002 0.9044 0.8931 0.9811 Gameselo/STS-multilingual-mpnet-base-v2 False
STS15VisualSTS 0.7368 nan nan 0.8685 voyageai/voyage-multimodal-3 False
STS17 0.8603 0.8858 0.8214 0.9342 infgrad/Jasper-Token-Compression-600M False
STS17MultilingualVisualSTS 0.3886 nan nan 0.6779 voyageai/voyage-multimodal-3 False
STS22.v2 0.7155 0.7169 0.643 0.7718 Kingsoft-LLM/QZhou-Embedding False
STSB 0.8355 0.8550 0.8236 0.9199 Gameselo/STS-multilingual-mpnet-base-v2 False
STSBenchmark 0.8821 0.8908 0.8729 0.9504 Kingsoft-LLM/QZhou-Embedding False
STSBenchmarkMultilingualVisualSTS 0.5583 nan nan 0.7567 voyageai/voyage-multimodal-3 False
STSES 0.7815 0.8175 0.8021 0.8231 google/embeddinggemma-300m False
SUN397 0.7588 nan nan 0.8052 QuanSun/EVA02-CLIP-bigE-14-plus False
ScalaClassification 0.5279 0.5185 0.5109 0.9112 microsoft/harrier-oss-v1-27b False
SemRel24STS 0.6441 0.7314 0.6266 0.8112 VPLabs/SearchMap_Preview False
SentimentAnalysisHindi 0.7405 0.7606 0.642 0.8070 microsoft/harrier-oss-v1-27b False
SinhalaNewsClassification 0.5108 0.8229 0.6682 0.8591 microsoft/harrier-oss-v1-27b False
SiswatiNewsClassification 0.7325 0.6238 0.535 0.7837 Lajavaness/bilingual-embedding-small False
SlovakMovieReviewSentimentClassification 0.8614 0.9035 0.7441 0.9616 microsoft/harrier-oss-v1-27b False
SpartQA 0.0466 0.1030 0.0565 0.8769 microsoft/harrier-oss-v1-27b False
SpeechCommandsZeroshotv0.02 0.9730 nan nan 0.9742 LCO-Embedding/LCO-Embedding-Omni-7B False
SpokenSQuADT2ARetrieval 0.7433 nan nan 0.7400 LCO-Embedding/LCO-Embedding-Omni-7B False
SprintDuplicateQuestions 0.9706 0.9690 0.9314 0.9838 Kingsoft-LLM/QZhou-Embedding False
StackExchangeClustering.v2 0.6909 0.9207 0.4643 0.9207 google/gemini-embedding-001 False
StackOverflowQA 0.9411 0.9671 0.8889 0.9749 codefuse-ai/F2LLM-v2-14B False
StanfordCarsZeroShot 0.6132 nan nan 0.9463 google/siglip-so400m-patch14-384 False
StatcanDialogueDatasetRetrieval 0.3847 0.5111 0.1063 0.5807 jinaai/jina-embeddings-v4 False
SwahiliNewsClassification 0.5938 0.6605 0.5969 0.7066 codefuse-ai/F2LLM-v2-4B False
SwednClusteringP2P 0.4444 0.4584 0.3691 0.6213 Qwen/Qwen3-Embedding-4B False
SwissJudgementClassification 0.5389 0.5786 0.5362 0.7958 microsoft/harrier-oss-v1-27b False
T2Reranking 0.6701 0.6795 0.6632 0.7315 tencent/Youtu-Embedding False
TERRa 0.5834 0.6392 0.5842 0.7957 ai-sage/Giga-Embeddings-instruct False
TRECCOVID 0.8089 0.8631 0.7115 0.9833 IEITYuan/Yuan-embedding-2.0-en False
Tatoeba 0.7279 0.8197 0.7573 0.9659 SamilPwC-AXNode-GenAI/PwC-Embedding_expr False
TempReasonL1 0.0165 0.0296 0.0114 0.4184 microsoft/harrier-oss-v1-27b False
TinyImageNetClustering 0.7617 nan nan 0.8358 QuanSun/EVA02-CLIP-bigE-14 False
ToxicConversationsClassification 0.7001 0.8875 0.6601 0.9759 voyageai/voyage-3-m-exp False
TswanaNewsClassification 0.3273 0.5337 0.47 0.6417 Bytedance/Seed1.6-embedding-1215 False
TweetTopicSingleClassification 0.7815 0.7111 0.6532 0.8631 jinaai/jina-embeddings-v5-text-small False
TwitterHjerneRetrieval 0.7563 0.9802 0.3522 0.9802 google/gemini-embedding-001 False
TwitterURLCorpus 0.8731 0.8705 0.8583 0.9571 TencentBAC/Conan-embedding-v2 False
UrbanSound8KT2ARetrieval 0.0065 nan nan 0.0098 laion/clap-htsat-unfused False
VQA2IT2TRetrieval 0.0439 nan nan 0.1701 TIGER-Lab/VLM2Vec-LoRA False
VehicleSoundClustering 0.0511 nan nan 0.1337 MIT/ast-finetuned-audioset-10-10-0.4593 False
VidoreDocVQARetrieval 0.4503 nan nan 0.6868 webAI-Official/webAI-ColVec1-9b False
VidoreInfoVQARetrieval 0.7946 nan nan 0.9518 webAI-Official/webAI-ColVec1-9b False
VidoreShiftProjectRetrieval 0.5543 nan nan 0.9330 nvidia/nemotron-colembed-vl-8b-v2 False
VidoreSyntheticDocQAAIRetrieval 0.8666 nan nan 1.0000 nvidia/llama-nemotron-colembed-vl-3b-v2 False
VidoreTabfquadRetrieval 0.7423 nan nan 0.9805 nvidia/nemotron-colembed-vl-4b-v2 False
VidoreTatdqaRetrieval 0.4988 nan nan 0.8404 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 False
VisualNewsI2TRetrieval 0.0702 nan nan 0.4636 QuanSun/EVA02-CLIP-bigE-14-plus False
VisualSTS-b-Multilingual 0.5245 nan nan 0.7305 voyageai/voyage-multimodal-3 False
VoxCelebSA 0.4949 nan nan 0.4897 LCO-Embedding/LCO-Embedding-Omni-3B False
VoxPopuliAccentPairClassification 0.5105 nan nan 0.5540 openai/whisper-medium False
VoxPopuliGenderClustering 0.1285 nan nan 0.5268 laion/clap-htsat-fused False
VoxPopuliLanguageID 0.9620 nan nan 0.9940 speechbrain/m-ctc-t-large False
VoyageMMarcoReranking 0.6740 0.6673 0.6821 0.8366 codefuse-ai/F2LLM-v2-14B False
WITT2IRetrieval 0.5003 nan nan 0.4996 royokong/e5-v False
WebLINXCandidatesReranking 0.1159 0.1097 0.0778 0.2658 Querit/Querit False
WebQAT2ITRetrieval 0.5941 nan nan 0.6564 voyageai/voyage-multimodal-3 False
WikiCitiesClustering 0.7470 0.9163 0.755 0.9500 microsoft/harrier-oss-v1-27b False
WikiClusteringP2P.v2 0.3128 0.2823 0.256 0.3319 microsoft/harrier-oss-v1-27b False
WikipediaRerankingMultilingual 0.8580 0.9224 0.8981 0.9308 jinaai/jina-reranker-v3 False
WikipediaRetrievalMultilingual 0.8750 0.9420 0.9111 0.9420 google/gemini-embedding-001 False
WinoGrande 0.5798 0.6052 0.5498 0.9314 microsoft/harrier-oss-v1-27b False
Winoground 0.0950 nan nan 0.1350 Salesforce/blip-itm-large-flickr False
XM3600T2IRetrieval 0.6391 nan nan 0.6659 royokong/e5-v False
XNLI 0.7685 0.8526 0.7477 0.9291 Bytedance/Seed1.6-embedding-1215 False
indonli 0.5733 0.6069 0.5174 0.6722 Bytedance/Seed1.6-embedding-1215 False
Average 0.5924 0.6837 0.5861 0.7497 nan -

Model have high performance on these tasks: MInDS14,BLINKIT2IMultiChoice,SpokenSQuADT2ARetrieval,RP2kI2IRetrieval,FleursT2ARetrieval,FER2013ZeroShot,AROFlickrOrder,AROCocoOrder,WITT2IRetrieval,VoxCelebSA,SIBFLEURS,HALClusteringS2S.v2,RavdessZeroshot


@Samoed
Copy link
Copy Markdown
Member

Samoed commented Apr 19, 2026

I generated scores without VisualSTS17Multilingual. I'll check why it fails because of this later

@KennethEnevoldsen KennethEnevoldsen merged commit 1d0d085 into embeddings-benchmark:main Apr 21, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants