Dmitrii Kuzmin 1kkiRen

Dmitrii Kuzmin — NLP/ML Engineer & Researcher

I research and build language-model systems: tokenizer adaptation, multilingual LLM evaluation, agentic workflows, and production NLP pipelines. My current work focuses on making LLMs more reliable, efficient, and easier to adapt across languages.

Jump to what interests you

Dmitrii Kuzmin — NLP/ML Engineer & Researcher

Quick highlights

Research Intern at Mohamed bin Zayed University of Artificial Intelligence (Jun 2025 – present), studying alternative tokenization methods and language adaptation for LLMs.
Middle NLP Engineer / Lead NLP Researcher at DeepPavlov.ai (May 2025 – present), working on LLM evaluation, agentic systems, uncertainty estimation, and reasoning reliability.
Previously worked with Center for Applied AI (Skolkovo), Higher School of Economics, Moscow Aviation Institute, and Innopolis University on multimodal fine-tuning, Russian LLM adaptation, BERT models, and NLP services.
Maintainer of open-source tokenizer and embedding tooling published on PyPI.

What I work with

NLP & Deep Learning Stack

DevOps & Tooling

Backend & Communication

Languages & soft skills

English (C1)
Russian (native)
Flexibility · Responsibility · Enthusiasm

Recent experience

Research Intern · MBZUAI — Abu Dhabi, UAE (Jun 2025 – present)

Lead research on tokenizer adaptation and language-specific LLM fine-tuning strategies.
Research alternative tokenization strategies and language adaptation methods for LLMs.
Evaluate tokenizer-driven quality and efficiency tradeoffs and prepare publication-ready papers.

Middle NLP Engineer / Lead NLP Researcher · DeepPavlov.ai — Moscow, Russia (May 2025 – present)

Lead R&D around LLM evaluation, agentic systems, uncertainty estimation, and reasoning reliability.
Build benchmarking workflows and run comparative testing across diverse GPU infrastructures.

Middle NLP Engineer · Center for Applied AI, Skolkovo — Moscow, Russia (Feb 2025 – May 2025)

Tuned the Qwen2.5-VL model and built supporting pipelines.
Designed prompting strategies to generate actionable feedback on heterogeneous specifications.

NLP Researcher · Higher School of Economics — Moscow, Russia (Jun 2024 – May 2025)

Fine-tuned Llama3-8B-Instruct for Russian-language tasks.
Developed a Russian BPE tokenizer and tooling to manipulate existing tokenizer vocabularies safely.
Built a grammar benchmark suite to quantify improvements across downstream tasks.

ML / Backend Engineer · Moscow Aviation Institute — Moscow, Russia (Jul 2023 – Oct 2023)

Delivered a sentence theme classifier and optimized database queries.
Integrated Telegram-based interfaces for model delivery.

NLP Engineer · Innopolis University — Innopolis, Russia (Jun 2023 – Jul 2023)

Developed a deep-learning sentiment model for YouTube comments.
Fine-tuned BERT for domain-specific tone classification.

Publications & research

Mitigating the Impact of Glitch Tokens via Targeted Retokenization — EMNLP 2026 (under review)

Researcher & writer, 2025. Studies how glitch-token handling and tokenizer behavior affect LLM generation quality.

TokenSubstitution: Cost-Efficient Method of Language Adaptation Based on Token "Trained-ness" — EMNLP 2026 (in progress)

Proposes a cost-efficient method for adapting LLM generation quality to a target language.

A Multi-Aspect Evaluation of Tokenizer Adaptation Methods for Large Language Models on Russian — AI Journey 2025 (accepted)

Demonstrates tokenizer adaptation as a cost-effective technique by analyzing text quality and token efficiency across diverse benchmarks.

Open-source projects

TokenizerChanger — modify tokenizers

Python library for modifying Hugging Face tokenizers.
PyPI · GitHub
pip install TokenizerChanger

EmbeddingsDivision — adapt LLM embeddings

Python library for separating and adapting LLM embedding layers.
PyPI · GitHub
pip install embdiv

CRUD Calendar LLM Chatbot — Telegram/FastAPI assistant

Features: calendar CRUD, summarise latest news, voice reminders.
Stack: Telegram Bot API, FastAPI, RAG pipeline with Qwen2.5-VL.

Education

Innopolis University — B.S. in Data Analysis & Artificial Intelligence (2022 – 2026)
Key coursework: Software Systems Analysis and Design, Human-AI Interaction, Mathematical Analysis.

Beyond work

Tutor for first-year students at Innopolis University (Sep 2023 – Jan 2024), helping newcomers acclimate and organizing community events.
Always exploring ways to make LLM tooling more accessible and efficient.

Let’s connect

Portfolio: 1kkiren.ru
Email: 1kkiren@mail.ru
Profiles: GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly