I research and build language-model systems: tokenizer adaptation, multilingual LLM evaluation, agentic workflows, and production NLP pipelines. My current work focuses on making LLMs more reliable, efficient, and easier to adapt across languages.
- Research Intern at Mohamed bin Zayed University of Artificial Intelligence (Jun 2025 – present), studying alternative tokenization methods and language adaptation for LLMs.
- Middle NLP Engineer / Lead NLP Researcher at DeepPavlov.ai (May 2025 – present), working on LLM evaluation, agentic systems, uncertainty estimation, and reasoning reliability.
- Previously worked with Center for Applied AI (Skolkovo), Higher School of Economics, Moscow Aviation Institute, and Innopolis University on multimodal fine-tuning, Russian LLM adaptation, BERT models, and NLP services.
- Maintainer of open-source tokenizer and embedding tooling published on PyPI.
Languages & soft skills
- English (C1)
- Russian (native)
- Flexibility · Responsibility · Enthusiasm
Research Intern · MBZUAI — Abu Dhabi, UAE (Jun 2025 – present)
- Lead research on tokenizer adaptation and language-specific LLM fine-tuning strategies.
- Research alternative tokenization strategies and language adaptation methods for LLMs.
- Evaluate tokenizer-driven quality and efficiency tradeoffs and prepare publication-ready papers.
Middle NLP Engineer / Lead NLP Researcher · DeepPavlov.ai — Moscow, Russia (May 2025 – present)
- Lead R&D around LLM evaluation, agentic systems, uncertainty estimation, and reasoning reliability.
- Build benchmarking workflows and run comparative testing across diverse GPU infrastructures.
Middle NLP Engineer · Center for Applied AI, Skolkovo — Moscow, Russia (Feb 2025 – May 2025)
- Tuned the Qwen2.5-VL model and built supporting pipelines.
- Designed prompting strategies to generate actionable feedback on heterogeneous specifications.
NLP Researcher · Higher School of Economics — Moscow, Russia (Jun 2024 – May 2025)
- Fine-tuned Llama3-8B-Instruct for Russian-language tasks.
- Developed a Russian BPE tokenizer and tooling to manipulate existing tokenizer vocabularies safely.
- Built a grammar benchmark suite to quantify improvements across downstream tasks.
ML / Backend Engineer · Moscow Aviation Institute — Moscow, Russia (Jul 2023 – Oct 2023)
- Delivered a sentence theme classifier and optimized database queries.
- Integrated Telegram-based interfaces for model delivery.
NLP Engineer · Innopolis University — Innopolis, Russia (Jun 2023 – Jul 2023)
- Developed a deep-learning sentiment model for YouTube comments.
- Fine-tuned BERT for domain-specific tone classification.
Mitigating the Impact of Glitch Tokens via Targeted Retokenization — EMNLP 2026 (under review)
Researcher & writer, 2025. Studies how glitch-token handling and tokenizer behavior affect LLM generation quality.
TokenSubstitution: Cost-Efficient Method of Language Adaptation Based on Token "Trained-ness" — EMNLP 2026 (in progress)
Proposes a cost-efficient method for adapting LLM generation quality to a target language.
A Multi-Aspect Evaluation of Tokenizer Adaptation Methods for Large Language Models on Russian — AI Journey 2025 (accepted)
Demonstrates tokenizer adaptation as a cost-effective technique by analyzing text quality and token efficiency across diverse benchmarks.
TokenizerChanger — modify tokenizers
EmbeddingsDivision — adapt LLM embeddings
CRUD Calendar LLM Chatbot — Telegram/FastAPI assistant
- Features: calendar CRUD, summarise latest news, voice reminders.
- Stack: Telegram Bot API, FastAPI, RAG pipeline with Qwen2.5-VL.
Innopolis University — B.S. in Data Analysis & Artificial Intelligence (2022 – 2026)
Key coursework: Software Systems Analysis and Design, Human-AI Interaction, Mathematical Analysis.
- Tutor for first-year students at Innopolis University (Sep 2023 – Jan 2024), helping newcomers acclimate and organizing community events.
- Always exploring ways to make LLM tooling more accessible and efficient.
- Portfolio: 1kkiren.ru
- Email: 1kkiren@mail.ru
- Profiles: GitHub



