grpotrainer

Star

Here are 9 public repositories matching this topic...

GAD-cell / vlm-grpo

Star

An implementation of GRPO for Unsloth's VLMs training

reinforcement-learning vlm huggingface trl unsloth grpo grpotrainer

Updated Aug 7, 2025
Python

The-Swarm-Corporation / PARL

Star

PARL (Parallel-Agent Reinforcement Learning) is a training paradigm that teaches models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously.

reinforcement-learning ai ml multi-agent rl swarms agents kimi rlhf agentic deepseek grpo grpotrainer subagents moonshotai

Updated Mar 24, 2026
Python

uw-nsl / TinyV

Star

Your efficient and accurate answer verification system for RL training.

rl academic-project llm grpo grpotrainer

Updated Jun 23, 2025
Python

yflyzhang / simpleR1

Star

simpleR1: A Simple Framework for Training R1-like Models

reinforcement-learning ppo trl deepseek-r1 grpo r1-zero grpotrainer reinforcement-learning-with-verifiable-rewards

Updated Aug 12, 2025
Python

sdiehl / tiny-r1

Star

Recreating the minimal training methods of DeepSeek-R1 for small langauge models.

reasoning r1 grpo grpotrainer

Updated Feb 10, 2025
Python

teilomillet / jiki

Star

interface mcp model-context-protocol mcp-client grpo grpotrainer

Updated May 8, 2025
Python

SparkSupernova / NovaLiveSystem-Showcase-PUBLIC

Sponsor

Star

Public showcase of NovaLiveSystem: a biomimetic cognitive architecture with interoception and distributed intelligence.

computer-science ai computer-vision ml developer-tools adaptive-learning autonomous-agents cognitive-systems human-centered-ai sota-technique ai-research-and-development grpotrainer biomimetic-ai

Updated Mar 5, 2026

phrugsa-limbunlom / vlm-grpo

Star

Post-training VLMs with GRPO from TRL library

grpo grpotrainer

Updated Oct 28, 2025
Python

RohanThawait / qwen2.5-7b-math-reasoning-grpo

Star

This project implements the reasoning training pipeline introduced in the DeepSeek-R1 paper, applying Group Relative Policy Optimization (GRPO) to teach a 7B language model to reason step-by-step through mathematical problems.

llm supervised-finetuning finetuning-llms qwen2-5 grpotrainer

Updated May 13, 2026
Python

Improve this page

Add a description, image, and links to the grpotrainer topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the grpotrainer topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpotrainer

Here are 9 public repositories matching this topic...

GAD-cell / vlm-grpo

The-Swarm-Corporation / PARL

uw-nsl / TinyV

yflyzhang / simpleR1

sdiehl / tiny-r1

teilomillet / jiki

SparkSupernova / NovaLiveSystem-Showcase-PUBLIC

phrugsa-limbunlom / vlm-grpo

RohanThawait / qwen2.5-7b-math-reasoning-grpo

Improve this page

Add this topic to your repo