Files
2025-10-22 15:38:32 +08:00
..
2025-10-22 15:38:32 +08:00
2025-10-22 15:38:32 +08:00
2025-10-22 15:38:32 +08:00
2025-10-22 15:38:32 +08:00
2025-10-22 15:38:32 +08:00
2025-10-22 15:38:32 +08:00
2025-10-22 15:38:32 +08:00
2025-10-22 15:38:32 +08:00

Screencoder Train: MLLM SFT + RL Training Stack

This repo combines supervised fine-tuning (SFT) via LLaMA-Factory and reinforcement learning (GRPO) via VLM-R1, plus a minimal vLLM serving setup.

Contents

  • LLaMA-Factory/: general SFT for LLM/VLM and evaluation
  • VLM-R1/: RL training (GRPO) for VLM tasks like REC/OVD/Math
  • vllm/: minimal scripts for OpenAI-style API with vLLM
  • conda_envs/: example conda env YAMLs and optional env archives
  • scripts/: convenience wrappers

Quickstart

  1. Optional bootstrap
bash scripts/bootstrap_envs.sh         # installs rl, vllm, data, sft
  1. Environments
# Option A: from YAMLs (recommended to customize as needed)
conda env create -f conda_envs/sft_env.yml
conda env create -f conda_envs/rl_env.yml
# Optional: a separate env for vLLM
conda create -n vllm python=3.10 -y
  1. Data
  • SFT: see LLaMA-Factory/data/get_data.sh and LLaMA-Factory/data/dataset_info.json.
  • RL: follow dataset links in VLM-R1/README.md (COCO, RefCOCO/+/g, LISA-Grounding, etc.).
  1. Train
# SFT (LoRA example)
conda activate sft
bash scripts/run_sft.sh LLaMA-Factory/examples/train_lora/llama3_lora_sft.yaml

# RL (REC GRPO example)
conda activate rl
bash scripts/run_rl.sh VLM-R1/run_scripts/run_grpo_rec.sh
  1. Inference (OpenAI-style API via vLLM)
conda activate vllm
bash scripts/run_vllm.sh LLaMA-Factory/examples/inference/llama3_vllm.yaml 8000
  • Test clients in LLaMA-Factory/scripts/api_example/.

Notes

  • Upstream project docs remain authoritative:
    • LLaMA-Factory: see its README.md and examples/ for many model/task recipes.
    • VLM-R1: see its README.md and run_scripts/ for GRPO variants, multi-node, LoRA.
  • Large assets (data, checkpoints, env tar parts) are ignored via .gitignore by default.

License

  • Each subproject keeps its own license. Follow model/checkpoint licenses accordingly.

Acknowledgements