diff --git a/post-training/.gitattributes b/post-training/.gitattributes
new file mode 100644
index 0000000..fdeae08
--- /dev/null
+++ b/post-training/.gitattributes
@@ -0,0 +1,3 @@
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+rl.tar.gz* filter=lfs diff=lfs merge=lfs -text
+vllm.tar.gz* filter=lfs diff=lfs merge=lfs -text
diff --git a/post-training/.gitignore b/post-training/.gitignore
new file mode 100644
index 0000000..27f2bb0
--- /dev/null
+++ b/post-training/.gitignore
@@ -0,0 +1,56 @@
+# OS / IDE
+.DS_Store
+.idea/
+.vscode/
+*.swp
+
+# Python
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.ipynb_checkpoints/
+.venv/
+venv/
+.env
+.env.*
+
+# Logs & outputs
+logs/
+output/
+outputs/
+runs/
+wandb/
+*.log
+
+# Data & checkpoints (large files)
+data/
+checkpoints/
+experiments/
+*.ckpt
+*.safetensors
+*.pt
+*.bin
+
+# HuggingFace / caches
+hf_cache/
+ms_cache/
+om_cache/
+**/.cache/
+
+# Conda env archives
+conda_envs/*.tar.gz
+conda_envs/*.tar.gz.part.*
+
+# LLaMA-Factory artifacts
+LLaMA-Factory/output/
+LLaMA-Factory/saves/
+LLaMA-Factory/.cache/
+
+# VLM-R1 artifacts
+VLM-R1/output/
+VLM-R1/.cache/
+
+# vLLM
+vllm/*.json
+vllm/*.log
diff --git a/post-training/LLaMA-Factory/.dockerignore b/post-training/LLaMA-Factory/.dockerignore
new file mode 100644
index 0000000..bc56ab8
--- /dev/null
+++ b/post-training/LLaMA-Factory/.dockerignore
@@ -0,0 +1,15 @@
+.vscode
+.git
+.github
+.venv
+cache
+data
+docker
+saves
+hf_cache
+ms_cache
+om_cache
+output
+.dockerignore
+.gitattributes
+.gitignore
diff --git a/post-training/LLaMA-Factory/.gitattributes b/post-training/LLaMA-Factory/.gitattributes
new file mode 100644
index 0000000..dfe0770
--- /dev/null
+++ b/post-training/LLaMA-Factory/.gitattributes
@@ -0,0 +1,2 @@
+# Auto detect text files and perform LF normalization
+* text=auto
diff --git a/post-training/LLaMA-Factory/.github/CODE_OF_CONDUCT.md b/post-training/LLaMA-Factory/.github/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000..c2035ce
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/CODE_OF_CONDUCT.md
@@ -0,0 +1,128 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, religion, or sexual identity
+and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+ and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the
+ overall community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or
+ advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email
+ address, without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+ professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+`hoshihiyouga AT gmail DOT com`.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series
+of actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or
+permanent ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within
+the community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.0, available at
+https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+
+Community Impact Guidelines were inspired by [Mozilla's code of conduct
+enforcement ladder](https://github.com/mozilla/diversity).
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see the FAQ at
+https://www.contributor-covenant.org/faq. Translations are available at
+https://www.contributor-covenant.org/translations.
diff --git a/post-training/LLaMA-Factory/.github/CONTRIBUTING.md b/post-training/LLaMA-Factory/.github/CONTRIBUTING.md
new file mode 100644
index 0000000..507d666
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/CONTRIBUTING.md
@@ -0,0 +1,67 @@
+# Contributing to LLaMA Factory
+
+Everyone is welcome to contribute, and we value everybody's contribution. Code contributions are not the only way to help the community. Answering questions, helping others, and improving the documentation are also immensely valuable.
+
+It also helps us if you spread the word! Reference the library in blog posts about the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply ⭐️ the repository to say thank you.
+
+However you choose to contribute, please be mindful and respect our [code of conduct](CODE_OF_CONDUCT.md).
+
+**This guide was heavily inspired by [transformers guide to contributing](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md).**
+
+## Ways to contribute
+
+There are several ways you can contribute to LLaMA Factory:
+
+* Fix outstanding issues with the existing code.
+* Submit issues related to bugs or desired new features.
+* Contribute to the examples or to the documentation.
+
+### Style guide
+
+LLaMA Factory follows the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html), check it for details.
+
+### Create a Pull Request
+
+1. Fork the [repository](https://github.com/hiyouga/LLaMA-Factory) by clicking on the [Fork](https://github.com/hiyouga/LLaMA-Factory/fork) button on the repository's page. This creates a copy of the code under your GitHub user account.
+
+2. Clone your fork to your local disk, and add the base repository as a remote:
+
+```bash
+git clone git@github.com:[username]/LLaMA-Factory.git
+cd LLaMA-Factory
+git remote add upstream https://github.com/hiyouga/LLaMA-Factory.git
+```
+
+3. Create a new branch to hold your development changes:
+
+```bash
+git checkout -b dev_your_branch
+```
+
+4. Set up a development environment by running the following command in a virtual environment:
+
+```bash
+pip install -e ".[dev]"
+```
+
+If LLaMA Factory was already installed in the virtual environment, remove it with `pip uninstall llamafactory` before reinstalling it in editable mode with the -e flag.
+
+5. Check code before commit:
+
+```bash
+make commit
+make style && make quality
+make test
+```
+
+6. Submit changes:
+
+```bash
+git add .
+git commit -m "commit message"
+git fetch upstream
+git rebase upstream/main
+git push -u origin dev_your_branch
+```
+
+7. Create a merge request from your branch `dev_your_branch` at [origin repo](https://github.com/hiyouga/LLaMA-Factory).
diff --git a/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/1-bug-report.yml b/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/1-bug-report.yml
new file mode 100644
index 0000000..a08596f
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/1-bug-report.yml
@@ -0,0 +1,61 @@
+name: "\U0001F41B Bug / help"
+description: Create a report to help us improve the LLaMA Factory
+labels: ["bug", "pending"]
+body:
+ - type: markdown
+ attributes:
+ value: |
+ Issues included in **[FAQs](https://github.com/hiyouga/LLaMA-Factory/issues/4614)** or those with **insufficient** information may be closed without a response.
+ 已经包含在 **[常见问题](https://github.com/hiyouga/LLaMA-Factory/issues/4614)** 内或提供信息**不完整**的 issues 可能不会被回复。
+
+ - type: markdown
+ attributes:
+ value: |
+ Please do not create issues that are not related to framework bugs under this category, use **[Discussions](https://github.com/hiyouga/LLaMA-Factory/discussions/categories/q-a)** instead.
+ 请勿在此分类下创建和框架 bug 无关的 issues,训练问题求助请使用 **[讨论区](https://github.com/hiyouga/LLaMA-Factory/discussions/categories/q-a)**。
+
+ - type: checkboxes
+ id: reminder
+ attributes:
+ label: Reminder
+ description: |
+ Please ensure you have read the above rules carefully and searched the existing issues (including FAQs).
+ 请确保您已经认真阅读了上述规则并且搜索过现有的 issues(包括常见问题)。
+
+ options:
+ - label: I have read the above rules and searched the existing issues.
+ required: true
+
+ - type: textarea
+ id: system-info
+ validations:
+ required: true
+ attributes:
+ label: System Info
+ description: |
+ Please share your system info with us. You can run the command **llamafactory-cli env** and copy-paste its output below.
+ 请提供您的系统信息。您可以在命令行运行 **llamafactory-cli env** 并将其输出复制到该文本框中。
+
+ placeholder: llamafactory version, platform, python version, ...
+
+ - type: textarea
+ id: reproduction
+ validations:
+ required: true
+ attributes:
+ label: Reproduction
+ description: |
+ Please provide entry arguments, error messages and stack traces that reproduces the problem.
+ 请提供入口参数,错误日志以及异常堆栈以便于我们复现问题。
+
+ value: |
+ ```text
+ Put your message here.
+ ```
+
+ - type: textarea
+ id: others
+ validations:
+ required: false
+ attributes:
+ label: Others
diff --git a/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/2-feature-request.yml b/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/2-feature-request.yml
new file mode 100644
index 0000000..5d72271
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/2-feature-request.yml
@@ -0,0 +1,41 @@
+name: "\U0001F680 Feature request"
+description: Submit a request for a new feature
+labels: ["enhancement", "pending"]
+body:
+ - type: markdown
+ attributes:
+ value: |
+ Please do not create issues that are not related to new features under this category.
+ 请勿在此分类下创建和新特性无关的 issues。
+
+ - type: checkboxes
+ id: reminder
+ attributes:
+ label: Reminder
+ description: |
+ Please ensure you have read the above rules carefully and searched the existing issues.
+ 请确保您已经认真阅读了上述规则并且搜索过现有的 issues。
+
+ options:
+ - label: I have read the above rules and searched the existing issues.
+ required: true
+
+ - type: textarea
+ id: description
+ validations:
+ required: true
+ attributes:
+ label: Description
+ description: |
+ A clear and concise description of the feature proposal.
+ 请详细描述您希望加入的新功能特性。
+
+ - type: textarea
+ id: contribution
+ validations:
+ required: false
+ attributes:
+ label: Pull Request
+ description: |
+ Have you already created the relevant PR and submitted the code?
+ 您是否已经创建了相关 PR 并提交了代码?
diff --git a/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/config.yml b/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 0000000..3ba13e0
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1 @@
+blank_issues_enabled: false
diff --git a/post-training/LLaMA-Factory/.github/PULL_REQUEST_TEMPLATE.md b/post-training/LLaMA-Factory/.github/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index 0000000..d23d6be
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,8 @@
+# What does this PR do?
+
+Fixes # (issue)
+
+## Before submitting
+
+- [ ] Did you read the [contributor guideline](https://github.com/hiyouga/LLaMA-Factory/blob/main/.github/CONTRIBUTING.md)?
+- [ ] Did you write any new necessary tests?
diff --git a/post-training/LLaMA-Factory/.github/SECURITY.md b/post-training/LLaMA-Factory/.github/SECURITY.md
new file mode 100644
index 0000000..d34728e
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/SECURITY.md
@@ -0,0 +1,7 @@
+# Reporting Security Issues
+
+To report a security issue, please use the GitHub Security Advisory ["Report a Vulnerability"](https://github.com/hiyouga/LLaMA-Factory/security/advisories/new) tab.
+
+We will send a response indicating the next steps in handling your report. After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance.
+
+Report security bugs in third-party modules to the person or team maintaining the module.
diff --git a/post-training/LLaMA-Factory/.github/workflows/label_issue.yml b/post-training/LLaMA-Factory/.github/workflows/label_issue.yml
new file mode 100644
index 0000000..b7469f6
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/workflows/label_issue.yml
@@ -0,0 +1,32 @@
+name: label_issue
+
+on:
+ issues:
+ types:
+ - opened
+
+jobs:
+ label_issue:
+ runs-on: ubuntu-latest
+
+ permissions:
+ issues: write
+
+ steps:
+ - env:
+ GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ ISSUE_URL: ${{ github.event.issue.html_url }}
+ ISSUE_TITLE: ${{ github.event.issue.title }}
+ run: |
+ LABEL=""
+ NPU_KEYWORDS=(npu huawei ascend 华为 昇腾)
+ ISSUE_TITLE_LOWER=$(echo $ISSUE_TITLE | tr '[:upper:]' '[:lower:]')
+ for KEYWORD in ${NPU_KEYWORDS[@]}; do
+ if [[ $ISSUE_TITLE_LOWER == *$KEYWORD* ]] && [[ $ISSUE_TITLE_LOWER != *input* ]]; then
+ LABEL="npu"
+ break
+ fi
+ done
+ if [ -n "$LABEL" ]; then
+ gh issue edit $ISSUE_URL --add-label $LABEL
+ fi
diff --git a/post-training/LLaMA-Factory/.github/workflows/publish.yml b/post-training/LLaMA-Factory/.github/workflows/publish.yml
new file mode 100644
index 0000000..c3f729a
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/workflows/publish.yml
@@ -0,0 +1,36 @@
+name: publish
+
+on:
+ workflow_dispatch:
+ release:
+ types:
+ - published
+
+jobs:
+ publish:
+ name: Upload release to PyPI
+
+ runs-on: ubuntu-latest
+
+ environment:
+ name: release
+ url: https://pypi.org/p/llamafactory
+
+ permissions:
+ id-token: write
+
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: Set up Python
+ uses: actions/setup-python@v5
+ with:
+ python-version: "3.9"
+
+ - name: Build package
+ run: |
+ make build
+
+ - name: Publish package
+ uses: pypa/gh-action-pypi-publish@release/v1
diff --git a/post-training/LLaMA-Factory/.github/workflows/tests.yml b/post-training/LLaMA-Factory/.github/workflows/tests.yml
new file mode 100644
index 0000000..84920f4
--- /dev/null
+++ b/post-training/LLaMA-Factory/.github/workflows/tests.yml
@@ -0,0 +1,99 @@
+name: tests
+
+on:
+ workflow_dispatch:
+ push:
+ branches:
+ - "main"
+ paths:
+ - "**.py"
+ - "requirements.txt"
+ - ".github/workflows/*.yml"
+ pull_request:
+ branches:
+ - "main"
+ paths:
+ - "**.py"
+ - "requirements.txt"
+ - ".github/workflows/*.yml"
+
+jobs:
+ tests:
+ strategy:
+ fail-fast: false
+ matrix:
+ python:
+ - "3.9"
+ - "3.10"
+ - "3.11"
+ - "3.12"
+ os:
+ - "ubuntu-latest"
+ - "windows-latest"
+ - "macos-13"
+ transformers:
+ - null
+ include: # test backward compatibility
+ - python: "3.9"
+ os: "ubuntu-latest"
+ transformers: "4.45.0"
+ - python: "3.9"
+ os: "ubuntu-latest"
+ transformers: "4.49.0"
+
+ runs-on: ${{ matrix.os }}
+
+ concurrency:
+ group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os }}-${{ matrix.python }}-${{ matrix.transformers }}
+ cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
+
+ env:
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
+ OS_NAME: ${{ matrix.os }}
+
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: Set up Python
+ uses: actions/setup-python@v5
+ with:
+ python-version: ${{ matrix.python }}
+ cache: "pip"
+ cache-dependency-path: "**/requirements*.txt"
+
+ - name: Install dependencies
+ run: |
+ python -m pip install --upgrade pip
+ python -m pip install ".[torch,dev]"
+
+ - name: Install transformers
+ if: ${{ matrix.transformers }}
+ run: |
+ python -m pip install "transformers==${{ matrix.transformers }}"
+
+ - name: Cache files
+ id: hf-hub-cache
+ uses: actions/cache@v4
+ with:
+ path: ${{ runner.temp }}/huggingface
+ key: huggingface-${{ matrix.os }}-${{ matrix.python }}-${{ matrix.transformers }}-${{ hashFiles('tests/version.txt') }}
+
+ - name: Check quality
+ run: |
+ make style && make quality
+
+ - name: Check license
+ run: |
+ make license
+
+ - name: Check build
+ run: |
+ make build
+
+ - name: Test with pytest
+ run: |
+ make test
+ env:
+ HF_HOME: ${{ runner.temp }}/huggingface
+ HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"
diff --git a/post-training/LLaMA-Factory/.gitignore b/post-training/LLaMA-Factory/.gitignore
new file mode 100644
index 0000000..8397a78
--- /dev/null
+++ b/post-training/LLaMA-Factory/.gitignore
@@ -0,0 +1,177 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+# For a library or package, you might want to ignore these files since the code is
+# intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+# However, in case of collaboration, if having platform-specific dependencies or dependencies
+# having no cross-platform support, pipenv may install dependencies that don't work, or not
+# install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+# This is especially recommended for binary packages to ensure reproducibility, and is more
+# commonly ignored for libraries.
+# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+# in version control.
+# https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+# and can be added to the global gitignore or merged into this file. For a more nuclear
+# option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+
+# vscode
+.vscode/
+
+# uv
+uv.lock
+
+# custom .gitignore
+ms_cache/
+hf_cache/
+om_cache/
+cache/
+config/
+output/
+wandb/
+swanlog/
+generated_predictions.jsonl
diff --git a/post-training/LLaMA-Factory/.pre-commit-config.yaml b/post-training/LLaMA-Factory/.pre-commit-config.yaml
new file mode 100644
index 0000000..cbe361e
--- /dev/null
+++ b/post-training/LLaMA-Factory/.pre-commit-config.yaml
@@ -0,0 +1,28 @@
+repos:
+- repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v5.0.0
+ hooks:
+ - id: check-ast
+ - id: check-added-large-files
+ args: ['--maxkb=25000']
+ - id: check-merge-conflict
+ - id: check-yaml
+ - id: debug-statements
+ - id: end-of-file-fixer
+ - id: trailing-whitespace
+ args: [--markdown-linebreak-ext=md]
+ - id: no-commit-to-branch
+ args: ['--branch', 'main']
+
+- repo: https://github.com/asottile/pyupgrade
+ rev: v3.17.0
+ hooks:
+ - id: pyupgrade
+ args: [--py38-plus]
+
+- repo: https://github.com/astral-sh/ruff-pre-commit
+ rev: v0.6.9
+ hooks:
+ - id: ruff
+ args: [--fix]
+ - id: ruff-format
diff --git a/post-training/LLaMA-Factory/CITATION.cff b/post-training/LLaMA-Factory/CITATION.cff
new file mode 100644
index 0000000..01b4c9f
--- /dev/null
+++ b/post-training/LLaMA-Factory/CITATION.cff
@@ -0,0 +1,44 @@
+cff-version: 1.2.0
+date-released: 2024-03
+message: "If you use this software, please cite it as below."
+authors:
+- family-names: "Zheng"
+ given-names: "Yaowei"
+- family-names: "Zhang"
+ given-names: "Richong"
+- family-names: "Zhang"
+ given-names: "Junhao"
+- family-names: "Ye"
+ given-names: "Yanhan"
+- family-names: "Luo"
+ given-names: "Zheyan"
+- family-names: "Feng"
+ given-names: "Zhangchi"
+- family-names: "Ma"
+ given-names: "Yongqiang"
+title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
+url: "https://arxiv.org/abs/2403.13372"
+preferred-citation:
+ type: conference-paper
+ conference:
+ name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
+ authors:
+ - family-names: "Zheng"
+ given-names: "Yaowei"
+ - family-names: "Zhang"
+ given-names: "Richong"
+ - family-names: "Zhang"
+ given-names: "Junhao"
+ - family-names: "Ye"
+ given-names: "Yanhan"
+ - family-names: "Luo"
+ given-names: "Zheyan"
+ - family-names: "Feng"
+ given-names: "Zhangchi"
+ - family-names: "Ma"
+ given-names: "Yongqiang"
+ title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
+ url: "https://arxiv.org/abs/2403.13372"
+ year: 2024
+ publisher: "Association for Computational Linguistics"
+ address: "Bangkok, Thailand"
diff --git a/post-training/LLaMA-Factory/LICENSE b/post-training/LLaMA-Factory/LICENSE
new file mode 100644
index 0000000..b09cd78
--- /dev/null
+++ b/post-training/LLaMA-Factory/LICENSE
@@ -0,0 +1,201 @@
+Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
diff --git a/post-training/LLaMA-Factory/MANIFEST.in b/post-training/LLaMA-Factory/MANIFEST.in
new file mode 100644
index 0000000..82c51f6
--- /dev/null
+++ b/post-training/LLaMA-Factory/MANIFEST.in
@@ -0,0 +1 @@
+include LICENSE requirements.txt
diff --git a/post-training/LLaMA-Factory/Makefile b/post-training/LLaMA-Factory/Makefile
new file mode 100644
index 0000000..2dcb7ca
--- /dev/null
+++ b/post-training/LLaMA-Factory/Makefile
@@ -0,0 +1,24 @@
+.PHONY: build commit license quality style test
+
+check_dirs := scripts src tests setup.py
+
+build:
+ pip3 install build && python3 -m build
+
+commit:
+ pre-commit install
+ pre-commit run --all-files
+
+license:
+ python3 tests/check_license.py $(check_dirs)
+
+quality:
+ ruff check $(check_dirs)
+ ruff format --check $(check_dirs)
+
+style:
+ ruff check $(check_dirs) --fix
+ ruff format $(check_dirs)
+
+test:
+ CUDA_VISIBLE_DEVICES= WANDB_DISABLED=true pytest -vv tests/
diff --git a/post-training/LLaMA-Factory/README.md b/post-training/LLaMA-Factory/README.md
new file mode 100644
index 0000000..05fdd97
--- /dev/null
+++ b/post-training/LLaMA-Factory/README.md
@@ -0,0 +1,915 @@
+
+
+[](https://github.com/hiyouga/LLaMA-Factory/stargazers)
+[](https://github.com/hiyouga/LLaMA-Factory/commits/main)
+[](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors)
+[](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml)
+[](https://pypi.org/project/llamafactory/)
+[](https://scholar.google.com/scholar?cites=12620864006390196564)
+[](https://github.com/hiyouga/LLaMA-Factory/pulls)
+
+[](https://twitter.com/llamafactory_ai)
+[](https://discord.gg/rKfvV9r9FK)
+[](https://gitcode.com/zhengyaowei/LLaMA-Factory)
+
+[](https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing)
+[](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory)
+[](https://huggingface.co/spaces/hiyouga/LLaMA-Board)
+[](https://modelscope.cn/studios/hiyouga/LLaMA-Board)
+[](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/)
+
+
+ Easily fine-tune 100+ large language models with zero-code CLI and Web UI
+
+
+
+
+
+
+
+👋 Join our [WeChat](assets/wechat.jpg) or [NPU user group](assets/wechat_npu.jpg).
+
+\[ English | [中文](README_zh.md) \]
+
+**Fine-tuning a large language model can be easy as...**
+
+https://github.com/user-attachments/assets/3991a3a8-4276-4d30-9cab-4cb0c4b9b99e
+
+Choose your path:
+
+- **Documentation**: https://llamafactory.readthedocs.io/en/latest/
+- **Colab (free)**: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing
+- **Local machine**: Please refer to [usage](#getting-started)
+- **PAI-DSW (free trial)**: [Llama3 Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) | [Qwen2-VL Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl) | [DeepSeek-R1-Distill Example](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b)
+- **Amazon SageMaker**: [Blog](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/)
+- **Easy Dataset**: [Fine-tune on Synthetic Data](https://buaa-act.feishu.cn/wiki/GVzlwYcRFiR8OLkHbL6cQpYin7g)
+
+> [!NOTE]
+> Except for the above links, all other websites are unauthorized third-party websites. Please carefully use them.
+
+## Table of Contents
+
+- [Features](#features)
+- [Benchmark](#benchmark)
+- [Changelog](#changelog)
+- [Supported Models](#supported-models)
+- [Supported Training Approaches](#supported-training-approaches)
+- [Provided Datasets](#provided-datasets)
+- [Requirement](#requirement)
+- [Getting Started](#getting-started)
+ - [Installation](#installation)
+ - [Data Preparation](#data-preparation)
+ - [Quickstart](#quickstart)
+ - [Fine-Tuning with LLaMA Board GUI](#fine-tuning-with-llama-board-gui-powered-by-gradio)
+ - [Build Docker](#build-docker)
+ - [Deploy with OpenAI-style API and vLLM](#deploy-with-openai-style-api-and-vllm)
+ - [Download from ModelScope Hub](#download-from-modelscope-hub)
+ - [Download from Modelers Hub](#download-from-modelers-hub)
+ - [Use W&B Logger](#use-wb-logger)
+ - [Use SwanLab Logger](#use-swanlab-logger)
+- [Projects using LLaMA Factory](#projects-using-llama-factory)
+- [License](#license)
+- [Citation](#citation)
+- [Acknowledgement](#acknowledgement)
+
+## Features
+
+- **Various models**: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Qwen2-VL, DeepSeek, Yi, Gemma, ChatGLM, Phi, etc.
+- **Integrated methods**: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.
+- **Scalable resources**: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
+- **Advanced algorithms**: [GaLore](https://github.com/jiaweizzhao/GaLore), [BAdam](https://github.com/Ledzy/BAdam), [APOLLO](https://github.com/zhuhanqing/APOLLO), [Adam-mini](https://github.com/zyushun/Adam-mini), DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ and PiSSA.
+- **Practical tricks**: [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), [Unsloth](https://github.com/unslothai/unsloth), [Liger Kernel](https://github.com/linkedin/Liger-Kernel), RoPE scaling, NEFTune and rsLoRA.
+- **Wide tasks**: Multi-turn dialogue, tool using, image understanding, visual grounding, video recognition, audio understanding, etc.
+- **Experiment monitors**: LlamaBoard, TensorBoard, Wandb, MLflow, [SwanLab](https://github.com/SwanHubX/SwanLab), etc.
+- **Faster inference**: OpenAI-style API, Gradio UI and CLI with [vLLM worker](https://github.com/vllm-project/vllm) or [SGLang worker](https://github.com/sgl-project/sglang).
+
+### Day-N Support for Fine-Tuning Cutting-Edge Models
+
+| Support Date | Model Name |
+| ------------ | ------------------------------------------------------------ |
+| Day 0 | Qwen2.5 / Qwen2.5-VL / Gemma 3 / InternLM 3 / MiniCPM-o-2.6 |
+| Day 1 | Llama 3 / GLM-4 / Mistral Small / PaliGemma2 / Llama 4 |
+
+## Benchmark
+
+Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/ptuning), LLaMA Factory's LoRA tuning offers up to **3.7 times faster** training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.
+
+
+
+Definitions
+
+- **Training Speed**: the number of training samples processed per second during the training. (bs=4, cutoff_len=1024)
+- **Rouge Score**: Rouge-2 score on the development set of the [advertising text generation](https://aclanthology.org/D19-1321.pdf) task. (bs=4, cutoff_len=1024)
+- **GPU Memory**: Peak GPU memory usage in 4-bit quantized training. (bs=1, cutoff_len=1024)
+- We adopt `pre_seq_len=128` for ChatGLM's P-Tuning and `lora_rank=32` for LLaMA Factory's LoRA tuning.
+
+
+
+## Changelog
+
+[25/04/16] We supported fine-tuning the **[InternVL3](https://huggingface.co/OpenGVLab/InternVL3-8B)** model. See [PR #7258](https://github.com/hiyouga/LLaMA-Factory/pull/7258) to get started.
+
+[25/04/14] We supported fine-tuning the **[GLM-Z1](https://huggingface.co/THUDM/GLM-Z1-9B-0414)** and **[Kimi-VL](https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct)** models.
+
+[25/04/06] We supported fine-tuning the **[Llama 4](https://ai.meta.com/blog/llama-4-multimodal-intelligence/)** model. See [PR #7611](https://github.com/hiyouga/LLaMA-Factory/pull/7611) to get started.
+
+[25/03/31] We supported fine-tuning the **[Qwen2.5 Omni](https://qwenlm.github.io/blog/qwen2.5-omni/)** model. See [PR #7537](https://github.com/hiyouga/LLaMA-Factory/pull/7537) to get started.
+
+Full Changelog
+
+[25/03/15] We supported **[SGLang](https://github.com/sgl-project/sglang)** as inference backend. Try `infer_backend: sglang` to accelerate inference.
+
+[25/03/12] We supported fine-tuning the **[Gemma 3](https://huggingface.co/blog/gemma3)** model.
+
+[25/02/24] Announcing **[EasyR1](https://github.com/hiyouga/EasyR1)**, an efficient, scalable and multi-modality RL training framework for efficient GRPO training.
+
+[25/02/11] We supported saving the **[Ollama](https://github.com/ollama/ollama)** modelfile when exporting the model checkpoints. See [examples](examples/README.md) for usage.
+
+[25/02/05] We supported fine-tuning the **[Qwen2-Audio](Qwen/Qwen2-Audio-7B-Instruct)** and **[MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6)** on audio understanding tasks.
+
+[25/01/31] We supported fine-tuning the **[DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)** and **[Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)** models.
+
+[25/01/15] We supported **[APOLLO](https://arxiv.org/abs/2412.05270)** optimizer. See [examples](examples/README.md) for usage.
+
+[25/01/14] We supported fine-tuning the **[MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6)** and **[MiniCPM-V-2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)** models. Thank [@BUAADreamer](https://github.com/BUAADreamer)'s PR.
+
+[25/01/14] We supported fine-tuning the **[InternLM 3](https://huggingface.co/collections/internlm/)** models. Thank [@hhaAndroid](https://github.com/hhaAndroid)'s PR.
+
+[25/01/10] We supported fine-tuning the **[Phi-4](https://huggingface.co/microsoft/phi-4)** model.
+
+[24/12/21] We supported using **[SwanLab](https://github.com/SwanHubX/SwanLab)** for experiment tracking and visualization. See [this section](#use-swanlab-logger) for details.
+
+[24/11/27] We supported fine-tuning the **[Skywork-o1](https://huggingface.co/Skywork/Skywork-o1-Open-Llama-3.1-8B)** model and the **[OpenO1](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)** dataset.
+
+[24/10/09] We supported downloading pre-trained models and datasets from the **[Modelers Hub](https://modelers.cn/models)**. See [this tutorial](#download-from-modelers-hub) for usage.
+
+[24/09/19] We supported fine-tuning the **[Qwen2.5](https://qwenlm.github.io/blog/qwen2.5/)** models.
+
+[24/08/30] We supported fine-tuning the **[Qwen2-VL](https://qwenlm.github.io/blog/qwen2-vl/)** models. Thank [@simonJJJ](https://github.com/simonJJJ)'s PR.
+
+[24/08/27] We supported **[Liger Kernel](https://github.com/linkedin/Liger-Kernel)**. Try `enable_liger_kernel: true` for efficient training.
+
+[24/08/09] We supported **[Adam-mini](https://github.com/zyushun/Adam-mini)** optimizer. See [examples](examples/README.md) for usage. Thank [@relic-yuexi](https://github.com/relic-yuexi)'s PR.
+
+[24/07/04] We supported [contamination-free packed training](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing). Use `neat_packing: true` to activate it. Thank [@chuan298](https://github.com/chuan298)'s PR.
+
+[24/06/16] We supported **[PiSSA](https://arxiv.org/abs/2404.02948)** algorithm. See [examples](examples/README.md) for usage.
+
+[24/06/07] We supported fine-tuning the **[Qwen2](https://qwenlm.github.io/blog/qwen2/)** and **[GLM-4](https://github.com/THUDM/GLM-4)** models.
+
+[24/05/26] We supported **[SimPO](https://arxiv.org/abs/2405.14734)** algorithm for preference learning. See [examples](examples/README.md) for usage.
+
+[24/05/20] We supported fine-tuning the **PaliGemma** series models. Note that the PaliGemma models are pre-trained models, you need to fine-tune them with `paligemma` template for chat completion.
+
+[24/05/18] We supported **[KTO](https://arxiv.org/abs/2402.01306)** algorithm for preference learning. See [examples](examples/README.md) for usage.
+
+[24/05/14] We supported training and inference on the Ascend NPU devices. Check [installation](#installation) section for details.
+
+[24/04/26] We supported fine-tuning the **LLaVA-1.5** multimodal LLMs. See [examples](examples/README.md) for usage.
+
+[24/04/22] We provided a **[Colab notebook](https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing)** for fine-tuning the Llama-3 model on a free T4 GPU. Two Llama-3-derived models fine-tuned using LLaMA Factory are available at Hugging Face, check [Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat) and [Llama3-Chinese](https://huggingface.co/zhichen/Llama3-Chinese) for details.
+
+[24/04/21] We supported **[Mixture-of-Depths](https://arxiv.org/abs/2404.02258)** according to [AstraMindAI's implementation](https://github.com/astramind-ai/Mixture-of-depths). See [examples](examples/README.md) for usage.
+
+[24/04/16] We supported **[BAdam](https://arxiv.org/abs/2404.02827)** optimizer. See [examples](examples/README.md) for usage.
+
+[24/04/16] We supported **[unsloth](https://github.com/unslothai/unsloth)**'s long-sequence training (Llama-2-7B-56k within 24GB). It achieves **117%** speed and **50%** memory compared with FlashAttention-2, more benchmarks can be found in [this page](https://github.com/hiyouga/LLaMA-Factory/wiki/Performance-comparison).
+
+[24/03/31] We supported **[ORPO](https://arxiv.org/abs/2403.07691)**. See [examples](examples/README.md) for usage.
+
+[24/03/21] Our paper "[LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models](https://arxiv.org/abs/2403.13372)" is available at arXiv!
+
+[24/03/20] We supported **FSDP+QLoRA** that fine-tunes a 70B model on 2x24GB GPUs. See [examples](examples/README.md) for usage.
+
+[24/03/13] We supported **[LoRA+](https://arxiv.org/abs/2402.12354)**. See [examples](examples/README.md) for usage.
+
+[24/03/07] We supported **[GaLore](https://arxiv.org/abs/2403.03507)** optimizer. See [examples](examples/README.md) for usage.
+
+[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `infer_backend: vllm` to enjoy **270%** inference speed.
+
+[24/02/28] We supported weight-decomposed LoRA (**[DoRA](https://arxiv.org/abs/2402.09353)**). Try `use_dora: true` to activate DoRA training.
+
+[24/02/15] We supported **block expansion** proposed by [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro). See [examples](examples/README.md) for usage.
+
+[24/02/05] Qwen1.5 (Qwen2 beta version) series models are supported in LLaMA-Factory. Check this [blog post](https://qwenlm.github.io/blog/qwen1.5/) for details.
+
+[24/01/18] We supported **agent tuning** for most models, equipping model with tool using abilities by fine-tuning with `dataset: glaive_toolcall_en`.
+
+[23/12/23] We supported **[unsloth](https://github.com/unslothai/unsloth)**'s implementation to boost LoRA tuning for the LLaMA, Mistral and Yi models. Try `use_unsloth: true` argument to activate unsloth patch. It achieves **170%** speed in our benchmark, check [this page](https://github.com/hiyouga/LLaMA-Factory/wiki/Performance-comparison) for details.
+
+[23/12/12] We supported fine-tuning the latest MoE model **[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)** in our framework. See hardware requirement [here](#hardware-requirement).
+
+[23/12/01] We supported downloading pre-trained models and datasets from the **[ModelScope Hub](https://modelscope.cn/models)**. See [this tutorial](#download-from-modelscope-hub) for usage.
+
+[23/10/21] We supported **[NEFTune](https://arxiv.org/abs/2310.05914)** trick for fine-tuning. Try `neftune_noise_alpha: 5` argument to activate NEFTune.
+
+[23/09/27] We supported **$S^2$-Attn** proposed by [LongLoRA](https://github.com/dvlab-research/LongLoRA) for the LLaMA models. Try `shift_attn: true` argument to enable shift short attention.
+
+[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See [examples](examples/README.md) for usage.
+
+[23/09/10] We supported **[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)**. Try `flash_attn: fa2` argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs.
+
+[23/08/12] We supported **RoPE scaling** to extend the context length of the LLaMA models. Try `rope_scaling: linear` argument in training and `rope_scaling: dynamic` argument at inference to extrapolate the position embeddings.
+
+[23/08/11] We supported **[DPO training](https://arxiv.org/abs/2305.18290)** for instruction-tuned models. See [examples](examples/README.md) for usage.
+
+[23/07/31] We supported **dataset streaming**. Try `streaming: true` and `max_steps: 10000` arguments to load your dataset in streaming mode.
+
+[23/07/29] We released two instruction-tuned 13B models at Hugging Face. See these Hugging Face Repos ([LLaMA-2](https://huggingface.co/hiyouga/Llama-2-Chinese-13b-chat) / [Baichuan](https://huggingface.co/hiyouga/Baichuan-13B-sft)) for details.
+
+[23/07/18] We developed an **all-in-one Web UI** for training, evaluation and inference. Try `train_web.py` to fine-tune models in your Web browser. Thank [@KanadeSiina](https://github.com/KanadeSiina) and [@codemayq](https://github.com/codemayq) for their efforts in the development.
+
+[23/07/09] We released **[FastEdit](https://github.com/hiyouga/FastEdit)** ⚡🩹, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow [FastEdit](https://github.com/hiyouga/FastEdit) if you are interested.
+
+[23/06/29] We provided a **reproducible example** of training a chat model using instruction-following datasets, see [Baichuan-7B-sft](https://huggingface.co/hiyouga/Baichuan-7B-sft) for details.
+
+[23/06/22] We aligned the [demo API](src/api_demo.py) with the [OpenAI's](https://platform.openai.com/docs/api-reference/chat) format where you can insert the fine-tuned model in **arbitrary ChatGPT-based applications**.
+
+[23/06/03] We supported quantized training and inference (aka **[QLoRA](https://github.com/artidoro/qlora)**). See [examples](examples/README.md) for usage.
+
+
+
+## Supported Models
+
+| Model | Model size | Template |
+| ----------------------------------------------------------------- | -------------------------------- | ------------------- |
+| [Baichuan 2](https://huggingface.co/baichuan-inc) | 7B/13B | baichuan2 |
+| [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
+| [ChatGLM3](https://huggingface.co/THUDM) | 6B | chatglm3 |
+| [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere |
+| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek |
+| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 |
+| [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseek3 |
+| [Falcon](https://huggingface.co/tiiuae) | 7B/11B/40B/180B | falcon |
+| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma |
+| [Gemma 3](https://huggingface.co/google) | 1B/4B/12B/27B | gemma3/gemma (1B) |
+| [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/THUDM) | 9B/32B | glm4 |
+| [GPT-2](https://huggingface.co/openai-community) | 0.1B/0.4B/0.8B/1.5B | - |
+| [Granite 3.0-3.3](https://huggingface.co/ibm-granite) | 1B/2B/3B/8B | granite3 |
+| [Hunyuan](https://huggingface.co/tencent/) | 7B | hunyuan |
+| [Index](https://huggingface.co/IndexTeam) | 1.9B | index |
+| [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 |
+| [InternVL 2.5-3](https://huggingface.co/OpenGVLab)\*\* | 1B/2B/4B/8B/9B/14B/26B/38B/78B | intern_vl |
+| [Kimi-VL](https://huggingface.co/moonshotai) | 16B | kimi_vl |
+| [Llama](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | - |
+| [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 |
+| [Llama 3-3.3](https://huggingface.co/meta-llama) | 1B/3B/8B/70B | llama3 |
+| [Llama 4](https://huggingface.co/meta-llama) | 109B/402B | llama4 |
+| [Llama 3.2 Vision](https://huggingface.co/meta-llama) | 11B/90B | mllama |
+| [LLaVA-1.5](https://huggingface.co/llava-hf) | 7B/13B | llava |
+| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
+| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
+| [MiniCPM](https://huggingface.co/openbmb) | 1B/2B/4B | cpm/cpm3 |
+| [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v |
+| [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral |
+| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
+| [Mistral Small](https://huggingface.co/mistralai) | 24B | mistral_small |
+| [OLMo](https://huggingface.co/allenai) | 1B/7B | - |
+| [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma |
+| [Phi-1.5/Phi-2](https://huggingface.co/microsoft) | 1.3B/2.7B | - |
+| [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi |
+| [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small |
+| [Phi-4](https://huggingface.co/microsoft) | 14B | phi4 |
+| [Pixtral](https://huggingface.co/mistralai) | 12B | pixtral |
+| [Qwen (1-2.5) (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
+| [Qwen2-Audio](https://huggingface.co/Qwen) | 7B | qwen2_audio |
+| [Qwen2.5-Omni](https://huggingface.co/Qwen)\*\* | 7B | qwen2_omni |
+| [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen) | 2B/3B/7B/32B/72B | qwen2_vl |
+| [Skywork o1](https://huggingface.co/Skywork) | 8B | skywork_o1 |
+| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
+| [TeleChat2](https://huggingface.co/Tele-AI) | 3B/7B/35B/115B | telechat2 |
+| [XVERSE](https://huggingface.co/xverse) | 7B/13B/65B | xverse |
+| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi |
+| [Yi-VL](https://huggingface.co/01-ai) | 6B/34B | yi_vl |
+| [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan |
+
+> [!NOTE]
+> For the "base" models, the `template` argument can be chosen from `default`, `alpaca`, `vicuna` etc. But make sure to use the **corresponding template** for the "instruct/chat" models.
+>
+> Remember to use the **SAME** template in training and inference.
+>
+> \*: You should install the `transformers` from main branch and use `DISABLE_VERSION_CHECK=1` to skip version check.
+>
+> \*\*: You need to install a specific version of `transformers` to use the corresponding model.
+
+Please refer to [constants.py](src/llamafactory/extras/constants.py) for a full list of models we supported.
+
+You also can add a custom chat template to [template.py](src/llamafactory/data/template.py).
+
+## Supported Training Approaches
+
+| Approach | Full-tuning | Freeze-tuning | LoRA | QLoRA |
+| ---------------------- | ------------------ | ------------------ | ------------------ | ------------------ |
+| Pre-Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Supervised Fine-Tuning | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Reward Modeling | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| PPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| DPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| KTO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| ORPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| SimPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+
+> [!TIP]
+> The implementation details of PPO can be found in [this blog](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html).
+
+## Provided Datasets
+
+Pre-training datasets
+
+- [Wiki Demo (en)](data/wiki_demo.txt)
+- [RefinedWeb (en)](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)
+- [RedPajama V2 (en)](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2)
+- [Wikipedia (en)](https://huggingface.co/datasets/olm/olm-wikipedia-20221220)
+- [Wikipedia (zh)](https://huggingface.co/datasets/pleisto/wikipedia-cn-20230720-filtered)
+- [Pile (en)](https://huggingface.co/datasets/EleutherAI/pile)
+- [SkyPile (zh)](https://huggingface.co/datasets/Skywork/SkyPile-150B)
+- [FineWeb (en)](https://huggingface.co/datasets/HuggingFaceFW/fineweb)
+- [FineWeb-Edu (en)](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
+- [The Stack (en)](https://huggingface.co/datasets/bigcode/the-stack)
+- [StarCoder (en)](https://huggingface.co/datasets/bigcode/starcoderdata)
+
+
+
+Supervised fine-tuning datasets
+
+- [Identity (en&zh)](data/identity.json)
+- [Stanford Alpaca (en)](https://github.com/tatsu-lab/stanford_alpaca)
+- [Stanford Alpaca (zh)](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3)
+- [Alpaca GPT4 (en&zh)](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
+- [Glaive Function Calling V2 (en&zh)](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2)
+- [LIMA (en)](https://huggingface.co/datasets/GAIR/lima)
+- [Guanaco Dataset (multilingual)](https://huggingface.co/datasets/JosephusCheung/GuanacoDataset)
+- [BELLE 2M (zh)](https://huggingface.co/datasets/BelleGroup/train_2M_CN)
+- [BELLE 1M (zh)](https://huggingface.co/datasets/BelleGroup/train_1M_CN)
+- [BELLE 0.5M (zh)](https://huggingface.co/datasets/BelleGroup/train_0.5M_CN)
+- [BELLE Dialogue 0.4M (zh)](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)
+- [BELLE School Math 0.25M (zh)](https://huggingface.co/datasets/BelleGroup/school_math_0.25M)
+- [BELLE Multiturn Chat 0.8M (zh)](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M)
+- [UltraChat (en)](https://github.com/thunlp/UltraChat)
+- [OpenPlatypus (en)](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)
+- [CodeAlpaca 20k (en)](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)
+- [Alpaca CoT (multilingual)](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT)
+- [OpenOrca (en)](https://huggingface.co/datasets/Open-Orca/OpenOrca)
+- [SlimOrca (en)](https://huggingface.co/datasets/Open-Orca/SlimOrca)
+- [MathInstruct (en)](https://huggingface.co/datasets/TIGER-Lab/MathInstruct)
+- [Firefly 1.1M (zh)](https://huggingface.co/datasets/YeungNLP/firefly-train-1.1M)
+- [Wiki QA (en)](https://huggingface.co/datasets/wiki_qa)
+- [Web QA (zh)](https://huggingface.co/datasets/suolyer/webqa)
+- [WebNovel (zh)](https://huggingface.co/datasets/zxbsmk/webnovel_cn)
+- [Nectar (en)](https://huggingface.co/datasets/berkeley-nest/Nectar)
+- [deepctrl (en&zh)](https://www.modelscope.cn/datasets/deepctrl/deepctrl-sft-data)
+- [Advertise Generating (zh)](https://huggingface.co/datasets/HasturOfficial/adgen)
+- [ShareGPT Hyperfiltered (en)](https://huggingface.co/datasets/totally-not-an-llm/sharegpt-hyperfiltered-3k)
+- [ShareGPT4 (en&zh)](https://huggingface.co/datasets/shibing624/sharegpt_gpt4)
+- [UltraChat 200k (en)](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
+- [AgentInstruct (en)](https://huggingface.co/datasets/THUDM/AgentInstruct)
+- [LMSYS Chat 1M (en)](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
+- [Evol Instruct V2 (en)](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k)
+- [Cosmopedia (en)](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia)
+- [STEM (zh)](https://huggingface.co/datasets/hfl/stem_zh_instruction)
+- [Ruozhiba (zh)](https://huggingface.co/datasets/hfl/ruozhiba_gpt4_turbo)
+- [Neo-sft (zh)](https://huggingface.co/datasets/m-a-p/neo_sft_phase2)
+- [Magpie-Pro-300K-Filtered (en)](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered)
+- [Magpie-ultra-v0.1 (en)](https://huggingface.co/datasets/argilla/magpie-ultra-v0.1)
+- [WebInstructSub (en)](https://huggingface.co/datasets/TIGER-Lab/WebInstructSub)
+- [OpenO1-SFT (en&zh)](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
+- [Open-Thoughts (en)](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k)
+- [Open-R1-Math (en)](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)
+- [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT)
+- [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k)
+- [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions)
+- [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de)
+- [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de)
+- [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de)
+- [OpenSchnabeltier (de)](https://huggingface.co/datasets/mayflowergmbh/openschnabeltier_de)
+- [Evol Instruct (de)](https://huggingface.co/datasets/mayflowergmbh/evol-instruct_de)
+- [Dolphin (de)](https://huggingface.co/datasets/mayflowergmbh/dolphin_de)
+- [Booksum (de)](https://huggingface.co/datasets/mayflowergmbh/booksum_de)
+- [Airoboros (de)](https://huggingface.co/datasets/mayflowergmbh/airoboros-3.0_de)
+- [Ultrachat (de)](https://huggingface.co/datasets/mayflowergmbh/ultra-chat_de)
+
+
+
+Preference datasets
+
+- [DPO mixed (en&zh)](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k)
+- [UltraFeedback (en)](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
+- [COIG-P (en&zh)](https://huggingface.co/datasets/m-a-p/COIG-P)
+- [RLHF-V (en)](https://huggingface.co/datasets/openbmb/RLHF-V-Dataset)
+- [VLFeedback (en)](https://huggingface.co/datasets/Zhihui/VLFeedback)
+- [Orca DPO Pairs (en)](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
+- [HH-RLHF (en)](https://huggingface.co/datasets/Anthropic/hh-rlhf)
+- [Nectar (en)](https://huggingface.co/datasets/berkeley-nest/Nectar)
+- [Orca DPO (de)](https://huggingface.co/datasets/mayflowergmbh/intel_orca_dpo_pairs_de)
+- [KTO mixed (en)](https://huggingface.co/datasets/argilla/kto-mix-15k)
+
+
+
+Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.
+
+```bash
+pip install --upgrade huggingface_hub
+huggingface-cli login
+```
+
+## Requirement
+
+| Mandatory | Minimum | Recommend |
+| ------------ | ------- | --------- |
+| python | 3.9 | 3.10 |
+| torch | 2.0.0 | 2.6.0 |
+| transformers | 4.45.0 | 4.50.0 |
+| datasets | 2.16.0 | 3.2.0 |
+| accelerate | 0.34.0 | 1.2.1 |
+| peft | 0.14.0 | 0.15.1 |
+| trl | 0.8.6 | 0.9.6 |
+
+| Optional | Minimum | Recommend |
+| ------------ | ------- | --------- |
+| CUDA | 11.6 | 12.2 |
+| deepspeed | 0.10.0 | 0.16.4 |
+| bitsandbytes | 0.39.0 | 0.43.1 |
+| vllm | 0.4.3 | 0.8.2 |
+| flash-attn | 2.5.6 | 2.7.2 |
+
+### Hardware Requirement
+
+\* *estimated*
+
+| Method | Bits | 7B | 14B | 30B | 70B | `x`B |
+| ------------------------------- | ---- | ----- | ----- | ----- | ------ | ------- |
+| Full (`bf16` or `fp16`) | 32 | 120GB | 240GB | 600GB | 1200GB | `18x`GB |
+| Full (`pure_bf16`) | 16 | 60GB | 120GB | 300GB | 600GB | `8x`GB |
+| Freeze/LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | `2x`GB |
+| QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | `x`GB |
+| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | `x/2`GB |
+| QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | `x/4`GB |
+
+## Getting Started
+
+### Installation
+
+> [!IMPORTANT]
+> Installation is mandatory.
+
+```bash
+git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
+cd LLaMA-Factory
+pip install -e ".[torch,metrics]"
+```
+
+Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, awq, aqlm, vllm, sglang, galore, apollo, badam, adam-mini, qwen, minicpm_v, modelscope, openmind, swanlab, quality
+
+> [!TIP]
+> Use `pip install --no-deps -e .` to resolve package conflicts.
+
+Setting up a virtual environment with uv
+
+Create an isolated Python environment with [uv](https://github.com/astral-sh/uv):
+
+```bash
+uv sync --extra torch --extra metrics --prerelease=allow
+```
+
+Run LLaMA-Factory in the isolated environment:
+
+```bash
+uv run --prerelease=allow llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+```
+
+
+
+For Windows users
+
+#### Install BitsAndBytes
+
+If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you need to install a pre-built version of `bitsandbytes` library, which supports CUDA 11.1 to 12.2, please select the appropriate [release version](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels) based on your CUDA version.
+
+```bash
+pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
+```
+
+#### Install Flash Attention-2
+
+To enable FlashAttention-2 on the Windows platform, please use the script from [flash-attention-windows-wheel](https://huggingface.co/lldacing/flash-attention-windows-wheel) to compile and install it by yourself.
+
+
+
+For Ascend NPU users
+
+To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher and specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
+
+```bash
+# replace the url according to your CANN version and devices
+# install CANN Toolkit
+wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C20SPC702/Ascend-cann-toolkit_8.0.0.alpha002_linux-"$(uname -i)".run
+bash Ascend-cann-toolkit_8.0.0.alpha002_linux-"$(uname -i)".run --install
+
+# install CANN Kernels
+wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C20SPC702/Ascend-cann-kernels-910b_8.0.0.alpha002_linux-"$(uname -i)".run
+bash Ascend-cann-kernels-910b_8.0.0.alpha002_linux-"$(uname -i)".run --install
+
+# set env variables
+source /usr/local/Ascend/ascend-toolkit/set_env.sh
+```
+
+| Requirement | Minimum | Recommend |
+| ------------ | ------- | -------------- |
+| CANN | 8.0.RC1 | 8.0.0.alpha002 |
+| torch | 2.1.0 | 2.4.0 |
+| torch-npu | 2.1.0 | 2.4.0.post2 |
+| deepspeed | 0.13.2 | 0.13.2 |
+| vllm-ascend | - | 0.7.3 |
+
+Remember to use `ASCEND_RT_VISIBLE_DEVICES` instead of `CUDA_VISIBLE_DEVICES` to specify the device to use.
+
+If you cannot infer model on NPU devices, try setting `do_sample: false` in the configurations.
+
+Download the pre-built Docker images: [32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html)
+
+#### Install BitsAndBytes
+
+To use QLoRA based on bitsandbytes on Ascend NPU, please follow these 3 steps:
+
+1. Manually compile bitsandbytes: Refer to [the installation documentation](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU) for the NPU version of bitsandbytes to complete the compilation and installation. The compilation requires a cmake version of at least 3.22.1 and a g++ version of at least 12.x.
+
+```bash
+# Install bitsandbytes from source
+# Clone bitsandbytes repo, Ascend NPU backend is currently enabled on multi-backend-refactor branch
+git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git
+cd bitsandbytes/
+
+# Install dependencies
+pip install -r requirements-dev.txt
+
+# Install the dependencies for the compilation tools. Note that the commands for this step may vary depending on the operating system. The following are provided for reference
+apt-get install -y build-essential cmake
+
+# Compile & install
+cmake -DCOMPUTE_BACKEND=npu -S .
+make
+pip install .
+```
+
+2. Install transformers from the main branch.
+
+```bash
+git clone -b main https://github.com/huggingface/transformers.git
+cd transformers
+pip install .
+```
+
+3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml).
+
+
+
+### Data Preparation
+
+Please refer to [data/README.md](data/README.md) for checking the details about the format of dataset files. You can use datasets on HuggingFace / ModelScope / Modelers hub, load the dataset in local disk, or specify a path to s3/gcs cloud storage.
+
+> [!NOTE]
+> Please update `data/dataset_info.json` to use your custom dataset.
+
+You can also use **[Easy Dataset](https://github.com/ConardLi/easy-dataset)** to create synthetic data for fine-tuning.
+
+### Quickstart
+
+Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Llama3-8B-Instruct model, respectively.
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+```
+
+See [examples/README.md](examples/README.md) for advanced usage (including distributed training).
+
+> [!TIP]
+> Use `llamafactory-cli help` to show help information.
+>
+> Read [FAQs](https://github.com/hiyouga/LLaMA-Factory/issues/4614) first if you encounter any problems.
+
+### Fine-Tuning with LLaMA Board GUI (powered by [Gradio](https://github.com/gradio-app/gradio))
+
+```bash
+llamafactory-cli webui
+```
+
+### Build Docker
+
+For CUDA users:
+
+```bash
+cd docker/docker-cuda/
+docker compose up -d
+docker compose exec llamafactory bash
+```
+
+For Ascend NPU users:
+
+```bash
+cd docker/docker-npu/
+docker compose up -d
+docker compose exec llamafactory bash
+```
+
+For AMD ROCm users:
+
+```bash
+cd docker/docker-rocm/
+docker compose up -d
+docker compose exec llamafactory bash
+```
+
+Build without Docker Compose
+
+For CUDA users:
+
+```bash
+docker build -f ./docker/docker-cuda/Dockerfile \
+ --build-arg INSTALL_BNB=false \
+ --build-arg INSTALL_VLLM=false \
+ --build-arg INSTALL_DEEPSPEED=false \
+ --build-arg INSTALL_FLASHATTN=false \
+ --build-arg PIP_INDEX=https://pypi.org/simple \
+ -t llamafactory:latest .
+
+docker run -dit --gpus=all \
+ -v ./hf_cache:/root/.cache/huggingface \
+ -v ./ms_cache:/root/.cache/modelscope \
+ -v ./om_cache:/root/.cache/openmind \
+ -v ./data:/app/data \
+ -v ./output:/app/output \
+ -p 7860:7860 \
+ -p 8000:8000 \
+ --shm-size 16G \
+ --name llamafactory \
+ llamafactory:latest
+
+docker exec -it llamafactory bash
+```
+
+For Ascend NPU users:
+
+```bash
+# Choose docker image upon your environment
+docker build -f ./docker/docker-npu/Dockerfile \
+ --build-arg INSTALL_DEEPSPEED=false \
+ --build-arg PIP_INDEX=https://pypi.org/simple \
+ -t llamafactory:latest .
+
+# Change `device` upon your resources
+docker run -dit \
+ -v ./hf_cache:/root/.cache/huggingface \
+ -v ./ms_cache:/root/.cache/modelscope \
+ -v ./om_cache:/root/.cache/openmind \
+ -v ./data:/app/data \
+ -v ./output:/app/output \
+ -v /usr/local/dcmi:/usr/local/dcmi \
+ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
+ -v /etc/ascend_install.info:/etc/ascend_install.info \
+ -p 7860:7860 \
+ -p 8000:8000 \
+ --device /dev/davinci0 \
+ --device /dev/davinci_manager \
+ --device /dev/devmm_svm \
+ --device /dev/hisi_hdc \
+ --shm-size 16G \
+ --name llamafactory \
+ llamafactory:latest
+
+docker exec -it llamafactory bash
+```
+
+For AMD ROCm users:
+
+```bash
+docker build -f ./docker/docker-rocm/Dockerfile \
+ --build-arg INSTALL_BNB=false \
+ --build-arg INSTALL_VLLM=false \
+ --build-arg INSTALL_DEEPSPEED=false \
+ --build-arg INSTALL_FLASHATTN=false \
+ --build-arg PIP_INDEX=https://pypi.org/simple \
+ -t llamafactory:latest .
+
+docker run -dit \
+ -v ./hf_cache:/root/.cache/huggingface \
+ -v ./ms_cache:/root/.cache/modelscope \
+ -v ./om_cache:/root/.cache/openmind \
+ -v ./data:/app/data \
+ -v ./output:/app/output \
+ -v ./saves:/app/saves \
+ -p 7860:7860 \
+ -p 8000:8000 \
+ --device /dev/kfd \
+ --device /dev/dri \
+ --shm-size 16G \
+ --name llamafactory \
+ llamafactory:latest
+
+docker exec -it llamafactory bash
+```
+
+
+
+Details about volume
+
+- `hf_cache`: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory.
+- `ms_cache`: Similar to Hugging Face cache but for ModelScope users.
+- `om_cache`: Similar to Hugging Face cache but for Modelers users.
+- `data`: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.
+- `output`: Set export dir to this location so that the merged result can be accessed directly on the host machine.
+
+
+
+### Deploy with OpenAI-style API and vLLM
+
+```bash
+API_PORT=8000 llamafactory-cli api examples/inference/llama3_vllm.yaml
+```
+
+> [!TIP]
+> Visit [this page](https://platform.openai.com/docs/api-reference/chat/create) for API document.
+>
+> Examples: [Image understanding](scripts/api_example/test_image.py) | [Function calling](scripts/api_example/test_toolcall.py)
+
+### Download from ModelScope Hub
+
+If you have trouble with downloading models and datasets from Hugging Face, you can use ModelScope.
+
+```bash
+export USE_MODELSCOPE_HUB=1 # `set USE_MODELSCOPE_HUB=1` for Windows
+```
+
+Train the model by specifying a model ID of the ModelScope Hub as the `model_name_or_path`. You can find a full list of model IDs at [ModelScope Hub](https://modelscope.cn/models), e.g., `LLM-Research/Meta-Llama-3-8B-Instruct`.
+
+### Download from Modelers Hub
+
+You can also use Modelers Hub to download models and datasets.
+
+```bash
+export USE_OPENMIND_HUB=1 # `set USE_OPENMIND_HUB=1` for Windows
+```
+
+Train the model by specifying a model ID of the Modelers Hub as the `model_name_or_path`. You can find a full list of model IDs at [Modelers Hub](https://modelers.cn/models), e.g., `TeleAI/TeleChat-7B-pt`.
+
+### Use W&B Logger
+
+To use [Weights & Biases](https://wandb.ai) for logging experimental results, you need to add the following arguments to yaml files.
+
+```yaml
+report_to: wandb
+run_name: test_run # optional
+```
+
+Set `WANDB_API_KEY` to [your key](https://wandb.ai/authorize) when launching training tasks to log in with your W&B account.
+
+### Use SwanLab Logger
+
+To use [SwanLab](https://github.com/SwanHubX/SwanLab) for logging experimental results, you need to add the following arguments to yaml files.
+
+```yaml
+use_swanlab: true
+swanlab_run_name: test_run # optional
+```
+
+When launching training tasks, you can log in to SwanLab in three ways:
+
+1. Add `swanlab_api_key=` to the yaml file, and set it to your [API key](https://swanlab.cn/settings).
+2. Set the environment variable `SWANLAB_API_KEY` to your [API key](https://swanlab.cn/settings).
+3. Use the `swanlab login` command to complete the login.
+
+## Projects using LLaMA Factory
+
+If you have a project that should be incorporated, please contact via email or create a pull request.
+
+Click to show
+
+1. Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [[arxiv]](https://arxiv.org/abs/2308.02223)
+1. Yu et al. Open, Closed, or Small Language Models for Text Classification? 2023. [[arxiv]](https://arxiv.org/abs/2308.10092)
+1. Wang et al. UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language. 2023. [[arxiv]](https://arxiv.org/abs/2308.10526)
+1. Luceri et al. Leveraging Large Language Models to Detect Influence Campaigns in Social Media. 2023. [[arxiv]](https://arxiv.org/abs/2311.07816)
+1. Zhang et al. Alleviating Hallucinations of Large Language Models through Induced Hallucinations. 2023. [[arxiv]](https://arxiv.org/abs/2312.15710)
+1. Wang et al. Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs. KDD 2024. [[arxiv]](https://arxiv.org/abs/2401.04319)
+1. Wang et al. CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning. ACL 2024. [[arxiv]](https://arxiv.org/abs/2401.07286)
+1. Choi et al. FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs. 2024. [[arxiv]](https://arxiv.org/abs/2402.05904)
+1. Zhang et al. AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts. 2024. [[arxiv]](https://arxiv.org/abs/2402.07625)
+1. Lyu et al. KnowTuning: Knowledge-aware Fine-tuning for Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2402.11176)
+1. Yang et al. LaCo: Large Language Model Pruning via Layer Collaps. 2024. [[arxiv]](https://arxiv.org/abs/2402.11187)
+1. Bhardwaj et al. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. 2024. [[arxiv]](https://arxiv.org/abs/2402.11746)
+1. Yang et al. Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models. 2024. [[arxiv]](https://arxiv.org/abs/2402.11801)
+1. Yi et al. Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. ACL 2024 Findings. [[arxiv]](https://arxiv.org/abs/2402.11809)
+1. Cao et al. Head-wise Shareable Attention for Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2402.11819)
+1. Zhang et al. Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages. 2024. [[arxiv]](https://arxiv.org/abs/2402.12204)
+1. Kim et al. Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2402.14714)
+1. Yu et al. KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models. ACL 2024. [[arxiv]](https://arxiv.org/abs/2402.15043)
+1. Huang et al. Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning. 2024. [[arxiv]](https://arxiv.org/abs/2403.02333)
+1. Duan et al. Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization. 2024. [[arxiv]](https://arxiv.org/abs/2403.03419)
+1. Xie and Schwertfeger. Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs. 2024. [[arxiv]](https://arxiv.org/abs/2403.08228)
+1. Wu et al. Large Language Models are Parallel Multilingual Learners. 2024. [[arxiv]](https://arxiv.org/abs/2403.09073)
+1. Zhang et al. EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling. 2024. [[arxiv]](https://arxiv.org/abs/2403.14541)
+1. Weller et al. FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. 2024. [[arxiv]](https://arxiv.org/abs/2403.15246)
+1. Hongbin Na. CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering. COLING 2024. [[arxiv]](https://arxiv.org/abs/2403.16008)
+1. Zan et al. CodeS: Natural Language to Code Repository via Multi-Layer Sketch. 2024. [[arxiv]](https://arxiv.org/abs/2403.16443)
+1. Liu et al. Extensive Self-Contrast Enables Feedback-Free Language Model Alignment. 2024. [[arxiv]](https://arxiv.org/abs/2404.00604)
+1. Luo et al. BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2404.02827)
+1. Du et al. Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. 2024. [[arxiv]](https://arxiv.org/abs/2404.04167)
+1. Ma et al. Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation. ICML 2024. [[arxiv]](https://arxiv.org/abs/2404.04316)
+1. Liu et al. Dynamic Generation of Personalities with Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2404.07084)
+1. Shang et al. How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2404.09836)
+1. Huang et al. LLMTune: Accelerate Database Knob Tuning with Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2404.11581)
+1. Deng et al. Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction. 2024. [[arxiv]](https://arxiv.org/abs/2404.14215)
+1. Acikgoz et al. Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare. 2024. [[arxiv]](https://arxiv.org/abs/2404.16621)
+1. Zhang et al. Small Language Models Need Strong Verifiers to Self-Correct Reasoning. ACL 2024 Findings. [[arxiv]](https://arxiv.org/abs/2404.17140)
+1. Zhou et al. FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering. NAACL 2024. [[arxiv]](https://arxiv.org/abs/2404.18585)
+1. Xu et al. Large Language Models for Cyber Security: A Systematic Literature Review. 2024. [[arxiv]](https://arxiv.org/abs/2405.04760)
+1. Dammu et al. "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations. 2024. [[arxiv]](https://arxiv.org/abs/2405.05378)
+1. Yi et al. A safety realignment framework via subspace-oriented model fusion for large language models. 2024. [[arxiv]](https://arxiv.org/abs/2405.09055)
+1. Lou et al. SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling. 2024. [[arxiv]](https://arxiv.org/abs/2405.12739)
+1. Zhang et al. Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners. 2024. [[arxiv]](https://arxiv.org/abs/2405.13816)
+1. Zhang et al. TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2405.20215)
+1. Zihong Chen. Sentence Segmentation and Sentence Punctuation Based on XunziALLM. 2024. [[paper]](https://aclanthology.org/2024.lt4hala-1.30)
+1. Gao et al. The Best of Both Worlds: Toward an Honest and Helpful Large Language Model. 2024. [[arxiv]](https://arxiv.org/abs/2406.00380)
+1. Wang and Song. MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset. 2024. [[arxiv]](https://arxiv.org/abs/2406.02106)
+1. Hu et al. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models. 2024. [[arxiv]](https://arxiv.org/abs/2406.03136)
+1. Ge et al. Time Sensitive Knowledge Editing through Efficient Finetuning. ACL 2024. [[arxiv]](https://arxiv.org/abs/2406.04496)
+1. Tan et al. Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions. 2024. [[arxiv]](https://arxiv.org/abs/2406.05688)
+1. Song et al. Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters. 2024. [[arxiv]](https://arxiv.org/abs/2406.05955)
+1. Gu et al. RWKV-CLIP: A Robust Vision-Language Representation Learner. 2024. [[arxiv]](https://arxiv.org/abs/2406.06973)
+1. Chen et al. Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees. 2024. [[arxiv]](https://arxiv.org/abs/2406.07115)
+1. Zhu et al. Are Large Language Models Good Statisticians?. 2024. [[arxiv]](https://arxiv.org/abs/2406.07815)
+1. Li et al. Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning. 2024. [[arxiv]](https://arxiv.org/abs/2406.10099)
+1. Ding et al. IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce. 2024. [[arxiv]](https://arxiv.org/abs/2406.10173)
+1. He et al. COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities. 2024. [[arxiv]](https://arxiv.org/abs/2406.12074)
+1. Lin et al. FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving. 2024. [[arxiv]](https://arxiv.org/abs/2406.14408)
+1. Treutlein et al. Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data. 2024. [[arxiv]](https://arxiv.org/abs/2406.14546)
+1. Feng et al. SS-Bench: A Benchmark for Social Story Generation and Evaluation. 2024. [[arxiv]](https://arxiv.org/abs/2406.15695)
+1. Feng et al. Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement. 2024. [[arxiv]](https://arxiv.org/abs/2406.17233)
+1. Liu et al. Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals. 2024. [[arxiv]](https://arxiv.org/abs/2406.18069)
+1. Iyer et al. Exploring Very Low-Resource Translation with LLMs: The University of Edinburgh's Submission to AmericasNLP 2024 Translation Task. AmericasNLP 2024. [[paper]](https://aclanthology.org/2024.americasnlp-1.25)
+1. Li et al. Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring. 2024. [[arxiv]](https://arxiv.org/abs/2406.19949)
+1. Yang et al. Financial Knowledge Large Language Model. 2024. [[arxiv]](https://arxiv.org/abs/2407.00365)
+1. Lin et al. DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging. 2024. [[arxiv]](https://arxiv.org/abs/2407.01470)
+1. Bako et al. Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization. 2024. [[arxiv]](https://arxiv.org/abs/2407.06129)
+1. Huang et al. RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization. 2024. [[arxiv]](https://arxiv.org/abs/2407.08044)
+1. Jiang et al. LLM-Collaboration on Automatic Science Journalism for the General Audience. 2024. [[arxiv]](https://arxiv.org/abs/2407.09756)
+1. Inouye et al. Applied Auto-tuning on LoRA Hyperparameters. 2024. [[paper]](https://scholarcommons.scu.edu/cseng_senior/272/)
+1. Qi et al. Research on Tibetan Tourism Viewpoints information generation system based on LLM. 2024. [[arxiv]](https://arxiv.org/abs/2407.13561)
+1. Xu et al. Course-Correction: Safety Alignment Using Synthetic Preferences. 2024. [[arxiv]](https://arxiv.org/abs/2407.16637)
+1. Sun et al. LAMBDA: A Large Model Based Data Agent. 2024. [[arxiv]](https://arxiv.org/abs/2407.17535)
+1. Zhu et al. CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare. 2024. [[arxiv]](https://arxiv.org/abs/2407.19705)
+1. Yu et al. Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment. 2024. [[arxiv]](https://arxiv.org/abs/2408.00137)
+1. Xie et al. The Power of Personalized Datasets: Advancing Chinese Composition Writing for Elementary School through Targeted Model Fine-Tuning. IALP 2024. [[paper]](https://www.asianlp.sg/conferences/ialp2024/proceedings/papers/IALP2024_P055.pdf)
+1. Liu et al. Instruct-Code-Llama: Improving Capabilities of Language Model in Competition Level Code Generation by Online Judge Feedback. ICIC 2024. [[paper]](https://link.springer.com/chapter/10.1007/978-981-97-5669-8_11)
+1. Wang et al. Cybernetic Sentinels: Unveiling the Impact of Safety Data Selection on Model Security in Supervised Fine-Tuning. ICIC 2024. [[paper]](https://link.springer.com/chapter/10.1007/978-981-97-5669-8_23)
+1. Xia et al. Understanding the Performance and Estimating the Cost of LLM Fine-Tuning. 2024. [[arxiv]](https://arxiv.org/abs/2408.04693)
+1. Zeng et al. Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions. 2024. [[arxiv]](https://arxiv.org/abs/2408.04168)
+1. Xia et al. Using Pre-trained Language Model for Accurate ESG Prediction. FinNLP 2024. [[paper]](https://aclanthology.org/2024.finnlp-2.1/)
+1. Liang et al. I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm. 2024. [[arxiv]](https://arxiv.org/abs/2408.08072)
+1. Bai et al. Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation. CIKM 2024. [[paper]](https://dl.acm.org/doi/10.1145/3627673.3679611)
+1. **[StarWhisper](https://github.com/Yu-Yang-Li/StarWhisper)**: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B.
+1. **[DISC-LawLLM](https://github.com/FudanDISC/DISC-LawLLM)**: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge.
+1. **[Sunsimiao](https://github.com/X-D-Lab/Sunsimiao)**: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B.
+1. **[CareGPT](https://github.com/WangRongsheng/CareGPT)**: A series of large language models for Chinese medical domain, based on LLaMA2-7B and Baichuan-13B.
+1. **[MachineMindset](https://github.com/PKU-YuanGroup/Machine-Mindset/)**: A series of MBTI Personality large language models, capable of giving any LLM 16 different personality types based on different datasets and training methods.
+1. **[Luminia-13B-v3](https://huggingface.co/Nekochu/Luminia-13B-v3)**: A large language model specialized in generate metadata for stable diffusion. [[demo]](https://huggingface.co/spaces/Nekochu/Luminia-13B_SD_Prompt)
+1. **[Chinese-LLaVA-Med](https://github.com/BUAADreamer/Chinese-LLaVA-Med)**: A multimodal large language model specialized in Chinese medical domain, based on LLaVA-1.5-7B.
+1. **[AutoRE](https://github.com/THUDM/AutoRE)**: A document-level relation extraction system based on large language models.
+1. **[NVIDIA RTX AI Toolkit](https://github.com/NVIDIA/RTX-AI-Toolkit)**: SDKs for fine-tuning LLMs on Windows PC for NVIDIA RTX.
+1. **[LazyLLM](https://github.com/LazyAGI/LazyLLM)**: An easy and lazy way for building multi-agent LLMs applications and supports model fine-tuning via LLaMA Factory.
+1. **[RAG-Retrieval](https://github.com/NLPJCL/RAG-Retrieval)**: A full pipeline for RAG retrieval model fine-tuning, inference, and distillation. [[blog]](https://zhuanlan.zhihu.com/p/987727357)
+1. **[360-LLaMA-Factory](https://github.com/Qihoo360/360-LLaMA-Factory)**: A modified library that supports long sequence SFT & DPO using ring attention.
+1. **[Sky-T1](https://novasky-ai.github.io/posts/sky-t1/)**: An o1-like model fine-tuned by NovaSky AI with very small cost.
+
+
+
+## License
+
+This repository is licensed under the [Apache-2.0 License](LICENSE).
+
+Please follow the model licenses to use the corresponding model weights: [Baichuan 2](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/blob/main/Community%20License%20for%20Baichuan%202%20Model.pdf) / [BLOOM](https://huggingface.co/spaces/bigscience/license) / [ChatGLM3](https://github.com/THUDM/ChatGLM3/blob/main/MODEL_LICENSE) / [Command R](https://cohere.com/c4ai-cc-by-nc-license) / [DeepSeek](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/LICENSE-MODEL) / [Falcon](https://huggingface.co/tiiuae/falcon-180B/blob/main/LICENSE.txt) / [Gemma](https://ai.google.dev/gemma/terms) / [GLM-4](https://huggingface.co/THUDM/glm-4-9b/blob/main/LICENSE) / [GPT-2](https://github.com/openai/gpt-2/blob/master/LICENSE) / [Granite](LICENSE) / [Index](https://huggingface.co/IndexTeam/Index-1.9B/blob/main/LICENSE) / [InternLM](https://github.com/InternLM/InternLM#license) / [Llama](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) / [Llama 2](https://ai.meta.com/llama/license/) / [Llama 3](https://llama.meta.com/llama3/license/) / [Llama 4](https://github.com/meta-llama/llama-models/blob/main/models/llama4/LICENSE) / [MiniCPM](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md) / [Mistral/Mixtral/Pixtral](LICENSE) / [OLMo](LICENSE) / [Phi-1.5/Phi-2](https://huggingface.co/microsoft/phi-1_5/resolve/main/Research%20License.docx) / [Phi-3/Phi-4](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/LICENSE) / [Qwen](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT) / [Skywork](https://huggingface.co/Skywork/Skywork-13B-base/blob/main/Skywork%20Community%20License.pdf) / [StarCoder 2](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement) / [TeleChat2](https://huggingface.co/Tele-AI/telechat-7B/blob/main/TeleChat%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) / [XVERSE](https://github.com/xverse-ai/XVERSE-13B/blob/main/MODEL_LICENSE.pdf) / [Yi](https://huggingface.co/01-ai/Yi-6B/blob/main/LICENSE) / [Yi-1.5](LICENSE) / [Yuan 2](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan)
+
+## Citation
+
+If this work is helpful, please kindly cite as:
+
+```bibtex
+@inproceedings{zheng2024llamafactory,
+ title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models},
+ author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Zhangchi Feng and Yongqiang Ma},
+ booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
+ address={Bangkok, Thailand},
+ publisher={Association for Computational Linguistics},
+ year={2024},
+ url={http://arxiv.org/abs/2403.13372}
+}
+```
+
+## Acknowledgement
+
+This repo benefits from [PEFT](https://github.com/huggingface/peft), [TRL](https://github.com/huggingface/trl), [QLoRA](https://github.com/artidoro/qlora) and [FastChat](https://github.com/lm-sys/FastChat). Thanks for their wonderful works.
+
+## Star History
+
+
diff --git a/post-training/LLaMA-Factory/README_zh.md b/post-training/LLaMA-Factory/README_zh.md
new file mode 100644
index 0000000..e563c96
--- /dev/null
+++ b/post-training/LLaMA-Factory/README_zh.md
@@ -0,0 +1,919 @@
+
+
+[](https://github.com/hiyouga/LLaMA-Factory/stargazers)
+[](https://github.com/hiyouga/LLaMA-Factory/commits/main)
+[](https://github.com/hiyouga/LLaMA-Factory/graphs/contributors)
+[](https://github.com/hiyouga/LLaMA-Factory/actions/workflows/tests.yml)
+[](https://pypi.org/project/llamafactory/)
+[](https://scholar.google.com/scholar?cites=12620864006390196564)
+[](https://github.com/hiyouga/LLaMA-Factory/pulls)
+
+[](https://twitter.com/llamafactory_ai)
+[](https://discord.gg/rKfvV9r9FK)
+[](https://gitcode.com/zhengyaowei/LLaMA-Factory)
+
+[](https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing)
+[](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory)
+[](https://huggingface.co/spaces/hiyouga/LLaMA-Board)
+[](https://modelscope.cn/studios/hiyouga/LLaMA-Board)
+[](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/)
+
+
+ 使用零代码命令行 与 Web UI 轻松微调百余种大模型
+
+
+
+
+
+
+
+
+👋 加入我们的[微信群](assets/wechat.jpg)或 [NPU 用户群](assets/wechat_npu.jpg)。
+
+\[ [English](README.md) | 中文 \]
+
+**微调大模型可以像这样轻松…**
+
+https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
+
+选择你的打开方式:
+
+- **入门教程**:https://zhuanlan.zhihu.com/p/695287607
+- **框架文档**:https://llamafactory.readthedocs.io/zh-cn/latest/
+- **框架文档(昇腾 NPU)**:https://ascend.github.io/docs/sources/llamafactory/
+- **Colab(免费)**:https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing
+- **本地机器**:请见[如何使用](#如何使用)
+- **PAI-DSW(免费试用)**:[Llama3 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory) | [Qwen2-VL 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_qwen2vl) | [DeepSeek-R1-Distill 案例](https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory_deepseek_r1_distill_7b)
+- **Amazon SageMaker**:[博客](https://aws.amazon.com/cn/blogs/china/a-one-stop-code-free-model-fine-tuning-deployment-platform-based-on-sagemaker-and-llama-factory/)
+- **Easy Dataset**:[数据蒸馏微调](https://buaa-act.feishu.cn/wiki/KY9xwTGs1iqHrRkjXBwcZP9WnL9)
+
+> [!NOTE]
+> 除上述链接以外的其他网站均为未经许可的第三方网站,请小心甄别。
+
+## 目录
+
+- [项目特色](#项目特色)
+- [性能指标](#性能指标)
+- [更新日志](#更新日志)
+- [模型](#模型)
+- [训练方法](#训练方法)
+- [数据集](#数据集)
+- [软硬件依赖](#软硬件依赖)
+- [如何使用](#如何使用)
+ - [安装 LLaMA Factory](#安装-llama-factory)
+ - [数据准备](#数据准备)
+ - [快速开始](#快速开始)
+ - [LLaMA Board 可视化微调](#llama-board-可视化微调由-gradio-驱动)
+ - [构建 Docker](#构建-docker)
+ - [利用 vLLM 部署 OpenAI API](#利用-vllm-部署-openai-api)
+ - [从魔搭社区下载](#从魔搭社区下载)
+ - [从魔乐社区下载](#从魔乐社区下载)
+ - [使用 W&B 面板](#使用-wb-面板)
+ - [使用 SwanLab 面板](#使用-swanlab-面板)
+- [使用了 LLaMA Factory 的项目](#使用了-llama-factory-的项目)
+- [协议](#协议)
+- [引用](#引用)
+- [致谢](#致谢)
+
+## 项目特色
+
+- **多种模型**:LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Qwen2-VL、DeepSeek、Yi、Gemma、ChatGLM、Phi 等等。
+- **集成方法**:(增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
+- **多种精度**:16 比特全参数微调、冻结微调、LoRA 微调和基于 AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ 的 2/3/4/5/6/8 比特 QLoRA 微调。
+- **先进算法**:[GaLore](https://github.com/jiaweizzhao/GaLore)、[BAdam](https://github.com/Ledzy/BAdam)、[APOLLO](https://github.com/zhuhanqing/APOLLO)、[Adam-mini](https://github.com/zyushun/Adam-mini)、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ 和 PiSSA。
+- **实用技巧**:[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)、[Unsloth](https://github.com/unslothai/unsloth)、[Liger Kernel](https://github.com/linkedin/Liger-Kernel)、RoPE scaling、NEFTune 和 rsLoRA。
+- **广泛任务**:多轮对话、工具调用、图像理解、视觉定位、视频识别和语音理解等等。
+- **实验监控**:LlamaBoard、TensorBoard、Wandb、MLflow、[SwanLab](https://github.com/SwanHubX/SwanLab) 等等。
+- **极速推理**:基于 [vLLM](https://github.com/vllm-project/vllm) 或 [SGLang](https://github.com/sgl-project/sglang) 的 OpenAI 风格 API、浏览器界面和命令行接口。
+
+### 最新模型的 Day-N 微调适配
+
+| 适配时间 | 模型名称 |
+| ------------ | ------------------------------------------------------------ |
+| Day 0 | Qwen2.5 / Qwen2.5-VL / Gemma 3 / InternLM 3 / MiniCPM-o-2.6 |
+| Day 1 | Llama 3 / GLM-4 / Mistral Small / PaliGemma2 / Llama 4 |
+
+## 性能指标
+
+与 ChatGLM 官方的 [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/ptuning) 微调相比,LLaMA Factory 的 LoRA 微调提供了 **3.7 倍**的加速比,同时在广告文案生成任务上取得了更高的 Rouge 分数。结合 4 比特量化技术,LLaMA Factory 的 QLoRA 微调进一步降低了 GPU 显存消耗。
+
+
+
+变量定义
+
+- **Training Speed**: 训练阶段每秒处理的样本数量。(批处理大小=4,截断长度=1024)
+- **Rouge Score**: [广告文案生成](https://aclanthology.org/D19-1321.pdf)任务验证集上的 Rouge-2 分数。(批处理大小=4,截断长度=1024)
+- **GPU Memory**: 4 比特量化训练的 GPU 显存峰值。(批处理大小=1,截断长度=1024)
+- 我们在 ChatGLM 的 P-Tuning 中采用 `pre_seq_len=128`,在 LLaMA Factory 的 LoRA 微调中采用 `lora_rank=32`。
+
+
+
+## 更新日志
+
+[25/04/16] 我们支持了 **[InternVL3](https://huggingface.co/OpenGVLab/InternVL3-8B)** 模型的微调。查看 [PR #7258](https://github.com/hiyouga/LLaMA-Factory/pull/7258) 以使用。
+
+[25/04/14] 我们支持了 **[GLM-Z1](https://huggingface.co/THUDM/GLM-Z1-9B-0414)** 和 **[Kimi-VL](https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct)** 模型的微调。
+
+[25/04/06] 我们支持了 **[Llama 4](https://ai.meta.com/blog/llama-4-multimodal-intelligence/)** 模型的微调。查看 [PR #7611](https://github.com/hiyouga/LLaMA-Factory/pull/7611) 以使用。
+
+[25/03/31] 我们支持了 **[Qwen2.5 Omni](https://qwenlm.github.io/blog/qwen2.5-omni/)** 模型的微调。查看 [PR #7537](https://github.com/hiyouga/LLaMA-Factory/pull/7537) 以使用。
+
+展开日志
+
+[25/03/15] 我们支持了 **[SGLang](https://github.com/sgl-project/sglang)** 推理后端,请使用 `infer_backend: sglang` 启用。
+
+[25/03/12] 我们支持了 **[Gemma 3](https://huggingface.co/blog/gemma3)** 模型的微调。
+
+[25/02/24] 我们宣布开源 **[EasyR1](https://github.com/hiyouga/EasyR1)**,一个高效可扩展的多模态强化学习框架,支持高效的 GRPO 训练。
+
+[25/02/11] 我们支持了在导出模型时保存 **[Ollama](https://github.com/ollama/ollama)** 配置文件。详细用法请参照 [examples](examples/README_zh.md)。
+
+[25/02/05] 我们支持了在语音理解任务上微调 **[Qwen2-Audio](Qwen/Qwen2-Audio-7B-Instruct)** 和 **[MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6)** 模型。
+
+[25/01/31] 我们支持了 **[DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)** 和 **[Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)** 模型的微调。
+
+[25/01/15] 我们支持了 **[APOLLO](https://arxiv.org/abs/2412.05270)** 优化器。详细用法请参照 [examples](examples/README_zh.md)。
+
+[25/01/14] 我们支持了 **[MiniCPM-o-2.6](https://huggingface.co/openbmb/MiniCPM-o-2_6)** 和 **[MiniCPM-V-2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)** 模型的微调。 感谢 [@BUAADreamer](https://github.com/BUAADreamer) 的 PR.
+
+[25/01/14] 我们支持了 **[InternLM 3](https://huggingface.co/collections/internlm/)** 模型的微调。感谢 [@hhaAndroid](https://github.com/hhaAndroid) 的 PR。
+
+[25/01/10] 我们支持了 **[Phi-4](https://huggingface.co/microsoft/phi-4)** 模型的微调。
+
+[24/12/21] 我们支持了使用 **[SwanLab](https://github.com/SwanHubX/SwanLab)** 跟踪与可视化实验。详细用法请参考 [此部分](#使用-swanlab-面板)。
+
+[24/11/27] 我们支持了 **[Skywork-o1](https://huggingface.co/Skywork/Skywork-o1-Open-Llama-3.1-8B)** 模型的微调和 **[OpenO1](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)** 数据集。
+
+[24/10/09] 我们支持了从 **[魔乐社区](https://modelers.cn/models)** 下载预训练模型和数据集。详细用法请参照 [此教程](#从魔乐社区下载)。
+
+[24/09/19] 我们支持了 **[Qwen2.5](https://qwenlm.github.io/blog/qwen2.5/)** 模型的微调。
+
+[24/08/30] 我们支持了 **[Qwen2-VL](https://qwenlm.github.io/blog/qwen2-vl/)** 模型的微调。感谢 [@simonJJJ](https://github.com/simonJJJ) 的 PR。
+
+[24/08/27] 我们支持了 **[Liger Kernel](https://github.com/linkedin/Liger-Kernel)**。请使用 `enable_liger_kernel: true` 来加速训练。
+
+[24/08/09] 我们支持了 **[Adam-mini](https://github.com/zyushun/Adam-mini)** 优化器。详细用法请参照 [examples](examples/README_zh.md)。感谢 [@relic-yuexi](https://github.com/relic-yuexi) 的 PR。
+
+[24/07/04] 我们支持了[无污染打包训练](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing)。请使用 `neat_packing: true` 参数。感谢 [@chuan298](https://github.com/chuan298) 的 PR。
+
+[24/06/16] 我们支持了 **[PiSSA](https://arxiv.org/abs/2404.02948)** 算法。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/06/07] 我们支持了 **[Qwen2](https://qwenlm.github.io/blog/qwen2/)** 和 **[GLM-4](https://github.com/THUDM/GLM-4)** 模型的微调。
+
+[24/05/26] 我们支持了 **[SimPO](https://arxiv.org/abs/2405.14734)** 偏好对齐算法。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/05/20] 我们支持了 **PaliGemma** 系列模型的微调。注意 PaliGemma 是预训练模型,你需要使用 `paligemma` 模板进行微调使其获得对话能力。
+
+[24/05/18] 我们支持了 **[KTO](https://arxiv.org/abs/2402.01306)** 偏好对齐算法。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/05/14] 我们支持了昇腾 NPU 设备的训练和推理。详情请查阅[安装](#安装-llama-factory)部分。
+
+[24/04/26] 我们支持了多模态模型 **LLaVA-1.5** 的微调。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/04/22] 我们提供了在免费 T4 GPU 上微调 Llama-3 模型的 **[Colab 笔记本](https://colab.research.google.com/drive/1d5KQtbemerlSDSxZIfAaWXhKr30QypiK?usp=sharing)**。Hugging Face 社区公开了两个利用 LLaMA Factory 微调的 Llama-3 模型,详情请见 [Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat) 和 [Llama3-Chinese](https://huggingface.co/zhichen/Llama3-Chinese)。
+
+[24/04/21] 我们基于 [AstraMindAI 的仓库](https://github.com/astramind-ai/Mixture-of-depths)支持了 **[混合深度训练](https://arxiv.org/abs/2404.02258)**。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/04/16] 我们支持了 **[BAdam](https://arxiv.org/abs/2404.02827)** 优化器。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/04/16] 我们支持了 **[unsloth](https://github.com/unslothai/unsloth)** 的长序列训练(24GB 可训练 Llama-2-7B-56k)。该方法相比 FlashAttention-2 提供了 **117%** 的训练速度和 **50%** 的显存节约。更多数据请见[此页面](https://github.com/hiyouga/LLaMA-Factory/wiki/Performance-comparison)。
+
+[24/03/31] 我们支持了 **[ORPO](https://arxiv.org/abs/2403.07691)**。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/03/21] 我们的论文 "[LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models](https://arxiv.org/abs/2403.13372)" 可在 arXiv 上查看!
+
+[24/03/20] 我们支持了能在 2x24GB GPU 上微调 70B 模型的 **FSDP+QLoRA**。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/03/13] 我们支持了 **[LoRA+](https://arxiv.org/abs/2402.12354)**。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/03/07] 我们支持了 **[GaLore](https://arxiv.org/abs/2403.03507)** 优化器。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `infer_backend: vllm` 来获得 **270%** 的推理速度。
+
+[24/02/28] 我们支持了 **[DoRA](https://arxiv.org/abs/2402.09353)** 微调。请使用 `use_dora: true` 参数进行 DoRA 微调。
+
+[24/02/15] 我们支持了 [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro) 提出的**块扩展**方法。详细用法请参照 [examples](examples/README_zh.md)。
+
+[24/02/05] Qwen1.5(Qwen2 测试版)系列模型已在 LLaMA-Factory 中实现微调支持。详情请查阅该[博客页面](https://qwenlm.github.io/zh/blog/qwen1.5/)。
+
+[24/01/18] 我们针对绝大多数模型实现了 **Agent 微调**,微调时指定 `dataset: glaive_toolcall_zh` 即可使模型获得工具调用能力。
+
+[23/12/23] 我们针对 LLaMA, Mistral 和 Yi 模型支持了 **[unsloth](https://github.com/unslothai/unsloth)** 的 LoRA 训练加速。请使用 `use_unsloth: true` 参数启用 unsloth 优化。该方法可提供 **170%** 的训练速度,详情请查阅[此页面](https://github.com/hiyouga/LLaMA-Factory/wiki/Performance-comparison)。
+
+[23/12/12] 我们支持了微调最新的混合专家模型 **[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)**。硬件需求请查阅[此处](#硬件依赖)。
+
+[23/12/01] 我们支持了从 **[魔搭社区](https://modelscope.cn/models)** 下载预训练模型和数据集。详细用法请参照 [此教程](#从魔搭社区下载)。
+
+[23/10/21] 我们支持了 **[NEFTune](https://arxiv.org/abs/2310.05914)** 训练技巧。请使用 `neftune_noise_alpha: 5` 参数启用 NEFTune。
+
+[23/09/27] 我们针对 LLaMA 模型支持了 [LongLoRA](https://github.com/dvlab-research/LongLoRA) 提出的 **$S^2$-Attn**。请使用 `shift_attn: true` 参数以启用该功能。
+
+[23/09/23] 我们在项目中集成了 MMLU、C-Eval 和 CMMLU 评估集。详细用法请参照 [examples](examples/README_zh.md)。
+
+[23/09/10] 我们支持了 **[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)**。如果您使用的是 RTX4090、A100 或 H100 GPU,请使用 `flash_attn: fa2` 参数以启用 FlashAttention-2。
+
+[23/08/12] 我们支持了 **RoPE 插值**来扩展 LLaMA 模型的上下文长度。请使用 `rope_scaling: linear` 参数训练模型或使用 `rope_scaling: dynamic` 参数评估模型。
+
+[23/08/11] 我们支持了指令模型的 **[DPO 训练](https://arxiv.org/abs/2305.18290)**。详细用法请参照 [examples](examples/README_zh.md)。
+
+[23/07/31] 我们支持了**数据流式加载**。请使用 `streaming: true` 和 `max_steps: 10000` 参数来流式加载数据集。
+
+[23/07/29] 我们在 Hugging Face 发布了两个 13B 指令微调模型。详细内容请查阅我们的 Hugging Face 项目([LLaMA-2](https://huggingface.co/hiyouga/Llama-2-Chinese-13b-chat) / [Baichuan](https://huggingface.co/hiyouga/Baichuan-13B-sft))。
+
+[23/07/18] 我们开发了支持训练和测试的**浏览器一体化界面**。请使用 `train_web.py` 在您的浏览器中微调模型。感谢 [@KanadeSiina](https://github.com/KanadeSiina) 和 [@codemayq](https://github.com/codemayq) 在该功能开发中付出的努力。
+
+[23/07/09] 我们开源了 **[FastEdit](https://github.com/hiyouga/FastEdit)** ⚡🩹,一个简单易用的、能迅速编辑大模型事实记忆的工具包。如果您感兴趣请关注我们的 [FastEdit](https://github.com/hiyouga/FastEdit) 项目。
+
+[23/06/29] 我们提供了一个**可复现的**指令模型微调示例,详细内容请查阅 [Baichuan-7B-sft](https://huggingface.co/hiyouga/Baichuan-7B-sft)。
+
+[23/06/22] 我们对齐了[示例 API](src/api_demo.py) 与 [OpenAI API](https://platform.openai.com/docs/api-reference/chat) 的格式,您可以将微调模型接入**任意基于 ChatGPT 的应用**中。
+
+[23/06/03] 我们实现了 4 比特的 LoRA 训练(也称 **[QLoRA](https://github.com/artidoro/qlora)**)。详细用法请参照 [examples](examples/README_zh.md)。
+
+
+
+## 模型
+
+| 模型名 | 参数量 | Template |
+| ----------------------------------------------------------------- | -------------------------------- | ------------------- |
+| [Baichuan 2](https://huggingface.co/baichuan-inc) | 7B/13B | baichuan2 |
+| [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
+| [ChatGLM3](https://huggingface.co/THUDM) | 6B | chatglm3 |
+| [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere |
+| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek |
+| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 |
+| [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseek3 |
+| [Falcon](https://huggingface.co/tiiuae) | 7B/11B/40B/180B | falcon |
+| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma |
+| [Gemma 3](https://huggingface.co/google) | 1B/4B/12B/27B | gemma3/gemma (1B) |
+| [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/THUDM) | 9B/32B | glm4 |
+| [GPT-2](https://huggingface.co/openai-community) | 0.1B/0.4B/0.8B/1.5B | - |
+| [Granite 3.0-3.3](https://huggingface.co/ibm-granite) | 1B/2B/3B/8B | granite3 |
+| [Hunyuan](https://huggingface.co/tencent/) | 7B | hunyuan |
+| [Index](https://huggingface.co/IndexTeam) | 1.9B | index |
+| [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 |
+| [InternVL 2.5-3](https://huggingface.co/OpenGVLab)\*\* | 1B/2B/4B/8B/9B/14B/26B/38B/78B | intern_vl |
+| [Kimi-VL](https://huggingface.co/moonshotai) | 16B | kimi_vl |
+| [Llama](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | - |
+| [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 |
+| [Llama 3-3.3](https://huggingface.co/meta-llama) | 1B/3B/8B/70B | llama3 |
+| [Llama 4](https://huggingface.co/meta-llama) | 109B/402B | llama4 |
+| [Llama 3.2 Vision](https://huggingface.co/meta-llama) | 11B/90B | mllama |
+| [LLaVA-1.5](https://huggingface.co/llava-hf) | 7B/13B | llava |
+| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
+| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
+| [MiniCPM](https://huggingface.co/openbmb) | 1B/2B/4B | cpm/cpm3 |
+| [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v |
+| [Ministral/Mistral-Nemo](https://huggingface.co/mistralai) | 8B/12B | ministral |
+| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
+| [Mistral Small](https://huggingface.co/mistralai) | 24B | mistral_small |
+| [OLMo](https://huggingface.co/allenai) | 1B/7B | - |
+| [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma |
+| [Phi-1.5/Phi-2](https://huggingface.co/microsoft) | 1.3B/2.7B | - |
+| [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi |
+| [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small |
+| [Phi-4](https://huggingface.co/microsoft) | 14B | phi4 |
+| [Pixtral](https://huggingface.co/mistralai) | 12B | pixtral |
+| [Qwen (1-2.5) (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
+| [Qwen2-Audio](https://huggingface.co/Qwen) | 7B | qwen2_audio |
+| [Qwen2.5-Omni](https://huggingface.co/Qwen)\*\* | 7B | qwen2_omni |
+| [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen) | 2B/3B/7B/32B/72B | qwen2_vl |
+| [Skywork o1](https://huggingface.co/Skywork) | 8B | skywork_o1 |
+| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
+| [TeleChat2](https://huggingface.co/Tele-AI) | 3B/7B/35B/115B | telechat2 |
+| [XVERSE](https://huggingface.co/xverse) | 7B/13B/65B | xverse |
+| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi |
+| [Yi-VL](https://huggingface.co/01-ai) | 6B/34B | yi_vl |
+| [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan |
+
+> [!NOTE]
+> 对于所有“基座”(Base)模型,`template` 参数可以是 `default`, `alpaca`, `vicuna` 等任意值。但“对话”(Instruct/Chat)模型请务必使用**对应的模板**。
+>
+> 请务必在训练和推理时采用**完全一致**的模板。
+>
+> \*:您需要从 main 分支安装 `transformers` 并使用 `DISABLE_VERSION_CHECK=1` 来跳过版本检查。
+>
+> \*\*:您需要安装特定版本的 `transformers` 以使用该模型。
+
+项目所支持模型的完整列表请参阅 [constants.py](src/llamafactory/extras/constants.py)。
+
+您也可以在 [template.py](src/llamafactory/data/template.py) 中添加自己的对话模板。
+
+## 训练方法
+
+| 方法 | 全参数训练 | 部分参数训练 | LoRA | QLoRA |
+| --------------------- | ------------------ | ------------------ | ------------------ | ------------------ |
+| 预训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| 指令监督微调 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| 奖励模型训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| PPO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| DPO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| KTO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| ORPO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| SimPO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+
+> [!TIP]
+> 有关 PPO 的实现细节,请参考[此博客](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html)。
+
+## 数据集
+
+预训练数据集
+
+- [Wiki Demo (en)](data/wiki_demo.txt)
+- [RefinedWeb (en)](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)
+- [RedPajama V2 (en)](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2)
+- [Wikipedia (en)](https://huggingface.co/datasets/olm/olm-wikipedia-20221220)
+- [Wikipedia (zh)](https://huggingface.co/datasets/pleisto/wikipedia-cn-20230720-filtered)
+- [Pile (en)](https://huggingface.co/datasets/EleutherAI/pile)
+- [SkyPile (zh)](https://huggingface.co/datasets/Skywork/SkyPile-150B)
+- [FineWeb (en)](https://huggingface.co/datasets/HuggingFaceFW/fineweb)
+- [FineWeb-Edu (en)](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
+- [The Stack (en)](https://huggingface.co/datasets/bigcode/the-stack)
+- [StarCoder (en)](https://huggingface.co/datasets/bigcode/starcoderdata)
+
+
+
+指令微调数据集
+
+- [Identity (en&zh)](data/identity.json)
+- [Stanford Alpaca (en)](https://github.com/tatsu-lab/stanford_alpaca)
+- [Stanford Alpaca (zh)](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3)
+- [Alpaca GPT4 (en&zh)](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
+- [Glaive Function Calling V2 (en&zh)](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2)
+- [LIMA (en)](https://huggingface.co/datasets/GAIR/lima)
+- [Guanaco Dataset (multilingual)](https://huggingface.co/datasets/JosephusCheung/GuanacoDataset)
+- [BELLE 2M (zh)](https://huggingface.co/datasets/BelleGroup/train_2M_CN)
+- [BELLE 1M (zh)](https://huggingface.co/datasets/BelleGroup/train_1M_CN)
+- [BELLE 0.5M (zh)](https://huggingface.co/datasets/BelleGroup/train_0.5M_CN)
+- [BELLE Dialogue 0.4M (zh)](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)
+- [BELLE School Math 0.25M (zh)](https://huggingface.co/datasets/BelleGroup/school_math_0.25M)
+- [BELLE Multiturn Chat 0.8M (zh)](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M)
+- [UltraChat (en)](https://github.com/thunlp/UltraChat)
+- [OpenPlatypus (en)](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)
+- [CodeAlpaca 20k (en)](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)
+- [Alpaca CoT (multilingual)](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT)
+- [OpenOrca (en)](https://huggingface.co/datasets/Open-Orca/OpenOrca)
+- [SlimOrca (en)](https://huggingface.co/datasets/Open-Orca/SlimOrca)
+- [MathInstruct (en)](https://huggingface.co/datasets/TIGER-Lab/MathInstruct)
+- [Firefly 1.1M (zh)](https://huggingface.co/datasets/YeungNLP/firefly-train-1.1M)
+- [Wiki QA (en)](https://huggingface.co/datasets/wiki_qa)
+- [Web QA (zh)](https://huggingface.co/datasets/suolyer/webqa)
+- [WebNovel (zh)](https://huggingface.co/datasets/zxbsmk/webnovel_cn)
+- [Nectar (en)](https://huggingface.co/datasets/berkeley-nest/Nectar)
+- [deepctrl (en&zh)](https://www.modelscope.cn/datasets/deepctrl/deepctrl-sft-data)
+- [Advertise Generating (zh)](https://huggingface.co/datasets/HasturOfficial/adgen)
+- [ShareGPT Hyperfiltered (en)](https://huggingface.co/datasets/totally-not-an-llm/sharegpt-hyperfiltered-3k)
+- [ShareGPT4 (en&zh)](https://huggingface.co/datasets/shibing624/sharegpt_gpt4)
+- [UltraChat 200k (en)](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
+- [AgentInstruct (en)](https://huggingface.co/datasets/THUDM/AgentInstruct)
+- [LMSYS Chat 1M (en)](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
+- [Evol Instruct V2 (en)](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k)
+- [Cosmopedia (en)](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia)
+- [STEM (zh)](https://huggingface.co/datasets/hfl/stem_zh_instruction)
+- [Ruozhiba (zh)](https://huggingface.co/datasets/hfl/ruozhiba_gpt4_turbo)
+- [Neo-sft (zh)](https://huggingface.co/datasets/m-a-p/neo_sft_phase2)
+- [Magpie-Pro-300K-Filtered (en)](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered)
+- [Magpie-ultra-v0.1 (en)](https://huggingface.co/datasets/argilla/magpie-ultra-v0.1)
+- [WebInstructSub (en)](https://huggingface.co/datasets/TIGER-Lab/WebInstructSub)
+- [OpenO1-SFT (en&zh)](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)
+- [Open-Thoughts (en)](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k)
+- [Open-R1-Math (en)](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)
+- [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT)
+- [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k)
+- [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions)
+- [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de)
+- [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de)
+- [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de)
+- [OpenSchnabeltier (de)](https://huggingface.co/datasets/mayflowergmbh/openschnabeltier_de)
+- [Evol Instruct (de)](https://huggingface.co/datasets/mayflowergmbh/evol-instruct_de)
+- [Dolphin (de)](https://huggingface.co/datasets/mayflowergmbh/dolphin_de)
+- [Booksum (de)](https://huggingface.co/datasets/mayflowergmbh/booksum_de)
+- [Airoboros (de)](https://huggingface.co/datasets/mayflowergmbh/airoboros-3.0_de)
+- [Ultrachat (de)](https://huggingface.co/datasets/mayflowergmbh/ultra-chat_de)
+
+
+
+偏好数据集
+
+- [DPO mixed (en&zh)](https://huggingface.co/datasets/hiyouga/DPO-En-Zh-20k)
+- [UltraFeedback (en)](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
+- [COIG-P (en&zh)](https://huggingface.co/datasets/m-a-p/COIG-P)
+- [RLHF-V (en)](https://huggingface.co/datasets/openbmb/RLHF-V-Dataset)
+- [VLFeedback (en)](https://huggingface.co/datasets/Zhihui/VLFeedback)
+- [Orca DPO Pairs (en)](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
+- [HH-RLHF (en)](https://huggingface.co/datasets/Anthropic/hh-rlhf)
+- [Nectar (en)](https://huggingface.co/datasets/berkeley-nest/Nectar)
+- [Orca DPO (de)](https://huggingface.co/datasets/mayflowergmbh/intel_orca_dpo_pairs_de)
+- [KTO mixed (en)](https://huggingface.co/datasets/argilla/kto-mix-15k)
+
+
+
+部分数据集的使用需要确认,我们推荐使用下述命令登录您的 Hugging Face 账户。
+
+```bash
+pip install --upgrade huggingface_hub
+huggingface-cli login
+```
+
+## 软硬件依赖
+
+| 必需项 | 至少 | 推荐 |
+| ------------ | ------- | --------- |
+| python | 3.9 | 3.10 |
+| torch | 2.0.0 | 2.6.0 |
+| transformers | 4.45.0 | 4.50.0 |
+| datasets | 2.16.0 | 3.2.0 |
+| accelerate | 0.34.0 | 1.2.1 |
+| peft | 0.14.0 | 0.15.1 |
+| trl | 0.8.6 | 0.9.6 |
+
+| 可选项 | 至少 | 推荐 |
+| ------------ | ------- | --------- |
+| CUDA | 11.6 | 12.2 |
+| deepspeed | 0.10.0 | 0.16.4 |
+| bitsandbytes | 0.39.0 | 0.43.1 |
+| vllm | 0.4.3 | 0.8.2 |
+| flash-attn | 2.5.6 | 2.7.2 |
+
+### 硬件依赖
+
+\* *估算值*
+
+| 方法 | 精度 | 7B | 14B | 30B | 70B | `x`B |
+| ------------------------------- | ---- | ----- | ----- | ----- | ------ | ------- |
+| Full (`bf16` or `fp16`) | 32 | 120GB | 240GB | 600GB | 1200GB | `18x`GB |
+| Full (`pure_bf16`) | 16 | 60GB | 120GB | 300GB | 600GB | `8x`GB |
+| Freeze/LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | `2x`GB |
+| QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | `x`GB |
+| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | `x/2`GB |
+| QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | `x/4`GB |
+
+## 如何使用
+
+### 安装 LLaMA Factory
+
+> [!IMPORTANT]
+> 此步骤为必需。
+
+```bash
+git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
+cd LLaMA-Factory
+pip install -e ".[torch,metrics]"
+```
+
+可选的额外依赖项:torch、torch-npu、metrics、deepspeed、liger-kernel、bitsandbytes、hqq、eetq、gptq、awq、aqlm、vllm、sglang、galore、apollo、badam、adam-mini、qwen、minicpm_v、modelscope、openmind、swanlab、quality
+
+> [!TIP]
+> 遇到包冲突时,可使用 `pip install --no-deps -e .` 解决。
+
+使用 uv 构建虚拟环境
+
+使用 [uv](https://github.com/astral-sh/uv) 创建隔离的 Python 环境:
+
+```bash
+uv sync --extra torch --extra metrics --prerelease=allow
+```
+
+在环境中运行 LLaMA-Factory:
+
+```bash
+uv run --prerelease=allow llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+```
+
+
+
+
+Windows 用户指南
+
+#### 安装 BitsAndBytes
+
+如果要在 Windows 平台上开启量化 LoRA(QLoRA),需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2, 请根据您的 CUDA 版本情况选择适合的[发布版本](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels)。
+
+```bash
+pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
+```
+
+#### 安装 Flash Attention-2
+
+如果要在 Windows 平台上开启 FlashAttention-2,请使用 [flash-attention-windows-wheel](https://huggingface.co/lldacing/flash-attention-windows-wheel) 中的脚本自行编译与安装。
+
+
+
+昇腾 NPU 用户指南
+
+在昇腾 NPU 设备上安装 LLaMA Factory 时,请升级 Python 到 3.10 及以上,并需要指定额外依赖项,使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外,还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**,安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令:
+
+```bash
+# 请替换 URL 为 CANN 版本和设备型号对应的 URL
+# 安装 CANN Toolkit
+wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run
+bash Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run --install
+
+# 安装 CANN Kernels
+wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run
+bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install
+
+# 设置环境变量
+source /usr/local/Ascend/ascend-toolkit/set_env.sh
+```
+
+| 依赖项 | 至少 | 推荐 |
+| ------------ | ------- | -------------- |
+| CANN | 8.0.RC1 | 8.0.0.alpha002 |
+| torch | 2.1.0 | 2.4.0 |
+| torch-npu | 2.1.0 | 2.4.0.post2 |
+| deepspeed | 0.13.2 | 0.13.2 |
+| vllm-ascend | - | 0.7.3 |
+
+请使用 `ASCEND_RT_VISIBLE_DEVICES` 而非 `CUDA_VISIBLE_DEVICES` 来指定运算设备。
+
+如果遇到无法正常推理的情况,请尝试设置 `do_sample: false`。
+
+下载预构建 Docker 镜像:[32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html)
+
+#### 安装 BitsAndBytes
+
+如果要在 Ascend NPU 上进行基于 bitsandbytes 的 QLoRA 量化微调,请执行如下步骤:
+
+1. 手动编译 bitsandbytes:请参考[安装文档](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU)完成 NPU 版的 bitsandbytes 安装,编译要求环境 cmake 版本不低于 3.22.1,g++ 版本不低于 12.x。
+
+```bash
+# 从源码安装 bitsandbytes
+# 克隆 bitsandbytes 仓库, Ascend NPU 目前在 multi-backend-refactor 中支持
+git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git
+cd bitsandbytes/
+
+# 安装依赖
+pip install -r requirements-dev.txt
+
+# 安装编译工具依赖,该步骤在不同系统上命令有所不同,供参考
+apt-get install -y build-essential cmake
+
+# 编译 & 安装
+cmake -DCOMPUTE_BACKEND=npu -S .
+make
+pip install .
+```
+
+2. 安装 transformers 的 main 分支版本。
+
+```bash
+git clone -b main https://github.com/huggingface/transformers.git
+cd transformers
+pip install .
+```
+
+3. 在训练参数中设置 `double_quantization: false`,可参考[示例](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml)。
+
+
+
+### 数据准备
+
+关于数据集文件的格式,请参考 [data/README_zh.md](data/README_zh.md) 的内容。你可以使用 HuggingFace / ModelScope / Modelers 上的数据集或加载本地数据集。
+
+> [!NOTE]
+> 使用自定义数据集时,请更新 `data/dataset_info.json` 文件。
+
+您也可以使用 **[Easy Dataset](https://github.com/ConardLi/easy-dataset)** 构建用于微调的合成数据。
+
+### 快速开始
+
+下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+```
+
+高级用法请参考 [examples/README_zh.md](examples/README_zh.md)(包括多 GPU 微调)。
+
+> [!TIP]
+> 使用 `llamafactory-cli help` 显示帮助信息。
+>
+> 遇到报错请先看[常见问题](https://github.com/hiyouga/LLaMA-Factory/issues/4614)。
+
+### LLaMA Board 可视化微调(由 [Gradio](https://github.com/gradio-app/gradio) 驱动)
+
+```bash
+llamafactory-cli webui
+```
+
+### 构建 Docker
+
+CUDA 用户:
+
+```bash
+cd docker/docker-cuda/
+docker compose up -d
+docker compose exec llamafactory bash
+```
+
+昇腾 NPU 用户:
+
+```bash
+cd docker/docker-npu/
+docker compose up -d
+docker compose exec llamafactory bash
+```
+
+AMD ROCm 用户:
+
+```bash
+cd docker/docker-rocm/
+docker compose up -d
+docker compose exec llamafactory bash
+```
+
+不使用 Docker Compose 构建
+
+CUDA 用户:
+
+```bash
+docker build -f ./docker/docker-cuda/Dockerfile \
+ --build-arg INSTALL_BNB=false \
+ --build-arg INSTALL_VLLM=false \
+ --build-arg INSTALL_DEEPSPEED=false \
+ --build-arg INSTALL_FLASHATTN=false \
+ --build-arg PIP_INDEX=https://pypi.org/simple \
+ -t llamafactory:latest .
+
+docker run -dit --gpus=all \
+ -v ./hf_cache:/root/.cache/huggingface \
+ -v ./ms_cache:/root/.cache/modelscope \
+ -v ./om_cache:/root/.cache/openmind \
+ -v ./data:/app/data \
+ -v ./output:/app/output \
+ -p 7860:7860 \
+ -p 8000:8000 \
+ --shm-size 16G \
+ --name llamafactory \
+ llamafactory:latest
+
+docker exec -it llamafactory bash
+```
+
+昇腾 NPU 用户:
+
+```bash
+# 根据您的环境选择镜像
+docker build -f ./docker/docker-npu/Dockerfile \
+ --build-arg INSTALL_DEEPSPEED=false \
+ --build-arg PIP_INDEX=https://pypi.org/simple \
+ -t llamafactory:latest .
+
+# 根据您的资源更改 `device`
+docker run -dit \
+ -v ./hf_cache:/root/.cache/huggingface \
+ -v ./ms_cache:/root/.cache/modelscope \
+ -v ./om_cache:/root/.cache/openmind \
+ -v ./data:/app/data \
+ -v ./output:/app/output \
+ -v /usr/local/dcmi:/usr/local/dcmi \
+ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
+ -v /etc/ascend_install.info:/etc/ascend_install.info \
+ -p 7860:7860 \
+ -p 8000:8000 \
+ --device /dev/davinci0 \
+ --device /dev/davinci_manager \
+ --device /dev/devmm_svm \
+ --device /dev/hisi_hdc \
+ --shm-size 16G \
+ --name llamafactory \
+ llamafactory:latest
+
+docker exec -it llamafactory bash
+```
+
+AMD ROCm 用户:
+
+```bash
+docker build -f ./docker/docker-rocm/Dockerfile \
+ --build-arg INSTALL_BNB=false \
+ --build-arg INSTALL_VLLM=false \
+ --build-arg INSTALL_DEEPSPEED=false \
+ --build-arg INSTALL_FLASHATTN=false \
+ --build-arg PIP_INDEX=https://pypi.org/simple \
+ -t llamafactory:latest .
+
+docker run -dit \
+ -v ./hf_cache:/root/.cache/huggingface \
+ -v ./ms_cache:/root/.cache/modelscope \
+ -v ./om_cache:/root/.cache/openmind \
+ -v ./data:/app/data \
+ -v ./output:/app/output \
+ -v ./saves:/app/saves \
+ -p 7860:7860 \
+ -p 8000:8000 \
+ --device /dev/kfd \
+ --device /dev/dri \
+ --shm-size 16G \
+ --name llamafactory \
+ llamafactory:latest
+
+docker exec -it llamafactory bash
+```
+
+
+
+数据卷详情
+
+- `hf_cache`:使用宿主机的 Hugging Face 缓存文件夹,允许更改为新的目录。
+- `ms_cache`:类似 Hugging Face 缓存文件夹,为 ModelScope 用户提供。
+- `om_cache`:类似 Hugging Face 缓存文件夹,为 Modelers 用户提供。
+- `data`:宿主机中存放数据集的文件夹路径。
+- `output`:将导出目录设置为该路径后,即可在宿主机中访问导出后的模型。
+
+
+
+### 利用 vLLM 部署 OpenAI API
+
+```bash
+API_PORT=8000 llamafactory-cli api examples/inference/llama3_vllm.yaml
+```
+
+> [!TIP]
+> API 文档请查阅[这里](https://platform.openai.com/docs/api-reference/chat/create)。
+>
+> 示例:[图像理解](scripts/api_example/test_image.py) | [工具调用](scripts/api_example/test_toolcall.py)
+
+### 从魔搭社区下载
+
+如果您在 Hugging Face 模型和数据集的下载中遇到了问题,可以通过下述方法使用魔搭社区。
+
+```bash
+export USE_MODELSCOPE_HUB=1 # Windows 使用 `set USE_MODELSCOPE_HUB=1`
+```
+
+将 `model_name_or_path` 设置为模型 ID 来加载对应的模型。在[魔搭社区](https://modelscope.cn/models)查看所有可用的模型,例如 `LLM-Research/Meta-Llama-3-8B-Instruct`。
+
+### 从魔乐社区下载
+
+您也可以通过下述方法,使用魔乐社区下载数据集和模型。
+
+```bash
+export USE_OPENMIND_HUB=1 # Windows 使用 `set USE_OPENMIND_HUB=1`
+```
+
+将 `model_name_or_path` 设置为模型 ID 来加载对应的模型。在[魔乐社区](https://modelers.cn/models)查看所有可用的模型,例如 `TeleAI/TeleChat-7B-pt`。
+
+### 使用 W&B 面板
+
+若要使用 [Weights & Biases](https://wandb.ai) 记录实验数据,请在 yaml 文件中添加下面的参数。
+
+```yaml
+report_to: wandb
+run_name: test_run # 可选
+```
+
+在启动训练任务时,将 `WANDB_API_KEY` 设置为[密钥](https://wandb.ai/authorize)来登录 W&B 账户。
+
+### 使用 SwanLab 面板
+
+若要使用 [SwanLab](https://github.com/SwanHubX/SwanLab) 记录实验数据,请在 yaml 文件中添加下面的参数。
+
+```yaml
+use_swanlab: true
+swanlab_run_name: test_run # 可选
+```
+
+在启动训练任务时,登录SwanLab账户有以下三种方式:
+
+方式一:在 yaml 文件中添加 `swanlab_api_key=` ,并设置为你的 [API 密钥](https://swanlab.cn/settings)。
+方式二:将环境变量 `SWANLAB_API_KEY` 设置为你的 [API 密钥](https://swanlab.cn/settings)。
+方式三:启动前使用 `swanlab login` 命令完成登录。
+
+## 使用了 LLaMA Factory 的项目
+
+如果您有项目希望添加至下述列表,请通过邮件联系或者创建一个 PR。
+
+点击显示
+
+1. Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [[arxiv]](https://arxiv.org/abs/2308.02223)
+1. Yu et al. Open, Closed, or Small Language Models for Text Classification? 2023. [[arxiv]](https://arxiv.org/abs/2308.10092)
+1. Wang et al. UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural Language. 2023. [[arxiv]](https://arxiv.org/abs/2308.10526)
+1. Luceri et al. Leveraging Large Language Models to Detect Influence Campaigns in Social Media. 2023. [[arxiv]](https://arxiv.org/abs/2311.07816)
+1. Zhang et al. Alleviating Hallucinations of Large Language Models through Induced Hallucinations. 2023. [[arxiv]](https://arxiv.org/abs/2312.15710)
+1. Wang et al. Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs. KDD 2024. [[arxiv]](https://arxiv.org/abs/2401.04319)
+1. Wang et al. CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning. ACL 2024. [[arxiv]](https://arxiv.org/abs/2401.07286)
+1. Choi et al. FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs. 2024. [[arxiv]](https://arxiv.org/abs/2402.05904)
+1. Zhang et al. AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts. 2024. [[arxiv]](https://arxiv.org/abs/2402.07625)
+1. Lyu et al. KnowTuning: Knowledge-aware Fine-tuning for Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2402.11176)
+1. Yang et al. LaCo: Large Language Model Pruning via Layer Collaps. 2024. [[arxiv]](https://arxiv.org/abs/2402.11187)
+1. Bhardwaj et al. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. 2024. [[arxiv]](https://arxiv.org/abs/2402.11746)
+1. Yang et al. Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models. 2024. [[arxiv]](https://arxiv.org/abs/2402.11801)
+1. Yi et al. Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. ACL 2024 Findings. [[arxiv]](https://arxiv.org/abs/2402.11809)
+1. Cao et al. Head-wise Shareable Attention for Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2402.11819)
+1. Zhang et al. Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages. 2024. [[arxiv]](https://arxiv.org/abs/2402.12204)
+1. Kim et al. Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2402.14714)
+1. Yu et al. KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models. ACL 2024. [[arxiv]](https://arxiv.org/abs/2402.15043)
+1. Huang et al. Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning. 2024. [[arxiv]](https://arxiv.org/abs/2403.02333)
+1. Duan et al. Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization. 2024. [[arxiv]](https://arxiv.org/abs/2403.03419)
+1. Xie and Schwertfeger. Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs. 2024. [[arxiv]](https://arxiv.org/abs/2403.08228)
+1. Wu et al. Large Language Models are Parallel Multilingual Learners. 2024. [[arxiv]](https://arxiv.org/abs/2403.09073)
+1. Zhang et al. EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling. 2024. [[arxiv]](https://arxiv.org/abs/2403.14541)
+1. Weller et al. FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. 2024. [[arxiv]](https://arxiv.org/abs/2403.15246)
+1. Hongbin Na. CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering. COLING 2024. [[arxiv]](https://arxiv.org/abs/2403.16008)
+1. Zan et al. CodeS: Natural Language to Code Repository via Multi-Layer Sketch. 2024. [[arxiv]](https://arxiv.org/abs/2403.16443)
+1. Liu et al. Extensive Self-Contrast Enables Feedback-Free Language Model Alignment. 2024. [[arxiv]](https://arxiv.org/abs/2404.00604)
+1. Luo et al. BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2404.02827)
+1. Du et al. Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. 2024. [[arxiv]](https://arxiv.org/abs/2404.04167)
+1. Ma et al. Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation. ICML 2024. [[arxiv]](https://arxiv.org/abs/2404.04316)
+1. Liu et al. Dynamic Generation of Personalities with Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2404.07084)
+1. Shang et al. How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2404.09836)
+1. Huang et al. LLMTune: Accelerate Database Knob Tuning with Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2404.11581)
+1. Deng et al. Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction. 2024. [[arxiv]](https://arxiv.org/abs/2404.14215)
+1. Acikgoz et al. Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare. 2024. [[arxiv]](https://arxiv.org/abs/2404.16621)
+1. Zhang et al. Small Language Models Need Strong Verifiers to Self-Correct Reasoning. ACL 2024 Findings. [[arxiv]](https://arxiv.org/abs/2404.17140)
+1. Zhou et al. FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering. NAACL 2024. [[arxiv]](https://arxiv.org/abs/2404.18585)
+1. Xu et al. Large Language Models for Cyber Security: A Systematic Literature Review. 2024. [[arxiv]](https://arxiv.org/abs/2405.04760)
+1. Dammu et al. "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations. 2024. [[arxiv]](https://arxiv.org/abs/2405.05378)
+1. Yi et al. A safety realignment framework via subspace-oriented model fusion for large language models. 2024. [[arxiv]](https://arxiv.org/abs/2405.09055)
+1. Lou et al. SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling. 2024. [[arxiv]](https://arxiv.org/abs/2405.12739)
+1. Zhang et al. Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners. 2024. [[arxiv]](https://arxiv.org/abs/2405.13816)
+1. Zhang et al. TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models. 2024. [[arxiv]](https://arxiv.org/abs/2405.20215)
+1. Zihong Chen. Sentence Segmentation and Sentence Punctuation Based on XunziALLM. 2024. [[paper]](https://aclanthology.org/2024.lt4hala-1.30)
+1. Gao et al. The Best of Both Worlds: Toward an Honest and Helpful Large Language Model. 2024. [[arxiv]](https://arxiv.org/abs/2406.00380)
+1. Wang and Song. MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset. 2024. [[arxiv]](https://arxiv.org/abs/2406.02106)
+1. Hu et al. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models. 2024. [[arxiv]](https://arxiv.org/abs/2406.03136)
+1. Ge et al. Time Sensitive Knowledge Editing through Efficient Finetuning. ACL 2024. [[arxiv]](https://arxiv.org/abs/2406.04496)
+1. Tan et al. Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions. 2024. [[arxiv]](https://arxiv.org/abs/2406.05688)
+1. Song et al. Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters. 2024. [[arxiv]](https://arxiv.org/abs/2406.05955)
+1. Gu et al. RWKV-CLIP: A Robust Vision-Language Representation Learner. 2024. [[arxiv]](https://arxiv.org/abs/2406.06973)
+1. Chen et al. Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees. 2024. [[arxiv]](https://arxiv.org/abs/2406.07115)
+1. Zhu et al. Are Large Language Models Good Statisticians?. 2024. [[arxiv]](https://arxiv.org/abs/2406.07815)
+1. Li et al. Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning. 2024. [[arxiv]](https://arxiv.org/abs/2406.10099)
+1. Ding et al. IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce. 2024. [[arxiv]](https://arxiv.org/abs/2406.10173)
+1. He et al. COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities. 2024. [[arxiv]](https://arxiv.org/abs/2406.12074)
+1. Lin et al. FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving. 2024. [[arxiv]](https://arxiv.org/abs/2406.14408)
+1. Treutlein et al. Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data. 2024. [[arxiv]](https://arxiv.org/abs/2406.14546)
+1. Feng et al. SS-Bench: A Benchmark for Social Story Generation and Evaluation. 2024. [[arxiv]](https://arxiv.org/abs/2406.15695)
+1. Feng et al. Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement. 2024. [[arxiv]](https://arxiv.org/abs/2406.17233)
+1. Liu et al. Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals. 2024. [[arxiv]](https://arxiv.org/abs/2406.18069)
+1. Iyer et al. Exploring Very Low-Resource Translation with LLMs: The University of Edinburgh's Submission to AmericasNLP 2024 Translation Task. AmericasNLP 2024. [[paper]](https://aclanthology.org/2024.americasnlp-1.25)
+1. Li et al. Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring. 2024. [[arxiv]](https://arxiv.org/abs/2406.19949)
+1. Yang et al. Financial Knowledge Large Language Model. 2024. [[arxiv]](https://arxiv.org/abs/2407.00365)
+1. Lin et al. DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging. 2024. [[arxiv]](https://arxiv.org/abs/2407.01470)
+1. Bako et al. Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization. 2024. [[arxiv]](https://arxiv.org/abs/2407.06129)
+1. Huang et al. RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization. 2024. [[arxiv]](https://arxiv.org/abs/2407.08044)
+1. Jiang et al. LLM-Collaboration on Automatic Science Journalism for the General Audience. 2024. [[arxiv]](https://arxiv.org/abs/2407.09756)
+1. Inouye et al. Applied Auto-tuning on LoRA Hyperparameters. 2024. [[paper]](https://scholarcommons.scu.edu/cseng_senior/272/)
+1. Qi et al. Research on Tibetan Tourism Viewpoints information generation system based on LLM. 2024. [[arxiv]](https://arxiv.org/abs/2407.13561)
+1. Xu et al. Course-Correction: Safety Alignment Using Synthetic Preferences. 2024. [[arxiv]](https://arxiv.org/abs/2407.16637)
+1. Sun et al. LAMBDA: A Large Model Based Data Agent. 2024. [[arxiv]](https://arxiv.org/abs/2407.17535)
+1. Zhu et al. CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare. 2024. [[arxiv]](https://arxiv.org/abs/2407.19705)
+1. Yu et al. Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment. 2024. [[arxiv]](https://arxiv.org/abs/2408.00137)
+1. Xie et al. The Power of Personalized Datasets: Advancing Chinese Composition Writing for Elementary School through Targeted Model Fine-Tuning. IALP 2024. [[paper]](https://www.asianlp.sg/conferences/ialp2024/proceedings/papers/IALP2024_P055.pdf)
+1. Liu et al. Instruct-Code-Llama: Improving Capabilities of Language Model in Competition Level Code Generation by Online Judge Feedback. ICIC 2024. [[paper]](https://link.springer.com/chapter/10.1007/978-981-97-5669-8_11)
+1. Wang et al. Cybernetic Sentinels: Unveiling the Impact of Safety Data Selection on Model Security in Supervised Fine-Tuning. ICIC 2024. [[paper]](https://link.springer.com/chapter/10.1007/978-981-97-5669-8_23)
+1. Xia et al. Understanding the Performance and Estimating the Cost of LLM Fine-Tuning. 2024. [[arxiv]](https://arxiv.org/abs/2408.04693)
+1. Zeng et al. Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions. 2024. [[arxiv]](https://arxiv.org/abs/2408.04168)
+1. Xia et al. Using Pre-trained Language Model for Accurate ESG Prediction. FinNLP 2024. [[paper]](https://aclanthology.org/2024.finnlp-2.1/)
+1. Liang et al. I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm. 2024. [[arxiv]](https://arxiv.org/abs/2408.08072)
+1. Bai et al. Aligning Large Language Model with Direct Multi-Preference Optimization for Recommendation. CIKM 2024. [[paper]](https://dl.acm.org/doi/10.1145/3627673.3679611)
+1. **[StarWhisper](https://github.com/Yu-Yang-Li/StarWhisper)**: 天文大模型 StarWhisper,基于 ChatGLM2-6B 和 Qwen-14B 在天文数据上微调而得。
+1. **[DISC-LawLLM](https://github.com/FudanDISC/DISC-LawLLM)**: 中文法律领域大模型 DISC-LawLLM,基于 Baichuan-13B 微调而得,具有法律推理和知识检索能力。
+1. **[Sunsimiao](https://github.com/X-D-Lab/Sunsimiao)**: 孙思邈中文医疗大模型 Sumsimiao,基于 Baichuan-7B 和 ChatGLM-6B 在中文医疗数据上微调而得。
+1. **[CareGPT](https://github.com/WangRongsheng/CareGPT)**: 医疗大模型项目 CareGPT,基于 LLaMA2-7B 和 Baichuan-13B 在中文医疗数据上微调而得。
+1. **[MachineMindset](https://github.com/PKU-YuanGroup/Machine-Mindset/)**:MBTI性格大模型项目,根据数据集与训练方式让任意 LLM 拥有 16 个不同的性格类型。
+1. **[Luminia-13B-v3](https://huggingface.co/Nekochu/Luminia-13B-v3)**:一个用于生成 Stable Diffusion 提示词的大型语言模型。[[demo]](https://huggingface.co/spaces/Nekochu/Luminia-13B_SD_Prompt)
+1. **[Chinese-LLaVA-Med](https://github.com/BUAADreamer/Chinese-LLaVA-Med)**:中文多模态医学大模型,基于 LLaVA-1.5-7B 在中文多模态医疗数据上微调而得。
+1. **[AutoRE](https://github.com/THUDM/AutoRE)**:基于大语言模型的文档级关系抽取系统。
+1. **[NVIDIA RTX AI Toolkit](https://github.com/NVIDIA/RTX-AI-Toolkit)**:在 Windows 主机上利用英伟达 RTX 设备进行大型语言模型微调的开发包。
+1. **[LazyLLM](https://github.com/LazyAGI/LazyLLM)**:一个低代码构建多 Agent 大模型应用的开发工具,支持基于 LLaMA Factory 的模型微调.
+1. **[RAG-Retrieval](https://github.com/NLPJCL/RAG-Retrieval)**:一个全链路 RAG 检索模型微调、推理和蒸馏代码库。[[blog]](https://zhuanlan.zhihu.com/p/987727357)
+1. **[360-LLaMA-Factory](https://github.com/Qihoo360/360-LLaMA-Factory)**:一个魔改后的代码库,通过 Ring Attention 支持长序列的 SFT 和 DPO 训练。
+1. **[Sky-T1](https://novasky-ai.github.io/posts/sky-t1/)**:由 NovaSky AI 微调的低成本类 o1 长推理模型。
+
+
+
+## 协议
+
+本仓库的代码依照 [Apache-2.0](LICENSE) 协议开源。
+
+使用模型权重时,请遵循对应的模型协议:[Baichuan 2](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/blob/main/Community%20License%20for%20Baichuan%202%20Model.pdf) / [BLOOM](https://huggingface.co/spaces/bigscience/license) / [ChatGLM3](https://github.com/THUDM/ChatGLM3/blob/main/MODEL_LICENSE) / [Command R](https://cohere.com/c4ai-cc-by-nc-license) / [DeepSeek](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/LICENSE-MODEL) / [Falcon](https://huggingface.co/tiiuae/falcon-180B/blob/main/LICENSE.txt) / [Gemma](https://ai.google.dev/gemma/terms) / [GLM-4](https://huggingface.co/THUDM/glm-4-9b/blob/main/LICENSE) / [GPT-2](https://github.com/openai/gpt-2/blob/master/LICENSE) / [Granite](LICENSE) / [Index](https://huggingface.co/IndexTeam/Index-1.9B/blob/main/LICENSE) / [InternLM](https://github.com/InternLM/InternLM#license) / [Llama](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) / [Llama 2](https://ai.meta.com/llama/license/) / [Llama 3](https://llama.meta.com/llama3/license/) / [Llama 4](https://github.com/meta-llama/llama-models/blob/main/models/llama4/LICENSE) / [MiniCPM](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md) / [Mistral/Mixtral/Pixtral](LICENSE) / [OLMo](LICENSE) / [Phi-1.5/Phi-2](https://huggingface.co/microsoft/phi-1_5/resolve/main/Research%20License.docx) / [Phi-3/Phi-4](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/LICENSE) / [Qwen](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT) / [Skywork](https://huggingface.co/Skywork/Skywork-13B-base/blob/main/Skywork%20Community%20License.pdf) / [StarCoder 2](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement) / [TeleChat2](https://huggingface.co/Tele-AI/telechat-7B/blob/main/TeleChat%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) / [XVERSE](https://github.com/xverse-ai/XVERSE-13B/blob/main/MODEL_LICENSE.pdf) / [Yi](https://huggingface.co/01-ai/Yi-6B/blob/main/LICENSE) / [Yi-1.5](LICENSE) / [Yuan 2](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/LICENSE-Yuan)
+
+## 引用
+
+如果您觉得此项目有帮助,请考虑以下列格式引用
+
+```bibtex
+@inproceedings{zheng2024llamafactory,
+ title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models},
+ author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Zhangchi Feng and Yongqiang Ma},
+ booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
+ address={Bangkok, Thailand},
+ publisher={Association for Computational Linguistics},
+ year={2024},
+ url={http://arxiv.org/abs/2403.13372}
+}
+```
+
+## 致谢
+
+本项目受益于 [PEFT](https://github.com/huggingface/peft)、[TRL](https://github.com/huggingface/trl)、[QLoRA](https://github.com/artidoro/qlora) 和 [FastChat](https://github.com/lm-sys/FastChat),感谢以上诸位作者的付出。
+
+## Star History
+
+
diff --git a/post-training/LLaMA-Factory/assets/benchmark.svg b/post-training/LLaMA-Factory/assets/benchmark.svg
new file mode 100644
index 0000000..e2b1db4
--- /dev/null
+++ b/post-training/LLaMA-Factory/assets/benchmark.svg
@@ -0,0 +1,1216 @@
+
+
+
+
+
+
+
+ 2023-11-18T11:28:03.028228
+ image/svg+xml
+
+
+ Matplotlib v3.7.1, https://matplotlib.org/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/post-training/LLaMA-Factory/assets/logo.png b/post-training/LLaMA-Factory/assets/logo.png
new file mode 100644
index 0000000..5fb3dd5
Binary files /dev/null and b/post-training/LLaMA-Factory/assets/logo.png differ
diff --git a/post-training/LLaMA-Factory/assets/wechat.jpg b/post-training/LLaMA-Factory/assets/wechat.jpg
new file mode 100644
index 0000000..089674e
Binary files /dev/null and b/post-training/LLaMA-Factory/assets/wechat.jpg differ
diff --git a/post-training/LLaMA-Factory/assets/wechat_npu.jpg b/post-training/LLaMA-Factory/assets/wechat_npu.jpg
new file mode 100644
index 0000000..a5dece5
Binary files /dev/null and b/post-training/LLaMA-Factory/assets/wechat_npu.jpg differ
diff --git a/post-training/LLaMA-Factory/docker/docker-cuda/Dockerfile b/post-training/LLaMA-Factory/docker/docker-cuda/Dockerfile
new file mode 100644
index 0000000..aac9635
--- /dev/null
+++ b/post-training/LLaMA-Factory/docker/docker-cuda/Dockerfile
@@ -0,0 +1,101 @@
+# Default use the NVIDIA official image with PyTorch 2.6.0
+# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html
+ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:24.12-py3
+FROM ${BASE_IMAGE}
+
+# Define environments
+ENV MAX_JOBS=4
+ENV FLASH_ATTENTION_FORCE_BUILD=TRUE
+ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
+
+# Define installation arguments
+ARG INSTALL_BNB=false
+ARG INSTALL_VLLM=false
+ARG INSTALL_DEEPSPEED=false
+ARG INSTALL_FLASHATTN=false
+ARG INSTALL_LIGER_KERNEL=false
+ARG INSTALL_HQQ=false
+ARG INSTALL_EETQ=false
+ARG PIP_INDEX=https://pypi.org/simple
+ARG HTTP_PROXY=
+
+# Set the working directory
+WORKDIR /app
+
+# Set http proxy
+RUN if [ -n "$HTTP_PROXY" ]; then \
+ echo "Configuring proxy..."; \
+ export http_proxy=$HTTP_PROXY; \
+ export https_proxy=$HTTP_PROXY; \
+ fi
+
+# Install the requirements
+COPY requirements.txt /app
+RUN pip config set global.index-url "$PIP_INDEX" && \
+ pip config set global.extra-index-url "$PIP_INDEX" && \
+ python -m pip install --upgrade pip && \
+ if [ -n "$HTTP_PROXY" ]; then \
+ python -m pip install --proxy=$HTTP_PROXY -r requirements.txt; \
+ else \
+ python -m pip install -r requirements.txt; \
+ fi
+
+# Copy the rest of the application into the image
+COPY . /app
+
+# Install the LLaMA Factory
+RUN EXTRA_PACKAGES="metrics"; \
+ if [ "$INSTALL_BNB" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},bitsandbytes"; \
+ fi; \
+ if [ "$INSTALL_VLLM" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},vllm"; \
+ fi; \
+ if [ "$INSTALL_DEEPSPEED" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},deepspeed"; \
+ fi; \
+ if [ "$INSTALL_LIGER_KERNEL" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},liger-kernel"; \
+ fi; \
+ if [ "$INSTALL_HQQ" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},hqq"; \
+ fi; \
+ if [ "$INSTALL_EETQ" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},eetq"; \
+ fi; \
+ if [ -n "$HTTP_PROXY" ]; then \
+ pip install --proxy=$HTTP_PROXY -e ".[$EXTRA_PACKAGES]"; \
+ else \
+ pip install -e ".[$EXTRA_PACKAGES]"; \
+ fi
+
+# Rebuild flash attention
+RUN pip uninstall -y transformer-engine flash-attn && \
+ if [ "$INSTALL_FLASHATTN" == "true" ]; then \
+ pip uninstall -y ninja && \
+ if [ -n "$HTTP_PROXY" ]; then \
+ pip install --proxy=$HTTP_PROXY ninja && \
+ pip install --proxy=$HTTP_PROXY --no-cache-dir flash-attn --no-build-isolation; \
+ else \
+ pip install ninja && \
+ pip install --no-cache-dir flash-attn --no-build-isolation; \
+ fi; \
+ fi
+
+
+# Unset http proxy
+RUN if [ -n "$HTTP_PROXY" ]; then \
+ unset http_proxy; \
+ unset https_proxy; \
+ fi
+
+# Set up volumes
+VOLUME [ "/root/.cache/huggingface", "/root/.cache/modelscope", "/app/data", "/app/output" ]
+
+# Expose port 7860 for the LLaMA Board
+ENV GRADIO_SERVER_PORT 7860
+EXPOSE 7860
+
+# Expose port 8000 for the API service
+ENV API_PORT 8000
+EXPOSE 8000
diff --git a/post-training/LLaMA-Factory/docker/docker-cuda/docker-compose.yml b/post-training/LLaMA-Factory/docker/docker-cuda/docker-compose.yml
new file mode 100644
index 0000000..fa386cc
--- /dev/null
+++ b/post-training/LLaMA-Factory/docker/docker-cuda/docker-compose.yml
@@ -0,0 +1,37 @@
+services:
+ llamafactory:
+ build:
+ dockerfile: ./docker/docker-cuda/Dockerfile
+ context: ../..
+ args:
+ INSTALL_BNB: "false"
+ INSTALL_VLLM: "false"
+ INSTALL_DEEPSPEED: "false"
+ INSTALL_FLASHATTN: "false"
+ INSTALL_LIGER_KERNEL: "false"
+ INSTALL_HQQ: "false"
+ INSTALL_EETQ: "false"
+ PIP_INDEX: https://pypi.org/simple
+ container_name: llamafactory
+ volumes:
+ - ../../hf_cache:/root/.cache/huggingface
+ - ../../ms_cache:/root/.cache/modelscope
+ - ../../om_cache:/root/.cache/openmind
+ - ../../data:/app/data
+ - ../../output:/app/output
+ ports:
+ - "7860:7860"
+ - "8000:8000"
+ ipc: host
+ tty: true
+ shm_size: "16gb"
+ stdin_open: true
+ command: bash
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+ count: "all"
+ capabilities: [gpu]
+ restart: unless-stopped
diff --git a/post-training/LLaMA-Factory/docker/docker-npu/Dockerfile b/post-training/LLaMA-Factory/docker/docker-npu/Dockerfile
new file mode 100644
index 0000000..3431176
--- /dev/null
+++ b/post-training/LLaMA-Factory/docker/docker-npu/Dockerfile
@@ -0,0 +1,67 @@
+# Use the Ubuntu 22.04 image with CANN 8.0.rc1
+# More versions can be found at https://hub.docker.com/r/ascendai/cann/tags
+# FROM ascendai/cann:8.0.rc1-910-ubuntu22.04-py3.8
+FROM ascendai/cann:8.0.0-910b-ubuntu22.04-py3.10
+# FROM ascendai/cann:8.0.rc1-910-openeuler22.03-py3.8
+# FROM ascendai/cann:8.0.rc1-910b-openeuler22.03-py3.8
+
+# Define environments
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Define installation arguments
+ARG INSTALL_DEEPSPEED=false
+ARG PIP_INDEX=https://pypi.org/simple
+ARG TORCH_INDEX=https://download.pytorch.org/whl/cpu
+ARG HTTP_PROXY=
+
+# Set the working directory
+WORKDIR /app
+
+# Set http proxy
+RUN if [ -n "$HTTP_PROXY" ]; then \
+ echo "Configuring proxy..."; \
+ export http_proxy=$HTTP_PROXY; \
+ export https_proxy=$HTTP_PROXY; \
+ fi
+
+# Install the requirements
+COPY requirements.txt /app
+RUN pip config set global.index-url "$PIP_INDEX" && \
+ pip config set global.extra-index-url "$TORCH_INDEX" && \
+ python -m pip install --upgrade pip && \
+ if [ -n "$HTTP_PROXY" ]; then \
+ python -m pip install --proxy=$HTTP_PROXY -r requirements.txt; \
+ else \
+ python -m pip install -r requirements.txt; \
+ fi
+
+# Copy the rest of the application into the image
+COPY . /app
+
+# Install the LLaMA Factory
+RUN EXTRA_PACKAGES="torch-npu,metrics"; \
+ if [ "$INSTALL_DEEPSPEED" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},deepspeed"; \
+ fi; \
+ if [ -n "$HTTP_PROXY" ]; then \
+ pip install --proxy=$HTTP_PROXY -e ".[$EXTRA_PACKAGES]"; \
+ else \
+ pip install -e ".[$EXTRA_PACKAGES]"; \
+ fi
+
+# Unset http proxy
+RUN if [ -n "$HTTP_PROXY" ]; then \
+ unset http_proxy; \
+ unset https_proxy; \
+ fi
+
+# Set up volumes
+VOLUME [ "/root/.cache/huggingface", "/root/.cache/modelscope", "/app/data", "/app/output" ]
+
+# Expose port 7860 for the LLaMA Board
+ENV GRADIO_SERVER_PORT 7860
+EXPOSE 7860
+
+# Expose port 8000 for the API service
+ENV API_PORT 8000
+EXPOSE 8000
diff --git a/post-training/LLaMA-Factory/docker/docker-npu/docker-compose.yml b/post-training/LLaMA-Factory/docker/docker-npu/docker-compose.yml
new file mode 100644
index 0000000..dd9a8e1
--- /dev/null
+++ b/post-training/LLaMA-Factory/docker/docker-npu/docker-compose.yml
@@ -0,0 +1,33 @@
+services:
+ llamafactory:
+ build:
+ dockerfile: ./docker/docker-npu/Dockerfile
+ context: ../..
+ args:
+ INSTALL_DEEPSPEED: "false"
+ PIP_INDEX: https://pypi.org/simple
+ container_name: llamafactory
+ volumes:
+ - ../../hf_cache:/root/.cache/huggingface
+ - ../../ms_cache:/root/.cache/modelscope
+ - ../../om_cache:/root/.cache/openmind
+ - ../../data:/app/data
+ - ../../output:/app/output
+ - /usr/local/dcmi:/usr/local/dcmi
+ - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
+ - /usr/local/Ascend/driver:/usr/local/Ascend/driver
+ - /etc/ascend_install.info:/etc/ascend_install.info
+ ports:
+ - "7860:7860"
+ - "8000:8000"
+ ipc: host
+ tty: true
+ shm_size: "16gb"
+ stdin_open: true
+ command: bash
+ devices:
+ - /dev/davinci0
+ - /dev/davinci_manager
+ - /dev/devmm_svm
+ - /dev/hisi_hdc
+ restart: unless-stopped
diff --git a/post-training/LLaMA-Factory/docker/docker-rocm/Dockerfile b/post-training/LLaMA-Factory/docker/docker-rocm/Dockerfile
new file mode 100644
index 0000000..9595baf
--- /dev/null
+++ b/post-training/LLaMA-Factory/docker/docker-rocm/Dockerfile
@@ -0,0 +1,105 @@
+FROM hardandheavy/transformers-rocm:2.2.0
+
+# Define environments
+ENV MAX_JOBS=4
+ENV FLASH_ATTENTION_FORCE_BUILD=TRUE
+ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
+
+# Define installation arguments
+ARG INSTALL_BNB=false
+ARG INSTALL_VLLM=false
+ARG INSTALL_DEEPSPEED=false
+ARG INSTALL_FLASHATTN=false
+ARG INSTALL_LIGER_KERNEL=false
+ARG INSTALL_HQQ=false
+ARG INSTALL_PYTORCH=true
+ARG PIP_INDEX=https://pypi.org/simple
+ARG HTTP_PROXY=
+ARG PYTORCH_INDEX=https://download.pytorch.org/whl/nightly/rocm6.3
+
+# Use Bash instead of default /bin/sh
+SHELL ["/bin/bash", "-c"]
+
+# Set the working directory
+WORKDIR /app
+
+# Set http proxy
+RUN if [ -n "$HTTP_PROXY" ]; then \
+ echo "Configuring proxy..."; \
+ export http_proxy=$HTTP_PROXY; \
+ export https_proxy=$HTTP_PROXY; \
+ fi
+
+# Install the requirements
+COPY requirements.txt /app
+RUN pip config set global.index-url "$PIP_INDEX" && \
+ pip config set global.extra-index-url "$PIP_INDEX" && \
+ python -m pip install --upgrade pip && \
+ if [ -n "$HTTP_PROXY" ]; then \
+ python -m pip install --proxy=$HTTP_PROXY -r requirements.txt; \
+ else \
+ python -m pip install -r requirements.txt; \
+ fi
+
+# Copy the rest of the application into the image
+COPY . /app
+
+# Install the LLaMA Factory
+RUN EXTRA_PACKAGES="metrics"; \
+ if [ "$INSTALL_BNB" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},bitsandbytes"; \
+ fi; \
+ if [ "$INSTALL_VLLM" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},vllm"; \
+ fi; \
+ if [ "$INSTALL_DEEPSPEED" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},deepspeed"; \
+ fi; \
+ if [ "$INSTALL_LIGER_KERNEL" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},liger-kernel"; \
+ fi; \
+ if [ "$INSTALL_HQQ" == "true" ]; then \
+ EXTRA_PACKAGES="${EXTRA_PACKAGES},hqq"; \
+ fi; \
+ if [ -n "$HTTP_PROXY" ]; then \
+ pip install --proxy=$HTTP_PROXY -e ".[$EXTRA_PACKAGES]"; \
+ else \
+ pip install -e ".[$EXTRA_PACKAGES]"; \
+ fi
+
+# Reinstall pytorch
+# This is necessary to ensure that the correct version of PyTorch is installed
+RUN if [ "$INSTALL_PYTORCH" == "true" ]; then \
+ pip uninstall -y torch torchvision torchaudio && \
+ pip install --pre torch torchvision torchaudio --index-url "$PYTORCH_INDEX"; \
+ fi
+
+# Rebuild flash attention
+RUN pip uninstall -y transformer-engine flash-attn && \
+ if [ "$INSTALL_FLASHATTN" == "true" ]; then \
+ pip uninstall -y ninja && \
+ if [ -n "$HTTP_PROXY" ]; then \
+ pip install --proxy=$HTTP_PROXY ninja && \
+ pip install --proxy=$HTTP_PROXY --no-cache-dir flash-attn --no-build-isolation; \
+ else \
+ pip install ninja && \
+ pip install --no-cache-dir flash-attn --no-build-isolation; \
+ fi; \
+ fi
+
+# Unset http proxy
+RUN if [ -n "$HTTP_PROXY" ]; then \
+ unset http_proxy; \
+ unset https_proxy; \
+ fi
+
+# Set up volumes
+VOLUME [ "/root/.cache/huggingface", "/root/.cache/modelscope", "/app/data", "/app/output" ]
+
+# Expose port 7860 for the LLaMA Board
+ENV GRADIO_SERVER_PORT 7860
+EXPOSE 7860
+
+# Expose port 8000 for the API service
+ENV API_PORT 8000
+EXPOSE 8000
diff --git a/post-training/LLaMA-Factory/docker/docker-rocm/docker-compose.yml b/post-training/LLaMA-Factory/docker/docker-rocm/docker-compose.yml
new file mode 100644
index 0000000..caaf4e1
--- /dev/null
+++ b/post-training/LLaMA-Factory/docker/docker-rocm/docker-compose.yml
@@ -0,0 +1,35 @@
+services:
+ llamafactory:
+ build:
+ dockerfile: ./docker/docker-rocm/Dockerfile
+ context: ../..
+ args:
+ INSTALL_BNB: "false"
+ INSTALL_VLLM: "false"
+ INSTALL_DEEPSPEED: "false"
+ INSTALL_FLASHATTN: "false"
+ INSTALL_LIGER_KERNEL: "false"
+ INSTALL_PYTORCH: "true"
+ INSTALL_HQQ: "false"
+ PIP_INDEX: https://pypi.org/simple
+ PYTORCH_INDEX: https://download.pytorch.org/whl/nightly/rocm6.3
+ container_name: llamafactory
+ volumes:
+ - ../../hf_cache:/root/.cache/huggingface
+ - ../../ms_cache:/root/.cache/modelscope
+ - ../../om_cache:/root/.cache/openmind
+ - ../../data:/app/data
+ - ../../output:/app/output
+ - ../../saves:/app/saves
+ ports:
+ - "7860:7860"
+ - "8000:8000"
+ ipc: host
+ tty: true
+ shm_size: "16gb"
+ stdin_open: true
+ command: bash
+ devices:
+ - /dev/kfd:/dev/kfd
+ - /dev/dri:/dev/dri
+ restart: unless-stopped
diff --git a/post-training/LLaMA-Factory/evaluation/ceval/ceval.py b/post-training/LLaMA-Factory/evaluation/ceval/ceval.py
new file mode 100644
index 0000000..72693eb
--- /dev/null
+++ b/post-training/LLaMA-Factory/evaluation/ceval/ceval.py
@@ -0,0 +1,163 @@
+# Copyright 2025 the LlamaFactory team.
+# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import datasets
+import pandas as pd
+
+
+_CITATION = """\
+@article{huang2023ceval,
+ title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models},
+ author={Huang, Yuzhen and Bai, Yuzhuo and Zhu, Zhihao and others},
+ journal={arXiv preprint arXiv:2305.08322},
+ year={2023}
+}
+"""
+
+_DESCRIPTION = """\
+C-Eval is a comprehensive Chinese evaluation suite for foundation models.
+It consists of 13948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels.
+"""
+
+_HOMEPAGE = "https://cevalbenchmark.com"
+
+_LICENSE = "Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License"
+
+_URL = "ceval.zip"
+
+task_list = [
+ "computer_network",
+ "operating_system",
+ "computer_architecture",
+ "college_programming",
+ "college_physics",
+ "college_chemistry",
+ "advanced_mathematics",
+ "probability_and_statistics",
+ "discrete_mathematics",
+ "electrical_engineer",
+ "metrology_engineer",
+ "high_school_mathematics",
+ "high_school_physics",
+ "high_school_chemistry",
+ "high_school_biology",
+ "middle_school_mathematics",
+ "middle_school_biology",
+ "middle_school_physics",
+ "middle_school_chemistry",
+ "veterinary_medicine",
+ "college_economics",
+ "business_administration",
+ "marxism",
+ "mao_zedong_thought",
+ "education_science",
+ "teacher_qualification",
+ "high_school_politics",
+ "high_school_geography",
+ "middle_school_politics",
+ "middle_school_geography",
+ "modern_chinese_history",
+ "ideological_and_moral_cultivation",
+ "logic",
+ "law",
+ "chinese_language_and_literature",
+ "art_studies",
+ "professional_tour_guide",
+ "legal_professional",
+ "high_school_chinese",
+ "high_school_history",
+ "middle_school_history",
+ "civil_servant",
+ "sports_science",
+ "plant_protection",
+ "basic_medicine",
+ "clinical_medicine",
+ "urban_and_rural_planner",
+ "accountant",
+ "fire_engineer",
+ "environmental_impact_assessment_engineer",
+ "tax_accountant",
+ "physician",
+]
+
+
+class CevalConfig(datasets.BuilderConfig):
+ def __init__(self, **kwargs):
+ super().__init__(version=datasets.Version("1.0.0"), **kwargs)
+
+
+class Ceval(datasets.GeneratorBasedBuilder):
+ BUILDER_CONFIGS = [
+ CevalConfig(
+ name=task_name,
+ )
+ for task_name in task_list
+ ]
+
+ def _info(self):
+ features = datasets.Features(
+ {
+ "id": datasets.Value("int32"),
+ "question": datasets.Value("string"),
+ "A": datasets.Value("string"),
+ "B": datasets.Value("string"),
+ "C": datasets.Value("string"),
+ "D": datasets.Value("string"),
+ "answer": datasets.Value("string"),
+ "explanation": datasets.Value("string"),
+ }
+ )
+ return datasets.DatasetInfo(
+ description=_DESCRIPTION,
+ features=features,
+ homepage=_HOMEPAGE,
+ license=_LICENSE,
+ citation=_CITATION,
+ )
+
+ def _split_generators(self, dl_manager):
+ data_dir = dl_manager.download_and_extract(_URL)
+ task_name = self.config.name
+ return [
+ datasets.SplitGenerator(
+ name=datasets.Split.TEST,
+ gen_kwargs={
+ "filepath": os.path.join(data_dir, "test", f"{task_name}_test.csv"),
+ },
+ ),
+ datasets.SplitGenerator(
+ name=datasets.Split.VALIDATION,
+ gen_kwargs={
+ "filepath": os.path.join(data_dir, "val", f"{task_name}_val.csv"),
+ },
+ ),
+ datasets.SplitGenerator(
+ name=datasets.Split.TRAIN,
+ gen_kwargs={
+ "filepath": os.path.join(data_dir, "dev", f"{task_name}_dev.csv"),
+ },
+ ),
+ ]
+
+ def _generate_examples(self, filepath):
+ df = pd.read_csv(filepath, encoding="utf-8")
+ for i, instance in enumerate(df.to_dict(orient="records")):
+ if "answer" not in instance.keys():
+ instance["answer"] = ""
+ if "explanation" not in instance.keys():
+ instance["explanation"] = ""
+ yield i, instance
diff --git a/post-training/LLaMA-Factory/evaluation/ceval/ceval.zip b/post-training/LLaMA-Factory/evaluation/ceval/ceval.zip
new file mode 100644
index 0000000..d39274a
Binary files /dev/null and b/post-training/LLaMA-Factory/evaluation/ceval/ceval.zip differ
diff --git a/post-training/LLaMA-Factory/evaluation/ceval/mapping.json b/post-training/LLaMA-Factory/evaluation/ceval/mapping.json
new file mode 100644
index 0000000..42584aa
--- /dev/null
+++ b/post-training/LLaMA-Factory/evaluation/ceval/mapping.json
@@ -0,0 +1,210 @@
+{
+ "accountant": {
+ "name": "注册会计师",
+ "category": "Other"
+ },
+ "advanced_mathematics": {
+ "name": "高等数学",
+ "category": "STEM"
+ },
+ "art_studies": {
+ "name": "艺术学",
+ "category": "Humanities"
+ },
+ "basic_medicine": {
+ "name": "基础医学",
+ "category": "Other"
+ },
+ "business_administration": {
+ "name": "工商管理",
+ "category": "Social Sciences"
+ },
+ "chinese_language_and_literature": {
+ "name": "中国语言文学",
+ "category": "Humanities"
+ },
+ "civil_servant": {
+ "name": "公务员",
+ "category": "Other"
+ },
+ "clinical_medicine": {
+ "name": "临床医学",
+ "category": "Other"
+ },
+ "college_chemistry": {
+ "name": "大学化学",
+ "category": "STEM"
+ },
+ "college_economics": {
+ "name": "大学经济学",
+ "category": "Social Sciences"
+ },
+ "college_physics": {
+ "name": "大学物理",
+ "category": "STEM"
+ },
+ "college_programming": {
+ "name": "大学编程",
+ "category": "STEM"
+ },
+ "computer_architecture": {
+ "name": "计算机组成",
+ "category": "STEM"
+ },
+ "computer_network": {
+ "name": "计算机网络",
+ "category": "STEM"
+ },
+ "discrete_mathematics": {
+ "name": "离散数学",
+ "category": "STEM"
+ },
+ "education_science": {
+ "name": "教育学",
+ "category": "Social Sciences"
+ },
+ "electrical_engineer": {
+ "name": "注册电气工程师",
+ "category": "STEM"
+ },
+ "environmental_impact_assessment_engineer": {
+ "name": "环境影响评价工程师",
+ "category": "Other"
+ },
+ "fire_engineer": {
+ "name": "注册消防工程师",
+ "category": "Other"
+ },
+ "high_school_biology": {
+ "name": "高中生物",
+ "category": "STEM"
+ },
+ "high_school_chemistry": {
+ "name": "高中化学",
+ "category": "STEM"
+ },
+ "high_school_chinese": {
+ "name": "高中语文",
+ "category": "Humanities"
+ },
+ "high_school_geography": {
+ "name": "高中地理",
+ "category": "Social Sciences"
+ },
+ "high_school_history": {
+ "name": "高中历史",
+ "category": "Humanities"
+ },
+ "high_school_mathematics": {
+ "name": "高中数学",
+ "category": "STEM"
+ },
+ "high_school_physics": {
+ "name": "高中物理",
+ "category": "STEM"
+ },
+ "high_school_politics": {
+ "name": "高中政治",
+ "category": "Social Sciences"
+ },
+ "ideological_and_moral_cultivation": {
+ "name": "思想道德修养与法律基础",
+ "category": "Humanities"
+ },
+ "law": {
+ "name": "法学",
+ "category": "Humanities"
+ },
+ "legal_professional": {
+ "name": "法律职业资格",
+ "category": "Humanities"
+ },
+ "logic": {
+ "name": "逻辑学",
+ "category": "Humanities"
+ },
+ "mao_zedong_thought": {
+ "name": "毛泽东思想和中国特色社会主义理论体系概论",
+ "category": "Social Sciences"
+ },
+ "marxism": {
+ "name": "马克思主义基本原理",
+ "category": "Social Sciences"
+ },
+ "metrology_engineer": {
+ "name": "注册计量师",
+ "category": "STEM"
+ },
+ "middle_school_biology": {
+ "name": "初中生物",
+ "category": "STEM"
+ },
+ "middle_school_chemistry": {
+ "name": "初中化学",
+ "category": "STEM"
+ },
+ "middle_school_geography": {
+ "name": "初中地理",
+ "category": "Social Sciences"
+ },
+ "middle_school_history": {
+ "name": "初中历史",
+ "category": "Humanities"
+ },
+ "middle_school_mathematics": {
+ "name": "初中数学",
+ "category": "STEM"
+ },
+ "middle_school_physics": {
+ "name": "初中物理",
+ "category": "STEM"
+ },
+ "middle_school_politics": {
+ "name": "初中政治",
+ "category": "Social Sciences"
+ },
+ "modern_chinese_history": {
+ "name": "近代史纲要",
+ "category": "Humanities"
+ },
+ "operating_system": {
+ "name": "操作系统",
+ "category": "STEM"
+ },
+ "physician": {
+ "name": "医师资格",
+ "category": "Other"
+ },
+ "plant_protection": {
+ "name": "植物保护",
+ "category": "Other"
+ },
+ "probability_and_statistics": {
+ "name": "概率统计",
+ "category": "STEM"
+ },
+ "professional_tour_guide": {
+ "name": "导游资格",
+ "category": "Humanities"
+ },
+ "sports_science": {
+ "name": "体育学",
+ "category": "Other"
+ },
+ "tax_accountant": {
+ "name": "税务师",
+ "category": "Other"
+ },
+ "teacher_qualification": {
+ "name": "教师资格",
+ "category": "Social Sciences"
+ },
+ "urban_and_rural_planner": {
+ "name": "注册城乡规划师",
+ "category": "Other"
+ },
+ "veterinary_medicine": {
+ "name": "兽医学",
+ "category": "STEM"
+ }
+}
diff --git a/post-training/LLaMA-Factory/evaluation/cmmlu/cmmlu.py b/post-training/LLaMA-Factory/evaluation/cmmlu/cmmlu.py
new file mode 100644
index 0000000..44c52f1
--- /dev/null
+++ b/post-training/LLaMA-Factory/evaluation/cmmlu/cmmlu.py
@@ -0,0 +1,170 @@
+# Copyright 2025 the LlamaFactory team.
+# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import datasets
+import pandas as pd
+
+
+_CITATION = """\
+@article{li2023cmmlu,
+ title={CMMLU: Measuring massive multitask language understanding in Chinese},
+ author={Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and others,
+ journal={arXiv preprint arXiv:2306.09212},
+ year={2023}
+}
+"""
+
+_DESCRIPTION = """\
+CMMLU is a comprehensive Chinese assessment suite specifically designed to evaluate the advanced knowledge
+and reasoning abilities of LLMs within the Chinese language and cultural context.
+"""
+
+_HOMEPAGE = "https://github.com/haonan-li/CMMLU"
+
+_LICENSE = "Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License"
+
+_URL = "cmmlu.zip"
+
+task_list = [
+ "agronomy",
+ "anatomy",
+ "ancient_chinese",
+ "arts",
+ "astronomy",
+ "business_ethics",
+ "chinese_civil_service_exam",
+ "chinese_driving_rule",
+ "chinese_food_culture",
+ "chinese_foreign_policy",
+ "chinese_history",
+ "chinese_literature",
+ "chinese_teacher_qualification",
+ "clinical_knowledge",
+ "college_actuarial_science",
+ "college_education",
+ "college_engineering_hydrology",
+ "college_law",
+ "college_mathematics",
+ "college_medical_statistics",
+ "college_medicine",
+ "computer_science",
+ "computer_security",
+ "conceptual_physics",
+ "construction_project_management",
+ "economics",
+ "education",
+ "electrical_engineering",
+ "elementary_chinese",
+ "elementary_commonsense",
+ "elementary_information_and_technology",
+ "elementary_mathematics",
+ "ethnology",
+ "food_science",
+ "genetics",
+ "global_facts",
+ "high_school_biology",
+ "high_school_chemistry",
+ "high_school_geography",
+ "high_school_mathematics",
+ "high_school_physics",
+ "high_school_politics",
+ "human_sexuality",
+ "international_law",
+ "journalism",
+ "jurisprudence",
+ "legal_and_moral_basis",
+ "logical",
+ "machine_learning",
+ "management",
+ "marketing",
+ "marxist_theory",
+ "modern_chinese",
+ "nutrition",
+ "philosophy",
+ "professional_accounting",
+ "professional_law",
+ "professional_medicine",
+ "professional_psychology",
+ "public_relations",
+ "security_study",
+ "sociology",
+ "sports_science",
+ "traditional_chinese_medicine",
+ "virology",
+ "world_history",
+ "world_religions",
+]
+
+
+class CMMLUConfig(datasets.BuilderConfig):
+ def __init__(self, **kwargs):
+ super().__init__(version=datasets.Version("1.0.1"), **kwargs)
+
+
+class CMMLU(datasets.GeneratorBasedBuilder):
+ BUILDER_CONFIGS = [
+ CMMLUConfig(
+ name=task_name,
+ )
+ for task_name in task_list
+ ]
+
+ def _info(self):
+ features = datasets.Features(
+ {
+ "question": datasets.Value("string"),
+ "A": datasets.Value("string"),
+ "B": datasets.Value("string"),
+ "C": datasets.Value("string"),
+ "D": datasets.Value("string"),
+ "answer": datasets.Value("string"),
+ }
+ )
+ return datasets.DatasetInfo(
+ description=_DESCRIPTION,
+ features=features,
+ homepage=_HOMEPAGE,
+ license=_LICENSE,
+ citation=_CITATION,
+ )
+
+ def _split_generators(self, dl_manager):
+ data_dir = dl_manager.download_and_extract(_URL)
+ task_name = self.config.name
+ return [
+ datasets.SplitGenerator(
+ name=datasets.Split.TEST,
+ gen_kwargs={
+ "filepath": os.path.join(data_dir, f"test/{task_name}.csv"),
+ },
+ ),
+ datasets.SplitGenerator(
+ name=datasets.Split.TRAIN,
+ gen_kwargs={
+ "filepath": os.path.join(data_dir, f"dev/{task_name}.csv"),
+ },
+ ),
+ ]
+
+ def _generate_examples(self, filepath):
+ df = pd.read_csv(filepath, header=0, index_col=0, encoding="utf-8")
+ for i, instance in enumerate(df.to_dict(orient="records")):
+ question = instance.pop("Question", "")
+ answer = instance.pop("Answer", "")
+ instance["question"] = question
+ instance["answer"] = answer
+ yield i, instance
diff --git a/post-training/LLaMA-Factory/evaluation/cmmlu/cmmlu.zip b/post-training/LLaMA-Factory/evaluation/cmmlu/cmmlu.zip
new file mode 100644
index 0000000..c6bede1
Binary files /dev/null and b/post-training/LLaMA-Factory/evaluation/cmmlu/cmmlu.zip differ
diff --git a/post-training/LLaMA-Factory/evaluation/cmmlu/mapping.json b/post-training/LLaMA-Factory/evaluation/cmmlu/mapping.json
new file mode 100644
index 0000000..312d038
--- /dev/null
+++ b/post-training/LLaMA-Factory/evaluation/cmmlu/mapping.json
@@ -0,0 +1,270 @@
+{
+ "agronomy": {
+ "name": "农学",
+ "category": "Other"
+ },
+ "anatomy": {
+ "name": "解剖学",
+ "category": "STEM"
+ },
+ "ancient_chinese": {
+ "name": "古汉语",
+ "category": "Social Sciences"
+ },
+ "arts": {
+ "name": "艺术学",
+ "category": "Humanities"
+ },
+ "astronomy": {
+ "name": "天文学",
+ "category": "STEM"
+ },
+ "business_ethics": {
+ "name": "商业伦理",
+ "category": "Social Sciences"
+ },
+ "chinese_civil_service_exam": {
+ "name": "中国公务员考试",
+ "category": "Social Sciences"
+ },
+ "chinese_driving_rule": {
+ "name": "中国驾驶规则",
+ "category": "Other"
+ },
+ "chinese_food_culture": {
+ "name": "中国饮食文化",
+ "category": "Social Sciences"
+ },
+ "chinese_foreign_policy": {
+ "name": "中国外交政策",
+ "category": "Social Sciences"
+ },
+ "chinese_history": {
+ "name": "中国历史",
+ "category": "Humanities"
+ },
+ "chinese_literature": {
+ "name": "中国文学",
+ "category": "Humanities"
+ },
+ "chinese_teacher_qualification": {
+ "name": "中国教师资格",
+ "category": "Social Sciences"
+ },
+ "college_actuarial_science": {
+ "name": "大学精算学",
+ "category": "STEM"
+ },
+ "college_education": {
+ "name": "大学教育学",
+ "category": "Social Sciences"
+ },
+ "college_engineering_hydrology": {
+ "name": "大学工程水文学",
+ "category": "STEM"
+ },
+ "college_law": {
+ "name": "大学法律",
+ "category": "Humanities"
+ },
+ "college_mathematics": {
+ "name": "大学数学",
+ "category": "STEM"
+ },
+ "college_medical_statistics": {
+ "name": "大学医学统计",
+ "category": "STEM"
+ },
+ "clinical_knowledge": {
+ "name": "临床知识",
+ "category": "Other"
+ },
+ "college_medicine": {
+ "name": "大学医学",
+ "category": "Other"
+ },
+ "computer_science": {
+ "name": "计算机科学",
+ "category": "STEM"
+ },
+ "computer_security": {
+ "name": "计算机安全",
+ "category": "Other"
+ },
+ "conceptual_physics": {
+ "name": "概念物理学",
+ "category": "STEM"
+ },
+ "construction_project_management": {
+ "name": "建设工程管理",
+ "category": "Other"
+ },
+ "economics": {
+ "name": "经济学",
+ "category": "Social Sciences"
+ },
+ "education": {
+ "name": "教育学",
+ "category": "Social Sciences"
+ },
+ "elementary_chinese": {
+ "name": "小学语文",
+ "category": "Social Sciences"
+ },
+ "elementary_commonsense": {
+ "name": "小学常识",
+ "category": "Other"
+ },
+ "elementary_information_and_technology": {
+ "name": "小学信息技术",
+ "category": "Other"
+ },
+ "electrical_engineering": {
+ "name": "电气工程",
+ "category": "STEM"
+ },
+ "elementary_mathematics": {
+ "name": "初等数学",
+ "category": "STEM"
+ },
+ "ethnology": {
+ "name": "民族学",
+ "category": "Social Sciences"
+ },
+ "food_science": {
+ "name": "食品科学",
+ "category": "Other"
+ },
+ "genetics": {
+ "name": "遗传学",
+ "category": "STEM"
+ },
+ "global_facts": {
+ "name": "全球事实",
+ "category": "Humanities"
+ },
+ "high_school_biology": {
+ "name": "高中生物",
+ "category": "STEM"
+ },
+ "high_school_chemistry": {
+ "name": "高中化学",
+ "category": "STEM"
+ },
+ "high_school_geography": {
+ "name": "高中地理",
+ "category": "Social Sciences"
+ },
+ "high_school_mathematics": {
+ "name": "高中数学",
+ "category": "STEM"
+ },
+ "high_school_physics": {
+ "name": "高中物理学",
+ "category": "STEM"
+ },
+ "high_school_politics": {
+ "name": "高中政治",
+ "category": "Social Sciences"
+ },
+ "human_sexuality": {
+ "name": "人类性行为",
+ "category": "Other"
+ },
+ "international_law": {
+ "name": "国际法学",
+ "category": "Humanities"
+ },
+ "journalism": {
+ "name": "新闻学",
+ "category": "Social Sciences"
+ },
+ "jurisprudence": {
+ "name": "法理学",
+ "category": "Humanities"
+ },
+ "legal_and_moral_basis": {
+ "name": "法律与道德基础",
+ "category": "Other"
+ },
+ "logical": {
+ "name": "逻辑学",
+ "category": "Humanities"
+ },
+ "machine_learning": {
+ "name": "机器学习",
+ "category": "STEM"
+ },
+ "management": {
+ "name": "管理学",
+ "category": "Social Sciences"
+ },
+ "marketing": {
+ "name": "市场营销",
+ "category": "Social Sciences"
+ },
+ "marxist_theory": {
+ "name": "马克思主义理论",
+ "category": "Humanities"
+ },
+ "modern_chinese": {
+ "name": "现代汉语",
+ "category": "Social Sciences"
+ },
+ "nutrition": {
+ "name": "营养学",
+ "category": "Other"
+ },
+ "philosophy": {
+ "name": "哲学",
+ "category": "Humanities"
+ },
+ "professional_accounting": {
+ "name": "专业会计",
+ "category": "Social Sciences"
+ },
+ "professional_law": {
+ "name": "专业法学",
+ "category": "Humanities"
+ },
+ "professional_medicine": {
+ "name": "专业医学",
+ "category": "Other"
+ },
+ "professional_psychology": {
+ "name": "专业心理学",
+ "category": "Social Sciences"
+ },
+ "public_relations": {
+ "name": "公共关系",
+ "category": "Social Sciences"
+ },
+ "security_study": {
+ "name": "安全研究",
+ "category": "Social Sciences"
+ },
+ "sociology": {
+ "name": "社会学",
+ "category": "Social Sciences"
+ },
+ "sports_science": {
+ "name": "体育学",
+ "category": "Other"
+ },
+ "traditional_chinese_medicine": {
+ "name": "中医中药",
+ "category": "Other"
+ },
+ "virology": {
+ "name": "病毒学",
+ "category": "STEM"
+ },
+ "world_history": {
+ "name": "世界历史",
+ "category": "Humanities"
+ },
+ "world_religions": {
+ "name": "世界宗教",
+ "category": "Humanities"
+ }
+}
diff --git a/post-training/LLaMA-Factory/evaluation/mmlu/mapping.json b/post-training/LLaMA-Factory/evaluation/mmlu/mapping.json
new file mode 100644
index 0000000..27bd7c2
--- /dev/null
+++ b/post-training/LLaMA-Factory/evaluation/mmlu/mapping.json
@@ -0,0 +1,230 @@
+{
+ "abstract_algebra": {
+ "name": "abstract algebra",
+ "category": "STEM"
+ },
+ "anatomy": {
+ "name": "anatomy",
+ "category": "Other"
+ },
+ "astronomy": {
+ "name": "astronomy",
+ "category": "STEM"
+ },
+ "business_ethics": {
+ "name": "business ethics",
+ "category": "Other"
+ },
+ "clinical_knowledge": {
+ "name": "clinical knowledge",
+ "category": "Other"
+ },
+ "college_biology": {
+ "name": "college biology",
+ "category": "STEM"
+ },
+ "college_chemistry": {
+ "name": "college chemistry",
+ "category": "STEM"
+ },
+ "college_computer_science": {
+ "name": "college computer science",
+ "category": "STEM"
+ },
+ "college_mathematics": {
+ "name": "college mathematics",
+ "category": "STEM"
+ },
+ "college_medicine": {
+ "name": "college medicine",
+ "category": "Other"
+ },
+ "college_physics": {
+ "name": "college physics",
+ "category": "STEM"
+ },
+ "computer_security": {
+ "name": "computer security",
+ "category": "STEM"
+ },
+ "conceptual_physics": {
+ "name": "conceptual physics",
+ "category": "STEM"
+ },
+ "econometrics": {
+ "name": "econometrics",
+ "category": "Social Sciences"
+ },
+ "electrical_engineering": {
+ "name": "electrical engineering",
+ "category": "STEM"
+ },
+ "elementary_mathematics": {
+ "name": "elementary mathematics",
+ "category": "STEM"
+ },
+ "formal_logic": {
+ "name": "formal logic",
+ "category": "Humanities"
+ },
+ "global_facts": {
+ "name": "global facts",
+ "category": "Other"
+ },
+ "high_school_biology": {
+ "name": "high school biology",
+ "category": "STEM"
+ },
+ "high_school_chemistry": {
+ "name": "high school chemistry",
+ "category": "STEM"
+ },
+ "high_school_computer_science": {
+ "name": "high school computer science",
+ "category": "STEM"
+ },
+ "high_school_european_history": {
+ "name": "high school european history",
+ "category": "Humanities"
+ },
+ "high_school_geography": {
+ "name": "high school geography",
+ "category": "Social Sciences"
+ },
+ "high_school_government_and_politics": {
+ "name": "high school government and politics",
+ "category": "Social Sciences"
+ },
+ "high_school_macroeconomics": {
+ "name": "high school macroeconomics",
+ "category": "Social Sciences"
+ },
+ "high_school_mathematics": {
+ "name": "high school mathematics",
+ "category": "STEM"
+ },
+ "high_school_microeconomics": {
+ "name": "high school microeconomics",
+ "category": "Social Sciences"
+ },
+ "high_school_physics": {
+ "name": "high school physics",
+ "category": "STEM"
+ },
+ "high_school_psychology": {
+ "name": "high school psychology",
+ "category": "Social Sciences"
+ },
+ "high_school_statistics": {
+ "name": "high school statistics",
+ "category": "STEM"
+ },
+ "high_school_us_history": {
+ "name": "high school us history",
+ "category": "Humanities"
+ },
+ "high_school_world_history": {
+ "name": "high school world history",
+ "category": "Humanities"
+ },
+ "human_aging": {
+ "name": "human aging",
+ "category": "Other"
+ },
+ "human_sexuality": {
+ "name": "human sexuality",
+ "category": "Social Sciences"
+ },
+ "international_law": {
+ "name": "international law",
+ "category": "Humanities"
+ },
+ "jurisprudence": {
+ "name": "jurisprudence",
+ "category": "Humanities"
+ },
+ "logical_fallacies": {
+ "name": "logical fallacies",
+ "category": "Humanities"
+ },
+ "machine_learning": {
+ "name": "machine learning",
+ "category": "STEM"
+ },
+ "management": {
+ "name": "management",
+ "category": "Other"
+ },
+ "marketing": {
+ "name": "marketing",
+ "category": "Other"
+ },
+ "medical_genetics": {
+ "name": "medical genetics",
+ "category": "Other"
+ },
+ "miscellaneous": {
+ "name": "miscellaneous",
+ "category": "Other"
+ },
+ "moral_disputes": {
+ "name": "moral disputes",
+ "category": "Humanities"
+ },
+ "moral_scenarios": {
+ "name": "moral scenarios",
+ "category": "Humanities"
+ },
+ "nutrition": {
+ "name": "nutrition",
+ "category": "Other"
+ },
+ "philosophy": {
+ "name": "philosophy",
+ "category": "Humanities"
+ },
+ "prehistory": {
+ "name": "prehistory",
+ "category": "Humanities"
+ },
+ "professional_accounting": {
+ "name": "professional accounting",
+ "category": "Other"
+ },
+ "professional_law": {
+ "name": "professional law",
+ "category": "Humanities"
+ },
+ "professional_medicine": {
+ "name": "professional medicine",
+ "category": "Other"
+ },
+ "professional_psychology": {
+ "name": "professional psychology",
+ "category": "Social Sciences"
+ },
+ "public_relations": {
+ "name": "public relations",
+ "category": "Social Sciences"
+ },
+ "security_studies": {
+ "name": "security studies",
+ "category": "Social Sciences"
+ },
+ "sociology": {
+ "name": "sociology",
+ "category": "Social Sciences"
+ },
+ "us_foreign_policy": {
+ "name": "us foreign policy",
+ "category": "Social Sciences"
+ },
+ "virology": {
+ "name": "virology",
+ "category": "Other"
+ },
+ "world_religions": {
+ "name": "world religions",
+ "category": "Humanities"
+ }
+}
diff --git a/post-training/LLaMA-Factory/evaluation/mmlu/mmlu.py b/post-training/LLaMA-Factory/evaluation/mmlu/mmlu.py
new file mode 100644
index 0000000..6312742
--- /dev/null
+++ b/post-training/LLaMA-Factory/evaluation/mmlu/mmlu.py
@@ -0,0 +1,163 @@
+# Copyright 2025 the LlamaFactory team.
+# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import datasets
+import pandas as pd
+
+
+_CITATION = """\
+@article{hendryckstest2021,
+ title={Measuring Massive Multitask Language Understanding},
+ author={Dan Hendrycks and Collin Burns and others},
+ journal={Proceedings of the International Conference on Learning Representations (ICLR)},
+ year={2021}
+}
+"""
+
+_DESCRIPTION = """\
+Measuring Massive Multitask Language Understanding by Dan Hendrycks, Collin Burns, Steven Basart,
+Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt (ICLR 2021).
+"""
+
+_HOMEPAGE = "https://github.com/hendrycks/test"
+
+_LICENSE = "MIT"
+
+_URL = "mmlu.zip"
+
+task_list = [
+ "high_school_european_history",
+ "business_ethics",
+ "clinical_knowledge",
+ "medical_genetics",
+ "high_school_us_history",
+ "high_school_physics",
+ "high_school_world_history",
+ "virology",
+ "high_school_microeconomics",
+ "econometrics",
+ "college_computer_science",
+ "high_school_biology",
+ "abstract_algebra",
+ "professional_accounting",
+ "philosophy",
+ "professional_medicine",
+ "nutrition",
+ "global_facts",
+ "machine_learning",
+ "security_studies",
+ "public_relations",
+ "professional_psychology",
+ "prehistory",
+ "anatomy",
+ "human_sexuality",
+ "college_medicine",
+ "high_school_government_and_politics",
+ "college_chemistry",
+ "logical_fallacies",
+ "high_school_geography",
+ "elementary_mathematics",
+ "human_aging",
+ "college_mathematics",
+ "high_school_psychology",
+ "formal_logic",
+ "high_school_statistics",
+ "international_law",
+ "high_school_mathematics",
+ "high_school_computer_science",
+ "conceptual_physics",
+ "miscellaneous",
+ "high_school_chemistry",
+ "marketing",
+ "professional_law",
+ "management",
+ "college_physics",
+ "jurisprudence",
+ "world_religions",
+ "sociology",
+ "us_foreign_policy",
+ "high_school_macroeconomics",
+ "computer_security",
+ "moral_scenarios",
+ "moral_disputes",
+ "electrical_engineering",
+ "astronomy",
+ "college_biology",
+]
+
+
+class MMLUConfig(datasets.BuilderConfig):
+ def __init__(self, **kwargs):
+ super().__init__(version=datasets.Version("1.0.0"), **kwargs)
+
+
+class MMLU(datasets.GeneratorBasedBuilder):
+ BUILDER_CONFIGS = [
+ MMLUConfig(
+ name=task_name,
+ )
+ for task_name in task_list
+ ]
+
+ def _info(self):
+ features = datasets.Features(
+ {
+ "question": datasets.Value("string"),
+ "A": datasets.Value("string"),
+ "B": datasets.Value("string"),
+ "C": datasets.Value("string"),
+ "D": datasets.Value("string"),
+ "answer": datasets.Value("string"),
+ }
+ )
+ return datasets.DatasetInfo(
+ description=_DESCRIPTION,
+ features=features,
+ homepage=_HOMEPAGE,
+ license=_LICENSE,
+ citation=_CITATION,
+ )
+
+ def _split_generators(self, dl_manager):
+ data_dir = dl_manager.download_and_extract(_URL)
+ task_name = self.config.name
+ return [
+ datasets.SplitGenerator(
+ name=datasets.Split.TEST,
+ gen_kwargs={
+ "filepath": os.path.join(data_dir, "data", "test", f"{task_name}_test.csv"),
+ },
+ ),
+ datasets.SplitGenerator(
+ name=datasets.Split.VALIDATION,
+ gen_kwargs={
+ "filepath": os.path.join(data_dir, "data", "val", f"{task_name}_val.csv"),
+ },
+ ),
+ datasets.SplitGenerator(
+ name=datasets.Split.TRAIN,
+ gen_kwargs={
+ "filepath": os.path.join(data_dir, "data", "dev", f"{task_name}_dev.csv"),
+ },
+ ),
+ ]
+
+ def _generate_examples(self, filepath):
+ df = pd.read_csv(filepath, header=None)
+ df.columns = ["question", "A", "B", "C", "D", "answer"]
+
+ yield from enumerate(df.to_dict(orient="records"))
diff --git a/post-training/LLaMA-Factory/evaluation/mmlu/mmlu.zip b/post-training/LLaMA-Factory/evaluation/mmlu/mmlu.zip
new file mode 100644
index 0000000..1aaee65
Binary files /dev/null and b/post-training/LLaMA-Factory/evaluation/mmlu/mmlu.zip differ
diff --git a/post-training/LLaMA-Factory/examples/README.md b/post-training/LLaMA-Factory/examples/README.md
new file mode 100644
index 0000000..457ec87
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/README.md
@@ -0,0 +1,266 @@
+We provide diverse examples about fine-tuning LLMs.
+
+Make sure to execute these commands in the `LLaMA-Factory` directory.
+
+## Table of Contents
+
+- [LoRA Fine-Tuning](#lora-fine-tuning)
+- [QLoRA Fine-Tuning](#qlora-fine-tuning)
+- [Full-Parameter Fine-Tuning](#full-parameter-fine-tuning)
+- [Merging LoRA Adapters and Quantization](#merging-lora-adapters-and-quantization)
+- [Inferring LoRA Fine-Tuned Models](#inferring-lora-fine-tuned-models)
+- [Extras](#extras)
+
+Use `CUDA_VISIBLE_DEVICES` (GPU) or `ASCEND_RT_VISIBLE_DEVICES` (NPU) to choose computing devices.
+
+By default, LLaMA-Factory uses all visible computing devices.
+
+## Examples
+
+### LoRA Fine-Tuning
+
+#### (Continuous) Pre-Training
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+```
+
+#### Supervised Fine-Tuning
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+```
+
+#### Multimodal Supervised Fine-Tuning
+
+```bash
+llamafactory-cli train examples/train_lora/llava1_5_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen2vl_lora_sft.yaml
+```
+
+#### DPO/ORPO/SimPO Training
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
+```
+
+#### Multimodal DPO/ORPO/SimPO Training
+
+```bash
+llamafactory-cli train examples/train_lora/qwen2vl_lora_dpo.yaml
+```
+
+#### Reward Modeling
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
+```
+
+#### PPO Training
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
+```
+
+#### KTO Training
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
+```
+
+#### Preprocess Dataset
+
+It is useful for large dataset, use `tokenized_path` in config to load the preprocessed dataset.
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
+```
+
+#### Evaluating on MMLU/CMMLU/C-Eval Benchmarks
+
+```bash
+llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
+```
+
+#### Supervised Fine-Tuning on Multiple Nodes
+
+```bash
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+```
+
+#### Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)
+
+```bash
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
+```
+
+#### Supervised Fine-Tuning with Ray on 4 GPUs
+
+```bash
+USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
+```
+
+### QLoRA Fine-Tuning
+
+#### Supervised Fine-Tuning with 4/8-bit Bitsandbytes/HQQ/EETQ Quantization (Recommended)
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
+```
+
+#### Supervised Fine-Tuning with 4-bit Bitsandbytes Quantization on Ascend NPU
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
+```
+
+#### Supervised Fine-Tuning with 4/8-bit GPTQ Quantization
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_gptq.yaml
+```
+
+#### Supervised Fine-Tuning with 4-bit AWQ Quantization
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_awq.yaml
+```
+
+#### Supervised Fine-Tuning with 2-bit AQLM Quantization
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
+```
+
+### Full-Parameter Fine-Tuning
+
+#### Supervised Fine-Tuning on Single Node
+
+```bash
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+```
+
+#### Supervised Fine-Tuning on Multiple Nodes
+
+```bash
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+```
+
+#### Multimodal Supervised Fine-Tuning
+
+```bash
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2vl_full_sft.yaml
+```
+
+### Merging LoRA Adapters and Quantization
+
+#### Merge LoRA Adapters
+
+Note: DO NOT use quantized model or `quantization_bit` when merging LoRA adapters.
+
+```bash
+llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+```
+
+#### Quantizing Model using AutoGPTQ
+
+```bash
+llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
+```
+
+### Save Ollama modelfile
+
+```bash
+llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
+```
+
+### Inferring LoRA Fine-Tuned Models
+
+#### Batch Generation using vLLM Tensor Parallel
+
+```
+python scripts/vllm_infer.py --model_name_or_path path_to_merged_model --dataset alpaca_en_demo
+```
+
+#### Use CLI ChatBox
+
+```bash
+llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+```
+
+#### Use Web UI ChatBox
+
+```bash
+llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
+```
+
+#### Launch OpenAI-style API
+
+```bash
+llamafactory-cli api examples/inference/llama3_lora_sft.yaml
+```
+
+### Extras
+
+#### Full-Parameter Fine-Tuning using GaLore
+
+```bash
+llamafactory-cli train examples/extras/galore/llama3_full_sft.yaml
+```
+
+#### Full-Parameter Fine-Tuning using APOLLO
+
+```bash
+llamafactory-cli train examples/extras/apollo/llama3_full_sft.yaml
+```
+
+#### Full-Parameter Fine-Tuning using BAdam
+
+```bash
+llamafactory-cli train examples/extras/badam/llama3_full_sft.yaml
+```
+
+#### Full-Parameter Fine-Tuning using Adam-mini
+
+```bash
+llamafactory-cli train examples/extras/adam_mini/qwen2_full_sft.yaml
+```
+
+#### LoRA+ Fine-Tuning
+
+```bash
+llamafactory-cli train examples/extras/loraplus/llama3_lora_sft.yaml
+```
+
+#### PiSSA Fine-Tuning
+
+```bash
+llamafactory-cli train examples/extras/pissa/llama3_lora_sft.yaml
+```
+
+#### Mixture-of-Depths Fine-Tuning
+
+```bash
+llamafactory-cli train examples/extras/mod/llama3_full_sft.yaml
+```
+
+#### LLaMA-Pro Fine-Tuning
+
+```bash
+bash examples/extras/llama_pro/expand.sh
+llamafactory-cli train examples/extras/llama_pro/llama3_freeze_sft.yaml
+```
+
+#### FSDP+QLoRA Fine-Tuning
+
+```bash
+bash examples/extras/fsdp_qlora/train.sh
+```
+
+#### Computing BLEU and ROUGE Scores
+
+```bash
+llamafactory-cli train examples/extras/nlg_eval/llama3_lora_predict.yaml
+```
diff --git a/post-training/LLaMA-Factory/examples/README_zh.md b/post-training/LLaMA-Factory/examples/README_zh.md
new file mode 100644
index 0000000..4899e27
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/README_zh.md
@@ -0,0 +1,266 @@
+我们提供了多样化的大模型微调示例脚本。
+
+请确保在 `LLaMA-Factory` 目录下执行下述命令。
+
+## 目录
+
+- [LoRA 微调](#lora-微调)
+- [QLoRA 微调](#qlora-微调)
+- [全参数微调](#全参数微调)
+- [合并 LoRA 适配器与模型量化](#合并-lora-适配器与模型量化)
+- [推理 LoRA 模型](#推理-lora-模型)
+- [杂项](#杂项)
+
+使用 `CUDA_VISIBLE_DEVICES`(GPU)或 `ASCEND_RT_VISIBLE_DEVICES`(NPU)选择计算设备。
+
+LLaMA-Factory 默认使用所有可见的计算设备。
+
+## 示例
+
+### LoRA 微调
+
+#### (增量)预训练
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+```
+
+#### 指令监督微调
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+```
+
+#### 多模态指令监督微调
+
+```bash
+llamafactory-cli train examples/train_lora/llava1_5_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen2vl_lora_sft.yaml
+```
+
+#### DPO/ORPO/SimPO 训练
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
+```
+
+#### 多模态 DPO/ORPO/SimPO 训练
+
+```bash
+llamafactory-cli train examples/train_lora/qwen2vl_lora_dpo.yaml
+```
+
+#### 奖励模型训练
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
+```
+
+#### PPO 训练
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
+```
+
+#### KTO 训练
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
+```
+
+#### 预处理数据集
+
+对于大数据集有帮助,在配置中使用 `tokenized_path` 以加载预处理后的数据集。
+
+```bash
+llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
+```
+
+#### 在 MMLU/CMMLU/C-Eval 上评估
+
+```bash
+llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
+```
+
+#### 多机指令监督微调
+
+```bash
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+```
+
+#### 使用 DeepSpeed ZeRO-3 平均分配显存
+
+```bash
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
+```
+
+#### 使用 Ray 在 4 张 GPU 上微调
+
+```bash
+USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
+```
+
+### QLoRA 微调
+
+#### 基于 4/8 比特 Bitsandbytes/HQQ/EETQ 量化进行指令监督微调(推荐)
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
+```
+
+#### 在 NPU 上基于 4 比特 Bitsandbytes 量化进行指令监督微调
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
+```
+
+#### 基于 4/8 比特 GPTQ 量化进行指令监督微调
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_gptq.yaml
+```
+
+#### 基于 4 比特 AWQ 量化进行指令监督微调
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_awq.yaml
+```
+
+#### 基于 2 比特 AQLM 量化进行指令监督微调
+
+```bash
+llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
+```
+
+### 全参数微调
+
+#### 在单机上进行指令监督微调
+
+```bash
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+```
+
+#### 在多机上进行指令监督微调
+
+```bash
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+```
+
+#### 多模态指令监督微调
+
+```bash
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2vl_full_sft.yaml
+```
+
+### 合并 LoRA 适配器与模型量化
+
+#### 合并 LoRA 适配器
+
+注:请勿使用量化后的模型或 `quantization_bit` 参数来合并 LoRA 适配器。
+
+```bash
+llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+```
+
+#### 使用 AutoGPTQ 量化模型
+
+```bash
+llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
+```
+
+### 保存 Ollama 配置文件
+
+```bash
+llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
+```
+
+### 推理 LoRA 模型
+
+#### 使用 vLLM+TP 批量推理
+
+```
+python scripts/vllm_infer.py --model_name_or_path path_to_merged_model --dataset alpaca_en_demo
+```
+
+#### 使用命令行对话框
+
+```bash
+llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+```
+
+#### 使用浏览器对话框
+
+```bash
+llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
+```
+
+#### 启动 OpenAI 风格 API
+
+```bash
+llamafactory-cli api examples/inference/llama3_lora_sft.yaml
+```
+
+### 杂项
+
+#### 使用 GaLore 进行全参数训练
+
+```bash
+llamafactory-cli train examples/extras/galore/llama3_full_sft.yaml
+```
+
+#### 使用 APOLLO 进行全参数训练
+
+```bash
+llamafactory-cli train examples/extras/apollo/llama3_full_sft.yaml
+```
+
+#### 使用 BAdam 进行全参数训练
+
+```bash
+llamafactory-cli train examples/extras/badam/llama3_full_sft.yaml
+```
+
+#### 使用 Adam-mini 进行全参数训练
+
+```bash
+llamafactory-cli train examples/extras/adam_mini/qwen2_full_sft.yaml
+```
+
+#### LoRA+ 微调
+
+```bash
+llamafactory-cli train examples/extras/loraplus/llama3_lora_sft.yaml
+```
+
+#### PiSSA 微调
+
+```bash
+llamafactory-cli train examples/extras/pissa/llama3_lora_sft.yaml
+```
+
+#### 深度混合微调
+
+```bash
+llamafactory-cli train examples/extras/mod/llama3_full_sft.yaml
+```
+
+#### LLaMA-Pro 微调
+
+```bash
+bash examples/extras/llama_pro/expand.sh
+llamafactory-cli train examples/extras/llama_pro/llama3_freeze_sft.yaml
+```
+
+#### FSDP+QLoRA 微调
+
+```bash
+bash examples/extras/fsdp_qlora/train.sh
+```
+
+#### 计算 BLEU 和 ROUGE 分数
+
+```bash
+llamafactory-cli train examples/extras/nlg_eval/llama3_lora_predict.yaml
+```
diff --git a/post-training/LLaMA-Factory/examples/accelerate/fsdp_config.yaml b/post-training/LLaMA-Factory/examples/accelerate/fsdp_config.yaml
new file mode 100644
index 0000000..09d2f5d
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/accelerate/fsdp_config.yaml
@@ -0,0 +1,25 @@
+compute_environment: LOCAL_MACHINE
+debug: false
+distributed_type: FSDP
+downcast_bf16: 'no'
+fsdp_config:
+ fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+ fsdp_backward_prefetch: BACKWARD_PRE
+ fsdp_forward_prefetch: false
+ fsdp_cpu_ram_efficient_loading: true
+ fsdp_offload_params: false
+ fsdp_sharding_strategy: FULL_SHARD
+ fsdp_state_dict_type: FULL_STATE_DICT
+ fsdp_sync_module_states: true
+ fsdp_use_orig_params: true
+machine_rank: 0
+main_training_function: main
+mixed_precision: bf16 # or fp16
+num_machines: 1 # the number of nodes
+num_processes: 2 # the number of GPUs in all nodes
+rdzv_backend: static
+same_network: true
+tpu_env: []
+tpu_use_cluster: false
+tpu_use_sudo: false
+use_cpu: false
diff --git a/post-training/LLaMA-Factory/examples/accelerate/fsdp_config_offload.yaml b/post-training/LLaMA-Factory/examples/accelerate/fsdp_config_offload.yaml
new file mode 100644
index 0000000..a55e652
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/accelerate/fsdp_config_offload.yaml
@@ -0,0 +1,25 @@
+compute_environment: LOCAL_MACHINE
+debug: false
+distributed_type: FSDP
+downcast_bf16: 'no'
+fsdp_config:
+ fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+ fsdp_backward_prefetch: BACKWARD_PRE
+ fsdp_forward_prefetch: false
+ fsdp_cpu_ram_efficient_loading: true
+ fsdp_offload_params: true # offload may affect training speed
+ fsdp_sharding_strategy: FULL_SHARD
+ fsdp_state_dict_type: FULL_STATE_DICT
+ fsdp_sync_module_states: true
+ fsdp_use_orig_params: true
+machine_rank: 0
+main_training_function: main
+mixed_precision: bf16 # or fp16
+num_machines: 1 # the number of nodes
+num_processes: 2 # the number of GPUs in all nodes
+rdzv_backend: static
+same_network: true
+tpu_env: []
+tpu_use_cluster: false
+tpu_use_sudo: false
+use_cpu: false
diff --git a/post-training/LLaMA-Factory/examples/deepspeed/ds_z0_config.json b/post-training/LLaMA-Factory/examples/deepspeed/ds_z0_config.json
new file mode 100644
index 0000000..8ac9918
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/deepspeed/ds_z0_config.json
@@ -0,0 +1,28 @@
+{
+ "train_batch_size": "auto",
+ "train_micro_batch_size_per_gpu": "auto",
+ "gradient_accumulation_steps": "auto",
+ "gradient_clipping": "auto",
+ "zero_allow_untested_optimizer": true,
+ "fp16": {
+ "enabled": "auto",
+ "loss_scale": 0,
+ "loss_scale_window": 1000,
+ "initial_scale_power": 16,
+ "hysteresis": 2,
+ "min_loss_scale": 1
+ },
+ "bf16": {
+ "enabled": "auto"
+ },
+ "zero_optimization": {
+ "stage": 0,
+ "allgather_partitions": true,
+ "allgather_bucket_size": 5e8,
+ "overlap_comm": false,
+ "reduce_scatter": true,
+ "reduce_bucket_size": 5e8,
+ "contiguous_gradients": true,
+ "round_robin_gradients": true
+ }
+}
diff --git a/post-training/LLaMA-Factory/examples/deepspeed/ds_z2_config.json b/post-training/LLaMA-Factory/examples/deepspeed/ds_z2_config.json
new file mode 100644
index 0000000..c4177e5
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/deepspeed/ds_z2_config.json
@@ -0,0 +1,28 @@
+{
+ "train_batch_size": "auto",
+ "train_micro_batch_size_per_gpu": "auto",
+ "gradient_accumulation_steps": "auto",
+ "gradient_clipping": "auto",
+ "zero_allow_untested_optimizer": true,
+ "fp16": {
+ "enabled": "auto",
+ "loss_scale": 0,
+ "loss_scale_window": 1000,
+ "initial_scale_power": 16,
+ "hysteresis": 2,
+ "min_loss_scale": 1
+ },
+ "bf16": {
+ "enabled": "auto"
+ },
+ "zero_optimization": {
+ "stage": 2,
+ "allgather_partitions": true,
+ "allgather_bucket_size": 5e8,
+ "overlap_comm": false,
+ "reduce_scatter": true,
+ "reduce_bucket_size": 5e8,
+ "contiguous_gradients": true,
+ "round_robin_gradients": true
+ }
+}
diff --git a/post-training/LLaMA-Factory/examples/deepspeed/ds_z2_offload_config.json b/post-training/LLaMA-Factory/examples/deepspeed/ds_z2_offload_config.json
new file mode 100644
index 0000000..7550472
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/deepspeed/ds_z2_offload_config.json
@@ -0,0 +1,32 @@
+{
+ "train_batch_size": "auto",
+ "train_micro_batch_size_per_gpu": "auto",
+ "gradient_accumulation_steps": "auto",
+ "gradient_clipping": "auto",
+ "zero_allow_untested_optimizer": true,
+ "fp16": {
+ "enabled": "auto",
+ "loss_scale": 0,
+ "loss_scale_window": 1000,
+ "initial_scale_power": 16,
+ "hysteresis": 2,
+ "min_loss_scale": 1
+ },
+ "bf16": {
+ "enabled": "auto"
+ },
+ "zero_optimization": {
+ "stage": 2,
+ "offload_optimizer": {
+ "device": "cpu",
+ "pin_memory": true
+ },
+ "allgather_partitions": true,
+ "allgather_bucket_size": 5e8,
+ "overlap_comm": false,
+ "reduce_scatter": true,
+ "reduce_bucket_size": 5e8,
+ "contiguous_gradients": true,
+ "round_robin_gradients": true
+ }
+}
diff --git a/post-training/LLaMA-Factory/examples/deepspeed/ds_z3_config.json b/post-training/LLaMA-Factory/examples/deepspeed/ds_z3_config.json
new file mode 100644
index 0000000..46584a7
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/deepspeed/ds_z3_config.json
@@ -0,0 +1,30 @@
+{
+ "train_batch_size": "auto",
+ "train_micro_batch_size_per_gpu": "auto",
+ "gradient_accumulation_steps": "auto",
+ "gradient_clipping": "auto",
+ "zero_allow_untested_optimizer": true,
+ "fp16": {
+ "enabled": "auto",
+ "loss_scale": 0,
+ "loss_scale_window": 1000,
+ "initial_scale_power": 16,
+ "hysteresis": 2,
+ "min_loss_scale": 1
+ },
+ "bf16": {
+ "enabled": "auto"
+ },
+ "zero_optimization": {
+ "stage": 3,
+ "overlap_comm": false,
+ "contiguous_gradients": true,
+ "sub_group_size": 1e9,
+ "reduce_bucket_size": "auto",
+ "stage3_prefetch_bucket_size": "auto",
+ "stage3_param_persistence_threshold": "auto",
+ "stage3_max_live_parameters": 1e9,
+ "stage3_max_reuse_distance": 1e9,
+ "stage3_gather_16bit_weights_on_model_save": true
+ }
+}
diff --git a/post-training/LLaMA-Factory/examples/deepspeed/ds_z3_offload_config.json b/post-training/LLaMA-Factory/examples/deepspeed/ds_z3_offload_config.json
new file mode 100644
index 0000000..0fabebb
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/deepspeed/ds_z3_offload_config.json
@@ -0,0 +1,38 @@
+{
+ "train_batch_size": "auto",
+ "train_micro_batch_size_per_gpu": "auto",
+ "gradient_accumulation_steps": "auto",
+ "gradient_clipping": "auto",
+ "zero_allow_untested_optimizer": true,
+ "fp16": {
+ "enabled": "auto",
+ "loss_scale": 0,
+ "loss_scale_window": 1000,
+ "initial_scale_power": 16,
+ "hysteresis": 2,
+ "min_loss_scale": 1
+ },
+ "bf16": {
+ "enabled": "auto"
+ },
+ "zero_optimization": {
+ "stage": 3,
+ "offload_optimizer": {
+ "device": "cpu",
+ "pin_memory": true
+ },
+ "offload_param": {
+ "device": "cpu",
+ "pin_memory": true
+ },
+ "overlap_comm": false,
+ "contiguous_gradients": true,
+ "sub_group_size": 1e9,
+ "reduce_bucket_size": "auto",
+ "stage3_prefetch_bucket_size": "auto",
+ "stage3_param_persistence_threshold": "auto",
+ "stage3_max_live_parameters": 1e9,
+ "stage3_max_reuse_distance": 1e9,
+ "stage3_gather_16bit_weights_on_model_save": true
+ }
+}
diff --git a/post-training/LLaMA-Factory/examples/extras/adam_mini/qwen2_full_sft.yaml b/post-training/LLaMA-Factory/examples/extras/adam_mini/qwen2_full_sft.yaml
new file mode 100644
index 0000000..79df9a7
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/adam_mini/qwen2_full_sft.yaml
@@ -0,0 +1,43 @@
+### model
+model_name_or_path: Qwen/Qwen2-1.5B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+use_adam_mini: true
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: qwen
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/qwen2-1_5b/full/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/extras/apollo/llama3_full_sft.yaml b/post-training/LLaMA-Factory/examples/extras/apollo/llama3_full_sft.yaml
new file mode 100644
index 0000000..d9fb6c2
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/apollo/llama3_full_sft.yaml
@@ -0,0 +1,48 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+use_apollo: true
+apollo_layerwise: true # choices: [true, false], use false for DDP training
+apollo_target: all
+apollo_rank: 128
+apollo_scale: 32.0
+apollo_scale_type: channel
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/full/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 1 # use 1 for layerwise apollo
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+pure_bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/extras/badam/llama3_full_sft.yaml b/post-training/LLaMA-Factory/examples/extras/badam/llama3_full_sft.yaml
new file mode 100644
index 0000000..7ce3323
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/badam/llama3_full_sft.yaml
@@ -0,0 +1,46 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+use_badam: true
+badam_mode: layer
+badam_switch_mode: ascending
+badam_switch_interval: 50
+badam_verbose: 2
+# deepspeed: examples/deepspeed/ds_z3_config.json
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/full/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/extras/fsdp_qlora/llama3_lora_sft.yaml b/post-training/LLaMA-Factory/examples/extras/fsdp_qlora/llama3_lora_sft.yaml
new file mode 100644
index 0000000..1a8d974
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/fsdp_qlora/llama3_lora_sft.yaml
@@ -0,0 +1,45 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+quantization_bit: 4
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/extras/fsdp_qlora/train.sh b/post-training/LLaMA-Factory/examples/extras/fsdp_qlora/train.sh
new file mode 100644
index 0000000..fac8cde
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/fsdp_qlora/train.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+# DO NOT use GPTQ/AWQ model in FSDP+QLoRA
+
+CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
+ --config_file examples/accelerate/fsdp_config.yaml \
+ src/train.py examples/extras/fsdp_qlora/llama3_lora_sft.yaml
diff --git a/post-training/LLaMA-Factory/examples/extras/galore/llama3_full_sft.yaml b/post-training/LLaMA-Factory/examples/extras/galore/llama3_full_sft.yaml
new file mode 100644
index 0000000..9973093
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/galore/llama3_full_sft.yaml
@@ -0,0 +1,47 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+use_galore: true
+galore_layerwise: true # choices: [true, false], use false for DDP training
+galore_target: all
+galore_rank: 128
+galore_scale: 2.0
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/full/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 1 # use 1 for layerwise galore
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+pure_bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/extras/llama_pro/expand.sh b/post-training/LLaMA-Factory/examples/extras/llama_pro/expand.sh
new file mode 100644
index 0000000..9f3c013
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/llama_pro/expand.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+python scripts/llama_pro.py \
+ --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
+ --output_dir models/llama3-8b-pro \
+ --num_expand 8
diff --git a/post-training/LLaMA-Factory/examples/extras/llama_pro/llama3_freeze_sft.yaml b/post-training/LLaMA-Factory/examples/extras/llama_pro/llama3_freeze_sft.yaml
new file mode 100644
index 0000000..6c5efb8
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/llama_pro/llama3_freeze_sft.yaml
@@ -0,0 +1,45 @@
+### model
+model_name_or_path: models/llama3-8b-pro
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: freeze
+freeze_trainable_layers: 8
+freeze_trainable_modules: all
+use_llama_pro: true
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b-pro/freeze/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/extras/loraplus/llama3_lora_sft.yaml b/post-training/LLaMA-Factory/examples/extras/loraplus/llama3_lora_sft.yaml
new file mode 100644
index 0000000..574b487
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/loraplus/llama3_lora_sft.yaml
@@ -0,0 +1,45 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+loraplus_lr_ratio: 16.0
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/extras/mod/llama3_full_sft.yaml b/post-training/LLaMA-Factory/examples/extras/mod/llama3_full_sft.yaml
new file mode 100644
index 0000000..ed784e7
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/mod/llama3_full_sft.yaml
@@ -0,0 +1,44 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+mixture_of_depths: convert
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b-mod/full/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+optim: paged_adamw_8bit
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+pure_bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/extras/nlg_eval/llama3_lora_predict.yaml b/post-training/LLaMA-Factory/examples/extras/nlg_eval/llama3_lora_predict.yaml
new file mode 100644
index 0000000..be51c2e
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/nlg_eval/llama3_lora_predict.yaml
@@ -0,0 +1,31 @@
+# The batch generation can be SLOW using this config.
+# For faster inference, we recommend to use `scripts/vllm_infer.py`.
+
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+adapter_name_or_path: saves/llama3-8b/lora/sft
+trust_remote_code: true
+
+### method
+stage: sft
+do_predict: true
+finetuning_type: lora
+
+### dataset
+eval_dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 50
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/predict
+overwrite_output_dir: true
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### eval
+per_device_eval_batch_size: 1
+predict_with_generate: true
+ddp_timeout: 180000000
diff --git a/post-training/LLaMA-Factory/examples/extras/pissa/init.sh b/post-training/LLaMA-Factory/examples/extras/pissa/init.sh
new file mode 100644
index 0000000..11e1e35
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/pissa/init.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+python scripts/pissa_init.py \
+ --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
+ --output_dir models/llama3-8b-pissa
diff --git a/post-training/LLaMA-Factory/examples/extras/pissa/llama3_lora_sft.yaml b/post-training/LLaMA-Factory/examples/extras/pissa/llama3_lora_sft.yaml
new file mode 100644
index 0000000..1668343
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/extras/pissa/llama3_lora_sft.yaml
@@ -0,0 +1,47 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+pissa_init: true
+pissa_iter: 16
+pissa_convert: true
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/inference/llama3.yaml b/post-training/LLaMA-Factory/examples/inference/llama3.yaml
new file mode 100644
index 0000000..2851e9a
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/inference/llama3.yaml
@@ -0,0 +1,4 @@
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+template: llama3
+infer_backend: huggingface # choices: [huggingface, vllm]
+trust_remote_code: true
diff --git a/post-training/LLaMA-Factory/examples/inference/llama3_full_sft.yaml b/post-training/LLaMA-Factory/examples/inference/llama3_full_sft.yaml
new file mode 100644
index 0000000..d4555ca
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/inference/llama3_full_sft.yaml
@@ -0,0 +1,4 @@
+model_name_or_path: saves/llama3-8b/full/sft
+template: llama3
+infer_backend: huggingface # choices: [huggingface, vllm]
+trust_remote_code: true
diff --git a/post-training/LLaMA-Factory/examples/inference/llama3_lora_sft.yaml b/post-training/LLaMA-Factory/examples/inference/llama3_lora_sft.yaml
new file mode 100644
index 0000000..7796c52
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/inference/llama3_lora_sft.yaml
@@ -0,0 +1,5 @@
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+adapter_name_or_path: saves/llama3-8b/lora/sft
+template: llama3
+infer_backend: huggingface # choices: [huggingface, vllm]
+trust_remote_code: true
diff --git a/post-training/LLaMA-Factory/examples/inference/llama3_sglang.yaml b/post-training/LLaMA-Factory/examples/inference/llama3_sglang.yaml
new file mode 100644
index 0000000..8241898
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/inference/llama3_sglang.yaml
@@ -0,0 +1,4 @@
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+template: llama3
+infer_backend: sglang
+trust_remote_code: true
diff --git a/post-training/LLaMA-Factory/examples/inference/llama3_vllm.yaml b/post-training/LLaMA-Factory/examples/inference/llama3_vllm.yaml
new file mode 100644
index 0000000..4379956
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/inference/llama3_vllm.yaml
@@ -0,0 +1,5 @@
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+template: llama3
+infer_backend: vllm
+vllm_enforce_eager: true
+trust_remote_code: true
diff --git a/post-training/LLaMA-Factory/examples/inference/llava1_5.yaml b/post-training/LLaMA-Factory/examples/inference/llava1_5.yaml
new file mode 100644
index 0000000..2e934dd
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/inference/llava1_5.yaml
@@ -0,0 +1,4 @@
+model_name_or_path: llava-hf/llava-1.5-7b-hf
+template: llava
+infer_backend: huggingface # choices: [huggingface, vllm]
+trust_remote_code: true
diff --git a/post-training/LLaMA-Factory/examples/inference/qwen2_vl.yaml b/post-training/LLaMA-Factory/examples/inference/qwen2_vl.yaml
new file mode 100644
index 0000000..b5eabc6
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/inference/qwen2_vl.yaml
@@ -0,0 +1,4 @@
+model_name_or_path: Qwen/Qwen2-VL-7B-Instruct
+template: qwen2_vl
+infer_backend: huggingface # choices: [huggingface, vllm]
+trust_remote_code: true
diff --git a/post-training/LLaMA-Factory/examples/merge_lora/llama3_full_sft.yaml b/post-training/LLaMA-Factory/examples/merge_lora/llama3_full_sft.yaml
new file mode 100644
index 0000000..4e329fa
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/merge_lora/llama3_full_sft.yaml
@@ -0,0 +1,10 @@
+### model
+model_name_or_path: saves/llama3-8b/full/sft
+template: llama3
+trust_remote_code: true
+
+### export
+export_dir: output/llama3_full_sft
+export_size: 5
+export_device: cpu
+export_legacy_format: false
diff --git a/post-training/LLaMA-Factory/examples/merge_lora/llama3_gptq.yaml b/post-training/LLaMA-Factory/examples/merge_lora/llama3_gptq.yaml
new file mode 100644
index 0000000..3a2d909
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/merge_lora/llama3_gptq.yaml
@@ -0,0 +1,12 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+template: llama3
+trust_remote_code: true
+
+### export
+export_dir: output/llama3_gptq
+export_quantization_bit: 4
+export_quantization_dataset: data/c4_demo.json
+export_size: 5
+export_device: cpu
+export_legacy_format: false
diff --git a/post-training/LLaMA-Factory/examples/merge_lora/llama3_lora_sft.yaml b/post-training/LLaMA-Factory/examples/merge_lora/llama3_lora_sft.yaml
new file mode 100644
index 0000000..97bb457
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/merge_lora/llama3_lora_sft.yaml
@@ -0,0 +1,13 @@
+### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
+
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+adapter_name_or_path: saves/llama3-8b/lora/sft
+template: llama3
+trust_remote_code: true
+
+### export
+export_dir: output/llama3_lora_sft
+export_size: 5
+export_device: cpu
+export_legacy_format: false
diff --git a/post-training/LLaMA-Factory/examples/merge_lora/qwen2vl_lora_sft.yaml b/post-training/LLaMA-Factory/examples/merge_lora/qwen2vl_lora_sft.yaml
new file mode 100644
index 0000000..103dbcd
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/merge_lora/qwen2vl_lora_sft.yaml
@@ -0,0 +1,13 @@
+### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
+
+### model
+model_name_or_path: Qwen/Qwen2-VL-7B-Instruct
+adapter_name_or_path: saves/qwen2_vl-7b/lora/sft
+template: qwen2_vl
+trust_remote_code: true
+
+### export
+export_dir: output/qwen2_vl_lora_sft
+export_size: 5
+export_device: cpu
+export_legacy_format: false
diff --git a/post-training/LLaMA-Factory/examples/train_full/llama3_full_sft.yaml b/post-training/LLaMA-Factory/examples/train_full/llama3_full_sft.yaml
new file mode 100644
index 0000000..fb7066a
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_full/llama3_full_sft.yaml
@@ -0,0 +1,45 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/full/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 2
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: alpaca_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_full/qwen2vl_full_sft_3b.yaml b/post-training/LLaMA-Factory/examples/train_full/qwen2vl_full_sft_3b.yaml
new file mode 100644
index 0000000..f9773f3
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_full/qwen2vl_full_sft_3b.yaml
@@ -0,0 +1,49 @@
+### model
+model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
+image_max_pixels: 1843200 # 1280*720*2
+video_max_pixels: 16384
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+# freeze_trainable_layers: 1
+# freeze_trainable_modules: all
+freeze_vision_tower: true # choices: [true, false]
+freeze_multi_modal_projector: false # choices: [true, false]
+freeze_language_model: false
+deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
+# deepspeed: examples/deepspeed/ds_z3_offload_config.json
+
+### dataset
+dataset: websight_toy
+template: qwen2_vl
+cutoff_len: 8192
+max_samples: 9000
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+### output
+output_dir: saves/websight/toy/3b
+# output_dir: saves/mrweb/fulll
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 2
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_full/qwen2vl_full_sft_7b.yaml b/post-training/LLaMA-Factory/examples/train_full/qwen2vl_full_sft_7b.yaml
new file mode 100644
index 0000000..9f46674
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_full/qwen2vl_full_sft_7b.yaml
@@ -0,0 +1,49 @@
+### model
+model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+image_max_pixels: 1843200 # 1280*720*2
+video_max_pixels: 16384
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+# freeze_trainable_layers: 1
+# freeze_trainable_modules: all
+freeze_vision_tower: true # choices: [true, false]
+freeze_multi_modal_projector: false # choices: [true, false]
+train_mm_proj_only: false # choices: [true, false]
+deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
+# deepspeed: examples/deepspeed/ds_z3_offload_config.json
+
+### dataset
+dataset: websight
+template: qwen2_vl
+cutoff_len: 8192
+max_samples: 9000
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+### output
+output_dir: saves/websight/full/7b
+# output_dir: saves/mrweb/fulll
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 2
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_dpo.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_dpo.yaml
new file mode 100644
index 0000000..fd8c042
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_dpo.yaml
@@ -0,0 +1,48 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: dpo
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+pref_beta: 0.1
+pref_loss: sigmoid # choices: [sigmoid (dpo), orpo, simpo]
+
+### dataset
+dataset: dpo_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/dpo
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 5.0e-6
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: dpo_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_eval.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_eval.yaml
new file mode 100644
index 0000000..60d7c2f
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_eval.yaml
@@ -0,0 +1,19 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+adapter_name_or_path: saves/llama3-8b/lora/sft
+trust_remote_code: true
+
+### method
+finetuning_type: lora
+
+### dataset
+task: mmlu_test # choices: [mmlu_test, ceval_validation, cmmlu_test]
+template: fewshot
+lang: en
+n_shot: 5
+
+### output
+save_dir: saves/llama3-8b/lora/eval
+
+### eval
+batch_size: 4
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_kto.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_kto.yaml
new file mode 100644
index 0000000..113b912
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_kto.yaml
@@ -0,0 +1,44 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: kto
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+pref_beta: 0.1
+
+### dataset
+dataset: kto_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/kto
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 5.0e-6
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_ppo.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_ppo.yaml
new file mode 100644
index 0000000..8794481
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_ppo.yaml
@@ -0,0 +1,43 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+reward_model: saves/llama3-8b/lora/reward
+trust_remote_code: true
+
+### method
+stage: ppo
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/ppo
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-5
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### generate
+max_new_tokens: 512
+top_k: 0
+top_p: 0.9
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_pretrain.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_pretrain.yaml
new file mode 100644
index 0000000..3c851d7
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_pretrain.yaml
@@ -0,0 +1,45 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: pt
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: c4_demo
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/pretrain
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: c4_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_reward.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_reward.yaml
new file mode 100644
index 0000000..48230b5
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_reward.yaml
@@ -0,0 +1,46 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: rm
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: dpo_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/reward
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: dpo_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft.yaml
new file mode 100644
index 0000000..157d661
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft.yaml
@@ -0,0 +1,46 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: alpaca_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft_ds3.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft_ds3.yaml
new file mode 100644
index 0000000..e20b351
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft_ds3.yaml
@@ -0,0 +1,47 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 2
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: alpaca_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft_ray.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft_ray.yaml
new file mode 100644
index 0000000..8c03bf9
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_lora_sft_ray.yaml
@@ -0,0 +1,61 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct # or use local absolute path
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+dataset_dir: REMOTE:llamafactory/demo_data # or use local absolute path
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: tmp_dir
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### ray
+ray_run_name: llama3_8b_sft_lora
+ray_storage_path: ./saves
+ray_num_workers: 4 # Number of GPUs to use.
+placement_strategy: PACK
+resources_per_worker:
+ GPU: 1
+# ray_init_kwargs:
+# runtime_env:
+# env_vars:
+# : ""
+# pip:
+# - emoji
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: alpaca_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama3_preprocess.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama3_preprocess.yaml
new file mode 100644
index 0000000..fbaf01f
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama3_preprocess.yaml
@@ -0,0 +1,23 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+tokenized_path: saves/llama3-8b/dataset/sft
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+overwrite_output_dir: true
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llama4_lora_sft_ds3.yaml b/post-training/LLaMA-Factory/examples/train_lora/llama4_lora_sft_ds3.yaml
new file mode 100644
index 0000000..6c5bb7b
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llama4_lora_sft_ds3.yaml
@@ -0,0 +1,49 @@
+# pip install git+https://github.com/hiyouga/transformers.git@llama4_train
+
+### model
+model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
+
+### dataset
+dataset: mllm_demo,identity,alpaca_en_demo
+template: llama4
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama4-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 2
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# eval_dataset: alpaca_en_demo
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/llava1_5_lora_sft.yaml b/post-training/LLaMA-Factory/examples/train_lora/llava1_5_lora_sft.yaml
new file mode 100644
index 0000000..63cdcae
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/llava1_5_lora_sft.yaml
@@ -0,0 +1,45 @@
+### model
+model_name_or_path: llava-hf/llava-1.5-7b-hf
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: mllm_demo
+template: llava
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llava1_5-7b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/qwen2vl_lora_dpo.yaml b/post-training/LLaMA-Factory/examples/train_lora/qwen2vl_lora_dpo.yaml
new file mode 100644
index 0000000..3c990b4
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/qwen2vl_lora_dpo.yaml
@@ -0,0 +1,49 @@
+### model
+model_name_or_path: Qwen/Qwen2-VL-7B-Instruct
+image_max_pixels: 262144
+video_max_pixels: 16384
+trust_remote_code: true
+
+### method
+stage: dpo
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+pref_beta: 0.1
+pref_loss: sigmoid # choices: [sigmoid (dpo), orpo, simpo]
+
+### dataset
+dataset: rlhf_v
+template: qwen2_vl
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/qwen2_vl-7b/lora/dpo
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 5.0e-6
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_lora/qwen2vl_lora_sft.yaml b/post-training/LLaMA-Factory/examples/train_lora/qwen2vl_lora_sft.yaml
new file mode 100644
index 0000000..54ff984
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_lora/qwen2vl_lora_sft.yaml
@@ -0,0 +1,47 @@
+### model
+model_name_or_path: Qwen/Qwen2-VL-7B-Instruct
+image_max_pixels: 262144
+video_max_pixels: 16384
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: mllm_demo,identity,alpaca_en_demo # video: mllm_video_demo
+template: qwen2_vl
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/qwen2_vl-7b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_aqlm.yaml b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_aqlm.yaml
new file mode 100644
index 0000000..a7d44c7
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_aqlm.yaml
@@ -0,0 +1,44 @@
+### model
+model_name_or_path: ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_awq.yaml b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_awq.yaml
new file mode 100644
index 0000000..861edfd
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_awq.yaml
@@ -0,0 +1,44 @@
+### model
+model_name_or_path: TechxGenus/Meta-Llama-3-8B-Instruct-AWQ
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_bnb_npu.yaml b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
new file mode 100644
index 0000000..d68ce66
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
@@ -0,0 +1,47 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+quantization_bit: 4
+quantization_method: bnb
+double_quantization: false
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_gptq.yaml b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_gptq.yaml
new file mode 100644
index 0000000..729d862
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_gptq.yaml
@@ -0,0 +1,44 @@
+### model
+model_name_or_path: TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_otfq.yaml b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_otfq.yaml
new file mode 100644
index 0000000..1a157af
--- /dev/null
+++ b/post-training/LLaMA-Factory/examples/train_qlora/llama3_lora_sft_otfq.yaml
@@ -0,0 +1,46 @@
+### model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+quantization_bit: 4 # choices: [8 (bnb/hqq/eetq), 4 (bnb/hqq), 3 (hqq), 2 (hqq)]
+quantization_method: bnb # choices: [bnb, hqq, eetq]
+trust_remote_code: true
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+
+### dataset
+dataset: identity,alpaca_en_demo
+template: llama3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 1.0e-4
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+
+### eval
+# val_size: 0.1
+# per_device_eval_batch_size: 1
+# eval_strategy: steps
+# eval_steps: 500
diff --git a/post-training/LLaMA-Factory/pyproject.toml b/post-training/LLaMA-Factory/pyproject.toml
new file mode 100644
index 0000000..2585539
--- /dev/null
+++ b/post-training/LLaMA-Factory/pyproject.toml
@@ -0,0 +1,95 @@
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "llamafactory"
+dynamic = [
+ "version",
+ "dependencies",
+ "optional-dependencies",
+ "requires-python",
+ "scripts",
+ "authors",
+ "description",
+ "readme",
+ "license",
+ "keywords",
+ "classifiers"
+]
+
+[tool.ruff]
+target-version = "py39"
+line-length = 119
+indent-width = 4
+
+[tool.ruff.lint]
+ignore = [
+ "C408", # collection
+ "C901", # complex
+ "E501", # line too long
+ "E731", # lambda function
+ "E741", # ambiguous var name
+ "D100", # no doc public module
+ "D101", # no doc public class
+ "D102", # no doc public method
+ "D103", # no doc public function
+ "D104", # no doc public package
+ "D105", # no doc magic method
+ "D107", # no doc __init__
+]
+extend-select = [
+ "C", # complexity
+ "E", # error
+ "F", # pyflakes
+ "I", # isort
+ "W", # warning
+ "UP", # pyupgrade
+ "D", # pydocstyle
+ "PT009", # pytest assert
+ "RUF022", # sort __all__
+]
+
+[tool.ruff.lint.isort]
+lines-after-imports = 2
+known-first-party = ["llamafactory"]
+known-third-party = [
+ "accelerate",
+ "datasets",
+ "gradio",
+ "numpy",
+ "peft",
+ "torch",
+ "transformers",
+ "trl",
+]
+
+[tool.ruff.lint.pydocstyle]
+convention = "google"
+
+[tool.ruff.format]
+quote-style = "double"
+indent-style = "space"
+docstring-code-format = true
+skip-magic-trailing-comma = false
+line-ending = "auto"
+
+[tool.uv]
+conflicts = [
+ [
+ { extra = "torch-npu" },
+ { extra = "aqlm" },
+ ],
+ [
+ { extra = "torch-npu" },
+ { extra = "liger-kernel" },
+ ],
+ [
+ { extra = "torch-npu" },
+ { extra = "vllm" },
+ ],
+ [
+ { extra = "sglang" },
+ { extra = "minicpm_v" },
+ ],
+]
diff --git a/post-training/LLaMA-Factory/requirements.txt b/post-training/LLaMA-Factory/requirements.txt
new file mode 100644
index 0000000..c818bb2
--- /dev/null
+++ b/post-training/LLaMA-Factory/requirements.txt
@@ -0,0 +1,25 @@
+transformers>=4.45.0,<=4.51.3,!=4.46.*,!=4.47.*,!=4.48.0
+datasets>=2.16.0,<=3.5.0
+accelerate>=0.34.0,<=1.6.0
+peft>=0.14.0,<=0.15.1
+trl>=0.8.6,<=0.9.6
+tokenizers>=0.19.0,<=0.21.1
+gradio>=4.38.0,<=5.25.0
+scipy
+einops
+sentencepiece
+tiktoken
+protobuf
+uvicorn
+fastapi
+sse-starlette
+matplotlib>=3.7.0
+fire
+packaging
+pyyaml
+numpy<2.0.0
+pydantic<=2.10.6
+pandas>=2.0.0
+av
+librosa
+tyro<0.9.0
diff --git a/post-training/LLaMA-Factory/run_exp.sh b/post-training/LLaMA-Factory/run_exp.sh
new file mode 100644
index 0000000..776eb9c
--- /dev/null
+++ b/post-training/LLaMA-Factory/run_exp.sh
@@ -0,0 +1,2 @@
+set -x
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2vl_full_sft_3b.yaml
\ No newline at end of file
diff --git a/post-training/LLaMA-Factory/scripts/api_example/test_image.py b/post-training/LLaMA-Factory/scripts/api_example/test_image.py
new file mode 100644
index 0000000..afd2b69
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/api_example/test_image.py
@@ -0,0 +1,65 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+from openai import OpenAI
+from transformers.utils.versions import require_version
+
+
+require_version("openai>=1.5.0", "To fix: pip install openai>=1.5.0")
+
+
+def main():
+ client = OpenAI(
+ api_key="{}".format(os.getenv("API_KEY", "0")),
+ base_url="http://localhost:{}/v1".format(os.getenv("API_PORT", 8000)),
+ )
+ messages = []
+ messages.append(
+ {
+ "role": "user",
+ "content": [
+ {"type": "text", "text": "Output the color and number of each box."},
+ {
+ "type": "image_url",
+ "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-VL/boxes.png"},
+ },
+ ],
+ }
+ )
+ result = client.chat.completions.create(messages=messages, model="test")
+ messages.append(result.choices[0].message)
+ print("Round 1:", result.choices[0].message.content)
+ # The image shows a pyramid of colored blocks with numbers on them. Here are the colors and numbers of ...
+ messages.append(
+ {
+ "role": "user",
+ "content": [
+ {"type": "text", "text": "What kind of flower is this?"},
+ {
+ "type": "image_url",
+ "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-VL/flowers.jpg"},
+ },
+ ],
+ }
+ )
+ result = client.chat.completions.create(messages=messages, model="test")
+ messages.append(result.choices[0].message)
+ print("Round 2:", result.choices[0].message.content)
+ # The image shows a cluster of forget-me-not flowers. Forget-me-nots are small ...
+
+
+if __name__ == "__main__":
+ main()
diff --git a/post-training/LLaMA-Factory/scripts/api_example/test_toolcall.py b/post-training/LLaMA-Factory/scripts/api_example/test_toolcall.py
new file mode 100644
index 0000000..e291ba6
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/api_example/test_toolcall.py
@@ -0,0 +1,77 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+
+from openai import OpenAI
+from transformers.utils.versions import require_version
+
+
+require_version("openai>=1.5.0", "To fix: pip install openai>=1.5.0")
+
+
+def calculate_gpa(grades: list[str], hours: list[int]) -> float:
+ grade_to_score = {"A": 4, "B": 3, "C": 2}
+ total_score, total_hour = 0, 0
+ for grade, hour in zip(grades, hours):
+ total_score += grade_to_score[grade] * hour
+ total_hour += hour
+ return round(total_score / total_hour, 2)
+
+
+def main():
+ client = OpenAI(
+ api_key="{}".format(os.getenv("API_KEY", "0")),
+ base_url="http://localhost:{}/v1".format(os.getenv("API_PORT", 8000)),
+ )
+ tools = [
+ {
+ "type": "function",
+ "function": {
+ "name": "calculate_gpa",
+ "description": "Calculate the Grade Point Average (GPA) based on grades and credit hours",
+ "parameters": {
+ "type": "object",
+ "properties": {
+ "grades": {"type": "array", "items": {"type": "string"}, "description": "The grades"},
+ "hours": {"type": "array", "items": {"type": "integer"}, "description": "The credit hours"},
+ },
+ "required": ["grades", "hours"],
+ },
+ },
+ }
+ ]
+ tool_map = {"calculate_gpa": calculate_gpa}
+
+ messages = []
+ messages.append({"role": "user", "content": "My grades are A, A, B, and C. The credit hours are 3, 4, 3, and 2."})
+ result = client.chat.completions.create(messages=messages, model="test", tools=tools)
+ if result.choices[0].message.tool_calls is None:
+ raise ValueError("Cannot retrieve function call from the response.")
+
+ messages.append(result.choices[0].message)
+ tool_call = result.choices[0].message.tool_calls[0].function
+ print(tool_call)
+ # Function(arguments='{"grades": ["A", "A", "B", "C"], "hours": [3, 4, 3, 2]}', name='calculate_gpa')
+ name, arguments = tool_call.name, json.loads(tool_call.arguments)
+ tool_result = tool_map[name](**arguments)
+ messages.append({"role": "tool", "content": json.dumps({"gpa": tool_result}, ensure_ascii=False)})
+ result = client.chat.completions.create(messages=messages, model="test", tools=tools)
+ print(result.choices[0].message.content)
+ # Based on the grades and credit hours you provided, your Grade Point Average (GPA) is 3.42.
+
+
+if __name__ == "__main__":
+ main()
diff --git a/post-training/LLaMA-Factory/scripts/convert_ckpt/llamafy_baichuan2.py b/post-training/LLaMA-Factory/scripts/convert_ckpt/llamafy_baichuan2.py
new file mode 100644
index 0000000..3dbeff4
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/convert_ckpt/llamafy_baichuan2.py
@@ -0,0 +1,112 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from collections import OrderedDict
+from typing import Any
+
+import fire
+import torch
+from huggingface_hub import split_torch_state_dict_into_shards
+from safetensors.torch import save_file
+from tqdm import tqdm
+from transformers.modeling_utils import SAFE_WEIGHTS_INDEX_NAME, SAFE_WEIGHTS_NAME, WEIGHTS_INDEX_NAME, WEIGHTS_NAME
+
+
+CONFIG_NAME = "config.json"
+
+
+def save_weight(input_dir: str, output_dir: str, shard_size: str, save_safetensors: bool):
+ baichuan2_state_dict: dict[str, torch.Tensor] = OrderedDict()
+ for filepath in tqdm(os.listdir(input_dir), desc="Load weights"):
+ if os.path.isfile(os.path.join(input_dir, filepath)) and filepath.endswith(".bin"):
+ shard_weight = torch.load(os.path.join(input_dir, filepath), map_location="cpu")
+ baichuan2_state_dict.update(shard_weight)
+
+ llama_state_dict: dict[str, torch.Tensor] = OrderedDict()
+ for key, value in tqdm(baichuan2_state_dict.items(), desc="Convert format"):
+ if "W_pack" in key:
+ proj_size = value.size(0) // 3
+ llama_state_dict[key.replace("W_pack", "q_proj")] = value[:proj_size, :]
+ llama_state_dict[key.replace("W_pack", "k_proj")] = value[proj_size : 2 * proj_size, :]
+ llama_state_dict[key.replace("W_pack", "v_proj")] = value[2 * proj_size :, :]
+ elif "lm_head" in key:
+ llama_state_dict[key] = torch.nn.functional.normalize(value)
+ else:
+ llama_state_dict[key] = value
+
+ weights_name = SAFE_WEIGHTS_NAME if save_safetensors else WEIGHTS_NAME
+ filename_pattern = weights_name.replace(".bin", "{suffix}.bin").replace(".safetensors", "{suffix}.safetensors")
+ state_dict_split = split_torch_state_dict_into_shards(
+ llama_state_dict, filename_pattern=filename_pattern, max_shard_size=shard_size
+ )
+ for shard_file, tensors in tqdm(state_dict_split.filename_to_tensors.items(), desc="Save weights"):
+ shard = {tensor: llama_state_dict[tensor].contiguous() for tensor in tensors}
+ if save_safetensors:
+ save_file(shard, os.path.join(output_dir, shard_file), metadata={"format": "pt"})
+ else:
+ torch.save(shard, os.path.join(output_dir, shard_file))
+
+ if not state_dict_split.is_sharded:
+ print(f"Model weights saved in {os.path.join(output_dir, weights_name)}.")
+ else:
+ index = {
+ "metadata": state_dict_split.metadata,
+ "weight_map": state_dict_split.tensor_to_filename,
+ }
+ index_name = SAFE_WEIGHTS_INDEX_NAME if save_safetensors else WEIGHTS_INDEX_NAME
+ with open(os.path.join(output_dir, index_name), "w", encoding="utf-8") as f:
+ json.dump(index, f, indent=2, sort_keys=True)
+
+ print(f"Model weights saved in {output_dir}.")
+
+
+def save_config(input_dir: str, output_dir: str):
+ with open(os.path.join(input_dir, CONFIG_NAME), encoding="utf-8") as f:
+ llama2_config_dict: dict[str, Any] = json.load(f)
+
+ llama2_config_dict["architectures"] = ["LlamaForCausalLM"]
+ llama2_config_dict.pop("auto_map", None)
+ llama2_config_dict.pop("tokenizer_class", None)
+ llama2_config_dict["model_type"] = "llama"
+
+ with open(os.path.join(output_dir, CONFIG_NAME), "w", encoding="utf-8") as f:
+ json.dump(llama2_config_dict, f, indent=2)
+
+ print(f"Model config saved in {os.path.join(output_dir, CONFIG_NAME)}")
+
+
+def llamafy_baichuan2(
+ input_dir: str,
+ output_dir: str,
+ shard_size: str = "2GB",
+ save_safetensors: bool = True,
+):
+ r"""Convert the Baichuan2-7B model in the same format as LLaMA2-7B.
+
+ Usage: python llamafy_baichuan2.py --input_dir input --output_dir output
+ Converted model: https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied
+ """
+ try:
+ os.makedirs(output_dir, exist_ok=False)
+ except Exception as e:
+ raise print("Output dir already exists", e)
+
+ save_weight(input_dir, output_dir, shard_size, save_safetensors)
+ save_config(input_dir, output_dir)
+
+
+if __name__ == "__main__":
+ fire.Fire(llamafy_baichuan2)
diff --git a/post-training/LLaMA-Factory/scripts/convert_ckpt/llamafy_qwen.py b/post-training/LLaMA-Factory/scripts/convert_ckpt/llamafy_qwen.py
new file mode 100644
index 0000000..599b0f1
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/convert_ckpt/llamafy_qwen.py
@@ -0,0 +1,165 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from collections import OrderedDict
+from typing import Any
+
+import fire
+import torch
+from huggingface_hub import split_torch_state_dict_into_shards
+from safetensors import safe_open
+from safetensors.torch import save_file
+from tqdm import tqdm
+from transformers.modeling_utils import SAFE_WEIGHTS_INDEX_NAME, SAFE_WEIGHTS_NAME, WEIGHTS_INDEX_NAME, WEIGHTS_NAME
+from transformers.utils import check_min_version
+
+
+try:
+ check_min_version("4.34.0")
+except Exception:
+ raise ValueError("Please upgrade `transformers` to 4.34.0")
+
+
+CONFIG_NAME = "config.json"
+
+
+def save_weight(input_dir: str, output_dir: str, shard_size: str, save_safetensors: bool) -> str:
+ qwen_state_dict: dict[str, torch.Tensor] = OrderedDict()
+ for filepath in tqdm(os.listdir(input_dir), desc="Load weights"):
+ if os.path.isfile(os.path.join(input_dir, filepath)) and filepath.endswith(".safetensors"):
+ with safe_open(os.path.join(input_dir, filepath), framework="pt", device="cpu") as f:
+ for key in f.keys():
+ qwen_state_dict[key] = f.get_tensor(key)
+
+ llama_state_dict: dict[str, torch.Tensor] = OrderedDict()
+ torch_dtype = None
+ for key, value in tqdm(qwen_state_dict.items(), desc="Convert format"):
+ if torch_dtype is None:
+ torch_dtype = value.dtype
+ if "wte" in key:
+ llama_state_dict["model.embed_tokens.weight"] = value
+ elif "ln_f" in key:
+ llama_state_dict["model.norm.weight"] = value
+ else:
+ key = key.replace("transformer.h", "model.layers")
+ if "attn.c_attn" in key:
+ proj_size = value.size(0) // 3
+ llama_state_dict[key.replace("attn.c_attn", "self_attn.q_proj")] = value[:proj_size, ...]
+ llama_state_dict[key.replace("attn.c_attn", "self_attn.k_proj")] = value[
+ proj_size : 2 * proj_size, ...
+ ]
+ llama_state_dict[key.replace("attn.c_attn", "self_attn.v_proj")] = value[2 * proj_size :, ...]
+ elif "attn.c_proj" in key:
+ llama_state_dict[key.replace("attn.c_proj", "self_attn.o_proj")] = value
+ llama_state_dict[key.replace("attn.c_proj.weight", "self_attn.o_proj.bias")] = torch.zeros_like(
+ value[:, 0]
+ ).squeeze()
+ elif "ln_1" in key:
+ llama_state_dict[key.replace("ln_1", "input_layernorm")] = value
+ elif "ln_2" in key:
+ llama_state_dict[key.replace("ln_2", "post_attention_layernorm")] = value
+ elif "mlp.w1" in key:
+ llama_state_dict[key.replace("mlp.w1", "mlp.up_proj")] = value
+ elif "mlp.w2" in key:
+ llama_state_dict[key.replace("mlp.w2", "mlp.gate_proj")] = value
+ elif "mlp.c_proj" in key:
+ llama_state_dict[key.replace("mlp.c_proj", "mlp.down_proj")] = value
+ elif "lm_head" in key:
+ llama_state_dict[key] = value
+ else:
+ raise KeyError(f"Unable to process key {key}")
+
+ weights_name = SAFE_WEIGHTS_NAME if save_safetensors else WEIGHTS_NAME
+ filename_pattern = weights_name.replace(".bin", "{suffix}.bin").replace(".safetensors", "{suffix}.safetensors")
+ state_dict_split = split_torch_state_dict_into_shards(
+ llama_state_dict, filename_pattern=filename_pattern, max_shard_size=shard_size
+ )
+ for shard_file, tensors in tqdm(state_dict_split.filename_to_tensors.items(), desc="Save weights"):
+ shard = {tensor: llama_state_dict[tensor].contiguous() for tensor in tensors}
+ if save_safetensors:
+ save_file(shard, os.path.join(output_dir, shard_file), metadata={"format": "pt"})
+ else:
+ torch.save(shard, os.path.join(output_dir, shard_file))
+
+ if not state_dict_split.is_sharded:
+ print(f"Model weights saved in {os.path.join(output_dir, weights_name)}.")
+ else:
+ index = {
+ "metadata": state_dict_split.metadata,
+ "weight_map": state_dict_split.tensor_to_filename,
+ }
+ index_name = SAFE_WEIGHTS_INDEX_NAME if save_safetensors else WEIGHTS_INDEX_NAME
+ with open(os.path.join(output_dir, index_name), "w", encoding="utf-8") as f:
+ json.dump(index, f, indent=2, sort_keys=True)
+
+ print(f"Model weights saved in {output_dir}.")
+
+ return str(torch_dtype).replace("torch.", "")
+
+
+def save_config(input_dir: str, output_dir: str, torch_dtype: str):
+ with open(os.path.join(input_dir, CONFIG_NAME), encoding="utf-8") as f:
+ qwen_config_dict: dict[str, Any] = json.load(f)
+
+ llama2_config_dict: dict[str, Any] = OrderedDict()
+ llama2_config_dict["architectures"] = ["LlamaForCausalLM"]
+ llama2_config_dict["hidden_act"] = "silu"
+ llama2_config_dict["hidden_size"] = qwen_config_dict["hidden_size"]
+ llama2_config_dict["initializer_range"] = qwen_config_dict["initializer_range"]
+ llama2_config_dict["intermediate_size"] = qwen_config_dict["intermediate_size"] // 2
+ llama2_config_dict["max_position_embeddings"] = qwen_config_dict["max_position_embeddings"]
+ llama2_config_dict["model_type"] = "llama"
+ llama2_config_dict["num_attention_heads"] = qwen_config_dict["num_attention_heads"]
+ llama2_config_dict["num_hidden_layers"] = qwen_config_dict["num_hidden_layers"]
+ llama2_config_dict["num_key_value_heads"] = qwen_config_dict["hidden_size"] // qwen_config_dict["kv_channels"]
+ llama2_config_dict["pretraining_tp"] = 1
+ llama2_config_dict["rms_norm_eps"] = qwen_config_dict["layer_norm_epsilon"]
+ llama2_config_dict["rope_scaling"] = None
+ llama2_config_dict["tie_word_embeddings"] = qwen_config_dict["tie_word_embeddings"]
+ llama2_config_dict["torch_dtype"] = torch_dtype
+ llama2_config_dict["transformers_version"] = "4.34.0"
+ llama2_config_dict["use_cache"] = True
+ llama2_config_dict["vocab_size"] = qwen_config_dict["vocab_size"]
+ llama2_config_dict["attention_bias"] = True
+
+ with open(os.path.join(output_dir, CONFIG_NAME), "w", encoding="utf-8") as f:
+ json.dump(llama2_config_dict, f, indent=2)
+
+ print(f"Model config saved in {os.path.join(output_dir, CONFIG_NAME)}")
+
+
+def llamafy_qwen(
+ input_dir: str,
+ output_dir: str,
+ shard_size: str = "2GB",
+ save_safetensors: bool = False,
+):
+ r"""Convert the Qwen models in the same format as LLaMA2.
+
+ Usage: python llamafy_qwen.py --input_dir input --output_dir output
+ Converted model: https://huggingface.co/hiyouga/Qwen-14B-Chat-LLaMAfied
+ """
+ try:
+ os.makedirs(output_dir, exist_ok=False)
+ except Exception as e:
+ raise print("Output dir already exists", e)
+
+ torch_dtype = save_weight(input_dir, output_dir, shard_size, save_safetensors)
+ save_config(input_dir, output_dir, torch_dtype)
+
+
+if __name__ == "__main__":
+ fire.Fire(llamafy_qwen)
diff --git a/post-training/LLaMA-Factory/scripts/convert_ckpt/tiny_llama4.py b/post-training/LLaMA-Factory/scripts/convert_ckpt/tiny_llama4.py
new file mode 100644
index 0000000..2a96cfa
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/convert_ckpt/tiny_llama4.py
@@ -0,0 +1,39 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from transformers import Llama4Config, Llama4ForConditionalGeneration, Llama4TextConfig, Llama4VisionConfig
+
+
+if __name__ == "__main__":
+ vision_config = Llama4VisionConfig(
+ hidden_size=1408,
+ image_size=336,
+ intermediate_size=5632,
+ num_attention_heads=16,
+ num_hidden_layers=4,
+ vision_output_dim=4096,
+ )
+ text_config = Llama4TextConfig(
+ hidden_size=512,
+ intermediate_size=1024,
+ intermediate_size_mlp=1024,
+ num_hidden_layers=4,
+ num_attention_heads=8,
+ num_key_value_heads=2,
+ head_dim=512 // 8,
+ num_local_experts=2,
+ )
+ config = Llama4Config(vision_config=vision_config, text_config=text_config)
+ model = Llama4ForConditionalGeneration._from_config(config)
+ model.save_pretrained("tiny-llama4")
diff --git a/post-training/LLaMA-Factory/scripts/eval_bleu_rouge.py b/post-training/LLaMA-Factory/scripts/eval_bleu_rouge.py
new file mode 100644
index 0000000..22e370b
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/eval_bleu_rouge.py
@@ -0,0 +1,79 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import logging
+import time
+
+import fire
+from datasets import load_dataset
+
+
+try:
+ import jieba # type: ignore
+ from nltk.translate.bleu_score import SmoothingFunction, sentence_bleu # type: ignore
+ from rouge_chinese import Rouge # type: ignore
+
+ jieba.setLogLevel(logging.CRITICAL)
+ jieba.initialize()
+except ImportError:
+ print("Please install llamafactory with `pip install -e .[metrics]`.")
+ raise
+
+
+def compute_metrics(sample):
+ hypothesis = list(jieba.cut(sample["predict"]))
+ reference = list(jieba.cut(sample["label"]))
+
+ bleu_score = sentence_bleu(
+ [list(sample["label"])],
+ list(sample["predict"]),
+ smoothing_function=SmoothingFunction().method3,
+ )
+
+ if len(" ".join(hypothesis).split()) == 0 or len(" ".join(reference).split()) == 0:
+ result = {"rouge-1": {"f": 0.0}, "rouge-2": {"f": 0.0}, "rouge-l": {"f": 0.0}}
+ else:
+ rouge = Rouge()
+ scores = rouge.get_scores(" ".join(hypothesis), " ".join(reference))
+ result = scores[0]
+
+ metric_result = {}
+ for k, v in result.items():
+ metric_result[k] = round(v["f"] * 100, 4)
+
+ metric_result["bleu-4"] = round(bleu_score * 100, 4)
+
+ return metric_result
+
+
+def main(filename: str):
+ start_time = time.time()
+ dataset = load_dataset("json", data_files=filename, split="train")
+ dataset = dataset.map(compute_metrics, num_proc=8, remove_columns=dataset.column_names)
+ score_dict = dataset.to_dict()
+
+ average_score = {}
+ for task, scores in sorted(score_dict.items(), key=lambda x: x[0]):
+ print(f"{task}: {sum(scores) / len(scores):.4f}")
+ average_score[task] = sum(scores) / len(scores)
+
+ with open("predictions_score.json", "w", encoding="utf-8") as f:
+ json.dump(average_score, f, indent=4)
+
+ print(f"\nDone in {time.time() - start_time:.3f}s.\nScore file saved to predictions_score.json")
+
+
+if __name__ == "__main__":
+ fire.Fire(main)
diff --git a/post-training/LLaMA-Factory/scripts/llama_pro.py b/post-training/LLaMA-Factory/scripts/llama_pro.py
new file mode 100644
index 0000000..7e4b944
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/llama_pro.py
@@ -0,0 +1,129 @@
+# Copyright 2025 Tencent Inc. and the LlamaFactory team.
+#
+# This code is inspired by the Tencent's LLaMA-Pro library.
+# https://github.com/TencentARC/LLaMA-Pro/blob/main/scripts/block_expansion.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from collections import OrderedDict
+from typing import TYPE_CHECKING
+
+import fire
+import torch
+from huggingface_hub import split_torch_state_dict_into_shards
+from safetensors.torch import save_file
+from tqdm import tqdm
+from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, PreTrainedModel
+from transformers.modeling_utils import SAFE_WEIGHTS_INDEX_NAME, SAFE_WEIGHTS_NAME, WEIGHTS_INDEX_NAME, WEIGHTS_NAME
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig
+
+
+def change_name(name: str, old_index: int, new_index: int) -> str:
+ return name.replace(f".{old_index:d}.", f".{new_index:d}.")
+
+
+def block_expansion(
+ model_name_or_path: str,
+ output_dir: str,
+ num_expand: int,
+ shard_size: str = "5GB",
+ save_safetensors: bool = True,
+):
+ r"""Perform block expansion for LLaMA, Mistral, Qwen2 or Yi models.
+
+ Usage: python llama_pro.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir llama2_pro --num_expand 8
+ """
+ config: PretrainedConfig = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)
+ num_layers = getattr(config, "num_hidden_layers")
+ if num_layers % num_expand != 0:
+ raise ValueError(f"`num_layers` {num_layers} should be divisible by `num_expand` {num_expand}.")
+
+ setattr(config, "num_hidden_layers", num_layers + num_expand)
+ config.save_pretrained(output_dir)
+
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
+ tokenizer.save_pretrained(output_dir)
+
+ print(f"Expanding model of {num_layers} layers to {num_layers + num_expand} layers.")
+ model = AutoModelForCausalLM.from_pretrained(
+ model_name_or_path, torch_dtype="auto", device_map="cpu", trust_remote_code=True, low_cpu_mem_usage=True
+ )
+ assert isinstance(model, PreTrainedModel) # type hint
+ if save_safetensors and getattr(model.config, "tie_word_embeddings", False):
+ del model.lm_head # safetensors does not allow shared weights
+
+ split = num_layers // num_expand
+ layer_cnt = 0
+ state_dict = model.state_dict()
+ output_state_dict: dict[str, torch.Tensor] = OrderedDict()
+ for i in range(num_layers):
+ for key, value in state_dict.items():
+ if f".{i:d}." in key:
+ output_state_dict[change_name(key, i, layer_cnt)] = value
+
+ print(f"Add layer {layer_cnt} copied from layer {i}.")
+ layer_cnt += 1
+ if (i + 1) % split == 0:
+ for key, value in state_dict.items():
+ if f".{i:d}." in key:
+ if "down_proj" in key or "o_proj" in key:
+ output_state_dict[change_name(key, i, layer_cnt)] = torch.zeros_like(value)
+ else:
+ output_state_dict[change_name(key, i, layer_cnt)] = torch.clone(value)
+
+ print(f"Add layer {layer_cnt} expanded from layer {i}.")
+ layer_cnt += 1
+
+ for key, value in state_dict.items():
+ if key not in output_state_dict:
+ output_state_dict[key] = value
+
+ weights_name = SAFE_WEIGHTS_NAME if save_safetensors else WEIGHTS_NAME
+ filename_pattern = weights_name.replace(".bin", "{suffix}.bin").replace(".safetensors", "{suffix}.safetensors")
+ state_dict_split = split_torch_state_dict_into_shards(
+ output_state_dict, filename_pattern=filename_pattern, max_shard_size=shard_size
+ )
+ for shard_file, tensors in tqdm(state_dict_split.filename_to_tensors.items(), desc="Save weights"):
+ shard = {tensor: output_state_dict[tensor].contiguous() for tensor in tensors}
+ if save_safetensors:
+ save_file(shard, os.path.join(output_dir, shard_file), metadata={"format": "pt"})
+ else:
+ torch.save(shard, os.path.join(output_dir, shard_file))
+
+ if not state_dict_split.is_sharded:
+ print(f"Model weights saved in {os.path.join(output_dir, weights_name)}.")
+ else:
+ index = {
+ "metadata": state_dict_split.metadata,
+ "weight_map": state_dict_split.tensor_to_filename,
+ }
+ index_name = SAFE_WEIGHTS_INDEX_NAME if save_safetensors else WEIGHTS_INDEX_NAME
+ with open(os.path.join(output_dir, index_name), "w", encoding="utf-8") as f:
+ json.dump(index, f, indent=2, sort_keys=True)
+
+ print(f"Model weights saved in {output_dir}.")
+
+ print("- Fine-tune this model with:")
+ print(f"model_name_or_path: {output_dir}")
+ print("finetuning_type: freeze")
+ print(f"freeze_trainable_layers: {num_expand}")
+ print("use_llama_pro: true")
+
+
+if __name__ == "__main__":
+ fire.Fire(block_expansion)
diff --git a/post-training/LLaMA-Factory/scripts/loftq_init.py b/post-training/LLaMA-Factory/scripts/loftq_init.py
new file mode 100644
index 0000000..3a79338
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/loftq_init.py
@@ -0,0 +1,88 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is based on the HuggingFace's PEFT library.
+# https://github.com/huggingface/peft/blob/v0.10.0/examples/loftq_finetuning/quantize_save_load.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import TYPE_CHECKING
+
+import fire
+from peft import LoftQConfig, LoraConfig, TaskType, get_peft_model
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel
+
+
+def quantize_loftq(
+ model_name_or_path: str,
+ output_dir: str,
+ loftq_bits: int = 4,
+ loftq_iter: int = 4,
+ lora_alpha: int = None,
+ lora_rank: int = 16,
+ lora_dropout: float = 0,
+ lora_target: tuple = ("q_proj", "v_proj"),
+ save_safetensors: bool = True,
+):
+ r"""Initialize LoRA weights with LoRA-fine-tuning-aware Quantization (LoftQ).
+
+ Usage: python loftq_init.py --model_name_or_path path_to_model --output_dir output_dir
+ """
+ if isinstance(lora_target, str):
+ lora_target = [name.strip() for name in lora_target.split(",")]
+
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
+ model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype="auto")
+
+ loftq_config = LoftQConfig(loftq_bits=loftq_bits, loftq_iter=loftq_iter)
+ lora_config = LoraConfig(
+ task_type=TaskType.CAUSAL_LM,
+ inference_mode=True,
+ r=lora_rank,
+ lora_alpha=lora_alpha if lora_alpha is not None else lora_rank * 2,
+ lora_dropout=lora_dropout,
+ target_modules=lora_target,
+ init_lora_weights="loftq",
+ loftq_config=loftq_config,
+ )
+
+ # Init LoftQ model
+ print("Initializing LoftQ weights, it may be take several minutes, wait patiently.")
+ peft_model = get_peft_model(model, lora_config)
+ loftq_dir = os.path.join(output_dir, "loftq_init")
+
+ # Save LoftQ model
+ setattr(peft_model.peft_config["default"], "base_model_name_or_path", os.path.abspath(output_dir))
+ setattr(peft_model.peft_config["default"], "init_lora_weights", True) # don't apply loftq again
+ peft_model.save_pretrained(loftq_dir, safe_serialization=save_safetensors)
+ print(f"Adapter weights saved in {loftq_dir}")
+
+ # Save base model
+ base_model: PreTrainedModel = peft_model.unload()
+ base_model.save_pretrained(output_dir, safe_serialization=save_safetensors)
+ tokenizer.save_pretrained(output_dir)
+ print(f"Model weights saved in {output_dir}")
+
+ print("- Fine-tune this model with:")
+ print(f"model_name_or_path: {output_dir}")
+ print(f"adapter_name_or_path: {loftq_dir}")
+ print("finetuning_type: lora")
+ print(f"quantization_bit: {loftq_bits}")
+
+
+if __name__ == "__main__":
+ fire.Fire(quantize_loftq)
diff --git a/post-training/LLaMA-Factory/scripts/pissa_init.py b/post-training/LLaMA-Factory/scripts/pissa_init.py
new file mode 100644
index 0000000..405a147
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/pissa_init.py
@@ -0,0 +1,86 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is based on the HuggingFace's PEFT library.
+# https://github.com/huggingface/peft/blob/v0.11.0/examples/pissa_finetuning/preprocess.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import TYPE_CHECKING
+
+import fire
+from peft import LoraConfig, TaskType, get_peft_model
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel
+
+
+def quantize_pissa(
+ model_name_or_path: str,
+ output_dir: str,
+ pissa_iter: int = 16,
+ lora_alpha: int = None,
+ lora_rank: int = 16,
+ lora_dropout: float = 0,
+ lora_target: tuple = ("q_proj", "v_proj"),
+ save_safetensors: bool = True,
+):
+ r"""Initialize LoRA weights with Principal Singular values and Singular vectors Adaptation (PiSSA).
+
+ Usage: python pissa_init.py --model_name_or_path path_to_model --output_dir output_dir
+ """
+ if isinstance(lora_target, str):
+ lora_target = [name.strip() for name in lora_target.split(",")]
+
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
+ model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype="auto")
+
+ lora_config = LoraConfig(
+ task_type=TaskType.CAUSAL_LM,
+ r=lora_rank,
+ lora_alpha=lora_alpha if lora_alpha is not None else lora_rank * 2,
+ lora_dropout=lora_dropout,
+ target_modules=lora_target,
+ init_lora_weights="pissa" if pissa_iter == -1 else f"pissa_niter_{pissa_iter}",
+ )
+
+ # Init PiSSA model
+ peft_model = get_peft_model(model, lora_config)
+ pissa_dir = os.path.join(output_dir, "pissa_init")
+
+ # Save PiSSA model
+ setattr(peft_model.peft_config["default"], "base_model_name_or_path", os.path.abspath(output_dir))
+ setattr(peft_model.peft_config["default"], "init_lora_weights", True) # don't apply pissa again
+ peft_model.save_pretrained(pissa_dir, safe_serialization=save_safetensors)
+ print(f"Adapter weights saved in {pissa_dir}")
+
+ # Save base model
+ base_model: PreTrainedModel = peft_model.unload()
+ base_model.save_pretrained(output_dir, safe_serialization=save_safetensors)
+ tokenizer.save_pretrained(output_dir)
+ print(f"Model weights saved in {output_dir}")
+
+ print("- Fine-tune this model with:")
+ print(f"model_name_or_path: {output_dir}")
+ print(f"adapter_name_or_path: {pissa_dir}")
+ print("finetuning_type: lora")
+ print("pissa_init: false")
+ print("pissa_convert: true")
+ print("- and optionally with:")
+ print("quantization_bit: 4")
+
+
+if __name__ == "__main__":
+ fire.Fire(quantize_pissa)
diff --git a/post-training/LLaMA-Factory/scripts/qwen_omni_merge.py b/post-training/LLaMA-Factory/scripts/qwen_omni_merge.py
new file mode 100644
index 0000000..449b17b
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/qwen_omni_merge.py
@@ -0,0 +1,118 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import shutil
+
+import fire
+from peft import PeftModel
+from transformers import AutoModel, AutoProcessor, Qwen2_5OmniThinkerForConditionalGeneration # type: ignore
+
+
+def merge_lora(
+ base_model_path: str,
+ lora_checkpoint_path: str,
+ extra_file: str = "spk_dict.pt",
+ submodule_name: str = "thinker",
+ save_path: str = "./merged_model_checkpoint",
+):
+ """Load the original model, tokenizer, and processor configuration, merge the LoRA weights.
+
+ For a specified submodule, and save the final merged model along with its configurations.
+
+ Args:
+ base_model_path (str): Path to the original model directory.
+ lora_checkpoint_path (str): Path to the directory containing LoRA weights.
+ extra_file (str): Name of the extra file to be copied (default: "spk_dict.pt").
+ submodule_name (str): Name of the submodule to merge (default: "thinker").
+ save_path (str): Directory where the merged model and configurations will be saved.
+ """
+ # 1. Load the original model, tokenizer, and processor
+ model = AutoModel.from_pretrained(base_model_path, torch_dtype="auto", device_map="cpu")
+ processor = AutoProcessor.from_pretrained(base_model_path)
+ print("Successfully loaded the original model and tokenizer.")
+
+ # 2. Extract the submodule to be merged (e.g., model.thinker)
+ if not hasattr(model, submodule_name):
+ raise AttributeError(f"The model does not have a submodule named '{submodule_name}'.")
+
+ base_submodule = getattr(model, submodule_name)
+ print(f"Successfully extracted submodule: {submodule_name}.")
+
+ # 3. Load the LoRA weights onto the extracted submodule
+ lora_model = PeftModel.from_pretrained(base_submodule, lora_checkpoint_path)
+ print("LoRA weights loaded successfully.")
+
+ # 4. Merge the LoRA weights into the submodule and unload the LoRA modules
+ merged_submodule = lora_model.merge_and_unload()
+ print("LoRA weights merged successfully.")
+
+ # 5. Replace the original submodule with the merged submodule in the model
+ setattr(model, submodule_name, merged_submodule)
+
+ # 6. Save the final merged model along with the tokenizer and processor configuration
+ model.save_pretrained(save_path)
+ processor.save_pretrained(save_path)
+ print(f"Merged model and tokenizer saved to {save_path}.")
+
+ source_file = os.path.join(base_model_path, extra_file)
+ target_file = os.path.join(save_path, extra_file)
+ if os.path.exists(source_file):
+ shutil.copy(source_file, target_file)
+ print(f"File '{extra_file}' copied from {base_model_path} to {save_path}.")
+ else:
+ print(f"File '{extra_file}' not found in {base_model_path}, skipping copy.")
+
+
+def save_full_model(
+ saved_thinker_path: str,
+ base_model_path: str,
+ save_path: str = "./merged_model_checkpoint",
+ extra_file: str = "spk_dict.pt",
+):
+ """Load the saved thinker module and the original model, replace the thinker in the original model.
+
+ Then save the complete model along with its tokenizer and processor configuration.
+
+ Args:
+ saved_thinker_path (str): Path to the saved thinker weights.
+ base_model_path (str): Directory path of the original model.
+ save_path (str): Directory where the merged model and configurations will be saved.
+ extra_file (str): Name of the extra file to be copied (default: "spk_dict.pt").
+ """
+ # 1. Load the saved thinker module and the original model
+ thinker = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
+ saved_thinker_path, torch_dtype="auto", device_map="cpu"
+ )
+ base_model = AutoModel.from_pretrained(base_model_path, torch_dtype="auto", device_map="cpu")
+ base_model.thinker = thinker
+
+ # 2. Save the complete model along with its tokenizer and processor configuration
+ processor = AutoProcessor.from_pretrained(base_model_path)
+ base_model.save_pretrained(save_path)
+ processor.save_pretrained(save_path)
+ print(f"Merged model and tokenizer saved to {save_path}.")
+
+ # 3. Copy the extra file from the base model directory to the save_path
+ source_file = os.path.join(base_model_path, extra_file)
+ target_file = os.path.join(save_path, extra_file)
+ if os.path.exists(source_file):
+ shutil.copy(source_file, target_file)
+ print(f"File '{extra_file}' copied from {base_model_path} to {save_path}.")
+ else:
+ print(f"File '{extra_file}' not found in {base_model_path}, skipping copy.")
+
+
+if __name__ == "__main__":
+ fire.Fire({"save_full": save_full_model, "merge_lora": merge_lora})
diff --git a/post-training/LLaMA-Factory/scripts/stat_utils/cal_flops.py b/post-training/LLaMA-Factory/scripts/stat_utils/cal_flops.py
new file mode 100644
index 0000000..3dc0499
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/stat_utils/cal_flops.py
@@ -0,0 +1,49 @@
+# Copyright 2025 Microsoft Corporation and the LlamaFactory team.
+#
+# This code is inspired by the Microsoft's DeepSpeed library.
+# https://www.deepspeed.ai/tutorials/flops-profiler/
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import fire
+import torch
+from deepspeed.accelerator import get_accelerator # type: ignore
+from deepspeed.profiling.flops_profiler import get_model_profile # type: ignore
+
+from llamafactory.chat import ChatModel
+
+
+def calculate_flops(
+ model_name_or_path: str,
+ batch_size: int = 1,
+ seq_length: int = 512,
+ flash_attn: str = "auto",
+):
+ r"""Calculate the flops of pre-trained models.
+
+ Usage: python cal_flops.py --model_name_or_path path_to_model --batch_size 1 --seq_length 512
+ """
+ with get_accelerator().device(0):
+ chat_model = ChatModel(dict(model_name_or_path=model_name_or_path, template="empty", flash_attn=flash_attn))
+ fake_input = torch.ones((batch_size, seq_length), dtype=torch.long, device=chat_model.engine.model.device)
+ input_dict = {"input_ids": fake_input, "labels": fake_input.clone()}
+ flops, macs, params = get_model_profile(
+ chat_model.engine.model, kwargs=input_dict, print_profile=True, detailed=True
+ )
+ print("FLOPs:", flops)
+ print("MACs:", macs)
+ print("Params:", params)
+
+
+if __name__ == "__main__":
+ fire.Fire(calculate_flops)
diff --git a/post-training/LLaMA-Factory/scripts/stat_utils/cal_lr.py b/post-training/LLaMA-Factory/scripts/stat_utils/cal_lr.py
new file mode 100644
index 0000000..eb35c47
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/stat_utils/cal_lr.py
@@ -0,0 +1,98 @@
+# Copyright 2025 imoneoi and the LlamaFactory team.
+#
+# This code is inspired by the imoneoi's OpenChat library.
+# https://github.com/imoneoi/openchat/blob/3.6.0/ochat/training_deepspeed/train.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import Literal
+
+import fire
+import torch
+from torch.utils.data import DataLoader
+from tqdm import tqdm
+from transformers import DataCollatorForLanguageModeling
+
+from llamafactory.data import MultiModalDataCollatorForSeq2Seq, get_dataset, get_template_and_fix_tokenizer
+from llamafactory.extras.constants import IGNORE_INDEX
+from llamafactory.hparams import get_train_args
+from llamafactory.model import load_tokenizer
+
+
+BASE_LR = 3e-4 # 1.5e-4 for 30B-70B models
+BASE_BS = 4_000_000 # from llama paper
+
+
+def calculate_lr(
+ model_name_or_path: str,
+ batch_size: int, # total batch size, namely (batch size * gradient accumulation * world size)
+ stage: Literal["pt", "sft"] = "sft",
+ dataset: str = "alpaca_en_demo",
+ dataset_dir: str = "data",
+ template: str = "default",
+ cutoff_len: int = 2048, # i.e. maximum input length during training
+ is_mistral_or_gemma: bool = False, # mistral and gemma models opt for a smaller learning rate,
+ packing: bool = False,
+):
+ r"""Calculate the optimal learning rate for 7B/13B models using LLaMA's hyper-parameters.
+
+ Usage:
+ python cal_lr.py --model_name_or_path path_to_model --dataset alpaca_en_demo --cutoff_len 1024 --batch_size 16
+ """
+ model_args, data_args, training_args, _, _ = get_train_args(
+ dict(
+ stage=stage,
+ model_name_or_path=model_name_or_path,
+ dataset=dataset,
+ dataset_dir=dataset_dir,
+ template=template,
+ cutoff_len=cutoff_len,
+ packing=packing,
+ preprocessing_num_workers=16,
+ output_dir="dummy_dir",
+ overwrite_cache=True,
+ do_train=True,
+ )
+ )
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ trainset = get_dataset(template, model_args, data_args, training_args, stage, **tokenizer_module)["train_dataset"]
+ if stage == "pt":
+ data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
+ elif stage == "sft":
+ data_collator = MultiModalDataCollatorForSeq2Seq(
+ template=template, tokenizer=tokenizer, label_pad_token_id=IGNORE_INDEX
+ )
+ else:
+ raise NotImplementedError(f"Stage does not supported: {stage}.")
+
+ dataloader = DataLoader(trainset, batch_size, shuffle=False, collate_fn=data_collator, pin_memory=True)
+ valid_tokens, total_tokens = 0, 0
+ for batch in tqdm(dataloader, desc="Collecting valid tokens"):
+ valid_tokens += torch.sum(batch["labels"] != IGNORE_INDEX).item()
+ total_tokens += torch.numel(batch["labels"])
+
+ valid_ratio = valid_tokens / total_tokens
+ token_batch_size = cutoff_len * batch_size * valid_ratio
+ lr = BASE_LR * math.sqrt(token_batch_size / BASE_BS) # lr ~ sqrt(batch_size)
+ lr = lr / 6.0 if is_mistral_or_gemma else lr
+ print(
+ f"Optimal learning rate is {lr:.2e} for valid ratio% {valid_ratio * 100:.2f} "
+ f"and effective token batch size {token_batch_size:.2f}"
+ )
+
+
+if __name__ == "__main__":
+ fire.Fire(calculate_lr)
diff --git a/post-training/LLaMA-Factory/scripts/stat_utils/cal_mfu.py b/post-training/LLaMA-Factory/scripts/stat_utils/cal_mfu.py
new file mode 100644
index 0000000..f1d4446
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/stat_utils/cal_mfu.py
@@ -0,0 +1,161 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+
+import fire
+import torch
+import torch.distributed as dist
+from transformers import AutoConfig
+
+from llamafactory.train.tuner import run_exp
+
+
+BASE = 2 # gemm (add + mul)
+
+
+def compute_model_flops(
+ model_name_or_path: str,
+ total_batch_size: int,
+ seq_length: int,
+ include_backward: bool = True,
+ include_recompute: bool = False,
+ include_flashattn: bool = False,
+) -> int:
+ r"""Calculate the FLOPs of model per forward/backward pass."""
+ config = AutoConfig.from_pretrained(model_name_or_path)
+ hidden_size = getattr(config, "hidden_size", None)
+ vocab_size = getattr(config, "vocab_size", None)
+ intermediate_size = getattr(config, "intermediate_size", None)
+ num_attention_heads = getattr(config, "num_attention_heads", None)
+ num_key_value_heads = getattr(config, "num_key_value_heads", None)
+ num_hidden_layers = getattr(config, "num_hidden_layers", None)
+ tie_word_embeddings = getattr(config, "tie_word_embeddings", False)
+
+ # mlp module
+ mlp_flops_per_token = 3 * BASE * hidden_size * intermediate_size # up, gate, down
+ mlp_flops = total_batch_size * seq_length * num_hidden_layers * mlp_flops_per_token
+
+ # attn projector module
+ q_flops_per_token = BASE * hidden_size * hidden_size
+ o_flops_per_token = BASE * hidden_size * hidden_size
+ k_flops_per_token = BASE * hidden_size * hidden_size * num_key_value_heads // num_attention_heads
+ v_flops_per_token = BASE * hidden_size * hidden_size * num_key_value_heads // num_attention_heads
+ attn_proj_flops_per_token = q_flops_per_token + o_flops_per_token + k_flops_per_token + v_flops_per_token
+ attn_proj_flops = total_batch_size * seq_length * num_hidden_layers * attn_proj_flops_per_token
+
+ # attn sdpa module
+ sdpa_flops_per_layer = 2 * BASE * hidden_size * seq_length * seq_length # (q * k^T) * v
+ sdpa_flops = total_batch_size * num_hidden_layers * sdpa_flops_per_layer
+
+ # embedding module
+ embedding_flops_per_token = hidden_size * vocab_size
+ embedding_flops = total_batch_size * seq_length * embedding_flops_per_token
+ if tie_word_embeddings is False:
+ embedding_flops *= 2
+
+ non_embedding_flops = mlp_flops + attn_proj_flops + sdpa_flops
+ non_embedding_coeff, embedding_coeff = 1, 1
+ if include_backward:
+ non_embedding_coeff += 2
+ embedding_coeff += 2
+
+ if include_recompute:
+ non_embedding_coeff += 1
+
+ total_flops = non_embedding_coeff * non_embedding_flops + embedding_coeff * embedding_flops
+
+ if include_flashattn:
+ total_flops += sdpa_flops
+
+ return total_flops
+
+
+def compute_device_flops(world_size: int) -> float:
+ r"""Calculate the FLOPs of the device capability per second."""
+ device_name = torch.cuda.get_device_name()
+ if "H100" in device_name or "H800" in device_name:
+ return 989 * 1e12 * world_size
+ elif "A100" in device_name or "A800" in device_name:
+ return 312 * 1e12 * world_size
+ elif "V100" in device_name:
+ return 125 * 1e12 * world_size
+ elif "4090" in device_name:
+ return 98 * 1e12 * world_size
+ else:
+ raise NotImplementedError(f"Device not supported: {device_name}.")
+
+
+def calculate_mfu(
+ model_name_or_path: str,
+ batch_size: int = 1,
+ seq_length: int = 1024,
+ num_steps: int = 100,
+ finetuning_type: str = "lora",
+ flash_attn: str = "auto",
+ deepspeed_stage: int = 0,
+ disable_gc: bool = False,
+ liger_kernel: bool = False,
+ unsloth_gc: bool = False,
+) -> float:
+ r"""Calculate MFU for given model and hyper-params.
+
+ Usage: python cal_mfu.py --model_name_or_path path_to_model --batch_size 1 --seq_length 1024
+ """
+ args = {
+ "model_name_or_path": model_name_or_path,
+ "flash_attn": flash_attn,
+ "disable_gradient_checkpointing": disable_gc,
+ "enable_liger_kernel": liger_kernel,
+ "use_unsloth_gc": unsloth_gc,
+ "stage": "pt",
+ "do_train": True,
+ "finetuning_type": finetuning_type,
+ "dataset": "c4_demo",
+ "cutoff_len": seq_length,
+ "output_dir": os.path.join("saves", "test_mfu"),
+ "logging_strategy": "no",
+ "save_strategy": "no",
+ "save_only_model": True,
+ "overwrite_output_dir": True,
+ "per_device_train_batch_size": batch_size,
+ "max_steps": num_steps,
+ "bf16": True,
+ }
+ if deepspeed_stage in [2, 3]:
+ args["deepspeed"] = f"examples/deepspeed/ds_z{deepspeed_stage}_config.json"
+
+ run_exp(args)
+ if dist.is_initialized():
+ dist.barrier()
+ world_size = dist.get_world_size()
+ else:
+ world_size = 1
+
+ if int(os.getenv("LOCAL_RANK", "0")) == 0:
+ with open(os.path.join("saves", "test_mfu", "all_results.json"), encoding="utf-8") as f:
+ result = json.load(f)
+
+ total_batch_size = batch_size * world_size
+ mfu_value = (
+ result["train_steps_per_second"]
+ * compute_model_flops(model_name_or_path, total_batch_size, seq_length)
+ / compute_device_flops(world_size)
+ )
+ print(f"MFU: {mfu_value * 100:.2f}%")
+
+
+if __name__ == "__main__":
+ fire.Fire(calculate_mfu)
diff --git a/post-training/LLaMA-Factory/scripts/stat_utils/cal_ppl.py b/post-training/LLaMA-Factory/scripts/stat_utils/cal_ppl.py
new file mode 100644
index 0000000..8d47ffd
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/stat_utils/cal_ppl.py
@@ -0,0 +1,134 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+from dataclasses import dataclass
+from typing import Any, Literal, Optional
+
+import fire
+import torch
+from torch.utils.data import DataLoader
+from tqdm import tqdm
+from transformers import DataCollatorForLanguageModeling
+
+from llamafactory.data import MultiModalDataCollatorForSeq2Seq, get_dataset, get_template_and_fix_tokenizer
+from llamafactory.extras.constants import IGNORE_INDEX
+from llamafactory.hparams import get_train_args
+from llamafactory.model import load_model, load_tokenizer
+
+
+@dataclass
+class PairwiseDataCollatorWithPadding(MultiModalDataCollatorForSeq2Seq):
+ r"""Data collator for pairwise data."""
+
+ train_on_prompt: bool = False
+
+ def __call__(self, features: list[dict[str, Any]]) -> dict[str, torch.Tensor]:
+ r"""Pad batched data to the longest sequence in the batch."""
+ chosen_features = []
+ for feature in features:
+ chosen_features.append(
+ {
+ "input_ids": feature["chosen_input_ids"],
+ "attention_mask": feature["chosen_attention_mask"],
+ "labels": feature["chosen_input_ids"] if self.train_on_prompt else feature["chosen_labels"],
+ "images": feature["images"],
+ "videos": feature["videos"],
+ "audios": feature["audios"],
+ }
+ )
+
+ return super().__call__(chosen_features)
+
+
+def calculate_ppl(
+ model_name_or_path: str,
+ save_name: str = "ppl.json",
+ batch_size: int = 4,
+ stage: Literal["pt", "sft", "rm"] = "sft",
+ dataset: str = "alpaca_en_demo",
+ dataset_dir: str = "data",
+ template: str = "default",
+ cutoff_len: int = 2048,
+ max_samples: Optional[int] = None,
+ train_on_prompt: bool = False,
+):
+ r"""Calculate the ppl on the dataset of the pre-trained models.
+
+ Usage: export CUDA_VISIBLE_DEVICES=0
+ python cal_ppl.py --model_name_or_path path_to_model --dataset alpaca_en_demo --save_name ppl.json
+ """
+ model_args, data_args, training_args, finetuning_args, _ = get_train_args(
+ dict(
+ stage=stage,
+ model_name_or_path=model_name_or_path,
+ dataset=dataset,
+ dataset_dir=dataset_dir,
+ template=template,
+ cutoff_len=cutoff_len,
+ max_samples=max_samples,
+ train_on_prompt=train_on_prompt,
+ preprocessing_num_workers=16,
+ output_dir="dummy_dir",
+ overwrite_cache=True,
+ do_train=True,
+ )
+ )
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ trainset = get_dataset(template, model_args, data_args, training_args, stage, **tokenizer_module)["train_dataset"]
+ model = load_model(tokenizer, model_args, finetuning_args, is_trainable=False)
+ if stage == "pt":
+ data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
+ elif stage == "sft":
+ data_collator = MultiModalDataCollatorForSeq2Seq(
+ template=template, tokenizer=tokenizer, label_pad_token_id=IGNORE_INDEX
+ )
+ elif stage == "rm":
+ data_collator = PairwiseDataCollatorWithPadding(
+ template=template, tokenizer=tokenizer, label_pad_token_id=IGNORE_INDEX, train_on_prompt=train_on_prompt
+ )
+ else:
+ raise NotImplementedError(f"Stage does not supported: {stage}.")
+
+ dataloader = DataLoader(trainset, batch_size, shuffle=False, collate_fn=data_collator, pin_memory=True)
+ criterion = torch.nn.CrossEntropyLoss(reduction="none")
+ total_ppl = 0
+ perplexities = []
+ batch: dict[str, torch.Tensor]
+ with torch.no_grad():
+ for batch in tqdm(dataloader, desc="Computing perplexities"):
+ batch = batch.to(model.device)
+ outputs = model(**batch)
+ shift_logits: torch.Tensor = outputs["logits"][..., :-1, :]
+ shift_labels: torch.Tensor = batch["labels"][..., 1:]
+ loss_mask = shift_labels != IGNORE_INDEX
+ flatten_logits = shift_logits.contiguous().view(shift_labels.size(0) * shift_labels.size(1), -1)
+ flatten_labels = shift_labels.contiguous().view(-1)
+ token_logps: torch.Tensor = criterion(flatten_logits, flatten_labels)
+ token_logps = token_logps.contiguous().view(shift_logits.size(0), -1)
+ sentence_logps = (token_logps * loss_mask).sum(-1) / loss_mask.sum(-1)
+ total_ppl += sentence_logps.exp().sum().item()
+ perplexities.extend(sentence_logps.exp().tolist())
+
+ with open(save_name, "w", encoding="utf-8") as f:
+ json.dump(perplexities, f, indent=2)
+
+ print(f"Average perplexity is {total_ppl / len(perplexities):.2f}")
+ print(f"Perplexities have been saved at {save_name}.")
+
+
+if __name__ == "__main__":
+ fire.Fire(calculate_ppl)
diff --git a/post-training/LLaMA-Factory/scripts/stat_utils/length_cdf.py b/post-training/LLaMA-Factory/scripts/stat_utils/length_cdf.py
new file mode 100644
index 0000000..c459c8f
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/stat_utils/length_cdf.py
@@ -0,0 +1,69 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from collections import defaultdict
+
+import fire
+from tqdm import tqdm
+
+from llamafactory.data import get_dataset, get_template_and_fix_tokenizer
+from llamafactory.hparams import get_train_args
+from llamafactory.model import load_tokenizer
+
+
+def length_cdf(
+ model_name_or_path: str,
+ dataset: str = "alpaca_en_demo",
+ dataset_dir: str = "data",
+ template: str = "default",
+ interval: int = 1000,
+):
+ r"""Calculate the distribution of the input lengths in the dataset.
+
+ Usage: export CUDA_VISIBLE_DEVICES=0
+ python length_cdf.py --model_name_or_path path_to_model --dataset alpaca_en_demo --template default
+ """
+ model_args, data_args, training_args, _, _ = get_train_args(
+ dict(
+ stage="sft",
+ model_name_or_path=model_name_or_path,
+ dataset=dataset,
+ dataset_dir=dataset_dir,
+ template=template,
+ cutoff_len=1_000_000,
+ preprocessing_num_workers=16,
+ output_dir="dummy_dir",
+ overwrite_cache=True,
+ do_train=True,
+ )
+ )
+ tokenizer_module = load_tokenizer(model_args)
+ template = get_template_and_fix_tokenizer(tokenizer_module["tokenizer"], data_args)
+ trainset = get_dataset(template, model_args, data_args, training_args, "sft", **tokenizer_module)["train_dataset"]
+ total_num = len(trainset)
+ length_dict = defaultdict(int)
+ for sample in tqdm(trainset["input_ids"], desc="Collecting lengths"):
+ length_dict[len(sample) // interval * interval] += 1
+
+ length_tuples = list(length_dict.items())
+ length_tuples.sort()
+ count_accu, prob_accu = 0, 0
+ for length, count in length_tuples:
+ count_accu += count
+ prob_accu += count / total_num * 100
+ print(f"{count_accu:d} ({prob_accu:.2f}%) samples have length < {length + interval}.")
+
+
+if __name__ == "__main__":
+ fire.Fire(length_cdf)
diff --git a/post-training/LLaMA-Factory/scripts/vllm_infer.py b/post-training/LLaMA-Factory/scripts/vllm_infer.py
new file mode 100644
index 0000000..53391ee
--- /dev/null
+++ b/post-training/LLaMA-Factory/scripts/vllm_infer.py
@@ -0,0 +1,162 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+from typing import Optional
+
+import fire
+from transformers import Seq2SeqTrainingArguments
+
+from llamafactory.data import get_dataset, get_template_and_fix_tokenizer
+from llamafactory.extras.constants import IGNORE_INDEX
+from llamafactory.extras.misc import get_device_count
+from llamafactory.extras.packages import is_vllm_available
+from llamafactory.hparams import get_infer_args
+from llamafactory.model import load_tokenizer
+
+
+if is_vllm_available():
+ from vllm import LLM, SamplingParams
+ from vllm.lora.request import LoRARequest
+
+
+def vllm_infer(
+ model_name_or_path: str,
+ adapter_name_or_path: str = None,
+ dataset: str = "alpaca_en_demo",
+ dataset_dir: str = "data",
+ template: str = "default",
+ cutoff_len: int = 2048,
+ max_samples: Optional[int] = None,
+ vllm_config: str = "{}",
+ save_name: str = "generated_predictions.jsonl",
+ temperature: float = 0.95,
+ top_p: float = 0.7,
+ top_k: int = 50,
+ max_new_tokens: int = 1024,
+ repetition_penalty: float = 1.0,
+ skip_special_tokens: bool = True,
+ seed: Optional[int] = None,
+ pipeline_parallel_size: int = 1,
+ image_max_pixels: int = 768 * 768,
+ image_min_pixels: int = 32 * 32,
+):
+ r"""Perform batch generation using vLLM engine, which supports tensor parallelism.
+
+ Usage: python vllm_infer.py --model_name_or_path meta-llama/Llama-2-7b-hf --template llama --dataset alpaca_en_demo
+ """
+ if pipeline_parallel_size > get_device_count():
+ raise ValueError("Pipeline parallel size should be smaller than the number of gpus.")
+
+ model_args, data_args, _, generating_args = get_infer_args(
+ dict(
+ model_name_or_path=model_name_or_path,
+ adapter_name_or_path=adapter_name_or_path,
+ dataset=dataset,
+ dataset_dir=dataset_dir,
+ template=template,
+ cutoff_len=cutoff_len,
+ max_samples=max_samples,
+ preprocessing_num_workers=16,
+ vllm_config=vllm_config,
+ temperature=temperature,
+ top_p=top_p,
+ top_k=top_k,
+ max_new_tokens=max_new_tokens,
+ repetition_penalty=repetition_penalty,
+ )
+ )
+
+ training_args = Seq2SeqTrainingArguments(output_dir="dummy_dir")
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template_obj = get_template_and_fix_tokenizer(tokenizer, data_args)
+ template_obj.mm_plugin.expand_mm_tokens = False # for vllm generate
+ dataset_module = get_dataset(template_obj, model_args, data_args, training_args, "ppo", **tokenizer_module)
+
+ inputs, prompts, labels = [], [], []
+ for sample in dataset_module["train_dataset"]:
+ if sample["images"]:
+ multi_modal_data = {
+ "image": template_obj.mm_plugin._regularize_images(
+ sample["images"], image_max_pixels=image_max_pixels, image_min_pixels=image_min_pixels
+ )["images"]
+ }
+ elif sample["videos"]:
+ multi_modal_data = {
+ "video": template_obj.mm_plugin._regularize_videos(
+ sample["videos"], image_max_pixels=image_max_pixels, image_min_pixels=image_min_pixels
+ )["videos"]
+ }
+ elif sample["audios"]:
+ audio_data = template_obj.mm_plugin._regularize_audios(
+ sample["audios"],
+ sampling_rate=16000,
+ )
+ multi_modal_data = {"audio": zip(audio_data["audios"], audio_data["sampling_rates"])}
+ else:
+ multi_modal_data = None
+
+ inputs.append({"prompt_token_ids": sample["input_ids"], "multi_modal_data": multi_modal_data})
+ prompts.append(tokenizer.decode(sample["input_ids"], skip_special_tokens=skip_special_tokens))
+ labels.append(
+ tokenizer.decode(
+ list(filter(lambda x: x != IGNORE_INDEX, sample["labels"])), skip_special_tokens=skip_special_tokens
+ )
+ )
+
+ sampling_params = SamplingParams(
+ repetition_penalty=generating_args.repetition_penalty or 1.0, # repetition_penalty must > 0
+ temperature=generating_args.temperature,
+ top_p=generating_args.top_p or 1.0, # top_p must > 0
+ top_k=generating_args.top_k or -1, # top_k must > 0
+ stop_token_ids=template_obj.get_stop_token_ids(tokenizer),
+ max_tokens=generating_args.max_new_tokens,
+ skip_special_tokens=skip_special_tokens,
+ seed=seed,
+ )
+ if model_args.adapter_name_or_path is not None:
+ lora_request = LoRARequest("default", 1, model_args.adapter_name_or_path[0])
+ else:
+ lora_request = None
+
+ engine_args = {
+ "model": model_args.model_name_or_path,
+ "trust_remote_code": True,
+ "dtype": model_args.infer_dtype,
+ "max_model_len": cutoff_len + max_new_tokens,
+ "tensor_parallel_size": (get_device_count() // pipeline_parallel_size) or 1,
+ "pipeline_parallel_size": pipeline_parallel_size,
+ "disable_log_stats": True,
+ "enable_lora": model_args.adapter_name_or_path is not None,
+ }
+ if template_obj.mm_plugin.__class__.__name__ != "BasePlugin":
+ engine_args["limit_mm_per_prompt"] = {"image": 4, "video": 2, "audio": 2}
+
+ if isinstance(model_args.vllm_config, dict):
+ engine_args.update(model_args.vllm_config)
+
+ results = LLM(**engine_args).generate(inputs, sampling_params, lora_request=lora_request)
+ preds = [result.outputs[0].text for result in results]
+ with open(save_name, "w", encoding="utf-8") as f:
+ for text, pred, label in zip(prompts, preds, labels):
+ f.write(json.dumps({"prompt": text, "predict": pred, "label": label}, ensure_ascii=False) + "\n")
+
+ print("*" * 70)
+ print(f"{len(prompts)} generated results have been saved at {save_name}.")
+ print("*" * 70)
+
+
+if __name__ == "__main__":
+ fire.Fire(vllm_infer)
diff --git a/post-training/LLaMA-Factory/setup.py b/post-training/LLaMA-Factory/setup.py
new file mode 100644
index 0000000..e00edb3
--- /dev/null
+++ b/post-training/LLaMA-Factory/setup.py
@@ -0,0 +1,117 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import re
+
+from setuptools import find_packages, setup
+
+
+def get_version() -> str:
+ with open(os.path.join("src", "llamafactory", "extras", "env.py"), encoding="utf-8") as f:
+ file_content = f.read()
+ pattern = r"{}\W*=\W*\"([^\"]+)\"".format("VERSION")
+ (version,) = re.findall(pattern, file_content)
+ return version
+
+
+def get_requires() -> list[str]:
+ with open("requirements.txt", encoding="utf-8") as f:
+ file_content = f.read()
+ lines = [line.strip() for line in file_content.strip().split("\n") if not line.startswith("#")]
+ return lines
+
+
+def get_console_scripts() -> list[str]:
+ console_scripts = ["llamafactory-cli = llamafactory.cli:main"]
+ if os.getenv("ENABLE_SHORT_CONSOLE", "1").lower() in ["true", "y", "1"]:
+ console_scripts.append("lmf = llamafactory.cli:main")
+
+ return console_scripts
+
+
+extra_require = {
+ "torch": ["torch>=1.13.1"],
+ "torch-npu": ["torch==2.4.0", "torch-npu==2.4.0.post2", "decorator"],
+ "metrics": ["nltk", "jieba", "rouge-chinese"],
+ "deepspeed": ["deepspeed>=0.10.0,<=0.16.5"],
+ "liger-kernel": ["liger-kernel>=0.5.5"],
+ "bitsandbytes": ["bitsandbytes>=0.39.0"],
+ "hqq": ["hqq"],
+ "eetq": ["eetq"],
+ "gptq": ["optimum>=1.17.0", "auto-gptq>=0.5.0"],
+ "awq": ["autoawq"],
+ "aqlm": ["aqlm[gpu]>=1.1.0"],
+ "vllm": ["vllm>=0.4.3,<=0.8.4"],
+ "sglang": ["sglang[srt]>=0.4.5", "transformers==4.51.1"],
+ "galore": ["galore-torch"],
+ "apollo": ["apollo-torch"],
+ "badam": ["badam>=1.2.1"],
+ "adam-mini": ["adam-mini"],
+ "qwen": ["transformers_stream_generator"],
+ "minicpm_v": [
+ "soundfile",
+ "torchvision",
+ "torchaudio",
+ "vector_quantize_pytorch",
+ "vocos",
+ "msgpack",
+ "referencing",
+ "jsonschema_specifications",
+ "transformers==4.48.3",
+ ],
+ "modelscope": ["modelscope"],
+ "openmind": ["openmind"],
+ "swanlab": ["swanlab"],
+ "dev": ["pre-commit", "ruff", "pytest", "build"],
+}
+
+
+def main():
+ setup(
+ name="llamafactory",
+ version=get_version(),
+ author="hiyouga",
+ author_email="hiyouga@buaa.edu.cn",
+ description="Unified Efficient Fine-Tuning of 100+ LLMs",
+ long_description=open("README.md", encoding="utf-8").read(),
+ long_description_content_type="text/markdown",
+ keywords=["AI", "LLM", "GPT", "ChatGPT", "Llama", "Transformer", "DeepSeek", "Pytorch"],
+ license="Apache 2.0 License",
+ url="https://github.com/hiyouga/LLaMA-Factory",
+ package_dir={"": "src"},
+ packages=find_packages("src"),
+ python_requires=">=3.9.0",
+ install_requires=get_requires(),
+ extras_require=extra_require,
+ entry_points={"console_scripts": get_console_scripts()},
+ classifiers=[
+ "Development Status :: 4 - Beta",
+ "Intended Audience :: Developers",
+ "Intended Audience :: Education",
+ "Intended Audience :: Science/Research",
+ "License :: OSI Approved :: Apache Software License",
+ "Operating System :: OS Independent",
+ "Programming Language :: Python :: 3",
+ "Programming Language :: Python :: 3.9",
+ "Programming Language :: Python :: 3.10",
+ "Programming Language :: Python :: 3.11",
+ "Programming Language :: Python :: 3.12",
+ "Topic :: Scientific/Engineering :: Artificial Intelligence",
+ ],
+ )
+
+
+if __name__ == "__main__":
+ main()
diff --git a/post-training/LLaMA-Factory/src/api.py b/post-training/LLaMA-Factory/src/api.py
new file mode 100644
index 0000000..6121545
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/api.py
@@ -0,0 +1,33 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import uvicorn
+
+from llamafactory.api.app import create_app
+from llamafactory.chat import ChatModel
+
+
+def main():
+ chat_model = ChatModel()
+ app = create_app(chat_model)
+ api_host = os.getenv("API_HOST", "0.0.0.0")
+ api_port = int(os.getenv("API_PORT", "8000"))
+ print(f"Visit http://localhost:{api_port}/docs for API document.")
+ uvicorn.run(app, host=api_host, port=api_port)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/post-training/LLaMA-Factory/src/llamafactory/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/__init__.py
new file mode 100644
index 0000000..b1567ef
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/__init__.py
@@ -0,0 +1,31 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+r"""Efficient fine-tuning of large language models.
+
+Level:
+ api, webui > chat, eval, train > data, model > hparams > extras
+
+Disable version checking: DISABLE_VERSION_CHECK=1
+Enable VRAM recording: RECORD_VRAM=1
+Force using torchrun: FORCE_TORCHRUN=1
+Set logging verbosity: LLAMAFACTORY_VERBOSITY=WARN
+Use modelscope: USE_MODELSCOPE_HUB=1
+Use openmind: USE_OPENMIND_HUB=1
+"""
+
+from .extras.env import VERSION
+
+
+__version__ = VERSION
diff --git a/post-training/LLaMA-Factory/src/llamafactory/api/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/api/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/post-training/LLaMA-Factory/src/llamafactory/api/app.py b/post-training/LLaMA-Factory/src/llamafactory/api/app.py
new file mode 100644
index 0000000..e0621d8
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/api/app.py
@@ -0,0 +1,133 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import asyncio
+import os
+from contextlib import asynccontextmanager
+from functools import partial
+from typing import Annotated, Optional
+
+from ..chat import ChatModel
+from ..extras.constants import EngineName
+from ..extras.misc import torch_gc
+from ..extras.packages import is_fastapi_available, is_starlette_available, is_uvicorn_available
+from .chat import (
+ create_chat_completion_response,
+ create_score_evaluation_response,
+ create_stream_chat_completion_response,
+)
+from .protocol import (
+ ChatCompletionRequest,
+ ChatCompletionResponse,
+ ModelCard,
+ ModelList,
+ ScoreEvaluationRequest,
+ ScoreEvaluationResponse,
+)
+
+
+if is_fastapi_available():
+ from fastapi import Depends, FastAPI, HTTPException, status
+ from fastapi.middleware.cors import CORSMiddleware
+ from fastapi.security.http import HTTPAuthorizationCredentials, HTTPBearer
+
+
+if is_starlette_available():
+ from sse_starlette import EventSourceResponse
+
+
+if is_uvicorn_available():
+ import uvicorn
+
+
+async def sweeper() -> None:
+ while True:
+ torch_gc()
+ await asyncio.sleep(300)
+
+
+@asynccontextmanager
+async def lifespan(app: "FastAPI", chat_model: "ChatModel"): # collects GPU memory
+ if chat_model.engine.name == EngineName.HF:
+ asyncio.create_task(sweeper())
+
+ yield
+ torch_gc()
+
+
+def create_app(chat_model: "ChatModel") -> "FastAPI":
+ root_path = os.getenv("FASTAPI_ROOT_PATH", "")
+ app = FastAPI(lifespan=partial(lifespan, chat_model=chat_model), root_path=root_path)
+ app.add_middleware(
+ CORSMiddleware,
+ allow_origins=["*"],
+ allow_credentials=True,
+ allow_methods=["*"],
+ allow_headers=["*"],
+ )
+ api_key = os.getenv("API_KEY")
+ security = HTTPBearer(auto_error=False)
+
+ async def verify_api_key(auth: Annotated[Optional[HTTPAuthorizationCredentials], Depends(security)]):
+ if api_key and (auth is None or auth.credentials != api_key):
+ raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API key.")
+
+ @app.get(
+ "/v1/models",
+ response_model=ModelList,
+ status_code=status.HTTP_200_OK,
+ dependencies=[Depends(verify_api_key)],
+ )
+ async def list_models():
+ model_card = ModelCard(id=os.getenv("API_MODEL_NAME", "gpt-3.5-turbo"))
+ return ModelList(data=[model_card])
+
+ @app.post(
+ "/v1/chat/completions",
+ response_model=ChatCompletionResponse,
+ status_code=status.HTTP_200_OK,
+ dependencies=[Depends(verify_api_key)],
+ )
+ async def create_chat_completion(request: ChatCompletionRequest):
+ if not chat_model.engine.can_generate:
+ raise HTTPException(status_code=status.HTTP_405_METHOD_NOT_ALLOWED, detail="Not allowed")
+
+ if request.stream:
+ generate = create_stream_chat_completion_response(request, chat_model)
+ return EventSourceResponse(generate, media_type="text/event-stream", sep="\n")
+ else:
+ return await create_chat_completion_response(request, chat_model)
+
+ @app.post(
+ "/v1/score/evaluation",
+ response_model=ScoreEvaluationResponse,
+ status_code=status.HTTP_200_OK,
+ dependencies=[Depends(verify_api_key)],
+ )
+ async def create_score_evaluation(request: ScoreEvaluationRequest):
+ if chat_model.engine.can_generate:
+ raise HTTPException(status_code=status.HTTP_405_METHOD_NOT_ALLOWED, detail="Not allowed")
+
+ return await create_score_evaluation_response(request, chat_model)
+
+ return app
+
+
+def run_api() -> None:
+ chat_model = ChatModel()
+ app = create_app(chat_model)
+ api_host = os.getenv("API_HOST", "0.0.0.0")
+ api_port = int(os.getenv("API_PORT", "8000"))
+ print(f"Visit http://localhost:{api_port}/docs for API document.")
+ uvicorn.run(app, host=api_host, port=api_port)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/api/chat.py b/post-training/LLaMA-Factory/src/llamafactory/api/chat.py
new file mode 100644
index 0000000..c97197d
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/api/chat.py
@@ -0,0 +1,283 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import base64
+import io
+import json
+import os
+import re
+import uuid
+from collections.abc import AsyncGenerator
+from typing import TYPE_CHECKING, Optional
+
+from ..data import Role as DataRole
+from ..extras import logging
+from ..extras.constants import AUDIO_PLACEHOLDER, IMAGE_PLACEHOLDER, VIDEO_PLACEHOLDER
+from ..extras.misc import is_env_enabled
+from ..extras.packages import is_fastapi_available, is_pillow_available, is_requests_available
+from .common import dictify, jsonify
+from .protocol import (
+ ChatCompletionMessage,
+ ChatCompletionResponse,
+ ChatCompletionResponseChoice,
+ ChatCompletionResponseUsage,
+ ChatCompletionStreamResponse,
+ ChatCompletionStreamResponseChoice,
+ Finish,
+ Function,
+ FunctionCall,
+ Role,
+ ScoreEvaluationResponse,
+)
+
+
+if is_fastapi_available():
+ from fastapi import HTTPException, status
+
+
+if is_pillow_available():
+ from PIL import Image
+
+
+if is_requests_available():
+ import requests
+
+
+if TYPE_CHECKING:
+ from ..chat import ChatModel
+ from ..data.mm_plugin import AudioInput, ImageInput, VideoInput
+ from .protocol import ChatCompletionRequest, ScoreEvaluationRequest
+
+
+logger = logging.get_logger(__name__)
+ROLE_MAPPING = {
+ Role.USER: DataRole.USER.value,
+ Role.ASSISTANT: DataRole.ASSISTANT.value,
+ Role.SYSTEM: DataRole.SYSTEM.value,
+ Role.FUNCTION: DataRole.FUNCTION.value,
+ Role.TOOL: DataRole.OBSERVATION.value,
+}
+
+
+def _process_request(
+ request: "ChatCompletionRequest",
+) -> tuple[
+ list[dict[str, str]],
+ Optional[str],
+ Optional[str],
+ Optional[list["ImageInput"]],
+ Optional[list["VideoInput"]],
+ Optional[list["AudioInput"]],
+]:
+ if is_env_enabled("API_VERBOSE", "1"):
+ logger.info_rank0(f"==== request ====\n{json.dumps(dictify(request), indent=2, ensure_ascii=False)}")
+
+ if len(request.messages) == 0:
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid length")
+
+ if request.messages[0].role == Role.SYSTEM:
+ content = request.messages.pop(0).content
+ system = content[0].text if isinstance(content, list) else content
+ else:
+ system = None
+
+ if len(request.messages) % 2 == 0:
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Only supports u/a/u/a/u...")
+
+ input_messages = []
+ images, videos, audios = [], [], []
+ for i, message in enumerate(request.messages):
+ if i % 2 == 0 and message.role not in [Role.USER, Role.TOOL]:
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")
+ elif i % 2 == 1 and message.role not in [Role.ASSISTANT, Role.FUNCTION]:
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")
+
+ if message.role == Role.ASSISTANT and isinstance(message.tool_calls, list) and len(message.tool_calls):
+ tool_calls = [
+ {"name": tool_call.function.name, "arguments": tool_call.function.arguments}
+ for tool_call in message.tool_calls
+ ]
+ content = json.dumps(tool_calls, ensure_ascii=False)
+ input_messages.append({"role": ROLE_MAPPING[Role.FUNCTION], "content": content})
+ elif isinstance(message.content, list):
+ text_content = ""
+ for input_item in message.content:
+ if input_item.type == "text":
+ text_content += input_item.text
+ elif input_item.type == "image_url":
+ text_content += IMAGE_PLACEHOLDER
+ image_url = input_item.image_url.url
+ if re.match(r"^data:image\/(png|jpg|jpeg|gif|bmp);base64,(.+)$", image_url): # base64 image
+ image_stream = io.BytesIO(base64.b64decode(image_url.split(",", maxsplit=1)[1]))
+ elif os.path.isfile(image_url): # local file
+ image_stream = open(image_url, "rb")
+ else: # web uri
+ image_stream = requests.get(image_url, stream=True).raw
+
+ images.append(Image.open(image_stream).convert("RGB"))
+ elif input_item.type == "video_url":
+ text_content += VIDEO_PLACEHOLDER
+ video_url = input_item.video_url.url
+ if re.match(r"^data:video\/(mp4|mkv|avi|mov);base64,(.+)$", video_url): # base64 video
+ video_stream = io.BytesIO(base64.b64decode(video_url.split(",", maxsplit=1)[1]))
+ elif os.path.isfile(video_url): # local file
+ video_stream = open(video_url, "rb")
+ else: # web uri
+ video_stream = requests.get(video_url, stream=True).raw
+
+ videos.append(video_stream)
+ elif input_item.type == "audio_url":
+ text_content += AUDIO_PLACEHOLDER
+ audio_url = input_item.audio_url.url
+ if re.match(r"^data:audio\/(mpeg|mp3|wav|ogg);base64,(.+)$", audio_url): # base64 audio
+ audio_stream = io.BytesIO(base64.b64decode(audio_url.split(",", maxsplit=1)[1]))
+ elif os.path.isfile(audio_url): # local file
+ audio_stream = open(audio_url, "rb")
+ else: # web uri
+ audio_stream = requests.get(audio_url, stream=True).raw
+
+ audios.append(audio_stream)
+ else:
+ raise HTTPException(
+ status_code=status.HTTP_400_BAD_REQUEST, detail=f"Invalid input type {input_item.type}."
+ )
+
+ input_messages.append({"role": ROLE_MAPPING[message.role], "content": text_content})
+ else:
+ input_messages.append({"role": ROLE_MAPPING[message.role], "content": message.content})
+
+ tool_list = request.tools
+ if isinstance(tool_list, list) and len(tool_list):
+ try:
+ tools = json.dumps([dictify(tool.function) for tool in tool_list], ensure_ascii=False)
+ except json.JSONDecodeError:
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid tools")
+ else:
+ tools = None
+
+ return input_messages, system, tools, images or None, videos or None, audios or None
+
+
+def _create_stream_chat_completion_chunk(
+ completion_id: str,
+ model: str,
+ delta: "ChatCompletionMessage",
+ index: Optional[int] = 0,
+ finish_reason: Optional["Finish"] = None,
+) -> str:
+ choice_data = ChatCompletionStreamResponseChoice(index=index, delta=delta, finish_reason=finish_reason)
+ chunk = ChatCompletionStreamResponse(id=completion_id, model=model, choices=[choice_data])
+ return jsonify(chunk)
+
+
+async def create_chat_completion_response(
+ request: "ChatCompletionRequest", chat_model: "ChatModel"
+) -> "ChatCompletionResponse":
+ completion_id = f"chatcmpl-{uuid.uuid4().hex}"
+ input_messages, system, tools, images, videos, audios = _process_request(request)
+ responses = await chat_model.achat(
+ input_messages,
+ system,
+ tools,
+ images,
+ videos,
+ audios,
+ do_sample=request.do_sample,
+ temperature=request.temperature,
+ top_p=request.top_p,
+ max_new_tokens=request.max_tokens,
+ num_return_sequences=request.n,
+ stop=request.stop,
+ )
+
+ prompt_length, response_length = 0, 0
+ choices = []
+ for i, response in enumerate(responses):
+ if tools:
+ result = chat_model.engine.template.extract_tool(response.response_text)
+ else:
+ result = response.response_text
+
+ if isinstance(result, list):
+ tool_calls = []
+ for tool in result:
+ function = Function(name=tool.name, arguments=tool.arguments)
+ tool_calls.append(FunctionCall(id=f"call_{uuid.uuid4().hex}", function=function))
+
+ response_message = ChatCompletionMessage(role=Role.ASSISTANT, tool_calls=tool_calls)
+ finish_reason = Finish.TOOL
+ else:
+ response_message = ChatCompletionMessage(role=Role.ASSISTANT, content=result)
+ finish_reason = Finish.STOP if response.finish_reason == "stop" else Finish.LENGTH
+
+ choices.append(ChatCompletionResponseChoice(index=i, message=response_message, finish_reason=finish_reason))
+ prompt_length = response.prompt_length
+ response_length += response.response_length
+
+ usage = ChatCompletionResponseUsage(
+ prompt_tokens=prompt_length,
+ completion_tokens=response_length,
+ total_tokens=prompt_length + response_length,
+ )
+
+ return ChatCompletionResponse(id=completion_id, model=request.model, choices=choices, usage=usage)
+
+
+async def create_stream_chat_completion_response(
+ request: "ChatCompletionRequest", chat_model: "ChatModel"
+) -> AsyncGenerator[str, None]:
+ completion_id = f"chatcmpl-{uuid.uuid4().hex}"
+ input_messages, system, tools, images, videos, audios = _process_request(request)
+ if tools:
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Cannot stream function calls.")
+
+ if request.n > 1:
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Cannot stream multiple responses.")
+
+ yield _create_stream_chat_completion_chunk(
+ completion_id=completion_id, model=request.model, delta=ChatCompletionMessage(role=Role.ASSISTANT, content="")
+ )
+ async for new_token in chat_model.astream_chat(
+ input_messages,
+ system,
+ tools,
+ images,
+ videos,
+ audios,
+ do_sample=request.do_sample,
+ temperature=request.temperature,
+ top_p=request.top_p,
+ max_new_tokens=request.max_tokens,
+ stop=request.stop,
+ ):
+ if len(new_token) != 0:
+ yield _create_stream_chat_completion_chunk(
+ completion_id=completion_id, model=request.model, delta=ChatCompletionMessage(content=new_token)
+ )
+
+ yield _create_stream_chat_completion_chunk(
+ completion_id=completion_id, model=request.model, delta=ChatCompletionMessage(), finish_reason=Finish.STOP
+ )
+ yield "[DONE]"
+
+
+async def create_score_evaluation_response(
+ request: "ScoreEvaluationRequest", chat_model: "ChatModel"
+) -> "ScoreEvaluationResponse":
+ score_id = f"scoreval-{uuid.uuid4().hex}"
+ if len(request.messages) == 0:
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid request")
+
+ scores = await chat_model.aget_scores(request.messages, max_length=request.max_length)
+ return ScoreEvaluationResponse(id=score_id, model=request.model, scores=scores)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/api/common.py b/post-training/LLaMA-Factory/src/llamafactory/api/common.py
new file mode 100644
index 0000000..f4d0c2f
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/api/common.py
@@ -0,0 +1,34 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+from typing import TYPE_CHECKING, Any
+
+
+if TYPE_CHECKING:
+ from pydantic import BaseModel
+
+
+def dictify(data: "BaseModel") -> dict[str, Any]:
+ try: # pydantic v2
+ return data.model_dump(exclude_unset=True)
+ except AttributeError: # pydantic v1
+ return data.dict(exclude_unset=True)
+
+
+def jsonify(data: "BaseModel") -> str:
+ try: # pydantic v2
+ return json.dumps(data.model_dump(exclude_unset=True), ensure_ascii=False)
+ except AttributeError: # pydantic v1
+ return data.json(exclude_unset=True, ensure_ascii=False)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/api/protocol.py b/post-training/LLaMA-Factory/src/llamafactory/api/protocol.py
new file mode 100644
index 0000000..ac9746e
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/api/protocol.py
@@ -0,0 +1,156 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import time
+from enum import Enum, unique
+from typing import Any, Optional, Union
+
+from pydantic import BaseModel, Field
+from typing_extensions import Literal
+
+
+@unique
+class Role(str, Enum):
+ USER = "user"
+ ASSISTANT = "assistant"
+ SYSTEM = "system"
+ FUNCTION = "function"
+ TOOL = "tool"
+
+
+@unique
+class Finish(str, Enum):
+ STOP = "stop"
+ LENGTH = "length"
+ TOOL = "tool_calls"
+
+
+class ModelCard(BaseModel):
+ id: str
+ object: Literal["model"] = "model"
+ created: int = Field(default_factory=lambda: int(time.time()))
+ owned_by: Literal["owner"] = "owner"
+
+
+class ModelList(BaseModel):
+ object: Literal["list"] = "list"
+ data: list[ModelCard] = []
+
+
+class Function(BaseModel):
+ name: str
+ arguments: str
+
+
+class FunctionDefinition(BaseModel):
+ name: str
+ description: str
+ parameters: dict[str, Any]
+
+
+class FunctionAvailable(BaseModel):
+ type: Literal["function", "code_interpreter"] = "function"
+ function: Optional[FunctionDefinition] = None
+
+
+class FunctionCall(BaseModel):
+ id: str
+ type: Literal["function"] = "function"
+ function: Function
+
+
+class URL(BaseModel):
+ url: str
+ detail: Literal["auto", "low", "high"] = "auto"
+
+
+class MultimodalInputItem(BaseModel):
+ type: Literal["text", "image_url", "video_url", "audio_url"]
+ text: Optional[str] = None
+ image_url: Optional[URL] = None
+ video_url: Optional[URL] = None
+ audio_url: Optional[URL] = None
+
+
+class ChatMessage(BaseModel):
+ role: Role
+ content: Optional[Union[str, list[MultimodalInputItem]]] = None
+ tool_calls: Optional[list[FunctionCall]] = None
+
+
+class ChatCompletionMessage(BaseModel):
+ role: Optional[Role] = None
+ content: Optional[str] = None
+ tool_calls: Optional[list[FunctionCall]] = None
+
+
+class ChatCompletionRequest(BaseModel):
+ model: str
+ messages: list[ChatMessage]
+ tools: Optional[list[FunctionAvailable]] = None
+ do_sample: Optional[bool] = None
+ temperature: Optional[float] = None
+ top_p: Optional[float] = None
+ n: int = 1
+ max_tokens: Optional[int] = None
+ stop: Optional[Union[str, list[str]]] = None
+ stream: bool = False
+
+
+class ChatCompletionResponseChoice(BaseModel):
+ index: int
+ message: ChatCompletionMessage
+ finish_reason: Finish
+
+
+class ChatCompletionStreamResponseChoice(BaseModel):
+ index: int
+ delta: ChatCompletionMessage
+ finish_reason: Optional[Finish] = None
+
+
+class ChatCompletionResponseUsage(BaseModel):
+ prompt_tokens: int
+ completion_tokens: int
+ total_tokens: int
+
+
+class ChatCompletionResponse(BaseModel):
+ id: str
+ object: Literal["chat.completion"] = "chat.completion"
+ created: int = Field(default_factory=lambda: int(time.time()))
+ model: str
+ choices: list[ChatCompletionResponseChoice]
+ usage: ChatCompletionResponseUsage
+
+
+class ChatCompletionStreamResponse(BaseModel):
+ id: str
+ object: Literal["chat.completion.chunk"] = "chat.completion.chunk"
+ created: int = Field(default_factory=lambda: int(time.time()))
+ model: str
+ choices: list[ChatCompletionStreamResponseChoice]
+
+
+class ScoreEvaluationRequest(BaseModel):
+ model: str
+ messages: list[str]
+ max_length: Optional[int] = None
+
+
+class ScoreEvaluationResponse(BaseModel):
+ id: str
+ object: Literal["score.evaluation"] = "score.evaluation"
+ model: str
+ scores: list[float]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/chat/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/chat/__init__.py
new file mode 100644
index 0000000..15d8b9b
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/chat/__init__.py
@@ -0,0 +1,19 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .base_engine import BaseEngine
+from .chat_model import ChatModel
+
+
+__all__ = ["BaseEngine", "ChatModel"]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/chat/base_engine.py b/post-training/LLaMA-Factory/src/llamafactory/chat/base_engine.py
new file mode 100644
index 0000000..6d497c1
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/chat/base_engine.py
@@ -0,0 +1,98 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from abc import ABC, abstractmethod
+from collections.abc import AsyncGenerator
+from dataclasses import dataclass
+from typing import TYPE_CHECKING, Any, Literal, Optional, Union
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel, PreTrainedTokenizer
+ from vllm import AsyncLLMEngine
+
+ from ..data import Template
+ from ..data.mm_plugin import AudioInput, ImageInput, VideoInput
+ from ..extras.constants import EngineName
+ from ..hparams import DataArguments, FinetuningArguments, GeneratingArguments, ModelArguments
+
+
+@dataclass
+class Response:
+ response_text: str
+ response_length: int
+ prompt_length: int
+ finish_reason: Literal["stop", "length"]
+
+
+class BaseEngine(ABC):
+ r"""Base class for inference engine of chat models.
+
+ Must implements async methods: chat(), stream_chat() and get_scores().
+ """
+
+ name: "EngineName"
+ model: Union["PreTrainedModel", "AsyncLLMEngine"]
+ tokenizer: "PreTrainedTokenizer"
+ can_generate: bool
+ template: "Template"
+ generating_args: dict[str, Any]
+
+ @abstractmethod
+ def __init__(
+ self,
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ finetuning_args: "FinetuningArguments",
+ generating_args: "GeneratingArguments",
+ ) -> None:
+ r"""Initialize an inference engine."""
+ ...
+
+ @abstractmethod
+ async def chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> list["Response"]:
+ r"""Get a list of responses of the chat model."""
+ ...
+
+ @abstractmethod
+ async def stream_chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> AsyncGenerator[str, None]:
+ r"""Get the response token-by-token of the chat model."""
+ ...
+
+ @abstractmethod
+ async def get_scores(
+ self,
+ batch_input: list[str],
+ **input_kwargs,
+ ) -> list[float]:
+ r"""Get a list of scores of the reward model."""
+ ...
diff --git a/post-training/LLaMA-Factory/src/llamafactory/chat/chat_model.py b/post-training/LLaMA-Factory/src/llamafactory/chat/chat_model.py
new file mode 100644
index 0000000..0022eed
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/chat/chat_model.py
@@ -0,0 +1,184 @@
+# Copyright 2025 THUDM and the LlamaFactory team.
+#
+# This code is inspired by the THUDM's ChatGLM implementation.
+# https://github.com/THUDM/ChatGLM-6B/blob/main/cli_demo.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import asyncio
+import os
+from collections.abc import AsyncGenerator, Generator
+from threading import Thread
+from typing import TYPE_CHECKING, Any, Optional
+
+from ..extras.constants import EngineName
+from ..extras.misc import torch_gc
+from ..hparams import get_infer_args
+from .hf_engine import HuggingfaceEngine
+from .sglang_engine import SGLangEngine
+from .vllm_engine import VllmEngine
+
+
+if TYPE_CHECKING:
+ from ..data.mm_plugin import AudioInput, ImageInput, VideoInput
+ from .base_engine import BaseEngine, Response
+
+
+def _start_background_loop(loop: "asyncio.AbstractEventLoop") -> None:
+ asyncio.set_event_loop(loop)
+ loop.run_forever()
+
+
+class ChatModel:
+ r"""General class for chat models. Backed by huggingface or vllm engines.
+
+ Supports both sync and async methods.
+ Sync methods: chat(), stream_chat() and get_scores().
+ Async methods: achat(), astream_chat() and aget_scores().
+ """
+
+ def __init__(self, args: Optional[dict[str, Any]] = None) -> None:
+ model_args, data_args, finetuning_args, generating_args = get_infer_args(args)
+ if model_args.infer_backend == EngineName.HF:
+ self.engine: BaseEngine = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
+ elif model_args.infer_backend == EngineName.VLLM:
+ self.engine: BaseEngine = VllmEngine(model_args, data_args, finetuning_args, generating_args)
+ elif model_args.infer_backend == EngineName.SGLANG:
+ self.engine: BaseEngine = SGLangEngine(model_args, data_args, finetuning_args, generating_args)
+ else:
+ raise NotImplementedError(f"Unknown backend: {model_args.infer_backend}")
+
+ self._loop = asyncio.new_event_loop()
+ self._thread = Thread(target=_start_background_loop, args=(self._loop,), daemon=True)
+ self._thread.start()
+
+ def chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> list["Response"]:
+ r"""Get a list of responses of the chat model."""
+ task = asyncio.run_coroutine_threadsafe(
+ self.achat(messages, system, tools, images, videos, audios, **input_kwargs), self._loop
+ )
+ return task.result()
+
+ async def achat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> list["Response"]:
+ r"""Asynchronously get a list of responses of the chat model."""
+ return await self.engine.chat(messages, system, tools, images, videos, audios, **input_kwargs)
+
+ def stream_chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> Generator[str, None, None]:
+ r"""Get the response token-by-token of the chat model."""
+ generator = self.astream_chat(messages, system, tools, images, videos, audios, **input_kwargs)
+ while True:
+ try:
+ task = asyncio.run_coroutine_threadsafe(generator.__anext__(), self._loop)
+ yield task.result()
+ except StopAsyncIteration:
+ break
+
+ async def astream_chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> AsyncGenerator[str, None]:
+ r"""Asynchronously get the response token-by-token of the chat model."""
+ async for new_token in self.engine.stream_chat(
+ messages, system, tools, images, videos, audios, **input_kwargs
+ ):
+ yield new_token
+
+ def get_scores(
+ self,
+ batch_input: list[str],
+ **input_kwargs,
+ ) -> list[float]:
+ r"""Get a list of scores of the reward model."""
+ task = asyncio.run_coroutine_threadsafe(self.aget_scores(batch_input, **input_kwargs), self._loop)
+ return task.result()
+
+ async def aget_scores(
+ self,
+ batch_input: list[str],
+ **input_kwargs,
+ ) -> list[float]:
+ r"""Asynchronously get a list of scores of the reward model."""
+ return await self.engine.get_scores(batch_input, **input_kwargs)
+
+
+def run_chat() -> None:
+ if os.name != "nt":
+ try:
+ import readline # noqa: F401
+ except ImportError:
+ print("Install `readline` for a better experience.")
+
+ chat_model = ChatModel()
+ messages = []
+ print("Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.")
+
+ while True:
+ try:
+ query = input("\nUser: ")
+ except UnicodeDecodeError:
+ print("Detected decoding error at the inputs, please set the terminal encoding to utf-8.")
+ continue
+ except Exception:
+ raise
+
+ if query.strip() == "exit":
+ break
+
+ if query.strip() == "clear":
+ messages = []
+ torch_gc()
+ print("History has been removed.")
+ continue
+
+ messages.append({"role": "user", "content": query})
+ print("Assistant: ", end="", flush=True)
+
+ response = ""
+ for new_text in chat_model.stream_chat(messages):
+ print(new_text, end="", flush=True)
+ response += new_text
+ print()
+ messages.append({"role": "assistant", "content": response})
diff --git a/post-training/LLaMA-Factory/src/llamafactory/chat/hf_engine.py b/post-training/LLaMA-Factory/src/llamafactory/chat/hf_engine.py
new file mode 100644
index 0000000..20a3c19
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/chat/hf_engine.py
@@ -0,0 +1,413 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import asyncio
+import os
+from collections.abc import AsyncGenerator
+from threading import Thread
+from typing import TYPE_CHECKING, Any, Callable, Optional, Union
+
+import torch
+from transformers import GenerationConfig, TextIteratorStreamer
+from typing_extensions import override
+
+from ..data import get_template_and_fix_tokenizer
+from ..extras import logging
+from ..extras.constants import AUDIO_PLACEHOLDER, IMAGE_PLACEHOLDER, VIDEO_PLACEHOLDER, EngineName
+from ..model import load_model, load_tokenizer
+from .base_engine import BaseEngine, Response
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel, PreTrainedTokenizer, ProcessorMixin
+ from trl import PreTrainedModelWrapper
+
+ from ..data import Template
+ from ..data.mm_plugin import AudioInput, ImageInput, VideoInput
+ from ..hparams import DataArguments, FinetuningArguments, GeneratingArguments, ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+class HuggingfaceEngine(BaseEngine):
+ def __init__(
+ self,
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ finetuning_args: "FinetuningArguments",
+ generating_args: "GeneratingArguments",
+ ) -> None:
+ self.name = EngineName.HF
+ self.can_generate = finetuning_args.stage == "sft"
+ tokenizer_module = load_tokenizer(model_args)
+ self.tokenizer = tokenizer_module["tokenizer"]
+ self.processor = tokenizer_module["processor"]
+ self.tokenizer.padding_side = "left" if self.can_generate else "right"
+ self.template = get_template_and_fix_tokenizer(self.tokenizer, data_args)
+ self.model = load_model(
+ self.tokenizer, model_args, finetuning_args, is_trainable=False, add_valuehead=(not self.can_generate)
+ ) # must after fixing tokenizer to resize vocab
+ self.generating_args = generating_args.to_dict()
+ try:
+ asyncio.get_event_loop()
+ except RuntimeError:
+ logger.warning_rank0_once("There is no current event loop, creating a new one.")
+ loop = asyncio.new_event_loop()
+ asyncio.set_event_loop(loop)
+
+ self.semaphore = asyncio.Semaphore(int(os.getenv("MAX_CONCURRENT", "1")))
+
+ @staticmethod
+ def _process_args(
+ model: "PreTrainedModel",
+ tokenizer: "PreTrainedTokenizer",
+ processor: Optional["ProcessorMixin"],
+ template: "Template",
+ generating_args: dict[str, Any],
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ input_kwargs: Optional[dict[str, Any]] = {},
+ ) -> tuple[dict[str, Any], int]:
+ mm_input_dict = {"images": [], "videos": [], "audios": [], "imglens": [0], "vidlens": [0], "audlens": [0]}
+ if images is not None:
+ mm_input_dict.update({"images": images, "imglens": [len(images)]})
+ if not any(IMAGE_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = IMAGE_PLACEHOLDER * len(images) + messages[0]["content"]
+
+ if videos is not None:
+ mm_input_dict.update({"videos": videos, "vidlens": [len(videos)]})
+ if not any(VIDEO_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = VIDEO_PLACEHOLDER * len(videos) + messages[0]["content"]
+
+ if audios is not None:
+ mm_input_dict.update({"audios": audios, "audlens": [len(audios)]})
+ if not any(AUDIO_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = AUDIO_PLACEHOLDER * len(audios) + messages[0]["content"]
+
+ messages = template.mm_plugin.process_messages(
+ messages, mm_input_dict["images"], mm_input_dict["videos"], mm_input_dict["audios"], processor
+ )
+ paired_messages = messages + [{"role": "assistant", "content": ""}]
+ system = system or generating_args["default_system"]
+ prompt_ids, _ = template.encode_oneturn(tokenizer, paired_messages, system, tools)
+ prompt_ids, _ = template.mm_plugin.process_token_ids(
+ prompt_ids,
+ None,
+ mm_input_dict["images"],
+ mm_input_dict["videos"],
+ mm_input_dict["audios"],
+ tokenizer,
+ processor,
+ )
+ prompt_length = len(prompt_ids)
+ inputs = torch.tensor([prompt_ids], device=model.device)
+ attention_mask = torch.ones_like(inputs, dtype=torch.bool)
+
+ do_sample: Optional[bool] = input_kwargs.pop("do_sample", None)
+ temperature: Optional[float] = input_kwargs.pop("temperature", None)
+ top_p: Optional[float] = input_kwargs.pop("top_p", None)
+ top_k: Optional[float] = input_kwargs.pop("top_k", None)
+ num_return_sequences: int = input_kwargs.pop("num_return_sequences", 1)
+ repetition_penalty: Optional[float] = input_kwargs.pop("repetition_penalty", None)
+ length_penalty: Optional[float] = input_kwargs.pop("length_penalty", None)
+ skip_special_tokens: Optional[bool] = input_kwargs.pop("skip_special_tokens", None)
+ max_length: Optional[int] = input_kwargs.pop("max_length", None)
+ max_new_tokens: Optional[int] = input_kwargs.pop("max_new_tokens", None)
+ stop: Optional[Union[str, list[str]]] = input_kwargs.pop("stop", None)
+
+ if stop is not None:
+ logger.warning_rank0("Stop parameter is not supported by the huggingface engine yet.")
+
+ generating_args = generating_args.copy()
+ generating_args.update(
+ dict(
+ do_sample=do_sample if do_sample is not None else generating_args["do_sample"],
+ temperature=temperature if temperature is not None else generating_args["temperature"],
+ top_p=top_p if top_p is not None else generating_args["top_p"],
+ top_k=top_k if top_k is not None else generating_args["top_k"],
+ num_return_sequences=num_return_sequences,
+ repetition_penalty=repetition_penalty
+ if repetition_penalty is not None
+ else generating_args["repetition_penalty"],
+ length_penalty=length_penalty if length_penalty is not None else generating_args["length_penalty"],
+ skip_special_tokens=skip_special_tokens
+ if skip_special_tokens is not None
+ else generating_args["skip_special_tokens"],
+ eos_token_id=template.get_stop_token_ids(tokenizer),
+ pad_token_id=tokenizer.pad_token_id,
+ )
+ )
+
+ if isinstance(num_return_sequences, int) and num_return_sequences > 1: # do_sample needs temperature > 0
+ generating_args["do_sample"] = True
+ generating_args["temperature"] = generating_args["temperature"] or 1.0
+
+ if not generating_args["temperature"]:
+ generating_args["do_sample"] = False
+
+ if not generating_args["do_sample"]:
+ generating_args.pop("temperature", None)
+ generating_args.pop("top_p", None)
+
+ if max_length:
+ generating_args.pop("max_new_tokens", None)
+ generating_args["max_length"] = max_length
+
+ if max_new_tokens:
+ generating_args.pop("max_length", None)
+ generating_args["max_new_tokens"] = max_new_tokens
+
+ gen_kwargs = dict(
+ inputs=inputs,
+ attention_mask=attention_mask,
+ generation_config=GenerationConfig(**generating_args),
+ )
+
+ mm_inputs = template.mm_plugin.get_mm_inputs(**mm_input_dict, batch_ids=[prompt_ids], processor=processor)
+ for key, value in mm_inputs.items():
+ if isinstance(value, list) and isinstance(value[0], torch.Tensor): # for pixtral inputs
+ value = torch.stack(value) # assume they have same sizes
+ elif (
+ isinstance(value, list) and isinstance(value[0], list) and isinstance(value[0][0], torch.Tensor)
+ ): # for minicpmv inputs
+ value = torch.stack([torch.stack(v) for v in value])
+ elif not isinstance(value, torch.Tensor):
+ value = torch.tensor(value)
+
+ if torch.is_floating_point(value): # cast data dtype for paligemma
+ value = value.to(model.dtype)
+
+ if key == "second_per_grid_ts": # qwen2.5vl special case
+ gen_kwargs[key] = value.tolist()
+ else:
+ gen_kwargs[key] = value.to(model.device)
+
+ if getattr(model.config, "model_type", None) in ["minicpmv", "minicpmo"]:
+ gen_kwargs["input_ids"] = inputs
+ gen_kwargs["tokenizer"] = tokenizer
+ if "audio_feature_lens" in mm_inputs:
+ gen_kwargs["audio_feature_lens"] = mm_inputs["audio_feature_lens"]
+
+ gen_kwargs.pop("image_sizes", None)
+
+ return gen_kwargs, prompt_length
+
+ @staticmethod
+ @torch.inference_mode()
+ def _chat(
+ model: "PreTrainedModel",
+ tokenizer: "PreTrainedTokenizer",
+ processor: Optional["ProcessorMixin"],
+ template: "Template",
+ generating_args: dict[str, Any],
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ input_kwargs: Optional[dict[str, Any]] = {},
+ ) -> list["Response"]:
+ gen_kwargs, prompt_length = HuggingfaceEngine._process_args(
+ model,
+ tokenizer,
+ processor,
+ template,
+ generating_args,
+ messages,
+ system,
+ tools,
+ images,
+ videos,
+ audios,
+ input_kwargs,
+ )
+ generate_output = model.generate(**gen_kwargs)
+ if isinstance(generate_output, tuple):
+ generate_output = generate_output[1][0] # post-process the minicpm_o output
+
+ response_ids = generate_output[:, prompt_length:]
+ response = tokenizer.batch_decode(
+ response_ids,
+ skip_special_tokens=getattr(gen_kwargs["generation_config"], "skip_special_tokens", True),
+ clean_up_tokenization_spaces=True,
+ )
+ results = []
+ for i in range(len(response)):
+ eos_index = (response_ids[i] == tokenizer.eos_token_id).nonzero()
+ response_length = (eos_index[0].item() + 1) if len(eos_index) else len(response_ids[i])
+ results.append(
+ Response(
+ response_text=response[i],
+ response_length=response_length,
+ prompt_length=prompt_length,
+ finish_reason="stop" if len(eos_index) else "length",
+ )
+ )
+
+ return results
+
+ @staticmethod
+ @torch.inference_mode()
+ def _stream_chat(
+ model: "PreTrainedModel",
+ tokenizer: "PreTrainedTokenizer",
+ processor: Optional["ProcessorMixin"],
+ template: "Template",
+ generating_args: dict[str, Any],
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ input_kwargs: Optional[dict[str, Any]] = {},
+ ) -> Callable[[], str]:
+ gen_kwargs, _ = HuggingfaceEngine._process_args(
+ model,
+ tokenizer,
+ processor,
+ template,
+ generating_args,
+ messages,
+ system,
+ tools,
+ images,
+ videos,
+ audios,
+ input_kwargs,
+ )
+ streamer = TextIteratorStreamer(
+ tokenizer,
+ skip_prompt=True,
+ skip_special_tokens=getattr(gen_kwargs["generation_config"], "skip_special_tokens", True),
+ )
+ gen_kwargs["streamer"] = streamer
+ thread = Thread(target=model.generate, kwargs=gen_kwargs, daemon=True)
+ thread.start()
+
+ def stream():
+ try:
+ return streamer.__next__()
+ except StopIteration:
+ raise StopAsyncIteration()
+
+ return stream
+
+ @staticmethod
+ @torch.inference_mode()
+ def _get_scores(
+ model: "PreTrainedModelWrapper",
+ tokenizer: "PreTrainedTokenizer",
+ batch_input: list[str],
+ input_kwargs: Optional[dict[str, Any]] = {},
+ ) -> list[float]:
+ max_length: Optional[int] = input_kwargs.pop("max_length", None)
+ device = getattr(model.pretrained_model, "device", "cuda")
+ inputs: dict[str, torch.Tensor] = tokenizer(
+ batch_input,
+ padding=True,
+ truncation=True,
+ max_length=max_length or getattr(model.config, "max_position_embeddings", 1024),
+ return_tensors="pt",
+ add_special_tokens=False,
+ ).to(device)
+ values: torch.Tensor = model(**inputs, return_dict=True, use_cache=False)[-1]
+ scores = values.gather(dim=-1, index=(inputs["attention_mask"].sum(dim=-1, keepdim=True) - 1))
+ return scores
+
+ @override
+ async def chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> list["Response"]:
+ if not self.can_generate:
+ raise ValueError("The current model does not support `chat`.")
+
+ input_args = (
+ self.model,
+ self.tokenizer,
+ self.processor,
+ self.template,
+ self.generating_args,
+ messages,
+ system,
+ tools,
+ images,
+ videos,
+ audios,
+ input_kwargs,
+ )
+ async with self.semaphore:
+ return await asyncio.to_thread(self._chat, *input_args)
+
+ @override
+ async def stream_chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> AsyncGenerator[str, None]:
+ if not self.can_generate:
+ raise ValueError("The current model does not support `stream_chat`.")
+
+ input_args = (
+ self.model,
+ self.tokenizer,
+ self.processor,
+ self.template,
+ self.generating_args,
+ messages,
+ system,
+ tools,
+ images,
+ videos,
+ audios,
+ input_kwargs,
+ )
+ async with self.semaphore:
+ stream = self._stream_chat(*input_args)
+ while True:
+ try:
+ yield await asyncio.to_thread(stream)
+ except StopAsyncIteration:
+ break
+
+ @override
+ async def get_scores(
+ self,
+ batch_input: list[str],
+ **input_kwargs,
+ ) -> list[float]:
+ if self.can_generate:
+ raise ValueError("Cannot get scores using an auto-regressive model.")
+
+ input_args = (self.model, self.tokenizer, batch_input, input_kwargs)
+ async with self.semaphore:
+ return await asyncio.to_thread(self._get_scores, *input_args)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/chat/sglang_engine.py b/post-training/LLaMA-Factory/src/llamafactory/chat/sglang_engine.py
new file mode 100644
index 0000000..3fc3aeb
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/chat/sglang_engine.py
@@ -0,0 +1,275 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import asyncio
+import atexit
+import json
+from collections.abc import AsyncGenerator, AsyncIterator, Sequence
+from typing import TYPE_CHECKING, Any, Optional, Union
+
+import requests
+from typing_extensions import override
+
+from ..data import get_template_and_fix_tokenizer
+from ..extras import logging
+from ..extras.constants import AUDIO_PLACEHOLDER, IMAGE_PLACEHOLDER, VIDEO_PLACEHOLDER, EngineName
+from ..extras.misc import get_device_count, torch_gc
+from ..extras.packages import is_sglang_available
+from ..hparams import DataArguments, FinetuningArguments, GeneratingArguments, ModelArguments
+from ..model import load_config, load_tokenizer
+from ..model.model_utils.quantization import QuantizationMethod
+from .base_engine import BaseEngine, Response
+
+
+if is_sglang_available():
+ from sglang.utils import launch_server_cmd, terminate_process, wait_for_server # type: ignore
+
+
+if TYPE_CHECKING:
+ from ..data.mm_plugin import AudioInput, ImageInput, VideoInput
+
+
+logger = logging.get_logger(__name__)
+
+
+class SGLangEngine(BaseEngine):
+ """Inference engine for SGLang models.
+
+ This class wraps the SGLang engine to provide a consistent interface for text generation
+ that matches LLaMA Factory's requirements. It uses the SGLang HTTP server approach for
+ better interaction and performance. The engine launches a server process and communicates
+ with it via HTTP requests.
+
+ For more details on the SGLang HTTP server approach, see:
+ https://docs.sglang.ai/backend/send_request.html
+ """
+
+ def __init__(
+ self,
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ finetuning_args: "FinetuningArguments",
+ generating_args: "GeneratingArguments",
+ ) -> None:
+ self.name = EngineName.SGLANG
+ self.model_args = model_args
+ config = load_config(model_args) # may download model from ms hub
+ if getattr(config, "quantization_config", None): # gptq models should use float16
+ quantization_config: dict[str, Any] = getattr(config, "quantization_config", None)
+ quant_method = quantization_config.get("quant_method", "")
+ if quant_method == QuantizationMethod.GPTQ and model_args.infer_dtype == "auto":
+ model_args.infer_dtype = "float16"
+
+ self.can_generate = finetuning_args.stage == "sft"
+ tokenizer_module = load_tokenizer(model_args)
+ self.tokenizer = tokenizer_module["tokenizer"]
+ self.processor = tokenizer_module["processor"]
+ self.tokenizer.padding_side = "left"
+ self.template = get_template_and_fix_tokenizer(self.tokenizer, data_args)
+ self.template.mm_plugin.expand_mm_tokens = False # for sglang generate
+ self.generating_args = generating_args.to_dict()
+
+ launch_cmd = [
+ "python3 -m sglang.launch_server",
+ f"--model-path {model_args.model_name_or_path}",
+ f"--dtype {model_args.infer_dtype}",
+ f"--context-length {model_args.sglang_maxlen}",
+ f"--mem-fraction-static {model_args.sglang_mem_fraction}",
+ f"--tp-size {model_args.sglang_tp_size if model_args.sglang_tp_size != -1 else get_device_count() or 1}",
+ f"--download-dir {model_args.cache_dir}",
+ "--log-level error",
+ ]
+ launch_cmd = " ".join(launch_cmd)
+ logger.info_rank0(f"Starting SGLang server with command: {launch_cmd}")
+ try:
+ torch_gc()
+ self.server_process, port = launch_server_cmd(launch_cmd)
+ self.base_url = f"http://localhost:{port}"
+ atexit.register(self._cleanup_server)
+
+ logger.info_rank0(f"Waiting for SGLang server to be ready at {self.base_url}")
+ wait_for_server(self.base_url, timeout=300)
+ logger.info_rank0(f"SGLang server initialized successfully at {self.base_url}")
+ try:
+ response = requests.get(f"{self.base_url}/get_model_info", timeout=5)
+ if response.status_code == 200:
+ model_info = response.json()
+ logger.info(f"SGLang server model info: {model_info}")
+ except Exception as e:
+ logger.debug(f"Note: could not get model info: {str(e)}")
+
+ except Exception as e:
+ logger.error(f"Failed to start SGLang server: {str(e)}")
+ self._cleanup_server() # make sure to clean up any started process
+ raise RuntimeError(f"SGLang server initialization failed: {str(e)}.")
+
+ def _cleanup_server(self):
+ r"""Clean up the server process when the engine is destroyed."""
+ if hasattr(self, "server_process") and self.server_process:
+ try:
+ logger.info("Terminating SGLang server process")
+ terminate_process(self.server_process)
+ logger.info("SGLang server process terminated")
+ except Exception as e:
+ logger.warning(f"Error terminating SGLang server: {str(e)}")
+
+ async def _generate(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> AsyncIterator[dict[str, Any]]:
+ if images is not None and not any(IMAGE_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = IMAGE_PLACEHOLDER * len(images) + messages[0]["content"]
+
+ if videos is not None and not any(VIDEO_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = VIDEO_PLACEHOLDER * len(videos) + messages[0]["content"]
+
+ if audios is not None and not any(AUDIO_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = AUDIO_PLACEHOLDER * len(audios) + messages[0]["content"]
+
+ messages = self.template.mm_plugin.process_messages(
+ messages, images or [], videos or [], audios or [], self.processor
+ )
+ paired_messages = messages + [{"role": "assistant", "content": ""}]
+ system = system or self.generating_args["default_system"]
+ prompt_ids, _ = self.template.encode_oneturn(self.tokenizer, paired_messages, system, tools)
+ prompt_length = len(prompt_ids)
+
+ temperature: Optional[float] = input_kwargs.pop("temperature", None)
+ top_p: Optional[float] = input_kwargs.pop("top_p", None)
+ top_k: Optional[float] = input_kwargs.pop("top_k", None)
+ num_return_sequences: int = input_kwargs.pop("num_return_sequences", 1)
+ repetition_penalty: Optional[float] = input_kwargs.pop("repetition_penalty", None)
+ skip_special_tokens: Optional[bool] = input_kwargs.pop("skip_special_tokens", None)
+ max_length: Optional[int] = input_kwargs.pop("max_length", None)
+ max_new_tokens: Optional[int] = input_kwargs.pop("max_new_tokens", None)
+ stop: Optional[Union[str, list[str]]] = input_kwargs.pop("stop", None)
+
+ if num_return_sequences != 1:
+ raise NotImplementedError("SGLang only supports n=1.")
+
+ if "max_new_tokens" in self.generating_args:
+ max_tokens = self.generating_args["max_new_tokens"]
+ elif "max_length" in self.generating_args:
+ if self.generating_args["max_length"] > prompt_length:
+ max_tokens = self.generating_args["max_length"] - prompt_length
+ else:
+ max_tokens = 1
+
+ if max_length:
+ max_tokens = max_length - prompt_length if max_length > prompt_length else 1
+
+ if max_new_tokens:
+ max_tokens = max_new_tokens
+
+ sampling_params = {
+ "temperature": temperature if temperature is not None else self.generating_args["temperature"],
+ "top_p": (top_p if top_p is not None else self.generating_args["top_p"]) or 1.0, # top_p must > 0
+ "top_k": (top_k if top_k is not None else self.generating_args["top_k"]) or -1, # top_k must > 0
+ "stop": stop,
+ "stop_token_ids": self.template.get_stop_token_ids(self.tokenizer),
+ "max_new_tokens": max_tokens,
+ "repetition_penalty": (
+ repetition_penalty if repetition_penalty is not None else self.generating_args["repetition_penalty"]
+ )
+ or 1.0, # repetition_penalty must > 0
+ "skip_special_tokens": skip_special_tokens
+ if skip_special_tokens is not None
+ else self.generating_args["skip_special_tokens"],
+ }
+
+ def stream_request():
+ json_data = {
+ "input_ids": prompt_ids,
+ "sampling_params": sampling_params,
+ "stream": True,
+ }
+ response = requests.post(f"{self.base_url}/generate", json=json_data, stream=True)
+ if response.status_code != 200:
+ raise RuntimeError(f"SGLang server error: {response.status_code}, {response.text}")
+
+ for chunk in response.iter_lines(decode_unicode=False):
+ chunk = str(chunk.decode("utf-8"))
+ if chunk == "data: [DONE]":
+ break
+
+ if chunk and chunk.startswith("data:"):
+ yield json.loads(chunk[5:].strip("\n"))
+
+ return await asyncio.to_thread(stream_request)
+
+ @override
+ async def chat(
+ self,
+ messages: Sequence[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[Sequence["ImageInput"]] = None,
+ videos: Optional[Sequence["VideoInput"]] = None,
+ audios: Optional[Sequence["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> list["Response"]:
+ final_output = None
+ generator = await self._generate(messages, system, tools, images, videos, audios, **input_kwargs)
+ for request_output in generator:
+ final_output = request_output
+
+ results = [
+ Response(
+ response_text=final_output["text"],
+ response_length=final_output["meta_info"]["completion_tokens"],
+ prompt_length=final_output["meta_info"]["prompt_tokens"],
+ finish_reason="stop" if final_output["meta_info"]["finish_reason"] == "stop" else "length",
+ )
+ ]
+ return results
+
+ @override
+ async def stream_chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> AsyncGenerator[str, None]:
+ generated_text = ""
+ generator = await self._generate(messages, system, tools, images, videos, audios, **input_kwargs)
+ for result in generator:
+ delta_text = result["text"][len(generated_text) :]
+ generated_text = result["text"]
+ yield delta_text
+
+ @override
+ async def get_scores(
+ self,
+ batch_input: list[str],
+ **input_kwargs,
+ ) -> list[float]:
+ raise NotImplementedError("SGLang engine does not support `get_scores`.")
+
+ def __del__(self):
+ r"""Ensure server is cleaned up when object is deleted."""
+ self._cleanup_server()
+ try:
+ atexit.unregister(self._cleanup_server)
+ except Exception:
+ pass
diff --git a/post-training/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py b/post-training/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py
new file mode 100644
index 0000000..1100fc8
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py
@@ -0,0 +1,264 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import uuid
+from collections.abc import AsyncGenerator, AsyncIterator
+from typing import TYPE_CHECKING, Any, Optional, Union
+
+from typing_extensions import override
+
+from ..data import get_template_and_fix_tokenizer
+from ..extras import logging
+from ..extras.constants import AUDIO_PLACEHOLDER, IMAGE_PLACEHOLDER, VIDEO_PLACEHOLDER, EngineName
+from ..extras.misc import get_device_count
+from ..extras.packages import is_vllm_available
+from ..model import load_config, load_tokenizer
+from ..model.model_utils.quantization import QuantizationMethod
+from ..model.model_utils.visual import LlavaMultiModalProjectorForYiVLForVLLM
+from .base_engine import BaseEngine, Response
+
+
+if is_vllm_available():
+ from vllm import AsyncEngineArgs, AsyncLLMEngine, RequestOutput, SamplingParams
+ from vllm.lora.request import LoRARequest
+
+
+if TYPE_CHECKING:
+ from ..data.mm_plugin import AudioInput, ImageInput, VideoInput
+ from ..hparams import DataArguments, FinetuningArguments, GeneratingArguments, ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+class VllmEngine(BaseEngine):
+ def __init__(
+ self,
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ finetuning_args: "FinetuningArguments",
+ generating_args: "GeneratingArguments",
+ ) -> None:
+ self.name = EngineName.VLLM
+ self.model_args = model_args
+ config = load_config(model_args) # may download model from ms hub
+ if getattr(config, "quantization_config", None): # gptq models should use float16
+ quantization_config: dict[str, Any] = getattr(config, "quantization_config", None)
+ quant_method = quantization_config.get("quant_method", "")
+ if quant_method == QuantizationMethod.GPTQ and model_args.infer_dtype == "auto":
+ model_args.infer_dtype = "float16"
+
+ self.can_generate = finetuning_args.stage == "sft"
+ tokenizer_module = load_tokenizer(model_args)
+ self.tokenizer = tokenizer_module["tokenizer"]
+ self.processor = tokenizer_module["processor"]
+ self.tokenizer.padding_side = "left"
+ self.template = get_template_and_fix_tokenizer(self.tokenizer, data_args)
+ self.template.mm_plugin.expand_mm_tokens = False # for vllm generate
+ self.generating_args = generating_args.to_dict()
+
+ engine_args = {
+ "model": model_args.model_name_or_path,
+ "trust_remote_code": model_args.trust_remote_code,
+ "download_dir": model_args.cache_dir,
+ "dtype": model_args.infer_dtype,
+ "max_model_len": model_args.vllm_maxlen,
+ "tensor_parallel_size": get_device_count() or 1,
+ "gpu_memory_utilization": model_args.vllm_gpu_util,
+ "disable_log_stats": True,
+ "disable_log_requests": True,
+ "enforce_eager": model_args.vllm_enforce_eager,
+ "enable_lora": model_args.adapter_name_or_path is not None,
+ "max_lora_rank": model_args.vllm_max_lora_rank,
+ }
+ if self.template.mm_plugin.__class__.__name__ != "BasePlugin":
+ engine_args["limit_mm_per_prompt"] = {"image": 4, "video": 2, "audio": 2}
+
+ if isinstance(model_args.vllm_config, dict):
+ engine_args.update(model_args.vllm_config)
+
+ if getattr(config, "is_yi_vl_derived_model", None):
+ import vllm.model_executor.models.llava
+
+ logger.info_rank0("Detected Yi-VL model, applying projector patch.")
+ vllm.model_executor.models.llava.LlavaMultiModalProjector = LlavaMultiModalProjectorForYiVLForVLLM
+
+ self.model = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**engine_args))
+ if model_args.adapter_name_or_path is not None:
+ self.lora_request = LoRARequest("default", 1, model_args.adapter_name_or_path[0])
+ else:
+ self.lora_request = None
+
+ async def _generate(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> AsyncIterator["RequestOutput"]:
+ request_id = f"chatcmpl-{uuid.uuid4().hex}"
+ if images is not None and not any(IMAGE_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = IMAGE_PLACEHOLDER * len(images) + messages[0]["content"]
+
+ if videos is not None and not any(VIDEO_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = VIDEO_PLACEHOLDER * len(videos) + messages[0]["content"]
+
+ if audios is not None and not any(AUDIO_PLACEHOLDER in message["content"] for message in messages):
+ messages[0]["content"] = AUDIO_PLACEHOLDER * len(audios) + messages[0]["content"]
+
+ messages = self.template.mm_plugin.process_messages(
+ messages, images or [], videos or [], audios or [], self.processor
+ )
+ paired_messages = messages + [{"role": "assistant", "content": ""}]
+ system = system or self.generating_args["default_system"]
+ prompt_ids, _ = self.template.encode_oneturn(self.tokenizer, paired_messages, system, tools)
+ prompt_length = len(prompt_ids)
+
+ temperature: Optional[float] = input_kwargs.pop("temperature", None)
+ top_p: Optional[float] = input_kwargs.pop("top_p", None)
+ top_k: Optional[float] = input_kwargs.pop("top_k", None)
+ num_return_sequences: int = input_kwargs.pop("num_return_sequences", 1)
+ repetition_penalty: Optional[float] = input_kwargs.pop("repetition_penalty", None)
+ length_penalty: Optional[float] = input_kwargs.pop("length_penalty", None)
+ skip_special_tokens: Optional[bool] = input_kwargs.pop("skip_special_tokens", None)
+ max_length: Optional[int] = input_kwargs.pop("max_length", None)
+ max_new_tokens: Optional[int] = input_kwargs.pop("max_new_tokens", None)
+ stop: Optional[Union[str, list[str]]] = input_kwargs.pop("stop", None)
+
+ if length_penalty is not None:
+ logger.warning_rank0("Length penalty is not supported by the vllm engine yet.")
+
+ if "max_new_tokens" in self.generating_args:
+ max_tokens = self.generating_args["max_new_tokens"]
+ elif "max_length" in self.generating_args:
+ if self.generating_args["max_length"] > prompt_length:
+ max_tokens = self.generating_args["max_length"] - prompt_length
+ else:
+ max_tokens = 1
+
+ if max_length:
+ max_tokens = max_length - prompt_length if max_length > prompt_length else 1
+
+ if max_new_tokens:
+ max_tokens = max_new_tokens
+
+ sampling_params = SamplingParams(
+ n=num_return_sequences,
+ repetition_penalty=(
+ repetition_penalty if repetition_penalty is not None else self.generating_args["repetition_penalty"]
+ )
+ or 1.0, # repetition_penalty must > 0
+ temperature=temperature if temperature is not None else self.generating_args["temperature"],
+ top_p=(top_p if top_p is not None else self.generating_args["top_p"]) or 1.0, # top_p must > 0
+ top_k=(top_k if top_k is not None else self.generating_args["top_k"]) or -1, # top_k must > 0
+ stop=stop,
+ stop_token_ids=self.template.get_stop_token_ids(self.tokenizer),
+ max_tokens=max_tokens,
+ skip_special_tokens=skip_special_tokens
+ if skip_special_tokens is not None
+ else self.generating_args["skip_special_tokens"],
+ )
+
+ if images is not None: # add image features
+ multi_modal_data = {
+ "image": self.template.mm_plugin._regularize_images(
+ images,
+ image_max_pixels=self.model_args.image_max_pixels,
+ image_min_pixels=self.model_args.image_min_pixels,
+ )["images"]
+ }
+ elif videos is not None:
+ multi_modal_data = {
+ "video": self.template.mm_plugin._regularize_videos(
+ videos,
+ image_max_pixels=self.model_args.video_max_pixels,
+ image_min_pixels=self.model_args.video_min_pixels,
+ video_fps=self.model_args.video_fps,
+ video_maxlen=self.model_args.video_maxlen,
+ )["videos"]
+ }
+ elif audios is not None:
+ audio_data = self.template.mm_plugin._regularize_audios(
+ audios,
+ sampling_rate=self.model_args.audio_sampling_rate,
+ )
+ multi_modal_data = {"audio": zip(audio_data["audios"], audio_data["sampling_rates"])}
+ else:
+ multi_modal_data = None
+
+ result_generator = self.model.generate(
+ {"prompt_token_ids": prompt_ids, "multi_modal_data": multi_modal_data},
+ sampling_params=sampling_params,
+ request_id=request_id,
+ lora_request=self.lora_request,
+ )
+ return result_generator
+
+ @override
+ async def chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> list["Response"]:
+ final_output = None
+ generator = await self._generate(messages, system, tools, images, videos, audios, **input_kwargs)
+ async for request_output in generator:
+ final_output = request_output
+
+ results = []
+ for output in final_output.outputs:
+ results.append(
+ Response(
+ response_text=output.text,
+ response_length=len(output.token_ids),
+ prompt_length=len(final_output.prompt_token_ids),
+ finish_reason=output.finish_reason,
+ )
+ )
+
+ return results
+
+ @override
+ async def stream_chat(
+ self,
+ messages: list[dict[str, str]],
+ system: Optional[str] = None,
+ tools: Optional[str] = None,
+ images: Optional[list["ImageInput"]] = None,
+ videos: Optional[list["VideoInput"]] = None,
+ audios: Optional[list["AudioInput"]] = None,
+ **input_kwargs,
+ ) -> AsyncGenerator[str, None]:
+ generated_text = ""
+ generator = await self._generate(messages, system, tools, images, videos, audios, **input_kwargs)
+ async for result in generator:
+ delta_text = result.outputs[0].text[len(generated_text) :]
+ generated_text = result.outputs[0].text
+ yield delta_text
+
+ @override
+ async def get_scores(
+ self,
+ batch_input: list[str],
+ **input_kwargs,
+ ) -> list[float]:
+ raise NotImplementedError("vLLM engine does not support `get_scores`.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/cli.py b/post-training/LLaMA-Factory/src/llamafactory/cli.py
new file mode 100644
index 0000000..f9c32d4
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/cli.py
@@ -0,0 +1,124 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import subprocess
+import sys
+from copy import deepcopy
+from functools import partial
+
+
+USAGE = (
+ "-" * 70
+ + "\n"
+ + "| Usage: |\n"
+ + "| llamafactory-cli api -h: launch an OpenAI-style API server |\n"
+ + "| llamafactory-cli chat -h: launch a chat interface in CLI |\n"
+ + "| llamafactory-cli eval -h: evaluate models |\n"
+ + "| llamafactory-cli export -h: merge LoRA adapters and export model |\n"
+ + "| llamafactory-cli train -h: train models |\n"
+ + "| llamafactory-cli webchat -h: launch a chat interface in Web UI |\n"
+ + "| llamafactory-cli webui: launch LlamaBoard |\n"
+ + "| llamafactory-cli version: show version info |\n"
+ + "-" * 70
+)
+
+
+def main():
+ from . import launcher
+ from .api.app import run_api
+ from .chat.chat_model import run_chat
+ from .eval.evaluator import run_eval
+ from .extras import logging
+ from .extras.env import VERSION, print_env
+ from .extras.misc import find_available_port, get_device_count, is_env_enabled, use_ray
+ from .train.tuner import export_model, run_exp
+ from .webui.interface import run_web_demo, run_web_ui
+
+ logger = logging.get_logger(__name__)
+
+ WELCOME = (
+ "-" * 58
+ + "\n"
+ + f"| Welcome to LLaMA Factory, version {VERSION}"
+ + " " * (21 - len(VERSION))
+ + "|\n|"
+ + " " * 56
+ + "|\n"
+ + "| Project page: https://github.com/hiyouga/LLaMA-Factory |\n"
+ + "-" * 58
+ )
+
+ COMMAND_MAP = {
+ "api": run_api,
+ "chat": run_chat,
+ "env": print_env,
+ "eval": run_eval,
+ "export": export_model,
+ "train": run_exp,
+ "webchat": run_web_demo,
+ "webui": run_web_ui,
+ "version": partial(print, WELCOME),
+ "help": partial(print, USAGE),
+ }
+
+ command = sys.argv.pop(1) if len(sys.argv) >= 1 else "help"
+ if command == "train" and (is_env_enabled("FORCE_TORCHRUN") or (get_device_count() > 1 and not use_ray())):
+ # launch distributed training
+ nnodes = os.getenv("NNODES", "1")
+ node_rank = os.getenv("NODE_RANK", "0")
+ nproc_per_node = os.getenv("NPROC_PER_NODE", str(get_device_count()))
+ master_addr = os.getenv("MASTER_ADDR", "127.0.0.1")
+ master_port = os.getenv("MASTER_PORT", str(find_available_port()))
+ logger.info_rank0(f"Initializing {nproc_per_node} distributed tasks at: {master_addr}:{master_port}")
+ if int(nnodes) > 1:
+ print(f"Multi-node training enabled: num nodes: {nnodes}, node rank: {node_rank}")
+
+ env = deepcopy(os.environ)
+ if is_env_enabled("OPTIM_TORCH", "1"):
+ # optimize DDP, see https://zhuanlan.zhihu.com/p/671834539
+ env["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
+ env["TORCH_NCCL_AVOID_RECORD_STREAMS"] = "1"
+
+ # NOTE: DO NOT USE shell=True to avoid security risk
+ process = subprocess.run(
+ (
+ "torchrun --nnodes {nnodes} --node_rank {node_rank} --nproc_per_node {nproc_per_node} "
+ "--master_addr {master_addr} --master_port {master_port} {file_name} {args}"
+ )
+ .format(
+ nnodes=nnodes,
+ node_rank=node_rank,
+ nproc_per_node=nproc_per_node,
+ master_addr=master_addr,
+ master_port=master_port,
+ file_name=launcher.__file__,
+ args=" ".join(sys.argv[1:]),
+ )
+ .split(),
+ env=env,
+ check=True,
+ )
+ sys.exit(process.returncode)
+ elif command in COMMAND_MAP:
+ COMMAND_MAP[command]()
+ else:
+ print(f"Unknown command: {command}.\n{USAGE}")
+
+
+if __name__ == "__main__":
+ from multiprocessing import freeze_support
+
+ freeze_support()
+ main()
diff --git a/post-training/LLaMA-Factory/src/llamafactory/eval/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/eval/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/post-training/LLaMA-Factory/src/llamafactory/eval/evaluator.py b/post-training/LLaMA-Factory/src/llamafactory/eval/evaluator.py
new file mode 100644
index 0000000..7729c59
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/eval/evaluator.py
@@ -0,0 +1,158 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# This code is inspired by the Dan's test library.
+# https://github.com/hendrycks/test/blob/master/evaluate_flan.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# MIT License
+#
+# Copyright (c) 2020 Dan Hendrycks
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import json
+import os
+from typing import TYPE_CHECKING, Any, Optional
+
+import numpy as np
+import torch
+from datasets import load_dataset
+from tqdm import tqdm, trange
+from transformers.utils import cached_file
+
+from ..data import get_template_and_fix_tokenizer
+from ..extras.constants import CHOICES, SUBJECTS
+from ..hparams import get_eval_args
+from ..model import load_model, load_tokenizer
+from .template import get_eval_template
+
+
+if TYPE_CHECKING:
+ from numpy.typing import NDArray
+
+
+class Evaluator:
+ def __init__(self, args: Optional[dict[str, Any]] = None) -> None:
+ self.model_args, self.data_args, self.eval_args, finetuning_args = get_eval_args(args)
+ self.tokenizer = load_tokenizer(self.model_args)["tokenizer"]
+ self.tokenizer.padding_side = "right" # avoid overflow issue in batched inference for llama2
+ self.template = get_template_and_fix_tokenizer(self.tokenizer, self.data_args)
+ self.model = load_model(self.tokenizer, self.model_args, finetuning_args)
+ self.eval_template = get_eval_template(self.eval_args.lang)
+ self.choice_inputs = [self.tokenizer.encode(ch, add_special_tokens=False)[-1] for ch in CHOICES]
+
+ @torch.inference_mode()
+ def batch_inference(self, batch_input: dict[str, "torch.Tensor"]) -> list[str]:
+ logits = self.model(**batch_input).logits
+ lengths = torch.sum(batch_input["attention_mask"], dim=-1)
+ word_probs = torch.stack([logits[i, lengths[i] - 1] for i in range(len(lengths))], dim=0)
+ choice_probs = torch.nn.functional.softmax(word_probs[:, self.choice_inputs], dim=-1).detach()
+ return [chr(ord("A") + offset.item()) for offset in torch.argmax(choice_probs, dim=-1)]
+
+ def eval(self) -> None:
+ eval_task = self.eval_args.task.split("_")[0]
+ eval_split = self.eval_args.task.split("_")[1]
+
+ mapping = cached_file(
+ path_or_repo_id=os.path.join(self.eval_args.task_dir, eval_task),
+ filename="mapping.json",
+ cache_dir=self.model_args.cache_dir,
+ token=self.model_args.hf_hub_token,
+ )
+
+ with open(mapping, encoding="utf-8") as f:
+ categorys: dict[str, dict[str, str]] = json.load(f)
+
+ category_corrects = {subj: np.array([], dtype="bool") for subj in SUBJECTS}
+ pbar = tqdm(categorys.keys(), desc="Processing subjects", position=0)
+ results = {}
+ for subject in pbar:
+ dataset = load_dataset(
+ path=os.path.join(self.eval_args.task_dir, eval_task),
+ name=subject,
+ cache_dir=self.model_args.cache_dir,
+ download_mode=self.eval_args.download_mode,
+ token=self.model_args.hf_hub_token,
+ trust_remote_code=self.model_args.trust_remote_code,
+ )
+ pbar.set_postfix_str(categorys[subject]["name"])
+ inputs, outputs, labels = [], [], []
+ for i in trange(len(dataset[eval_split]), desc="Formatting batches", position=1, leave=False):
+ support_set = (
+ dataset["train"].shuffle().select(range(min(self.eval_args.n_shot, len(dataset["train"]))))
+ )
+ messages = self.eval_template.format_example(
+ target_data=dataset[eval_split][i],
+ support_set=support_set,
+ subject_name=categorys[subject]["name"],
+ )
+
+ input_ids, _ = self.template.encode_oneturn(tokenizer=self.tokenizer, messages=messages)
+ inputs.append({"input_ids": input_ids, "attention_mask": [1] * len(input_ids)})
+ labels.append(messages[-1]["content"])
+
+ for i in trange(
+ 0, len(inputs), self.eval_args.batch_size, desc="Predicting batches", position=1, leave=False
+ ):
+ batch_input = self.tokenizer.pad(
+ inputs[i : i + self.eval_args.batch_size], return_attention_mask=True, return_tensors="pt"
+ ).to(self.model.device)
+ preds = self.batch_inference(batch_input)
+ outputs += preds
+
+ corrects = np.array(outputs) == np.array(labels)
+ category_name = categorys[subject]["category"]
+ category_corrects[category_name] = np.concatenate([category_corrects[category_name], corrects], axis=0)
+ category_corrects["Average"] = np.concatenate([category_corrects["Average"], corrects], axis=0)
+ results[subject] = {str(i): outputs[i] for i in range(len(outputs))}
+
+ pbar.close()
+ self._save_results(category_corrects, results)
+
+ def _save_results(self, category_corrects: dict[str, "NDArray"], results: dict[str, dict[int, str]]) -> None:
+ score_info = "\n".join(
+ [
+ f"{category_name:>15}: {100 * np.mean(category_correct):.2f}"
+ for category_name, category_correct in category_corrects.items()
+ if len(category_correct)
+ ]
+ )
+ print(score_info)
+ if self.eval_args.save_dir is not None:
+ os.makedirs(self.eval_args.save_dir, exist_ok=False)
+ with open(os.path.join(self.eval_args.save_dir, "results.json"), "w", encoding="utf-8", newline="\n") as f:
+ json.dump(results, f, indent=2)
+
+ with open(os.path.join(self.eval_args.save_dir, "results.log"), "w", encoding="utf-8", newline="\n") as f:
+ f.write(score_info)
+
+
+def run_eval() -> None:
+ Evaluator().eval()
diff --git a/post-training/LLaMA-Factory/src/llamafactory/eval/template.py b/post-training/LLaMA-Factory/src/llamafactory/eval/template.py
new file mode 100644
index 0000000..5742469
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/eval/template.py
@@ -0,0 +1,79 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataclasses import dataclass
+
+from ..data import Role
+from ..extras.constants import CHOICES
+
+
+@dataclass
+class EvalTemplate:
+ system: str
+ choice: str
+ answer: str
+
+ def _parse_example(self, example: dict[str, str]) -> tuple[str, str]:
+ r"""Parse eval example.
+
+ input: a dict with keys {"question", "A", "B", "C", "D", "answer"}
+ output: a tuple of (prompt, response).
+ """
+ candidates = [self.choice.format(choice=ch, content=example[ch]) for ch in CHOICES if ch in example]
+ return "".join([example["question"]] + candidates + [self.answer]), example["answer"]
+
+ def format_example(
+ self, target_data: dict[str, str], support_set: list[dict[str, str]], subject_name: str
+ ) -> list[dict[str, str]]:
+ r"""Convert dataset examples to messages."""
+ messages = []
+ for k in range(len(support_set)):
+ prompt, response = self._parse_example(support_set[k])
+ messages.append({"role": Role.USER.value, "content": prompt})
+ messages.append({"role": Role.ASSISTANT.value, "content": response})
+
+ prompt, response = self._parse_example(target_data)
+ messages.append({"role": Role.USER.value, "content": prompt})
+ messages.append({"role": Role.ASSISTANT.value, "content": response})
+ messages[0]["content"] = self.system.format(subject=subject_name) + messages[0]["content"]
+ return messages
+
+
+eval_templates: dict[str, "EvalTemplate"] = {}
+
+
+def _register_eval_template(name: str, system: str, choice: str, answer: str) -> None:
+ eval_templates[name] = EvalTemplate(system=system, choice=choice, answer=answer)
+
+
+def get_eval_template(name: str) -> "EvalTemplate":
+ eval_template = eval_templates.get(name, None)
+ assert eval_template is not None, f"Template {name} does not exist."
+ return eval_template
+
+
+_register_eval_template(
+ name="en",
+ system="The following are multiple choice questions (with answers) about {subject}.\n\n",
+ choice="\n{choice}. {content}",
+ answer="\nAnswer:",
+)
+
+
+_register_eval_template(
+ name="zh",
+ system="以下是中国关于{subject}考试的单项选择题,请选出其中的正确答案。\n\n",
+ choice="\n{choice}. {content}",
+ answer="\n答案:",
+)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/extras/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/extras/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/post-training/LLaMA-Factory/src/llamafactory/extras/constants.py b/post-training/LLaMA-Factory/src/llamafactory/extras/constants.py
new file mode 100644
index 0000000..025a8bb
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/extras/constants.py
@@ -0,0 +1,2918 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from collections import OrderedDict, defaultdict
+from enum import Enum, unique
+from typing import Optional
+
+from peft.utils import SAFETENSORS_WEIGHTS_NAME as SAFE_ADAPTER_WEIGHTS_NAME
+from peft.utils import WEIGHTS_NAME as ADAPTER_WEIGHTS_NAME
+from transformers.utils import SAFE_WEIGHTS_INDEX_NAME, SAFE_WEIGHTS_NAME, WEIGHTS_INDEX_NAME, WEIGHTS_NAME
+
+
+AUDIO_PLACEHOLDER = os.getenv("AUDIO_PLACEHOLDER", "")
+
+CHECKPOINT_NAMES = {
+ SAFE_ADAPTER_WEIGHTS_NAME,
+ ADAPTER_WEIGHTS_NAME,
+ SAFE_WEIGHTS_INDEX_NAME,
+ SAFE_WEIGHTS_NAME,
+ WEIGHTS_INDEX_NAME,
+ WEIGHTS_NAME,
+}
+
+CHOICES = ["A", "B", "C", "D"]
+
+DATA_CONFIG = "dataset_info.json"
+
+DEFAULT_TEMPLATE = defaultdict(str)
+
+FILEEXT2TYPE = {
+ "arrow": "arrow",
+ "csv": "csv",
+ "json": "json",
+ "jsonl": "json",
+ "parquet": "parquet",
+ "txt": "text",
+}
+
+IGNORE_INDEX = -100
+
+IMAGE_PLACEHOLDER = os.getenv("IMAGE_PLACEHOLDER", "")
+
+LAYERNORM_NAMES = {"norm", "ln"}
+
+LLAMABOARD_CONFIG = "llamaboard_config.yaml"
+
+METHODS = ["full", "freeze", "lora"]
+
+MOD_SUPPORTED_MODELS = {"bloom", "falcon", "gemma", "llama", "mistral", "mixtral", "phi", "starcoder2"}
+
+MULTIMODAL_SUPPORTED_MODELS = set()
+
+PEFT_METHODS = {"lora"}
+
+RUNNING_LOG = "running_log.txt"
+
+SUBJECTS = ["Average", "STEM", "Social Sciences", "Humanities", "Other"]
+
+SUPPORTED_MODELS = OrderedDict()
+
+TRAINER_LOG = "trainer_log.jsonl"
+
+TRAINING_ARGS = "training_args.yaml"
+
+TRAINING_STAGES = {
+ "Supervised Fine-Tuning": "sft",
+ "Reward Modeling": "rm",
+ "PPO": "ppo",
+ "DPO": "dpo",
+ "KTO": "kto",
+ "Pre-Training": "pt",
+}
+
+STAGES_USE_PAIR_DATA = {"rm", "dpo"}
+
+SUPPORTED_CLASS_FOR_S2ATTN = {"llama"}
+
+SWANLAB_CONFIG = "swanlab_public_config.json"
+
+VIDEO_PLACEHOLDER = os.getenv("VIDEO_PLACEHOLDER", "")
+
+V_HEAD_WEIGHTS_NAME = "value_head.bin"
+
+V_HEAD_SAFE_WEIGHTS_NAME = "value_head.safetensors"
+
+
+class AttentionFunction(str, Enum):
+ AUTO = "auto"
+ DISABLED = "disabled"
+ SDPA = "sdpa"
+ FA2 = "fa2"
+
+
+class EngineName(str, Enum):
+ HF = "huggingface"
+ VLLM = "vllm"
+ SGLANG = "sglang"
+
+
+class DownloadSource(str, Enum):
+ DEFAULT = "hf"
+ MODELSCOPE = "ms"
+ OPENMIND = "om"
+
+
+@unique
+class QuantizationMethod(str, Enum):
+ r"""Borrowed from `transformers.utils.quantization_config.QuantizationMethod`."""
+
+ BNB = "bnb"
+ GPTQ = "gptq"
+ AWQ = "awq"
+ AQLM = "aqlm"
+ QUANTO = "quanto"
+ EETQ = "eetq"
+ HQQ = "hqq"
+
+
+class RopeScaling(str, Enum):
+ LINEAR = "linear"
+ DYNAMIC = "dynamic"
+ YARN = "yarn"
+ LLAMA3 = "llama3"
+
+
+def register_model_group(
+ models: dict[str, dict[DownloadSource, str]],
+ template: Optional[str] = None,
+ multimodal: bool = False,
+) -> None:
+ for name, path in models.items():
+ SUPPORTED_MODELS[name] = path
+ if template is not None and (
+ any(suffix in name for suffix in ("-Chat", "-Distill", "-Instruct")) or multimodal
+ ):
+ DEFAULT_TEMPLATE[name] = template
+
+ if multimodal:
+ MULTIMODAL_SUPPORTED_MODELS.add(name)
+
+
+register_model_group(
+ models={
+ "Aya-23-8B-Chat": {
+ DownloadSource.DEFAULT: "CohereForAI/aya-23-8B",
+ },
+ "Aya-23-35B-Chat": {
+ DownloadSource.DEFAULT: "CohereForAI/aya-23-35B",
+ },
+ },
+ template="cohere",
+)
+
+
+register_model_group(
+ models={
+ "Baichuan-7B-Base": {
+ DownloadSource.DEFAULT: "baichuan-inc/Baichuan-7B",
+ DownloadSource.MODELSCOPE: "baichuan-inc/baichuan-7B",
+ },
+ "Baichuan-13B-Base": {
+ DownloadSource.DEFAULT: "baichuan-inc/Baichuan-13B-Base",
+ DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan-13B-Base",
+ },
+ "Baichuan-13B-Chat": {
+ DownloadSource.DEFAULT: "baichuan-inc/Baichuan-13B-Chat",
+ DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan-13B-Chat",
+ },
+ },
+ template="baichuan",
+)
+
+
+register_model_group(
+ models={
+ "Baichuan2-7B-Base": {
+ DownloadSource.DEFAULT: "baichuan-inc/Baichuan2-7B-Base",
+ DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan2-7B-Base",
+ },
+ "Baichuan2-13B-Base": {
+ DownloadSource.DEFAULT: "baichuan-inc/Baichuan2-13B-Base",
+ DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan2-13B-Base",
+ DownloadSource.OPENMIND: "Baichuan/Baichuan2_13b_base_pt",
+ },
+ "Baichuan2-7B-Chat": {
+ DownloadSource.DEFAULT: "baichuan-inc/Baichuan2-7B-Chat",
+ DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan2-7B-Chat",
+ DownloadSource.OPENMIND: "Baichuan/Baichuan2_7b_chat_pt",
+ },
+ "Baichuan2-13B-Chat": {
+ DownloadSource.DEFAULT: "baichuan-inc/Baichuan2-13B-Chat",
+ DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan2-13B-Chat",
+ DownloadSource.OPENMIND: "Baichuan/Baichuan2_13b_chat_pt",
+ },
+ },
+ template="baichuan2",
+)
+
+
+register_model_group(
+ models={
+ "BLOOM-560M": {
+ DownloadSource.DEFAULT: "bigscience/bloom-560m",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/bloom-560m",
+ },
+ "BLOOM-3B": {
+ DownloadSource.DEFAULT: "bigscience/bloom-3b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/bloom-3b",
+ },
+ "BLOOM-7B1": {
+ DownloadSource.DEFAULT: "bigscience/bloom-7b1",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/bloom-7b1",
+ },
+ },
+)
+
+
+register_model_group(
+ models={
+ "BLOOMZ-560M": {
+ DownloadSource.DEFAULT: "bigscience/bloomz-560m",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/bloomz-560m",
+ },
+ "BLOOMZ-3B": {
+ DownloadSource.DEFAULT: "bigscience/bloomz-3b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/bloomz-3b",
+ },
+ "BLOOMZ-7B1-mt": {
+ DownloadSource.DEFAULT: "bigscience/bloomz-7b1-mt",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/bloomz-7b1-mt",
+ },
+ },
+)
+
+
+register_model_group(
+ models={
+ "BlueLM-7B-Base": {
+ DownloadSource.DEFAULT: "vivo-ai/BlueLM-7B-Base",
+ DownloadSource.MODELSCOPE: "vivo-ai/BlueLM-7B-Base",
+ },
+ "BlueLM-7B-Chat": {
+ DownloadSource.DEFAULT: "vivo-ai/BlueLM-7B-Chat",
+ DownloadSource.MODELSCOPE: "vivo-ai/BlueLM-7B-Chat",
+ },
+ },
+ template="bluelm",
+)
+
+
+register_model_group(
+ models={
+ "Breeze-7B": {
+ DownloadSource.DEFAULT: "MediaTek-Research/Breeze-7B-Base-v1_0",
+ },
+ "Breeze-7B-Instruct": {
+ DownloadSource.DEFAULT: "MediaTek-Research/Breeze-7B-Instruct-v1_0",
+ },
+ },
+ template="breeze",
+)
+
+
+register_model_group(
+ models={
+ "ChatGLM2-6B-Chat": {
+ DownloadSource.DEFAULT: "THUDM/chatglm2-6b",
+ DownloadSource.MODELSCOPE: "ZhipuAI/chatglm2-6b",
+ }
+ },
+ template="chatglm2",
+)
+
+
+register_model_group(
+ models={
+ "ChatGLM3-6B-Base": {
+ DownloadSource.DEFAULT: "THUDM/chatglm3-6b-base",
+ DownloadSource.MODELSCOPE: "ZhipuAI/chatglm3-6b-base",
+ },
+ "ChatGLM3-6B-Chat": {
+ DownloadSource.DEFAULT: "THUDM/chatglm3-6b",
+ DownloadSource.MODELSCOPE: "ZhipuAI/chatglm3-6b",
+ },
+ },
+ template="chatglm3",
+)
+
+
+register_model_group(
+ models={
+ "Chinese-Llama-2-1.3B": {
+ DownloadSource.DEFAULT: "hfl/chinese-llama-2-1.3b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/chinese-llama-2-1.3b",
+ },
+ "Chinese-Llama-2-7B": {
+ DownloadSource.DEFAULT: "hfl/chinese-llama-2-7b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/chinese-llama-2-7b",
+ },
+ "Chinese-Llama-2-13B": {
+ DownloadSource.DEFAULT: "hfl/chinese-llama-2-13b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/chinese-llama-2-13b",
+ },
+ "Chinese-Alpaca-2-1.3B-Chat": {
+ DownloadSource.DEFAULT: "hfl/chinese-alpaca-2-1.3b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/chinese-alpaca-2-1.3b",
+ },
+ "Chinese-Alpaca-2-7B-Chat": {
+ DownloadSource.DEFAULT: "hfl/chinese-alpaca-2-7b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/chinese-alpaca-2-7b",
+ },
+ "Chinese-Alpaca-2-13B-Chat": {
+ DownloadSource.DEFAULT: "hfl/chinese-alpaca-2-13b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/chinese-alpaca-2-13b",
+ },
+ },
+ template="llama2_zh",
+)
+
+
+register_model_group(
+ models={
+ "CodeGeeX4-9B-Chat": {
+ DownloadSource.DEFAULT: "THUDM/codegeex4-all-9b",
+ DownloadSource.MODELSCOPE: "ZhipuAI/codegeex4-all-9b",
+ },
+ },
+ template="codegeex4",
+)
+
+
+register_model_group(
+ models={
+ "CodeGemma-7B": {
+ DownloadSource.DEFAULT: "google/codegemma-7b",
+ },
+ "CodeGemma-7B-Instruct": {
+ DownloadSource.DEFAULT: "google/codegemma-7b-it",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/codegemma-7b-it",
+ },
+ "CodeGemma-1.1-2B": {
+ DownloadSource.DEFAULT: "google/codegemma-1.1-2b",
+ },
+ "CodeGemma-1.1-7B-Instruct": {
+ DownloadSource.DEFAULT: "google/codegemma-1.1-7b-it",
+ },
+ },
+ template="gemma",
+)
+
+
+register_model_group(
+ models={
+ "Codestral-22B-v0.1-Chat": {
+ DownloadSource.DEFAULT: "mistralai/Codestral-22B-v0.1",
+ DownloadSource.MODELSCOPE: "swift/Codestral-22B-v0.1",
+ },
+ },
+ template="mistral",
+)
+
+
+register_model_group(
+ models={
+ "CommandR-35B-Chat": {
+ DownloadSource.DEFAULT: "CohereForAI/c4ai-command-r-v01",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/c4ai-command-r-v01",
+ },
+ "CommandR-Plus-104B-Chat": {
+ DownloadSource.DEFAULT: "CohereForAI/c4ai-command-r-plus",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/c4ai-command-r-plus",
+ },
+ "CommandR-35B-4bit-Chat": {
+ DownloadSource.DEFAULT: "CohereForAI/c4ai-command-r-v01-4bit",
+ DownloadSource.MODELSCOPE: "mirror013/c4ai-command-r-v01-4bit",
+ },
+ "CommandR-Plus-104B-4bit-Chat": {
+ DownloadSource.DEFAULT: "CohereForAI/c4ai-command-r-plus-4bit",
+ },
+ },
+ template="cohere",
+)
+
+
+register_model_group(
+ models={
+ "DBRX-132B-Base": {
+ DownloadSource.DEFAULT: "databricks/dbrx-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/dbrx-base",
+ },
+ "DBRX-132B-Instruct": {
+ DownloadSource.DEFAULT: "databricks/dbrx-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/dbrx-instruct",
+ },
+ },
+ template="dbrx",
+)
+
+
+register_model_group(
+ models={
+ "DeepSeek-LLM-7B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-llm-7b-base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-llm-7b-base",
+ },
+ "DeepSeek-LLM-67B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-llm-67b-base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-llm-67b-base",
+ },
+ "DeepSeek-LLM-7B-Chat": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-llm-7b-chat",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-llm-7b-chat",
+ },
+ "DeepSeek-LLM-67B-Chat": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-llm-67b-chat",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-llm-67b-chat",
+ },
+ "DeepSeek-Math-7B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-math-7b-base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-math-7b-base",
+ },
+ "DeepSeek-Math-7B-Instruct": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-math-7b-instruct",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-math-7b-instruct",
+ },
+ "DeepSeek-MoE-16B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-moe-16b-base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-moe-16b-base",
+ },
+ "DeepSeek-MoE-16B-Chat": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-moe-16b-chat",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-moe-16b-chat",
+ },
+ "DeepSeek-V2-16B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2-Lite",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2-Lite",
+ },
+ "DeepSeek-V2-236B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2",
+ },
+ "DeepSeek-V2-16B-Chat": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2-Lite-Chat",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2-Lite-Chat",
+ },
+ "DeepSeek-V2-236B-Chat": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2-Chat",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2-Chat",
+ },
+ "DeepSeek-Coder-V2-16B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-Coder-V2-Lite-Base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-Coder-V2-Lite-Base",
+ },
+ "DeepSeek-Coder-V2-236B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-Coder-V2-Base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-Coder-V2-Base",
+ },
+ "DeepSeek-Coder-V2-16B-Instruct": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
+ },
+ "DeepSeek-Coder-V2-236B-Instruct": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-Coder-V2-Instruct",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-Coder-V2-Instruct",
+ },
+ },
+ template="deepseek",
+)
+
+
+register_model_group(
+ models={
+ "DeepSeek-Coder-6.7B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-coder-6.7b-base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-coder-6.7b-base",
+ },
+ "DeepSeek-Coder-7B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-coder-7b-base-v1.5",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-coder-7b-base-v1.5",
+ },
+ "DeepSeek-Coder-33B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-coder-33b-base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-coder-33b-base",
+ },
+ "DeepSeek-Coder-6.7B-Instruct": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-coder-6.7b-instruct",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-coder-6.7b-instruct",
+ },
+ "DeepSeek-Coder-7B-Instruct": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-coder-7b-instruct-v1.5",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-coder-7b-instruct-v1.5",
+ },
+ "DeepSeek-Coder-33B-Instruct": {
+ DownloadSource.DEFAULT: "deepseek-ai/deepseek-coder-33b-instruct",
+ DownloadSource.MODELSCOPE: "deepseek-ai/deepseek-coder-33b-instruct",
+ },
+ },
+ template="deepseekcoder",
+)
+
+
+register_model_group(
+ models={
+ "DeepSeek-V2-236B-Chat-0628": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2-Chat-0628",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2-Chat-0628",
+ },
+ "DeepSeek-V2.5-236B-Chat": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2.5",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2.5",
+ },
+ "DeepSeek-V2.5-236B-Chat-1210": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V2.5-1210",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V2.5-1210",
+ },
+ "DeepSeek-V3-671B-Base": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V3-Base",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V3-Base",
+ },
+ "DeepSeek-V3-671B-Chat": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-V3",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-V3",
+ },
+ "DeepSeek-R1-1.5B-Distill": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
+ },
+ "DeepSeek-R1-7B-Distill": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
+ },
+ "DeepSeek-R1-8B-Distill": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
+ },
+ "DeepSeek-R1-14B-Distill": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
+ },
+ "DeepSeek-R1-32B-Distill": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
+ },
+ "DeepSeek-R1-70B-Distill": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
+ },
+ "DeepSeek-R1-671B-Chat-Zero": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1-Zero",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1-Zero",
+ },
+ "DeepSeek-R1-671B-Chat": {
+ DownloadSource.DEFAULT: "deepseek-ai/DeepSeek-R1",
+ DownloadSource.MODELSCOPE: "deepseek-ai/DeepSeek-R1",
+ },
+ },
+ template="deepseek3",
+)
+
+
+register_model_group(
+ models={
+ "EXAONE-3.0-7.8B-Instruct": {
+ DownloadSource.DEFAULT: "LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
+ },
+ },
+ template="exaone",
+)
+
+
+register_model_group(
+ models={
+ "Falcon-7B": {
+ DownloadSource.DEFAULT: "tiiuae/falcon-7b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/falcon-7b",
+ },
+ "Falcon-11B": {
+ DownloadSource.DEFAULT: "tiiuae/falcon-11B",
+ DownloadSource.MODELSCOPE: "tiiuae/falcon-11B",
+ },
+ "Falcon-40B": {
+ DownloadSource.DEFAULT: "tiiuae/falcon-40b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/falcon-40b",
+ },
+ "Falcon-180B": {
+ DownloadSource.DEFAULT: "tiiuae/falcon-180b",
+ DownloadSource.MODELSCOPE: "modelscope/falcon-180B",
+ },
+ "Falcon-7B-Instruct": {
+ DownloadSource.DEFAULT: "tiiuae/falcon-7b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/falcon-7b-instruct",
+ },
+ "Falcon-40B-Instruct": {
+ DownloadSource.DEFAULT: "tiiuae/falcon-40b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/falcon-40b-instruct",
+ },
+ "Falcon-180B-Chat": {
+ DownloadSource.DEFAULT: "tiiuae/falcon-180b-chat",
+ DownloadSource.MODELSCOPE: "modelscope/falcon-180B-chat",
+ },
+ },
+ template="falcon",
+)
+
+
+register_model_group(
+ models={
+ "Gemma-2B": {
+ DownloadSource.DEFAULT: "google/gemma-2b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/gemma-2b",
+ },
+ "Gemma-7B": {
+ DownloadSource.DEFAULT: "google/gemma-7b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/gemma-2b-it",
+ },
+ "Gemma-2B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-2b-it",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/gemma-7b",
+ },
+ "Gemma-7B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-7b-it",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/gemma-7b-it",
+ },
+ "Gemma-1.1-2B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-1.1-2b-it",
+ },
+ "Gemma-1.1-7B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-1.1-7b-it",
+ },
+ "Gemma-2-2B": {
+ DownloadSource.DEFAULT: "google/gemma-2-2b",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-2-2b",
+ },
+ "Gemma-2-9B": {
+ DownloadSource.DEFAULT: "google/gemma-2-9b",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-2-9b",
+ },
+ "Gemma-2-27B": {
+ DownloadSource.DEFAULT: "google/gemma-2-27b",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-2-27b",
+ },
+ "Gemma-2-2B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-2-2b-it",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-2-2b-it",
+ DownloadSource.OPENMIND: "LlamaFactory/gemma-2-2b-it",
+ },
+ "Gemma-2-9B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-2-9b-it",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-2-9b-it",
+ DownloadSource.OPENMIND: "LlamaFactory/gemma-2-9b-it",
+ },
+ "Gemma-2-27B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-2-27b-it",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-2-27b-it",
+ },
+ "Gemma-3-1B": {
+ DownloadSource.DEFAULT: "google/gemma-3-1b-pt",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-3-1b-pt",
+ },
+ "Gemma-3-1B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-3-1b-it",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-3-1b-it",
+ },
+ },
+ template="gemma",
+)
+
+
+register_model_group(
+ models={
+ "Gemma-3-4B": {
+ DownloadSource.DEFAULT: "google/gemma-3-4b-pt",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-3-4b-pt",
+ },
+ "Gemma-3-12B": {
+ DownloadSource.DEFAULT: "google/gemma-3-12b-pt",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-3-12b-pt",
+ },
+ "Gemma-3-27B": {
+ DownloadSource.DEFAULT: "google/gemma-3-27b-pt",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-3-27b-pt",
+ },
+ "Gemma-3-4B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-3-4b-it",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-3-4b-it",
+ },
+ "Gemma-3-12B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-3-12b-it",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-3-12b-it",
+ },
+ "Gemma-3-27B-Instruct": {
+ DownloadSource.DEFAULT: "google/gemma-3-27b-it",
+ DownloadSource.MODELSCOPE: "LLM-Research/gemma-3-27b-it",
+ },
+ },
+ template="gemma3",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "GLM-4-9B": {
+ DownloadSource.DEFAULT: "THUDM/glm-4-9b",
+ DownloadSource.MODELSCOPE: "ZhipuAI/glm-4-9b",
+ },
+ "GLM-4-9B-Chat": {
+ DownloadSource.DEFAULT: "THUDM/glm-4-9b-chat",
+ DownloadSource.MODELSCOPE: "ZhipuAI/glm-4-9b-chat",
+ DownloadSource.OPENMIND: "LlamaFactory/glm-4-9b-chat",
+ },
+ "GLM-4-9B-1M-Chat": {
+ DownloadSource.DEFAULT: "THUDM/glm-4-9b-chat-1m",
+ DownloadSource.MODELSCOPE: "ZhipuAI/glm-4-9b-chat-1m",
+ },
+ "GLM-4-9B-0414-Chat": {
+ DownloadSource.DEFAULT: "THUDM/GLM-4-9B-0414",
+ DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-9B-0414",
+ },
+ "GLM-4-32B-0414": {
+ DownloadSource.DEFAULT: "THUDM/GLM-4-32B-Base-0414",
+ DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-32B-Base-0414",
+ },
+ "GLM-4-32B-0414-Chat": {
+ DownloadSource.DEFAULT: "THUDM/GLM-4-32B-0414",
+ DownloadSource.MODELSCOPE: "ZhipuAI/GLM-4-32B-0414",
+ },
+ "GLM-Z1-9B-0414-Chat": {
+ DownloadSource.DEFAULT: "THUDM/GLM-Z1-9B-0414",
+ DownloadSource.MODELSCOPE: "ZhipuAI/GLM-Z1-9B-0414",
+ },
+ "GLM-Z1-32B-0414-Chat": {
+ DownloadSource.DEFAULT: "THUDM/GLM-Z1-32B-0414",
+ DownloadSource.MODELSCOPE: "ZhipuAI/GLM-Z1-32B-0414",
+ },
+ },
+ template="glm4",
+)
+
+
+register_model_group(
+ models={
+ "GPT-2-Small": {
+ DownloadSource.DEFAULT: "openai-community/gpt2",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/gpt2",
+ },
+ "GPT-2-Medium": {
+ DownloadSource.DEFAULT: "openai-community/gpt2-medium",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/gpt2-medium",
+ },
+ "GPT-2-Large": {
+ DownloadSource.DEFAULT: "openai-community/gpt2-large",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/gpt2-large",
+ },
+ "GPT-2-XL": {
+ DownloadSource.DEFAULT: "openai-community/gpt2-xl",
+ DownloadSource.MODELSCOPE: "goodbai95/GPT2-xl",
+ },
+ },
+)
+
+
+register_model_group(
+ models={
+ "Granite-3.0-1B-A400M-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.0-1b-a400m-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.0-1b-a400m-base",
+ },
+ "Granite-3.0-3B-A800M-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.0-3b-a800m-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.0-3b-a800m-base",
+ },
+ "Granite-3.0-2B-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.0-2b-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.0-2b-base",
+ },
+ "Granite-3.0-8B-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.0-8b-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.0-8b-base",
+ },
+ "Granite-3.0-1B-A400M-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.0-1b-a400m-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.0-1b-a400m-instruct",
+ },
+ "Granite-3.0-3B-A800M-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.0-3b-a800m-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.0-3b-a800m-instruct",
+ },
+ "Granite-3.0-2B-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.0-2b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.0-2b-instruct",
+ },
+ "Granite-3.0-8B-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.0-8b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.0-8b-instruct",
+ },
+ "Granite-3.1-1B-A400M-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.1-1b-a400m-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.1-1b-a400m-base",
+ },
+ "Granite-3.1-3B-A800M-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.1-3b-a800m-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.1-3b-a800m-base",
+ },
+ "Granite-3.1-2B-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.1-2b-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.1-2b-base",
+ },
+ "Granite-3.1-8B-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.1-8b-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.1-8b-base",
+ },
+ "Granite-3.1-1B-A400M-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.1-1b-a400m-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.1-1b-a400m-instruct",
+ },
+ "Granite-3.1-3B-A800M-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.1-3b-a800m-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.1-3b-a800m-instruct",
+ },
+ "Granite-3.1-2B-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.1-2b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.1-2b-instruct",
+ },
+ "Granite-3.1-8B-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.1-8b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.1-8b-instruct",
+ },
+ "Granite-3.2-2B-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.2-2b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.2-2b-instruct",
+ },
+ "Granite-3.2-8B-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.2-8b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.2-8b-instruct",
+ },
+ "Granite-3.3-2B-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.3-2b-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.3-2b-base",
+ },
+ "Granite-3.3-8B-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.3-8b-base",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.3-8b-base",
+ },
+ "Granite-3.3-2B-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.3-2b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.3-2b-instruct",
+ },
+ "Granite-3.3-8B-Instruct": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-3.3-8b-instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-3.3-8b-instruct",
+ },
+ },
+ template="granite3",
+)
+
+
+register_model_group(
+ models={
+ "Granite-3.2-1B-A400M-Base": {
+ DownloadSource.DEFAULT: "ibm-granite/granite-vision-3.2-2b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/granite-vision-3.2-2b",
+ },
+ },
+ template="granite3_vision",
+)
+
+
+register_model_group(
+ models={
+ "Hunyuan-7B-Instruct": {
+ DownloadSource.DEFAULT: "tencent/Hunyuan-7B-Instruct",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Hunyuan-7B-Instruct",
+ },
+ },
+ template="hunyuan",
+)
+
+
+register_model_group(
+ models={
+ "Index-1.9B-Base": {
+ DownloadSource.DEFAULT: "IndexTeam/Index-1.9B",
+ DownloadSource.MODELSCOPE: "IndexTeam/Index-1.9B",
+ },
+ "Index-1.9B-Base-Pure": {
+ DownloadSource.DEFAULT: "IndexTeam/Index-1.9B-Pure",
+ DownloadSource.MODELSCOPE: "IndexTeam/Index-1.9B-Pure",
+ },
+ "Index-1.9B-Chat": {
+ DownloadSource.DEFAULT: "IndexTeam/Index-1.9B-Chat",
+ DownloadSource.MODELSCOPE: "IndexTeam/Index-1.9B-Chat",
+ },
+ "Index-1.9B-Character-Chat": {
+ DownloadSource.DEFAULT: "IndexTeam/Index-1.9B-Character",
+ DownloadSource.MODELSCOPE: "IndexTeam/Index-1.9B-Character",
+ },
+ "Index-1.9B-Chat-32K": {
+ DownloadSource.DEFAULT: "IndexTeam/Index-1.9B-32K",
+ DownloadSource.MODELSCOPE: "IndexTeam/Index-1.9B-32K",
+ },
+ },
+ template="index",
+)
+
+
+register_model_group(
+ models={
+ "InternLM-7B": {
+ DownloadSource.DEFAULT: "internlm/internlm-7b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm-7b",
+ },
+ "InternLM-20B": {
+ DownloadSource.DEFAULT: "internlm/internlm-20b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm-20b",
+ },
+ "InternLM-7B-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm-chat-7b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm-chat-7b",
+ },
+ "InternLM-20B-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm-chat-20b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm-chat-20b",
+ },
+ },
+ template="intern",
+)
+
+
+register_model_group(
+ models={
+ "InternLM2-7B": {
+ DownloadSource.DEFAULT: "internlm/internlm2-7b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2-7b",
+ },
+ "InternLM2-20B": {
+ DownloadSource.DEFAULT: "internlm/internlm2-20b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2-20b",
+ },
+ "InternLM2-7B-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm2-chat-7b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2-chat-7b",
+ },
+ "InternLM2-20B-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm2-chat-20b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2-chat-20b",
+ },
+ "InternLM2.5-1.8B": {
+ DownloadSource.DEFAULT: "internlm/internlm2_5-1_8b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2_5-1_8b",
+ DownloadSource.OPENMIND: "Intern/internlm2_5-1_8b",
+ },
+ "InternLM2.5-7B": {
+ DownloadSource.DEFAULT: "internlm/internlm2_5-7b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2_5-7b",
+ },
+ "InternLM2.5-20B": {
+ DownloadSource.DEFAULT: "internlm/internlm2_5-20b",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2_5-20b",
+ DownloadSource.OPENMIND: "Intern/internlm2_5-20b",
+ },
+ "InternLM2.5-1.8B-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm2_5-1_8b-chat",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2_5-1_8b-chat",
+ DownloadSource.OPENMIND: "Intern/internlm2_5-1_8b-chat",
+ },
+ "InternLM2.5-7B-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm2_5-7b-chat",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2_5-7b-chat",
+ DownloadSource.OPENMIND: "Intern/internlm2_5-7b-chat",
+ },
+ "InternLM2.5-7B-1M-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm2_5-7b-chat-1m",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m",
+ DownloadSource.OPENMIND: "Intern/internlm2_5-7b-chat-1m",
+ },
+ "InternLM2.5-20B-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm2_5-20b-chat",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm2_5-20b-chat",
+ DownloadSource.OPENMIND: "Intern/internlm2_5-20b-chat",
+ },
+ "InternLM3-8B-Chat": {
+ DownloadSource.DEFAULT: "internlm/internlm3-8b-instruct",
+ DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm3-8b-instruct",
+ },
+ },
+ template="intern2",
+)
+
+
+register_model_group(
+ models={
+ "InternVL2.5-1B-MPO": {
+ DownloadSource.DEFAULT: "kingsley01/InternVL2_5-1B-MPO-hf",
+ DownloadSource.MODELSCOPE: "llamafactory/InternVL2_5-1B-MPO-hf",
+ },
+ "InternVL2.5-2B-MPO": {
+ DownloadSource.DEFAULT: "kingsley01/InternVL2_5-2B-MPO-hf",
+ DownloadSource.MODELSCOPE: "llamafactory/InternVL2_5-2B-MPO-hf",
+ },
+ "InternVL2.5-4B-MPO": {
+ DownloadSource.DEFAULT: "kingsley01/InternVL2_5-4B-MPO-hf",
+ DownloadSource.MODELSCOPE: "llamafactory/InternVL2_5-4B-MPO-hf",
+ },
+ "InternVL2.5-8B-MPO": {
+ DownloadSource.DEFAULT: "kingsley01/InternVL2_5-8B-MPO-hf",
+ DownloadSource.MODELSCOPE: "llamafactory/InternVL2_5-8B-MPO-hf",
+ },
+ "InternVL3-1B-hf": {
+ DownloadSource.DEFAULT: "kingsley01/InternVL3-1B-hf",
+ DownloadSource.MODELSCOPE: "llamafactory/InternVL3-1B-hf",
+ },
+ "InternVL3-2B-hf": {
+ DownloadSource.DEFAULT: "kingsley01/InternVL3-2B-hf",
+ DownloadSource.MODELSCOPE: "llamafactory/InternVL3-2B-hf",
+ },
+ "InternVL3-8B-hf": {
+ DownloadSource.DEFAULT: "kingsley01/InternVL3-8B-hf",
+ DownloadSource.MODELSCOPE: "llamafactory/InternVL3-8B-hf",
+ },
+ },
+ template="intern_vl",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Jamba-v0.1": {
+ DownloadSource.DEFAULT: "ai21labs/Jamba-v0.1",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Jamba-v0.1",
+ }
+ },
+)
+
+
+register_model_group(
+ models={
+ "Kimi-VL-A3B-Instruct": {
+ DownloadSource.DEFAULT: "moonshotai/Kimi-VL-A3B-Instruct",
+ DownloadSource.MODELSCOPE: "moonshotai/Kimi-VL-A3B-Instruct",
+ },
+ "Kimi-VL-A3B-Thinking": {
+ DownloadSource.DEFAULT: "moonshotai/Kimi-VL-A3B-Thinking",
+ DownloadSource.MODELSCOPE: "moonshotai/Kimi-VL-A3B-Thinking",
+ },
+ },
+ template="kimi_vl",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LingoWhale-8B": {
+ DownloadSource.DEFAULT: "deeplang-ai/LingoWhale-8B",
+ DownloadSource.MODELSCOPE: "DeepLang/LingoWhale-8B",
+ }
+ },
+)
+
+
+register_model_group(
+ models={
+ "Llama-7B": {
+ DownloadSource.DEFAULT: "huggyllama/llama-7b",
+ DownloadSource.MODELSCOPE: "skyline2006/llama-7b",
+ },
+ "Llama-13B": {
+ DownloadSource.DEFAULT: "huggyllama/llama-13b",
+ DownloadSource.MODELSCOPE: "skyline2006/llama-13b",
+ },
+ "Llama-30B": {
+ DownloadSource.DEFAULT: "huggyllama/llama-30b",
+ DownloadSource.MODELSCOPE: "skyline2006/llama-30b",
+ },
+ "Llama-65B": {
+ DownloadSource.DEFAULT: "huggyllama/llama-65b",
+ DownloadSource.MODELSCOPE: "skyline2006/llama-65b",
+ },
+ }
+)
+
+
+register_model_group(
+ models={
+ "Llama-2-7B": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-2-7b-hf",
+ DownloadSource.MODELSCOPE: "modelscope/Llama-2-7b-ms",
+ },
+ "Llama-2-13B": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-2-13b-hf",
+ DownloadSource.MODELSCOPE: "modelscope/Llama-2-13b-ms",
+ },
+ "Llama-2-70B": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-2-70b-hf",
+ DownloadSource.MODELSCOPE: "modelscope/Llama-2-70b-ms",
+ },
+ "Llama-2-7B-Chat": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-2-7b-chat-hf",
+ DownloadSource.MODELSCOPE: "modelscope/Llama-2-7b-chat-ms",
+ },
+ "Llama-2-13B-Chat": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-2-13b-chat-hf",
+ DownloadSource.MODELSCOPE: "modelscope/Llama-2-13b-chat-ms",
+ },
+ "Llama-2-70B-Chat": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-2-70b-chat-hf",
+ DownloadSource.MODELSCOPE: "modelscope/Llama-2-70b-chat-ms",
+ },
+ },
+ template="llama2",
+)
+
+
+register_model_group(
+ models={
+ "Llama-3-8B": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3-8B",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3-8B",
+ },
+ "Llama-3-70B": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3-70B",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3-70B",
+ },
+ "Llama-3-8B-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3-8B-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3-8B-Instruct",
+ },
+ "Llama-3-70B-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3-70B-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3-70B-Instruct",
+ },
+ "Llama-3-8B-Chinese-Chat": {
+ DownloadSource.DEFAULT: "shenzhi-wang/Llama3-8B-Chinese-Chat",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama3-8B-Chinese-Chat",
+ DownloadSource.OPENMIND: "LlamaFactory/Llama3-Chinese-8B-Instruct",
+ },
+ "Llama-3-70B-Chinese-Chat": {
+ DownloadSource.DEFAULT: "shenzhi-wang/Llama3-70B-Chinese-Chat",
+ },
+ "Llama-3.1-8B": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3.1-8B",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3.1-8B",
+ },
+ "Llama-3.1-70B": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3.1-70B",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3.1-70B",
+ },
+ "Llama-3.1-405B": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3.1-405B",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3.1-405B",
+ },
+ "Llama-3.1-8B-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3.1-8B-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3.1-8B-Instruct",
+ },
+ "Llama-3.1-70B-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3.1-70B-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3.1-70B-Instruct",
+ },
+ "Llama-3.1-405B-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Meta-Llama-3.1-405B-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Meta-Llama-3.1-405B-Instruct",
+ },
+ "Llama-3.1-8B-Chinese-Chat": {
+ DownloadSource.DEFAULT: "shenzhi-wang/Llama3.1-8B-Chinese-Chat",
+ DownloadSource.MODELSCOPE: "XD_AI/Llama3.1-8B-Chinese-Chat",
+ },
+ "Llama-3.1-70B-Chinese-Chat": {
+ DownloadSource.DEFAULT: "shenzhi-wang/Llama3.1-70B-Chinese-Chat",
+ DownloadSource.MODELSCOPE: "XD_AI/Llama3.1-70B-Chinese-Chat",
+ },
+ "Llama-3.2-1B": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.2-1B",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.2-1B",
+ },
+ "Llama-3.2-3B": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.2-3B",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.2-3B",
+ },
+ "Llama-3.2-1B-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.2-1B-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.2-1B-Instruct",
+ },
+ "Llama-3.2-3B-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.2-3B-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.2-3B-Instruct",
+ },
+ "Llama-3.3-70B-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.3-70B-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.3-70B-Instruct",
+ },
+ },
+ template="llama3",
+)
+
+
+register_model_group(
+ models={
+ "Llama-3.2-11B-Vision": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.2-11B-Vision",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.2-11B-Vision",
+ },
+ "Llama-3.2-11B-Vision-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.2-11B-Vision-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.2-11B-Vision-Instruct",
+ },
+ "Llama-3.2-90B-Vision": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.2-90B-Vision",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.2-90B-Vision",
+ },
+ "Llama-3.2-90B-Vision-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-3.2-90B-Vision-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-3.2-90B-Vision-Instruct",
+ },
+ },
+ template="mllama",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Llama-4-Scout-17B-16E": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-4-Scout-17B-16E",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-4-Scout-17B-16E",
+ },
+ "Llama-4-Scout-17B-16E-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-4-Scout-17B-16E-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-4-Scout-17B-16E-Instruct",
+ },
+ "Llama-4-Maverick-17B-128E": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-4-Maverick-17B-128E",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-4-Maverick-17B-128E",
+ },
+ "Llama-4-Maverick-17B-128E-Instruct": {
+ DownloadSource.DEFAULT: "meta-llama/Llama-4-Maverick-17B-128E-Instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Llama-4-Maverick-17B-128E-Instruct",
+ },
+ },
+ template="llama4",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-1.5-7B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llava-1.5-7b-hf",
+ DownloadSource.MODELSCOPE: "swift/llava-1.5-7b-hf",
+ },
+ "LLaVA-1.5-13B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llava-1.5-13b-hf",
+ DownloadSource.MODELSCOPE: "swift/llava-1.5-13b-hf",
+ },
+ },
+ template="llava",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-NeXT-7B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llava-v1.6-vicuna-7b-hf",
+ DownloadSource.MODELSCOPE: "swift/llava-v1.6-vicuna-7b-hf",
+ },
+ "LLaVA-NeXT-13B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llava-v1.6-vicuna-13b-hf",
+ DownloadSource.MODELSCOPE: "swift/llava-v1.6-vicuna-13b-hf",
+ },
+ },
+ template="llava_next",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-NeXT-Mistral-7B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llava-v1.6-mistral-7b-hf",
+ DownloadSource.MODELSCOPE: "swift/llava-v1.6-mistral-7b-hf",
+ },
+ },
+ template="llava_next_mistral",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-NeXT-Llama3-8B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llama3-llava-next-8b-hf",
+ DownloadSource.MODELSCOPE: "swift/llama3-llava-next-8b-hf",
+ },
+ },
+ template="llava_next_llama3",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-NeXT-34B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llava-v1.6-34b-hf",
+ DownloadSource.MODELSCOPE: "LLM-Research/llava-v1.6-34b-hf",
+ },
+ },
+ template="llava_next_yi",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-NeXT-72B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llava-next-72b-hf",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/llava-next-72b-hf",
+ },
+ "LLaVA-NeXT-110B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/llava-next-110b-hf",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/llava-next-110b-hf",
+ },
+ },
+ template="llava_next_qwen",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-NeXT-Video-7B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/LLaVA-NeXT-Video-7B-hf",
+ DownloadSource.MODELSCOPE: "swift/LLaVA-NeXT-Video-7B-hf",
+ },
+ "LLaVA-NeXT-Video-7B-DPO-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/LLaVA-NeXT-Video-7B-DPO-hf",
+ DownloadSource.MODELSCOPE: "swift/LLaVA-NeXT-Video-7B-DPO-hf",
+ },
+ },
+ template="llava_next_video",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-NeXT-Video-7B-32k-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/LLaVA-NeXT-Video-7B-32K-hf",
+ DownloadSource.MODELSCOPE: "swift/LLaVA-NeXT-Video-7B-32K-hf",
+ },
+ },
+ template="llava_next_video_mistral",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "LLaVA-NeXT-Video-34B-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/LLaVA-NeXT-Video-34B-hf",
+ DownloadSource.MODELSCOPE: "swift/LLaVA-NeXT-Video-34B-hf",
+ },
+ "LLaVA-NeXT-Video-34B-DPO-Chat": {
+ DownloadSource.DEFAULT: "llava-hf/LLaVA-NeXT-Video-34B-DPO-hf",
+ },
+ },
+ template="llava_next_video_yi",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Marco-o1-Chat": {
+ DownloadSource.DEFAULT: "AIDC-AI/Marco-o1",
+ DownloadSource.MODELSCOPE: "AIDC-AI/Marco-o1",
+ },
+ },
+ template="marco",
+)
+
+
+register_model_group(
+ models={
+ "MiniCPM-2B-SFT-Chat": {
+ DownloadSource.DEFAULT: "openbmb/MiniCPM-2B-sft-bf16",
+ DownloadSource.MODELSCOPE: "OpenBMB/miniCPM-bf16",
+ },
+ "MiniCPM-2B-DPO-Chat": {
+ DownloadSource.DEFAULT: "openbmb/MiniCPM-2B-dpo-bf16",
+ DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM-2B-dpo-bf16",
+ },
+ },
+ template="cpm",
+)
+
+
+register_model_group(
+ models={
+ "MiniCPM3-4B-Chat": {
+ DownloadSource.DEFAULT: "openbmb/MiniCPM3-4B",
+ DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM3-4B",
+ DownloadSource.OPENMIND: "LlamaFactory/MiniCPM3-4B",
+ },
+ },
+ template="cpm3",
+)
+
+
+register_model_group(
+ models={
+ "MiniCPM-o-2_6": {
+ DownloadSource.DEFAULT: "openbmb/MiniCPM-o-2_6",
+ DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM-o-2_6",
+ },
+ },
+ template="minicpm_o",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "MiniCPM-V-2_6": {
+ DownloadSource.DEFAULT: "openbmb/MiniCPM-V-2_6",
+ DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM-V-2_6",
+ },
+ },
+ template="minicpm_v",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Ministral-8B-Instruct-2410": {
+ DownloadSource.DEFAULT: "mistralai/Ministral-8B-Instruct-2410",
+ DownloadSource.MODELSCOPE: "mistralai/Ministral-8B-Instruct-2410",
+ },
+ "Mistral-Nemo-Base-2407": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-Nemo-Base-2407",
+ DownloadSource.MODELSCOPE: "LLM-Research/Mistral-Nemo-Base-2407",
+ },
+ "Mistral-Nemo-Instruct-2407": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-Nemo-Instruct-2407",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mistral-Nemo-Instruct-2407",
+ },
+ },
+ template="ministral",
+)
+
+
+register_model_group(
+ models={
+ "Mistral-7B-v0.1": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-7B-v0.1",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mistral-7B-v0.1",
+ },
+ "Mistral-7B-v0.2": {
+ DownloadSource.DEFAULT: "alpindale/Mistral-7B-v0.2-hf",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mistral-7B-v0.2-hf",
+ },
+ "Mistral-7B-v0.3": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-7B-v0.3",
+ DownloadSource.MODELSCOPE: "LLM-Research/mistral-7b-v0.3",
+ },
+ "Mistral-7B-Instruct-v0.1": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-7B-Instruct-v0.1",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mistral-7B-Instruct-v0.1",
+ },
+ "Mistral-7B-Instruct-v0.2": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-7B-Instruct-v0.2",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mistral-7B-Instruct-v0.2",
+ },
+ "Mistral-7B-Instruct-v0.3": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-7B-Instruct-v0.3",
+ DownloadSource.MODELSCOPE: "LLM-Research/Mistral-7B-Instruct-v0.3",
+ },
+ },
+ template="mistral",
+)
+
+
+register_model_group(
+ models={
+ "Mistral-Small-24B-Base-2501": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-Small-24B-Base-2501",
+ DownloadSource.MODELSCOPE: "mistralai/Mistral-Small-24B-Base-2501",
+ },
+ "Mistral-Small-24B-Instruct-2501": {
+ DownloadSource.DEFAULT: "mistralai/Mistral-Small-24B-Instruct-2501",
+ DownloadSource.MODELSCOPE: "mistralai/Mistral-Small-24B-Instruct-2501",
+ },
+ },
+ template="mistral_small",
+)
+
+
+register_model_group(
+ models={
+ "Mixtral-8x7B-v0.1": {
+ DownloadSource.DEFAULT: "mistralai/Mixtral-8x7B-v0.1",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mixtral-8x7B-v0.1",
+ },
+ "Mixtral-8x22B-v0.1": {
+ DownloadSource.DEFAULT: "mistralai/Mixtral-8x22B-v0.1",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mixtral-8x22B-v0.1",
+ },
+ "Mixtral-8x7B-v0.1-Instruct": {
+ DownloadSource.DEFAULT: "mistralai/Mixtral-8x7B-Instruct-v0.1",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mixtral-8x7B-Instruct-v0.1",
+ },
+ "Mixtral-8x22B-v0.1-Instruct": {
+ DownloadSource.DEFAULT: "mistralai/Mixtral-8x22B-Instruct-v0.1",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Mixtral-8x22B-Instruct-v0.1",
+ },
+ },
+ template="mistral",
+)
+
+
+register_model_group(
+ models={
+ "Moonlight-16B-A3B": {
+ DownloadSource.DEFAULT: "moonshotai/Moonlight-16B-A3B",
+ DownloadSource.MODELSCOPE: "moonshotai/Moonlight-16B-A3B",
+ },
+ "Moonlight-16B-A3B-Instruct": {
+ DownloadSource.DEFAULT: "moonshotai/Moonlight-16B-A3B-Instruct",
+ DownloadSource.MODELSCOPE: "moonshotai/Moonlight-16B-A3B-Instruct",
+ },
+ },
+ template="moonlight",
+)
+
+
+register_model_group(
+ models={
+ "OLMo-1B": {
+ DownloadSource.DEFAULT: "allenai/OLMo-1B-hf",
+ },
+ "OLMo-7B": {
+ DownloadSource.DEFAULT: "allenai/OLMo-7B-hf",
+ },
+ "OLMo-7B-Chat": {
+ DownloadSource.DEFAULT: "ssec-uw/OLMo-7B-Instruct-hf",
+ },
+ "OLMo-1.7-7B": {
+ DownloadSource.DEFAULT: "allenai/OLMo-1.7-7B-hf",
+ },
+ },
+)
+
+
+register_model_group(
+ models={
+ "OpenChat3.5-7B-Chat": {
+ DownloadSource.DEFAULT: "openchat/openchat-3.5-0106",
+ DownloadSource.MODELSCOPE: "xcwzxcwz/openchat-3.5-0106",
+ }
+ },
+ template="openchat",
+)
+
+
+register_model_group(
+ models={
+ "OpenChat3.6-8B-Chat": {
+ DownloadSource.DEFAULT: "openchat/openchat-3.6-8b-20240522",
+ }
+ },
+ template="openchat-3.6",
+)
+
+
+register_model_group(
+ models={
+ "OpenCoder-1.5B-Base": {
+ DownloadSource.DEFAULT: "infly/OpenCoder-1.5B-Base",
+ DownloadSource.MODELSCOPE: "infly/OpenCoder-1.5B-Base",
+ },
+ "OpenCoder-8B-Base": {
+ DownloadSource.DEFAULT: "infly/OpenCoder-8B-Base",
+ DownloadSource.MODELSCOPE: "infly/OpenCoder-8B-Base",
+ },
+ "OpenCoder-1.5B-Instruct": {
+ DownloadSource.DEFAULT: "infly/OpenCoder-1.5B-Instruct",
+ DownloadSource.MODELSCOPE: "infly/OpenCoder-1.5B-Instruct",
+ },
+ "OpenCoder-8B-Instruct": {
+ DownloadSource.DEFAULT: "infly/OpenCoder-8B-Instruct",
+ DownloadSource.MODELSCOPE: "infly/OpenCoder-8B-Instruct",
+ },
+ },
+ template="opencoder",
+)
+
+
+register_model_group(
+ models={
+ "Orion-14B-Base": {
+ DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-Base",
+ DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-Base",
+ },
+ "Orion-14B-Chat": {
+ DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-Chat",
+ DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-Chat",
+ },
+ "Orion-14B-Long-Chat": {
+ DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-LongChat",
+ DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-LongChat",
+ },
+ "Orion-14B-RAG-Chat": {
+ DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-Chat-RAG",
+ DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-Chat-RAG",
+ },
+ "Orion-14B-Plugin-Chat": {
+ DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-Chat-Plugin",
+ DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-Chat-Plugin",
+ },
+ },
+ template="orion",
+)
+
+
+register_model_group(
+ models={
+ "PaliGemma-3B-pt-224": {
+ DownloadSource.DEFAULT: "google/paligemma-3b-pt-224",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma-3b-pt-224",
+ },
+ "PaliGemma-3B-pt-448": {
+ DownloadSource.DEFAULT: "google/paligemma-3b-pt-448",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma-3b-pt-448",
+ },
+ "PaliGemma-3B-pt-896": {
+ DownloadSource.DEFAULT: "google/paligemma-3b-pt-896",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma-3b-pt-896",
+ },
+ "PaliGemma-3B-mix-224": {
+ DownloadSource.DEFAULT: "google/paligemma-3b-mix-224",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma-3b-mix-224",
+ },
+ "PaliGemma-3B-mix-448": {
+ DownloadSource.DEFAULT: "google/paligemma-3b-mix-448",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma-3b-mix-448",
+ },
+ },
+ template="paligemma",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "PaliGemma2-3B-pt-224": {
+ DownloadSource.DEFAULT: "google/paligemma2-3b-pt-224",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-3b-pt-224",
+ },
+ "PaliGemma2-3B-pt-448": {
+ DownloadSource.DEFAULT: "google/paligemma2-3b-pt-448",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-3b-pt-448",
+ },
+ "PaliGemma2-3B-pt-896": {
+ DownloadSource.DEFAULT: "google/paligemma2-3b-pt-896",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-3b-pt-896",
+ },
+ "PaliGemma2-10B-pt-224": {
+ DownloadSource.DEFAULT: "google/paligemma2-10b-pt-224",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-10b-pt-224",
+ },
+ "PaliGemma2-10B-pt-448": {
+ DownloadSource.DEFAULT: "google/paligemma2-10b-pt-448",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-10b-pt-448",
+ },
+ "PaliGemma2-10B-pt-896": {
+ DownloadSource.DEFAULT: "google/paligemma2-10b-pt-896",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-10b-pt-896",
+ },
+ "PaliGemma2-28B-pt-224": {
+ DownloadSource.DEFAULT: "google/paligemma2-28b-pt-224",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-28b-pt-224",
+ },
+ "PaliGemma2-28B-pt-448": {
+ DownloadSource.DEFAULT: "google/paligemma2-28b-pt-448",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-28b-pt-448",
+ },
+ "PaliGemma2-28B-pt-896": {
+ DownloadSource.DEFAULT: "google/paligemma2-28b-pt-896",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/paligemma2-28b-pt-896",
+ },
+ "PaliGemma2-3B-mix-224": {
+ DownloadSource.DEFAULT: "google/paligemma2-3b-mix-224",
+ DownloadSource.MODELSCOPE: "mlx-community/paligemma2-3b-mix-224-bf16",
+ },
+ "PaliGemma2-3B-mix-448": {
+ DownloadSource.DEFAULT: "google/paligemma2-3b-mix-448",
+ DownloadSource.MODELSCOPE: "mlx-community/paligemma2-3b-mix-448-bf16",
+ },
+ "PaliGemma2-10B-mix-224": {
+ DownloadSource.DEFAULT: "google/paligemma2-10b-mix-224",
+ DownloadSource.MODELSCOPE: "mlx-community/paligemma2-10b-mix-224-bf16",
+ },
+ "PaliGemma2-10B-mix-448": {
+ DownloadSource.DEFAULT: "google/paligemma2-10b-mix-448",
+ DownloadSource.MODELSCOPE: "mlx-community/paligemma2-10b-mix-448-bf16",
+ },
+ "PaliGemma2-28B-mix-224": {
+ DownloadSource.DEFAULT: "google/paligemma2-28b-mix-224",
+ DownloadSource.MODELSCOPE: "mlx-community/paligemma2-28b-mix-224-bf16",
+ },
+ "PaliGemma2-28B-mix-448": {
+ DownloadSource.DEFAULT: "google/paligemma2-28b-mix-448",
+ DownloadSource.MODELSCOPE: "mlx-community/paligemma2-28b-mix-448-bf16",
+ },
+ },
+ template="paligemma",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Phi-1.5-1.3B": {
+ DownloadSource.DEFAULT: "microsoft/phi-1_5",
+ DownloadSource.MODELSCOPE: "allspace/PHI_1-5",
+ },
+ "Phi-2-2.7B": {
+ DownloadSource.DEFAULT: "microsoft/phi-2",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/phi-2",
+ },
+ }
+)
+
+
+register_model_group(
+ models={
+ "Phi-3-4B-4k-Instruct": {
+ DownloadSource.DEFAULT: "microsoft/Phi-3-mini-4k-instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Phi-3-mini-4k-instruct",
+ },
+ "Phi-3-4B-128k-Instruct": {
+ DownloadSource.DEFAULT: "microsoft/Phi-3-mini-128k-instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Phi-3-mini-128k-instruct",
+ },
+ "Phi-3-14B-8k-Instruct": {
+ DownloadSource.DEFAULT: "microsoft/Phi-3-medium-4k-instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Phi-3-medium-4k-instruct",
+ },
+ "Phi-3-14B-128k-Instruct": {
+ DownloadSource.DEFAULT: "microsoft/Phi-3-medium-128k-instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Phi-3-medium-128k-instruct",
+ },
+ "Phi-3.5-4B-instruct": {
+ DownloadSource.DEFAULT: "microsoft/Phi-3.5-mini-instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Phi-3.5-mini-instruct",
+ },
+ "Phi-3.5-MoE-42B-A6.6B-instruct": {
+ DownloadSource.DEFAULT: "microsoft/Phi-3.5-MoE-instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Phi-3.5-MoE-instruct",
+ },
+ },
+ template="phi",
+)
+
+
+register_model_group(
+ models={
+ "Phi-3-7B-8k-Instruct": {
+ DownloadSource.DEFAULT: "microsoft/Phi-3-small-8k-instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Phi-3-small-8k-instruct",
+ },
+ "Phi-3-7B-128k-Instruct": {
+ DownloadSource.DEFAULT: "microsoft/Phi-3-small-128k-instruct",
+ DownloadSource.MODELSCOPE: "LLM-Research/Phi-3-small-128k-instruct",
+ },
+ },
+ template="phi_small",
+)
+
+
+register_model_group(
+ models={
+ "Phi-4-14B-Instruct": {
+ DownloadSource.DEFAULT: "microsoft/phi-4",
+ DownloadSource.MODELSCOPE: "LLM-Research/phi-4",
+ },
+ },
+ template="phi4",
+)
+
+
+register_model_group(
+ models={
+ "Pixtral-12B": {
+ DownloadSource.DEFAULT: "mistral-community/pixtral-12b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/pixtral-12b",
+ }
+ },
+ template="pixtral",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Qwen-1.8B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-1_8B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-1_8B",
+ },
+ "Qwen-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-7B",
+ },
+ "Qwen-14B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-14B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-14B",
+ },
+ "Qwen-72B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-72B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-72B",
+ },
+ "Qwen-1.8B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-1_8B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-1_8B-Chat",
+ },
+ "Qwen-7B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-7B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-7B-Chat",
+ },
+ "Qwen-14B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-14B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-14B-Chat",
+ },
+ "Qwen-72B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-72B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-72B-Chat",
+ },
+ "Qwen-1.8B-Chat-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-1_8B-Chat-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-1_8B-Chat-Int8",
+ },
+ "Qwen-1.8B-Chat-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-1_8B-Chat-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-1_8B-Chat-Int4",
+ },
+ "Qwen-7B-Chat-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-7B-Chat-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-7B-Chat-Int8",
+ },
+ "Qwen-7B-Chat-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-7B-Chat-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-7B-Chat-Int4",
+ },
+ "Qwen-14B-Chat-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-14B-Chat-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-14B-Chat-Int8",
+ },
+ "Qwen-14B-Chat-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-14B-Chat-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-14B-Chat-Int4",
+ },
+ "Qwen-72B-Chat-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-72B-Chat-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-72B-Chat-Int8",
+ },
+ "Qwen-72B-Chat-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen-72B-Chat-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen-72B-Chat-Int4",
+ },
+ },
+ template="qwen",
+)
+
+
+register_model_group(
+ models={
+ "Qwen1.5-0.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-0.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-0.5B",
+ },
+ "Qwen1.5-1.8B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-1.8B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-1.8B",
+ },
+ "Qwen1.5-4B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-4B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-4B",
+ },
+ "Qwen1.5-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-7B",
+ },
+ "Qwen1.5-14B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-14B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-14B",
+ },
+ "Qwen1.5-32B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-32B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-32B",
+ },
+ "Qwen1.5-72B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-72B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-72B",
+ },
+ "Qwen1.5-110B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-110B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-110B",
+ },
+ "Qwen1.5-MoE-A2.7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-MoE-A2.7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-MoE-A2.7B",
+ },
+ "Qwen1.5-0.5B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-0.5B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-0.5B-Chat",
+ },
+ "Qwen1.5-1.8B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-1.8B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-1.8B-Chat",
+ },
+ "Qwen1.5-4B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-4B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-4B-Chat",
+ },
+ "Qwen1.5-7B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-7B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-7B-Chat",
+ },
+ "Qwen1.5-14B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-14B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-14B-Chat",
+ },
+ "Qwen1.5-32B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-32B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-32B-Chat",
+ },
+ "Qwen1.5-72B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-72B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-72B-Chat",
+ },
+ "Qwen1.5-110B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-110B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-110B-Chat",
+ },
+ "Qwen1.5-MoE-A2.7B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-MoE-A2.7B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-MoE-A2.7B-Chat",
+ },
+ "Qwen1.5-0.5B-Chat-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8",
+ },
+ "Qwen1.5-0.5B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-0.5B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-0.5B-Chat-AWQ",
+ },
+ "Qwen1.5-1.8B-Chat-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8",
+ },
+ "Qwen1.5-1.8B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-1.8B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-1.8B-Chat-AWQ",
+ },
+ "Qwen1.5-4B-Chat-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-4B-Chat-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-4B-Chat-GPTQ-Int8",
+ },
+ "Qwen1.5-4B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-4B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-4B-Chat-AWQ",
+ },
+ "Qwen1.5-7B-Chat-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-7B-Chat-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-7B-Chat-GPTQ-Int8",
+ },
+ "Qwen1.5-7B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-7B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-7B-Chat-AWQ",
+ },
+ "Qwen1.5-14B-Chat-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-14B-Chat-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-14B-Chat-GPTQ-Int8",
+ },
+ "Qwen1.5-14B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-14B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-14B-Chat-AWQ",
+ },
+ "Qwen1.5-32B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-32B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-32B-Chat-AWQ",
+ },
+ "Qwen1.5-72B-Chat-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-72B-Chat-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-72B-Chat-GPTQ-Int8",
+ },
+ "Qwen1.5-72B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-72B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-72B-Chat-AWQ",
+ },
+ "Qwen1.5-110B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-110B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-110B-Chat-AWQ",
+ },
+ "Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4",
+ },
+ "CodeQwen1.5-7B": {
+ DownloadSource.DEFAULT: "Qwen/CodeQwen1.5-7B",
+ DownloadSource.MODELSCOPE: "Qwen/CodeQwen1.5-7B",
+ },
+ "CodeQwen1.5-7B-Chat": {
+ DownloadSource.DEFAULT: "Qwen/CodeQwen1.5-7B-Chat",
+ DownloadSource.MODELSCOPE: "Qwen/CodeQwen1.5-7B-Chat",
+ },
+ "CodeQwen1.5-7B-Chat-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/CodeQwen1.5-7B-Chat-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/CodeQwen1.5-7B-Chat-AWQ",
+ },
+ },
+ template="qwen",
+)
+
+
+register_model_group(
+ models={
+ "Qwen2-0.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-0.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-0.5B",
+ },
+ "Qwen2-1.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-1.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-1.5B",
+ },
+ "Qwen2-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-7B",
+ },
+ "Qwen2-72B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-72B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-72B",
+ },
+ "Qwen2-MoE-57B-A14B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-57B-A14B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-57B-A14B",
+ },
+ "Qwen2-0.5B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-0.5B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-0.5B-Instruct",
+ DownloadSource.OPENMIND: "LlamaFactory/Qwen2-0.5B-Instruct",
+ },
+ "Qwen2-1.5B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-1.5B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-1.5B-Instruct",
+ DownloadSource.OPENMIND: "LlamaFactory/Qwen2-1.5B-Instruct",
+ },
+ "Qwen2-7B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-7B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-7B-Instruct",
+ DownloadSource.OPENMIND: "LlamaFactory/Qwen2-7B-Instruct",
+ },
+ "Qwen2-72B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-72B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-72B-Instruct",
+ },
+ "Qwen2-MoE-57B-A14B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-57B-A14B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-57B-A14B-Instruct",
+ },
+ "Qwen2-0.5B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2-0.5B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2-0.5B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-0.5B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-0.5B-Instruct-AWQ",
+ },
+ "Qwen2-1.5B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-1.5B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-1.5B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2-1.5B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2-1.5B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-1.5B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-1.5B-Instruct-AWQ",
+ },
+ "Qwen2-7B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-7B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-7B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2-7B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-7B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-7B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2-7B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-7B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-7B-Instruct-AWQ",
+ },
+ "Qwen2-72B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-72B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-72B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2-72B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-72B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-72B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2-72B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-72B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-72B-Instruct-AWQ",
+ },
+ "Qwen2-57B-A14B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2-Math-1.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-Math-1.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-Math-1.5B",
+ },
+ "Qwen2-Math-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-Math-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-Math-7B",
+ },
+ "Qwen2-Math-72B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-Math-72B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-Math-72B",
+ },
+ "Qwen2-Math-1.5B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-Math-1.5B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-Math-1.5B-Instruct",
+ },
+ "Qwen2-Math-7B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-Math-7B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-Math-7B-Instruct",
+ },
+ "Qwen2-Math-72B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-Math-72B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-Math-72B-Instruct",
+ },
+ },
+ template="qwen",
+)
+
+
+register_model_group(
+ models={
+ "Qwen2.5-0.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-0.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-0.5B",
+ },
+ "Qwen2.5-1.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-1.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-1.5B",
+ },
+ "Qwen2.5-3B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-3B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-3B",
+ },
+ "Qwen2.5-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-7B",
+ },
+ "Qwen2.5-14B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-14B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-14B",
+ },
+ "Qwen2.5-32B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-32B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-32B",
+ },
+ "Qwen2.5-72B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-72B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-72B",
+ },
+ "Qwen2.5-0.5B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-0.5B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-0.5B-Instruct",
+ },
+ "Qwen2.5-1.5B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-1.5B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-1.5B-Instruct",
+ },
+ "Qwen2.5-3B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-3B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-3B-Instruct",
+ },
+ "Qwen2.5-7B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-7B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-7B-Instruct",
+ },
+ "Qwen2.5-14B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-14B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-14B-Instruct",
+ },
+ "Qwen2.5-32B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-32B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-32B-Instruct",
+ },
+ "Qwen2.5-72B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-72B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-72B-Instruct",
+ },
+ "Qwen2.5-7B-Instruct-1M": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-7B-Instruct-1M",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-7B-Instruct-1M",
+ },
+ "Qwen2.5-14B-Instruct-1M": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-14B-Instruct-1M",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-14B-Instruct-1M",
+ },
+ "Qwen2.5-0.5B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2.5-0.5B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2.5-0.5B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-0.5B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-0.5B-Instruct-AWQ",
+ },
+ "Qwen2.5-1.5B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2.5-1.5B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2.5-1.5B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-1.5B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-1.5B-Instruct-AWQ",
+ },
+ "Qwen2.5-3B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2.5-3B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2.5-3B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-3B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-3B-Instruct-AWQ",
+ },
+ "Qwen2.5-7B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2.5-7B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2.5-7B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-7B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-7B-Instruct-AWQ",
+ },
+ "Qwen2.5-14B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2.5-14B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2.5-14B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-14B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-14B-Instruct-AWQ",
+ },
+ "Qwen2.5-32B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2.5-32B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2.5-32B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-32B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-32B-Instruct-AWQ",
+ },
+ "Qwen2.5-72B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2.5-72B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2.5-72B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-72B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-72B-Instruct-AWQ",
+ },
+ "Qwen2.5-Coder-0.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-0.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-0.5B",
+ },
+ "Qwen2.5-Coder-1.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-1.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-1.5B",
+ },
+ "Qwen2.5-Coder-3B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-3B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-3B",
+ },
+ "Qwen2.5-Coder-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-7B",
+ },
+ "Qwen2.5-Coder-14B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-14B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-14B",
+ },
+ "Qwen2.5-Coder-32B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-32B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-32B",
+ },
+ "Qwen2.5-Coder-0.5B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-0.5B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-0.5B-Instruct",
+ },
+ "Qwen2.5-Coder-1.5B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-1.5B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-1.5B-Instruct",
+ },
+ "Qwen2.5-Coder-3B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-3B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-3B-Instruct",
+ },
+ "Qwen2.5-Coder-7B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-7B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-7B-Instruct",
+ },
+ "Qwen2.5-Coder-14B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-14B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-14B-Instruct",
+ },
+ "Qwen2.5-Coder-32B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Coder-32B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-32B-Instruct",
+ },
+ "Qwen2.5-Math-1.5B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Math-1.5B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Math-1.5B",
+ },
+ "Qwen2.5-Math-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Math-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Math-7B",
+ },
+ "Qwen2.5-Math-72B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Math-72B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Math-72B",
+ },
+ "Qwen2.5-Math-1.5B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Math-1.5B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-1.5B-Instruct",
+ },
+ "Qwen2.5-Math-7B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Math-7B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-7B-Instruct",
+ },
+ "Qwen2.5-Math-72B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Math-72B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Coder-72B-Instruct",
+ },
+ "QwQ-32B-Preview-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/QwQ-32B-Preview",
+ DownloadSource.MODELSCOPE: "Qwen/QwQ-32B-Preview",
+ },
+ "QwQ-32B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/QwQ-32B",
+ DownloadSource.MODELSCOPE: "Qwen/QwQ-32B",
+ },
+ },
+ template="qwen",
+)
+
+
+register_model_group(
+ models={
+ "Qwen2-Audio-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-Audio-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-Audio-7B",
+ },
+ "Qwen2-Audio-7B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-Audio-7B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-Audio-7B-Instruct",
+ },
+ },
+ template="qwen2_audio",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Qwen2.5-Omni-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-Omni-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-Omni-7B",
+ }
+ },
+ template="qwen2_omni",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Qwen2-VL-2B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-2B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-2B",
+ },
+ "Qwen2-VL-7B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-7B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-7B",
+ },
+ "Qwen2-VL-72B": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-72B",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-72B",
+ },
+ "Qwen2-VL-2B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-2B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-2B-Instruct",
+ DownloadSource.OPENMIND: "LlamaFactory/Qwen2-VL-2B-Instruct",
+ },
+ "Qwen2-VL-7B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-7B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-7B-Instruct",
+ DownloadSource.OPENMIND: "LlamaFactory/Qwen2-VL-7B-Instruct",
+ },
+ "Qwen2-VL-72B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-72B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-72B-Instruct",
+ },
+ "Qwen2-VL-2B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2-VL-2B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2-VL-2B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-2B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-2B-Instruct-AWQ",
+ },
+ "Qwen2-VL-7B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2-VL-7B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2-VL-7B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-7B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-7B-Instruct-AWQ",
+ },
+ "Qwen2-VL-72B-Instruct-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8",
+ },
+ "Qwen2-VL-72B-Instruct-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4",
+ },
+ "Qwen2-VL-72B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2-VL-72B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2-VL-72B-Instruct-AWQ",
+ },
+ "QVQ-72B-Preview": {
+ DownloadSource.DEFAULT: "Qwen/QVQ-72B-Preview",
+ DownloadSource.MODELSCOPE: "Qwen/QVQ-72B-Preview",
+ },
+ "Qwen2.5-VL-3B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-VL-3B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-VL-3B-Instruct",
+ },
+ "Qwen2.5-VL-7B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-VL-7B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-VL-7B-Instruct",
+ },
+ "Qwen2.5-VL-32B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-VL-32B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-VL-32B-Instruct",
+ },
+ "Qwen2.5-VL-72B-Instruct": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-VL-72B-Instruct",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-VL-72B-Instruct",
+ },
+ "Qwen2.5-VL-3B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-VL-3B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-VL-3B-Instruct-AWQ",
+ },
+ "Qwen2.5-VL-7B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-VL-7B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-VL-7B-Instruct-AWQ",
+ },
+ "Qwen2.5-VL-72B-Instruct-AWQ": {
+ DownloadSource.DEFAULT: "Qwen/Qwen2.5-VL-72B-Instruct-AWQ",
+ DownloadSource.MODELSCOPE: "Qwen/Qwen2.5-VL-72B-Instruct-AWQ",
+ },
+ },
+ template="qwen2_vl",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "SOLAR-10.7B-v1.0": {
+ DownloadSource.DEFAULT: "upstage/SOLAR-10.7B-v1.0",
+ },
+ "SOLAR-10.7B-Instruct-v1.0": {
+ DownloadSource.DEFAULT: "upstage/SOLAR-10.7B-Instruct-v1.0",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/SOLAR-10.7B-Instruct-v1.0",
+ },
+ },
+ template="solar",
+)
+
+
+register_model_group(
+ models={
+ "Skywork-13B-Base": {
+ DownloadSource.DEFAULT: "Skywork/Skywork-13B-base",
+ DownloadSource.MODELSCOPE: "skywork/Skywork-13B-base",
+ }
+ }
+)
+
+
+register_model_group(
+ models={
+ "Skywork-o1-Open-Llama-3.1-8B": {
+ DownloadSource.DEFAULT: "Skywork/Skywork-o1-Open-Llama-3.1-8B",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/Skywork-o1-Open-Llama-3.1-8B",
+ }
+ },
+ template="skywork_o1",
+)
+
+
+register_model_group(
+ models={
+ "StarCoder2-3B": {
+ DownloadSource.DEFAULT: "bigcode/starcoder2-3b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/starcoder2-3b",
+ },
+ "StarCoder2-7B": {
+ DownloadSource.DEFAULT: "bigcode/starcoder2-7b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/starcoder2-7b",
+ },
+ "StarCoder2-15B": {
+ DownloadSource.DEFAULT: "bigcode/starcoder2-15b",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/starcoder2-15b",
+ },
+ }
+)
+
+
+register_model_group(
+ models={
+ "TeleChat-1B-Chat": {
+ DownloadSource.DEFAULT: "Tele-AI/TeleChat-1B",
+ DownloadSource.MODELSCOPE: "TeleAI/TeleChat-1B",
+ },
+ "TeleChat-7B-Chat": {
+ DownloadSource.DEFAULT: "Tele-AI/telechat-7B",
+ DownloadSource.MODELSCOPE: "TeleAI/telechat-7B",
+ DownloadSource.OPENMIND: "TeleAI/TeleChat-7B-pt",
+ },
+ "TeleChat-12B-Chat": {
+ DownloadSource.DEFAULT: "Tele-AI/TeleChat-12B-v2",
+ DownloadSource.MODELSCOPE: "TeleAI/TeleChat-12B-v2",
+ DownloadSource.OPENMIND: "TeleAI/TeleChat-12B-pt",
+ },
+ "TeleChat-52B-Chat": {
+ DownloadSource.DEFAULT: "Tele-AI/TeleChat-52B",
+ },
+ },
+ template="telechat",
+)
+
+
+register_model_group(
+ models={
+ "TeleChat2-3B-Chat": {
+ DownloadSource.DEFAULT: "Tele-AI/TeleChat2-3B",
+ DownloadSource.MODELSCOPE: "TeleAI/TeleChat2-3B",
+ },
+ "TeleChat2-7B-Chat": {
+ DownloadSource.DEFAULT: "Tele-AI/TeleChat2-7B",
+ DownloadSource.MODELSCOPE: "TeleAI/TeleChat2-7B",
+ },
+ "TeleChat2-35B-Chat": {
+ DownloadSource.MODELSCOPE: "TeleAI/TeleChat2-35B-Nov",
+ },
+ "TeleChat2-115B-Chat": {
+ DownloadSource.DEFAULT: "Tele-AI/TeleChat2-115B",
+ DownloadSource.MODELSCOPE: "TeleAI/TeleChat2-115B",
+ },
+ },
+ template="telechat2",
+)
+
+
+register_model_group(
+ models={
+ "Vicuna-v1.5-7B-Chat": {
+ DownloadSource.DEFAULT: "lmsys/vicuna-7b-v1.5",
+ DownloadSource.MODELSCOPE: "Xorbits/vicuna-7b-v1.5",
+ },
+ "Vicuna-v1.5-13B-Chat": {
+ DownloadSource.DEFAULT: "lmsys/vicuna-13b-v1.5",
+ DownloadSource.MODELSCOPE: "Xorbits/vicuna-13b-v1.5",
+ },
+ },
+ template="vicuna",
+)
+
+
+register_model_group(
+ models={
+ "Video-LLaVA-7B-Chat": {
+ DownloadSource.DEFAULT: "LanguageBind/Video-LLaVA-7B-hf",
+ },
+ },
+ template="video_llava",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "XuanYuan-6B": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan-6B",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan-6B",
+ },
+ "XuanYuan-70B": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan-70B",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan-70B",
+ },
+ "XuanYuan2-70B": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan2-70B",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan2-70B",
+ },
+ "XuanYuan-6B-Chat": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan-6B-Chat",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan-6B-Chat",
+ },
+ "XuanYuan-70B-Chat": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan-70B-Chat",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan-70B-Chat",
+ },
+ "XuanYuan2-70B-Chat": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan2-70B-Chat",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan2-70B-Chat",
+ },
+ "XuanYuan-6B-Chat-8bit": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan-6B-Chat-8bit",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan-6B-Chat-8bit",
+ },
+ "XuanYuan-6B-Chat-4bit": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan-6B-Chat-4bit",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan-6B-Chat-4bit",
+ },
+ "XuanYuan-70B-Chat-8bit": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan-70B-Chat-8bit",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan-70B-Chat-8bit",
+ },
+ "XuanYuan-70B-Chat-4bit": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan-70B-Chat-4bit",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan-70B-Chat-4bit",
+ },
+ "XuanYuan2-70B-Chat-8bit": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan2-70B-Chat-8bit",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan2-70B-Chat-8bit",
+ },
+ "XuanYuan2-70B-Chat-4bit": {
+ DownloadSource.DEFAULT: "Duxiaoman-DI/XuanYuan2-70B-Chat-4bit",
+ DownloadSource.MODELSCOPE: "Duxiaoman-DI/XuanYuan2-70B-Chat-4bit",
+ },
+ },
+ template="xuanyuan",
+)
+
+
+register_model_group(
+ models={
+ "XVERSE-7B": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-7B",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-7B",
+ },
+ "XVERSE-13B": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-13B",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-13B",
+ },
+ "XVERSE-65B": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-65B",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-65B",
+ },
+ "XVERSE-65B-2": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-65B-2",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-65B-2",
+ },
+ "XVERSE-7B-Chat": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-7B-Chat",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-7B-Chat",
+ },
+ "XVERSE-13B-Chat": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-13B-Chat",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-13B-Chat",
+ },
+ "XVERSE-65B-Chat": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-65B-Chat",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-65B-Chat",
+ },
+ "XVERSE-MoE-A4.2B": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-MoE-A4.2B",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-MoE-A4.2B",
+ },
+ "XVERSE-7B-Chat-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-7B-Chat-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-7B-Chat-GPTQ-Int8",
+ },
+ "XVERSE-7B-Chat-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-7B-Chat-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-7B-Chat-GPTQ-Int4",
+ },
+ "XVERSE-13B-Chat-GPTQ-Int8": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-13B-Chat-GPTQ-Int8",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-13B-Chat-GPTQ-Int8",
+ },
+ "XVERSE-13B-Chat-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-13B-Chat-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-13B-Chat-GPTQ-Int4",
+ },
+ "XVERSE-65B-Chat-GPTQ-Int4": {
+ DownloadSource.DEFAULT: "xverse/XVERSE-65B-Chat-GPTQ-Int4",
+ DownloadSource.MODELSCOPE: "xverse/XVERSE-65B-Chat-GPTQ-Int4",
+ },
+ },
+ template="xverse",
+)
+
+
+register_model_group(
+ models={
+ "Yayi-7B": {
+ DownloadSource.DEFAULT: "wenge-research/yayi-7b-llama2",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/yayi-7b-llama2",
+ },
+ "Yayi-13B": {
+ DownloadSource.DEFAULT: "wenge-research/yayi-13b-llama2",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/yayi-13b-llama2",
+ },
+ },
+ template="yayi",
+)
+
+
+register_model_group(
+ models={
+ "Yi-6B": {
+ DownloadSource.DEFAULT: "01-ai/Yi-6B",
+ DownloadSource.MODELSCOPE: "01ai/Yi-6B",
+ },
+ "Yi-9B": {
+ DownloadSource.DEFAULT: "01-ai/Yi-9B",
+ DownloadSource.MODELSCOPE: "01ai/Yi-9B",
+ },
+ "Yi-34B": {
+ DownloadSource.DEFAULT: "01-ai/Yi-34B",
+ DownloadSource.MODELSCOPE: "01ai/Yi-34B",
+ },
+ "Yi-6B-Chat": {
+ DownloadSource.DEFAULT: "01-ai/Yi-6B-Chat",
+ DownloadSource.MODELSCOPE: "01ai/Yi-6B-Chat",
+ },
+ "Yi-34B-Chat": {
+ DownloadSource.DEFAULT: "01-ai/Yi-34B-Chat",
+ DownloadSource.MODELSCOPE: "01ai/Yi-34B-Chat",
+ },
+ "Yi-6B-Chat-8bits": {
+ DownloadSource.DEFAULT: "01-ai/Yi-6B-Chat-8bits",
+ DownloadSource.MODELSCOPE: "01ai/Yi-6B-Chat-8bits",
+ },
+ "Yi-6B-Chat-4bits": {
+ DownloadSource.DEFAULT: "01-ai/Yi-6B-Chat-4bits",
+ DownloadSource.MODELSCOPE: "01ai/Yi-6B-Chat-4bits",
+ },
+ "Yi-34B-Chat-8bits": {
+ DownloadSource.DEFAULT: "01-ai/Yi-34B-Chat-8bits",
+ DownloadSource.MODELSCOPE: "01ai/Yi-34B-Chat-8bits",
+ },
+ "Yi-34B-Chat-4bits": {
+ DownloadSource.DEFAULT: "01-ai/Yi-34B-Chat-4bits",
+ DownloadSource.MODELSCOPE: "01ai/Yi-34B-Chat-4bits",
+ },
+ "Yi-1.5-6B": {
+ DownloadSource.DEFAULT: "01-ai/Yi-1.5-6B",
+ DownloadSource.MODELSCOPE: "01ai/Yi-1.5-6B",
+ },
+ "Yi-1.5-9B": {
+ DownloadSource.DEFAULT: "01-ai/Yi-1.5-9B",
+ DownloadSource.MODELSCOPE: "01ai/Yi-1.5-9B",
+ },
+ "Yi-1.5-34B": {
+ DownloadSource.DEFAULT: "01-ai/Yi-1.5-34B",
+ DownloadSource.MODELSCOPE: "01ai/Yi-1.5-34B",
+ },
+ "Yi-1.5-6B-Chat": {
+ DownloadSource.DEFAULT: "01-ai/Yi-1.5-6B-Chat",
+ DownloadSource.MODELSCOPE: "01ai/Yi-1.5-6B-Chat",
+ DownloadSource.OPENMIND: "LlamaFactory/Yi-1.5-6B-Chat",
+ },
+ "Yi-1.5-9B-Chat": {
+ DownloadSource.DEFAULT: "01-ai/Yi-1.5-9B-Chat",
+ DownloadSource.MODELSCOPE: "01ai/Yi-1.5-9B-Chat",
+ },
+ "Yi-1.5-34B-Chat": {
+ DownloadSource.DEFAULT: "01-ai/Yi-1.5-34B-Chat",
+ DownloadSource.MODELSCOPE: "01ai/Yi-1.5-34B-Chat",
+ },
+ "Yi-Coder-1.5B": {
+ DownloadSource.DEFAULT: "01-ai/Yi-Coder-1.5B",
+ DownloadSource.MODELSCOPE: "01ai/Yi-Coder-1.5B",
+ },
+ "Yi-Coder-9B": {
+ DownloadSource.DEFAULT: "01-ai/Yi-Coder-9B",
+ DownloadSource.MODELSCOPE: "01ai/Yi-Coder-9B",
+ },
+ "Yi-Coder-1.5B-Chat": {
+ DownloadSource.DEFAULT: "01-ai/Yi-Coder-1.5B-Chat",
+ DownloadSource.MODELSCOPE: "01ai/Yi-Coder-1.5B-Chat",
+ },
+ "Yi-Coder-9B-Chat": {
+ DownloadSource.DEFAULT: "01-ai/Yi-Coder-9B-Chat",
+ DownloadSource.MODELSCOPE: "01ai/Yi-Coder-9B-Chat",
+ },
+ },
+ template="yi",
+)
+
+
+register_model_group(
+ models={
+ "Yi-VL-6B-Chat": {
+ DownloadSource.DEFAULT: "BUAADreamer/Yi-VL-6B-hf",
+ },
+ "Yi-VL-34B-Chat": {
+ DownloadSource.DEFAULT: "BUAADreamer/Yi-VL-34B-hf",
+ },
+ },
+ template="yi_vl",
+ multimodal=True,
+)
+
+
+register_model_group(
+ models={
+ "Yuan2-2B-Chat": {
+ DownloadSource.DEFAULT: "IEITYuan/Yuan2-2B-hf",
+ DownloadSource.MODELSCOPE: "YuanLLM/Yuan2.0-2B-hf",
+ },
+ "Yuan2-51B-Chat": {
+ DownloadSource.DEFAULT: "IEITYuan/Yuan2-51B-hf",
+ DownloadSource.MODELSCOPE: "YuanLLM/Yuan2.0-51B-hf",
+ },
+ "Yuan2-102B-Chat": {
+ DownloadSource.DEFAULT: "IEITYuan/Yuan2-102B-hf",
+ DownloadSource.MODELSCOPE: "YuanLLM/Yuan2.0-102B-hf",
+ },
+ },
+ template="yuan",
+)
+
+
+register_model_group(
+ models={
+ "Zephyr-7B-Alpha-Chat": {
+ DownloadSource.DEFAULT: "HuggingFaceH4/zephyr-7b-alpha",
+ DownloadSource.MODELSCOPE: "AI-ModelScope/zephyr-7b-alpha",
+ },
+ "Zephyr-7B-Beta-Chat": {
+ DownloadSource.DEFAULT: "HuggingFaceH4/zephyr-7b-beta",
+ DownloadSource.MODELSCOPE: "modelscope/zephyr-7b-beta",
+ },
+ "Zephyr-141B-ORPO-Chat": {
+ DownloadSource.DEFAULT: "HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
+ },
+ },
+ template="zephyr",
+)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/extras/env.py b/post-training/LLaMA-Factory/src/llamafactory/extras/env.py
new file mode 100644
index 0000000..ab0dfb8
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/extras/env.py
@@ -0,0 +1,86 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/commands/env.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import platform
+
+import accelerate
+import datasets
+import peft
+import torch
+import transformers
+import trl
+from transformers.utils import is_torch_cuda_available, is_torch_npu_available
+
+
+VERSION = "0.9.3.dev0"
+
+
+def print_env() -> None:
+ info = {
+ "`llamafactory` version": VERSION,
+ "Platform": platform.platform(),
+ "Python version": platform.python_version(),
+ "PyTorch version": torch.__version__,
+ "Transformers version": transformers.__version__,
+ "Datasets version": datasets.__version__,
+ "Accelerate version": accelerate.__version__,
+ "PEFT version": peft.__version__,
+ "TRL version": trl.__version__,
+ }
+
+ if is_torch_cuda_available():
+ info["PyTorch version"] += " (GPU)"
+ info["GPU type"] = torch.cuda.get_device_name()
+ info["GPU number"] = torch.cuda.device_count()
+ info["GPU memory"] = f"{torch.cuda.mem_get_info()[1] / (1024**3):.2f}GB"
+
+ if is_torch_npu_available():
+ info["PyTorch version"] += " (NPU)"
+ info["NPU type"] = torch.npu.get_device_name()
+ info["CANN version"] = torch.version.cann
+
+ try:
+ import deepspeed # type: ignore
+
+ info["DeepSpeed version"] = deepspeed.__version__
+ except Exception:
+ pass
+
+ try:
+ import bitsandbytes # type: ignore
+
+ info["Bitsandbytes version"] = bitsandbytes.__version__
+ except Exception:
+ pass
+
+ try:
+ import vllm
+
+ info["vLLM version"] = vllm.__version__
+ except Exception:
+ pass
+
+ try:
+ import subprocess
+
+ commit_info = subprocess.run(["git", "rev-parse", "HEAD"], capture_output=True, text=True, check=True)
+ commit_hash = commit_info.stdout.strip()
+ info["Git commit"] = commit_hash
+ except Exception:
+ pass
+
+ print("\n" + "\n".join([f"- {key}: {value}" for key, value in info.items()]) + "\n")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/extras/logging.py b/post-training/LLaMA-Factory/src/llamafactory/extras/logging.py
new file mode 100644
index 0000000..b078a96
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/extras/logging.py
@@ -0,0 +1,159 @@
+# Copyright 2025 Optuna, HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/utils/logging.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import os
+import sys
+import threading
+from concurrent.futures import ThreadPoolExecutor
+from functools import lru_cache
+from typing import Optional
+
+from .constants import RUNNING_LOG
+
+
+_thread_lock = threading.RLock()
+_default_handler: Optional["logging.Handler"] = None
+_default_log_level: "logging._Level" = logging.INFO
+
+
+class LoggerHandler(logging.Handler):
+ r"""Redirect the logging output to the logging file for LLaMA Board."""
+
+ def __init__(self, output_dir: str) -> None:
+ super().__init__()
+ self._formatter = logging.Formatter(
+ fmt="[%(levelname)s|%(asctime)s] %(filename)s:%(lineno)s >> %(message)s",
+ datefmt="%Y-%m-%d %H:%M:%S",
+ )
+ self.setLevel(logging.INFO)
+ os.makedirs(output_dir, exist_ok=True)
+ self.running_log = os.path.join(output_dir, RUNNING_LOG)
+ if os.path.exists(self.running_log):
+ os.remove(self.running_log)
+
+ self.thread_pool = ThreadPoolExecutor(max_workers=1)
+
+ def _write_log(self, log_entry: str) -> None:
+ with open(self.running_log, "a", encoding="utf-8") as f:
+ f.write(log_entry + "\n\n")
+
+ def emit(self, record) -> None:
+ if record.name == "httpx":
+ return
+
+ log_entry = self._formatter.format(record)
+ self.thread_pool.submit(self._write_log, log_entry)
+
+ def close(self) -> None:
+ self.thread_pool.shutdown(wait=True)
+ return super().close()
+
+
+class _Logger(logging.Logger):
+ r"""A logger that supports rank0 logging."""
+
+ def info_rank0(self, *args, **kwargs) -> None:
+ self.info(*args, **kwargs)
+
+ def warning_rank0(self, *args, **kwargs) -> None:
+ self.warning(*args, **kwargs)
+
+ def warning_rank0_once(self, *args, **kwargs) -> None:
+ self.warning(*args, **kwargs)
+
+
+def _get_default_logging_level() -> "logging._Level":
+ r"""Return the default logging level."""
+ env_level_str = os.getenv("LLAMAFACTORY_VERBOSITY", None)
+ if env_level_str:
+ if env_level_str.upper() in logging._nameToLevel:
+ return logging._nameToLevel[env_level_str.upper()]
+ else:
+ raise ValueError(f"Unknown logging level: {env_level_str}.")
+
+ return _default_log_level
+
+
+def _get_library_name() -> str:
+ return __name__.split(".")[0]
+
+
+def _get_library_root_logger() -> "_Logger":
+ return logging.getLogger(_get_library_name())
+
+
+def _configure_library_root_logger() -> None:
+ r"""Configure root logger using a stdout stream handler with an explicit format."""
+ global _default_handler
+
+ with _thread_lock:
+ if _default_handler: # already configured
+ return
+
+ formatter = logging.Formatter(
+ fmt="[%(levelname)s|%(asctime)s] %(name)s:%(lineno)s >> %(message)s",
+ datefmt="%Y-%m-%d %H:%M:%S",
+ )
+ _default_handler = logging.StreamHandler(sys.stdout)
+ _default_handler.setFormatter(formatter)
+ library_root_logger = _get_library_root_logger()
+ library_root_logger.addHandler(_default_handler)
+ library_root_logger.setLevel(_get_default_logging_level())
+ library_root_logger.propagate = False
+
+
+def get_logger(name: Optional[str] = None) -> "_Logger":
+ r"""Return a logger with the specified name. It it not supposed to be accessed externally."""
+ if name is None:
+ name = _get_library_name()
+
+ _configure_library_root_logger()
+ return logging.getLogger(name)
+
+
+def add_handler(handler: "logging.Handler") -> None:
+ r"""Add a handler to the root logger."""
+ _configure_library_root_logger()
+ _get_library_root_logger().addHandler(handler)
+
+
+def remove_handler(handler: logging.Handler) -> None:
+ r"""Remove a handler to the root logger."""
+ _configure_library_root_logger()
+ _get_library_root_logger().removeHandler(handler)
+
+
+def info_rank0(self: "logging.Logger", *args, **kwargs) -> None:
+ if int(os.getenv("LOCAL_RANK", "0")) == 0:
+ self.info(*args, **kwargs)
+
+
+def warning_rank0(self: "logging.Logger", *args, **kwargs) -> None:
+ if int(os.getenv("LOCAL_RANK", "0")) == 0:
+ self.warning(*args, **kwargs)
+
+
+@lru_cache(None)
+def warning_rank0_once(self: "logging.Logger", *args, **kwargs) -> None:
+ if int(os.getenv("LOCAL_RANK", "0")) == 0:
+ self.warning(*args, **kwargs)
+
+
+logging.Logger.info_rank0 = info_rank0
+logging.Logger.warning_rank0 = warning_rank0
+logging.Logger.warning_rank0_once = warning_rank0_once
diff --git a/post-training/LLaMA-Factory/src/llamafactory/extras/misc.py b/post-training/LLaMA-Factory/src/llamafactory/extras/misc.py
new file mode 100644
index 0000000..e5b9125
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/extras/misc.py
@@ -0,0 +1,302 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's PEFT library.
+# https://github.com/huggingface/peft/blob/v0.10.0/src/peft/peft_model.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import gc
+import os
+import socket
+from typing import TYPE_CHECKING, Any, Literal, Union
+
+import torch
+import torch.distributed as dist
+import transformers.dynamic_module_utils
+from transformers import InfNanRemoveLogitsProcessor, LogitsProcessorList
+from transformers.dynamic_module_utils import get_relative_imports
+from transformers.utils import (
+ is_torch_bf16_gpu_available,
+ is_torch_cuda_available,
+ is_torch_mps_available,
+ is_torch_npu_available,
+ is_torch_xpu_available,
+)
+from transformers.utils.versions import require_version
+
+from . import logging
+from .packages import is_transformers_version_greater_than
+
+
+_is_fp16_available = is_torch_npu_available() or is_torch_cuda_available()
+try:
+ _is_bf16_available = is_torch_bf16_gpu_available() or (is_torch_npu_available() and torch.npu.is_bf16_supported())
+except Exception:
+ _is_bf16_available = False
+
+
+if TYPE_CHECKING:
+ from numpy.typing import NDArray
+
+ from ..hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+class AverageMeter:
+ r"""Compute and store the average and current value."""
+
+ def __init__(self):
+ self.reset()
+
+ def reset(self):
+ self.val = 0
+ self.avg = 0
+ self.sum = 0
+ self.count = 0
+
+ def update(self, val, n=1):
+ self.val = val
+ self.sum += val * n
+ self.count += n
+ self.avg = self.sum / self.count
+
+
+def check_version(requirement: str, mandatory: bool = False) -> None:
+ r"""Optionally check the package version."""
+ if is_env_enabled("DISABLE_VERSION_CHECK") and not mandatory:
+ logger.warning_rank0_once("Version checking has been disabled, may lead to unexpected behaviors.")
+ return
+
+ if mandatory:
+ hint = f"To fix: run `pip install {requirement}`."
+ else:
+ hint = f"To fix: run `pip install {requirement}` or set `DISABLE_VERSION_CHECK=1` to skip this check."
+
+ require_version(requirement, hint)
+
+
+def check_dependencies() -> None:
+ r"""Check the version of the required packages."""
+ check_version("transformers>=4.45.0,<=4.51.3,!=4.46.0,!=4.46.1,!=4.46.2,!=4.46.3,!=4.47.0,!=4.47.1,!=4.48.0")
+ check_version("datasets>=2.16.0,<=3.5.0")
+ check_version("accelerate>=0.34.0,<=1.6.0")
+ check_version("peft>=0.14.0,<=0.15.1")
+ check_version("trl>=0.8.6,<=0.9.6")
+ if is_transformers_version_greater_than("4.46.0") and not is_transformers_version_greater_than("4.48.1"):
+ logger.warning_rank0_once("There are known bugs in transformers v4.46.0-v4.48.0, please use other versions.")
+
+
+def calculate_tps(dataset: list[dict[str, Any]], metrics: dict[str, float], stage: Literal["sft", "rm"]) -> float:
+ r"""Calculate effective tokens per second."""
+ effective_token_num = 0
+ for data in dataset:
+ if stage == "sft":
+ effective_token_num += len(data["input_ids"])
+ elif stage == "rm":
+ effective_token_num += len(data["chosen_input_ids"]) + len(data["rejected_input_ids"])
+
+ result = effective_token_num * metrics["epoch"] / metrics["train_runtime"]
+ return result / dist.get_world_size() if dist.is_initialized() else result
+
+
+def count_parameters(model: "torch.nn.Module") -> tuple[int, int]:
+ r"""Return the number of trainable parameters and number of all parameters in the model."""
+ trainable_params, all_param = 0, 0
+ for param in model.parameters():
+ num_params = param.numel()
+ # if using DS Zero 3 and the weights are initialized empty
+ if num_params == 0 and hasattr(param, "ds_numel"):
+ num_params = param.ds_numel
+
+ # Due to the design of 4bit linear layers from bitsandbytes, multiply the number of parameters by itemsize
+ if param.__class__.__name__ == "Params4bit":
+ if hasattr(param, "quant_storage") and hasattr(param.quant_storage, "itemsize"):
+ num_bytes = param.quant_storage.itemsize
+ elif hasattr(param, "element_size"): # for older pytorch version
+ num_bytes = param.element_size()
+ else:
+ num_bytes = 1
+
+ num_params = num_params * 2 * num_bytes
+
+ all_param += num_params
+ if param.requires_grad:
+ trainable_params += num_params
+
+ return trainable_params, all_param
+
+
+def get_current_device() -> "torch.device":
+ r"""Get the current available device."""
+ if is_torch_xpu_available():
+ device = "xpu:{}".format(os.getenv("LOCAL_RANK", "0"))
+ elif is_torch_npu_available():
+ device = "npu:{}".format(os.getenv("LOCAL_RANK", "0"))
+ elif is_torch_mps_available():
+ device = "mps:{}".format(os.getenv("LOCAL_RANK", "0"))
+ elif is_torch_cuda_available():
+ device = "cuda:{}".format(os.getenv("LOCAL_RANK", "0"))
+ else:
+ device = "cpu"
+
+ return torch.device(device)
+
+
+def get_device_count() -> int:
+ r"""Get the number of available devices."""
+ if is_torch_xpu_available():
+ return torch.xpu.device_count()
+ elif is_torch_npu_available():
+ return torch.npu.device_count()
+ elif is_torch_mps_available():
+ return torch.mps.device_count()
+ elif is_torch_cuda_available():
+ return torch.cuda.device_count()
+ else:
+ return 0
+
+
+def get_logits_processor() -> "LogitsProcessorList":
+ r"""Get logits processor that removes NaN and Inf logits."""
+ logits_processor = LogitsProcessorList()
+ logits_processor.append(InfNanRemoveLogitsProcessor())
+ return logits_processor
+
+
+def get_peak_memory() -> tuple[int, int]:
+ r"""Get the peak memory usage for the current device (in Bytes)."""
+ if is_torch_xpu_available():
+ return torch.xpu.max_memory_allocated(), torch.xpu.max_memory_reserved()
+ elif is_torch_npu_available():
+ return torch.npu.max_memory_allocated(), torch.npu.max_memory_reserved()
+ elif is_torch_mps_available():
+ return torch.mps.current_allocated_memory(), -1
+ elif is_torch_cuda_available():
+ return torch.cuda.max_memory_allocated(), torch.cuda.max_memory_reserved()
+ else:
+ return 0, 0
+
+
+def has_tokenized_data(path: "os.PathLike") -> bool:
+ r"""Check if the path has a tokenized dataset."""
+ return os.path.isdir(path) and len(os.listdir(path)) > 0
+
+
+def infer_optim_dtype(model_dtype: "torch.dtype") -> "torch.dtype":
+ r"""Infer the optimal dtype according to the model_dtype and device compatibility."""
+ if _is_bf16_available and model_dtype == torch.bfloat16:
+ return torch.bfloat16
+ elif _is_fp16_available:
+ return torch.float16
+ else:
+ return torch.float32
+
+
+def is_accelerator_available() -> bool:
+ r"""Check if the accelerator is available."""
+ return (
+ is_torch_xpu_available() or is_torch_npu_available() or is_torch_mps_available() or is_torch_cuda_available()
+ )
+
+
+def is_env_enabled(env_var: str, default: str = "0") -> bool:
+ r"""Check if the environment variable is enabled."""
+ return os.getenv(env_var, default).lower() in ["true", "y", "1"]
+
+
+def numpify(inputs: Union["NDArray", "torch.Tensor"]) -> "NDArray":
+ r"""Cast a torch tensor or a numpy array to a numpy array."""
+ if isinstance(inputs, torch.Tensor):
+ inputs = inputs.cpu()
+ if inputs.dtype == torch.bfloat16: # numpy does not support bfloat16 until 1.21.4
+ inputs = inputs.to(torch.float32)
+
+ inputs = inputs.numpy()
+
+ return inputs
+
+
+def skip_check_imports() -> None:
+ r"""Avoid flash attention import error in custom model files."""
+ if not is_env_enabled("FORCE_CHECK_IMPORTS"):
+ transformers.dynamic_module_utils.check_imports = get_relative_imports
+
+
+def torch_gc() -> None:
+ r"""Collect the device memory."""
+ gc.collect()
+ if is_torch_xpu_available():
+ torch.xpu.empty_cache()
+ elif is_torch_npu_available():
+ torch.npu.empty_cache()
+ elif is_torch_mps_available():
+ torch.mps.empty_cache()
+ elif is_torch_cuda_available():
+ torch.cuda.empty_cache()
+
+
+def try_download_model_from_other_hub(model_args: "ModelArguments") -> str:
+ if (not use_modelscope() and not use_openmind()) or os.path.exists(model_args.model_name_or_path):
+ return model_args.model_name_or_path
+
+ if use_modelscope():
+ check_version("modelscope>=1.11.0", mandatory=True)
+ from modelscope import snapshot_download # type: ignore
+
+ revision = "master" if model_args.model_revision == "main" else model_args.model_revision
+ return snapshot_download(
+ model_args.model_name_or_path,
+ revision=revision,
+ cache_dir=model_args.cache_dir,
+ )
+
+ if use_openmind():
+ check_version("openmind>=0.8.0", mandatory=True)
+ from openmind.utils.hub import snapshot_download # type: ignore
+
+ return snapshot_download(
+ model_args.model_name_or_path,
+ revision=model_args.model_revision,
+ cache_dir=model_args.cache_dir,
+ )
+
+
+def use_modelscope() -> bool:
+ return is_env_enabled("USE_MODELSCOPE_HUB")
+
+
+def use_openmind() -> bool:
+ return is_env_enabled("USE_OPENMIND_HUB")
+
+
+def use_ray() -> bool:
+ return is_env_enabled("USE_RAY")
+
+
+def find_available_port() -> int:
+ r"""Find an available port on the local machine."""
+ sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ sock.bind(("", 0))
+ port = sock.getsockname()[1]
+ sock.close()
+ return port
+
+
+def fix_proxy(ipv6_enabled: bool = False) -> None:
+ r"""Fix proxy settings for gradio ui."""
+ os.environ["no_proxy"] = "localhost,127.0.0.1,0.0.0.0"
+ if ipv6_enabled:
+ for name in ("http_proxy", "https_proxy", "HTTP_PROXY", "HTTPS_PROXY"):
+ os.environ.pop(name, None)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/extras/packages.py b/post-training/LLaMA-Factory/src/llamafactory/extras/packages.py
new file mode 100644
index 0000000..6b70f4a
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/extras/packages.py
@@ -0,0 +1,103 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/utils/import_utils.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import importlib.metadata
+import importlib.util
+from functools import lru_cache
+from typing import TYPE_CHECKING
+
+from packaging import version
+
+
+if TYPE_CHECKING:
+ from packaging.version import Version
+
+
+def _is_package_available(name: str) -> bool:
+ return importlib.util.find_spec(name) is not None
+
+
+def _get_package_version(name: str) -> "Version":
+ try:
+ return version.parse(importlib.metadata.version(name))
+ except Exception:
+ return version.parse("0.0.0")
+
+
+def is_pyav_available():
+ return _is_package_available("av")
+
+
+def is_librosa_available():
+ return _is_package_available("librosa")
+
+
+def is_fastapi_available():
+ return _is_package_available("fastapi")
+
+
+def is_galore_available():
+ return _is_package_available("galore_torch")
+
+
+def is_apollo_available():
+ return _is_package_available("apollo_torch")
+
+
+def is_gradio_available():
+ return _is_package_available("gradio")
+
+
+def is_matplotlib_available():
+ return _is_package_available("matplotlib")
+
+
+def is_pillow_available():
+ return _is_package_available("PIL")
+
+
+def is_ray_available():
+ return _is_package_available("ray")
+
+
+def is_requests_available():
+ return _is_package_available("requests")
+
+
+def is_rouge_available():
+ return _is_package_available("rouge_chinese")
+
+
+def is_starlette_available():
+ return _is_package_available("sse_starlette")
+
+
+@lru_cache
+def is_transformers_version_greater_than(content: str):
+ return _get_package_version("transformers") >= version.parse(content)
+
+
+def is_uvicorn_available():
+ return _is_package_available("uvicorn")
+
+
+def is_vllm_available():
+ return _is_package_available("vllm")
+
+
+def is_sglang_available():
+ return _is_package_available("sglang")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/extras/ploting.py b/post-training/LLaMA-Factory/src/llamafactory/extras/ploting.py
new file mode 100644
index 0000000..be89bcc
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/extras/ploting.py
@@ -0,0 +1,95 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import math
+import os
+from typing import Any
+
+from transformers.trainer import TRAINER_STATE_NAME
+
+from . import logging
+from .packages import is_matplotlib_available
+
+
+if is_matplotlib_available():
+ import matplotlib.figure
+ import matplotlib.pyplot as plt
+
+
+logger = logging.get_logger(__name__)
+
+
+def smooth(scalars: list[float]) -> list[float]:
+ r"""EMA implementation according to TensorBoard."""
+ if len(scalars) == 0:
+ return []
+
+ last = scalars[0]
+ smoothed = []
+ weight = 1.8 * (1 / (1 + math.exp(-0.05 * len(scalars))) - 0.5) # a sigmoid function
+ for next_val in scalars:
+ smoothed_val = last * weight + (1 - weight) * next_val
+ smoothed.append(smoothed_val)
+ last = smoothed_val
+ return smoothed
+
+
+def gen_loss_plot(trainer_log: list[dict[str, Any]]) -> "matplotlib.figure.Figure":
+ r"""Plot loss curves in LlamaBoard."""
+ plt.close("all")
+ plt.switch_backend("agg")
+ fig = plt.figure()
+ ax = fig.add_subplot(111)
+ steps, losses = [], []
+ for log in trainer_log:
+ if log.get("loss", None):
+ steps.append(log["current_steps"])
+ losses.append(log["loss"])
+
+ ax.plot(steps, losses, color="#1f77b4", alpha=0.4, label="original")
+ ax.plot(steps, smooth(losses), color="#1f77b4", label="smoothed")
+ ax.legend()
+ ax.set_xlabel("step")
+ ax.set_ylabel("loss")
+ return fig
+
+
+def plot_loss(save_dictionary: str, keys: list[str] = ["loss"]) -> None:
+ r"""Plot loss curves and saves the image."""
+ plt.switch_backend("agg")
+ with open(os.path.join(save_dictionary, TRAINER_STATE_NAME), encoding="utf-8") as f:
+ data = json.load(f)
+
+ for key in keys:
+ steps, metrics = [], []
+ for i in range(len(data["log_history"])):
+ if key in data["log_history"][i]:
+ steps.append(data["log_history"][i]["step"])
+ metrics.append(data["log_history"][i][key])
+
+ if len(metrics) == 0:
+ logger.warning_rank0(f"No metric {key} to plot.")
+ continue
+
+ plt.figure()
+ plt.plot(steps, metrics, color="#1f77b4", alpha=0.4, label="original")
+ plt.plot(steps, smooth(metrics), color="#1f77b4", label="smoothed")
+ plt.title(f"training {key} of {save_dictionary}")
+ plt.xlabel("step")
+ plt.ylabel(key)
+ plt.legend()
+ figure_path = os.path.join(save_dictionary, "training_{}.png".format(key.replace("/", "_")))
+ plt.savefig(figure_path, format="png", dpi=100)
+ print("Figure saved at:", figure_path)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/hparams/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/hparams/__init__.py
new file mode 100644
index 0000000..9bcc429
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/hparams/__init__.py
@@ -0,0 +1,37 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .data_args import DataArguments
+from .evaluation_args import EvaluationArguments
+from .finetuning_args import FinetuningArguments
+from .generating_args import GeneratingArguments
+from .model_args import ModelArguments
+from .parser import get_eval_args, get_infer_args, get_ray_args, get_train_args, read_args
+from .training_args import RayArguments, TrainingArguments
+
+
+__all__ = [
+ "DataArguments",
+ "EvaluationArguments",
+ "FinetuningArguments",
+ "GeneratingArguments",
+ "ModelArguments",
+ "RayArguments",
+ "TrainingArguments",
+ "get_eval_args",
+ "get_infer_args",
+ "get_ray_args",
+ "get_train_args",
+ "read_args",
+]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/hparams/data_args.py b/post-training/LLaMA-Factory/src/llamafactory/hparams/data_args.py
new file mode 100644
index 0000000..96dfb39
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/hparams/data_args.py
@@ -0,0 +1,170 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/examples/pytorch/language-modeling/run_clm.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataclasses import asdict, dataclass, field
+from typing import Any, Literal, Optional
+
+
+@dataclass
+class DataArguments:
+ r"""Arguments pertaining to what data we are going to input our model for training and evaluation."""
+
+ template: Optional[str] = field(
+ default=None,
+ metadata={"help": "Which template to use for constructing prompts in training and inference."},
+ )
+ dataset: Optional[str] = field(
+ default=None,
+ metadata={"help": "The name of dataset(s) to use for training. Use commas to separate multiple datasets."},
+ )
+ eval_dataset: Optional[str] = field(
+ default=None,
+ metadata={"help": "The name of dataset(s) to use for evaluation. Use commas to separate multiple datasets."},
+ )
+ dataset_dir: str = field(
+ default="data",
+ metadata={"help": "Path to the folder containing the datasets."},
+ )
+ media_dir: Optional[str] = field(
+ default=None,
+ metadata={"help": "Path to the folder containing the images, videos or audios. Defaults to `dataset_dir`."},
+ )
+ cutoff_len: int = field(
+ default=2048,
+ metadata={"help": "The cutoff length of the tokenized inputs in the dataset."},
+ )
+ train_on_prompt: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to disable the mask on the prompt."},
+ )
+ mask_history: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to mask the history and train on the last turn only."},
+ )
+ streaming: bool = field(
+ default=False,
+ metadata={"help": "Enable dataset streaming."},
+ )
+ buffer_size: int = field(
+ default=16384,
+ metadata={"help": "Size of the buffer to randomly sample examples from in dataset streaming."},
+ )
+ mix_strategy: Literal["concat", "interleave_under", "interleave_over"] = field(
+ default="concat",
+ metadata={"help": "Strategy to use in dataset mixing (concat/interleave) (undersampling/oversampling)."},
+ )
+ interleave_probs: Optional[str] = field(
+ default=None,
+ metadata={"help": "Probabilities to sample data from datasets. Use commas to separate multiple datasets."},
+ )
+ overwrite_cache: bool = field(
+ default=False,
+ metadata={"help": "Overwrite the cached training and evaluation sets."},
+ )
+ preprocessing_batch_size: int = field(
+ default=1000,
+ metadata={"help": "The number of examples in one group in pre-processing."},
+ )
+ preprocessing_num_workers: Optional[int] = field(
+ default=None,
+ metadata={"help": "The number of processes to use for the pre-processing."},
+ )
+ max_samples: Optional[int] = field(
+ default=None,
+ metadata={"help": "For debugging purposes, truncate the number of examples for each dataset."},
+ )
+ eval_num_beams: Optional[int] = field(
+ default=None,
+ metadata={"help": "Number of beams to use for evaluation. This argument will be passed to `model.generate`"},
+ )
+ ignore_pad_token_for_loss: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to ignore the tokens corresponding to the pad label in loss computation."},
+ )
+ val_size: float = field(
+ default=0.0,
+ metadata={"help": "Size of the validation set, should be an integer or a float in range `[0,1)`."},
+ )
+ packing: Optional[bool] = field(
+ default=None,
+ metadata={"help": "Enable sequences packing in training. Will automatically enable in pre-training."},
+ )
+ neat_packing: bool = field(
+ default=False,
+ metadata={"help": "Enable sequence packing without cross-attention."},
+ )
+ tool_format: Optional[str] = field(
+ default=None,
+ metadata={"help": "Tool format to use for constructing function calling examples."},
+ )
+ tokenized_path: Optional[str] = field(
+ default=None,
+ metadata={
+ "help": (
+ "Path to save or load the tokenized datasets. "
+ "If tokenized_path not exists, it will save the tokenized datasets. "
+ "If tokenized_path exists, it will load the tokenized datasets."
+ )
+ },
+ )
+
+ def __post_init__(self):
+ def split_arg(arg):
+ if isinstance(arg, str):
+ return [item.strip() for item in arg.split(",")]
+ return arg
+
+ self.dataset = split_arg(self.dataset)
+ self.eval_dataset = split_arg(self.eval_dataset)
+
+ if self.media_dir is None:
+ self.media_dir = self.dataset_dir
+
+ if self.dataset is None and self.val_size > 1e-6:
+ raise ValueError("Cannot specify `val_size` if `dataset` is None.")
+
+ if self.eval_dataset is not None and self.val_size > 1e-6:
+ raise ValueError("Cannot specify `val_size` if `eval_dataset` is not None.")
+
+ if self.interleave_probs is not None:
+ if self.mix_strategy == "concat":
+ raise ValueError("`interleave_probs` is only valid for interleaved mixing.")
+
+ self.interleave_probs = list(map(float, split_arg(self.interleave_probs)))
+ if self.dataset is not None and len(self.dataset) != len(self.interleave_probs):
+ raise ValueError("The length of dataset and interleave probs should be identical.")
+
+ if self.eval_dataset is not None and len(self.eval_dataset) != len(self.interleave_probs):
+ raise ValueError("The length of eval dataset and interleave probs should be identical.")
+
+ if self.streaming and self.val_size > 1e-6 and self.val_size < 1:
+ raise ValueError("Streaming mode should have an integer val size.")
+
+ if self.streaming and self.max_samples is not None:
+ raise ValueError("`max_samples` is incompatible with `streaming`.")
+
+ if self.mask_history and self.train_on_prompt:
+ raise ValueError("`mask_history` is incompatible with `train_on_prompt`.")
+
+ if self.neat_packing:
+ self.packing = True
+
+ if self.packing:
+ self.cutoff_len -= 1 # avoid pad_to_multiple_of, needs improve
+
+ def to_dict(self) -> dict[str, Any]:
+ return asdict(self)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/hparams/evaluation_args.py b/post-training/LLaMA-Factory/src/llamafactory/hparams/evaluation_args.py
new file mode 100644
index 0000000..d92e8b1
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/hparams/evaluation_args.py
@@ -0,0 +1,60 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from dataclasses import dataclass, field
+from typing import Literal, Optional
+
+from datasets import DownloadMode
+
+
+@dataclass
+class EvaluationArguments:
+ r"""Arguments pertaining to specify the evaluation parameters."""
+
+ task: str = field(
+ metadata={"help": "Name of the evaluation task."},
+ )
+ task_dir: str = field(
+ default="evaluation",
+ metadata={"help": "Path to the folder containing the evaluation datasets."},
+ )
+ batch_size: int = field(
+ default=4,
+ metadata={"help": "The batch size per GPU for evaluation."},
+ )
+ seed: int = field(
+ default=42,
+ metadata={"help": "Random seed to be used with data loaders."},
+ )
+ lang: Literal["en", "zh"] = field(
+ default="en",
+ metadata={"help": "Language used at evaluation."},
+ )
+ n_shot: int = field(
+ default=5,
+ metadata={"help": "Number of examplars for few-shot learning."},
+ )
+ save_dir: Optional[str] = field(
+ default=None,
+ metadata={"help": "Path to save the evaluation results."},
+ )
+ download_mode: DownloadMode = field(
+ default=DownloadMode.REUSE_DATASET_IF_EXISTS,
+ metadata={"help": "Download mode used for the evaluation datasets."},
+ )
+
+ def __post_init__(self):
+ if self.save_dir is not None and os.path.exists(self.save_dir):
+ raise ValueError("`save_dir` already exists, use another one.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/hparams/finetuning_args.py b/post-training/LLaMA-Factory/src/llamafactory/hparams/finetuning_args.py
new file mode 100644
index 0000000..483e9f4
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/hparams/finetuning_args.py
@@ -0,0 +1,499 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataclasses import asdict, dataclass, field
+from typing import Any, Literal, Optional
+
+
+@dataclass
+class FreezeArguments:
+ r"""Arguments pertaining to the freeze (partial-parameter) training."""
+
+ freeze_trainable_layers: int = field(
+ default=2,
+ metadata={
+ "help": (
+ "The number of trainable layers for freeze (partial-parameter) fine-tuning. "
+ "Positive numbers mean the last n layers are set as trainable, "
+ "negative numbers mean the first n layers are set as trainable."
+ )
+ },
+ )
+ freeze_trainable_modules: str = field(
+ default="all",
+ metadata={
+ "help": (
+ "Name(s) of trainable modules for freeze (partial-parameter) fine-tuning. "
+ "Use commas to separate multiple modules. "
+ "Use `all` to specify all the available modules."
+ )
+ },
+ )
+ freeze_extra_modules: Optional[str] = field(
+ default=None,
+ metadata={
+ "help": (
+ "Name(s) of modules apart from hidden layers to be set as trainable "
+ "for freeze (partial-parameter) fine-tuning. "
+ "Use commas to separate multiple modules."
+ )
+ },
+ )
+
+
+@dataclass
+class LoraArguments:
+ r"""Arguments pertaining to the LoRA training."""
+
+ additional_target: Optional[str] = field(
+ default=None,
+ metadata={
+ "help": (
+ "Name(s) of modules apart from LoRA layers to be set as trainable "
+ "and saved in the final checkpoint. "
+ "Use commas to separate multiple modules."
+ )
+ },
+ )
+ lora_alpha: Optional[int] = field(
+ default=None,
+ metadata={"help": "The scale factor for LoRA fine-tuning (default: lora_rank * 2)."},
+ )
+ lora_dropout: float = field(
+ default=0.0,
+ metadata={"help": "Dropout rate for the LoRA fine-tuning."},
+ )
+ lora_rank: int = field(
+ default=8,
+ metadata={"help": "The intrinsic dimension for LoRA fine-tuning."},
+ )
+ lora_target: str = field(
+ default="all",
+ metadata={
+ "help": (
+ "Name(s) of target modules to apply LoRA. "
+ "Use commas to separate multiple modules. "
+ "Use `all` to specify all the linear modules."
+ )
+ },
+ )
+ loraplus_lr_ratio: Optional[float] = field(
+ default=None,
+ metadata={"help": "LoRA plus learning rate ratio (lr_B / lr_A)."},
+ )
+ loraplus_lr_embedding: float = field(
+ default=1e-6,
+ metadata={"help": "LoRA plus learning rate for lora embedding layers."},
+ )
+ use_rslora: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use the rank stabilization scaling factor for LoRA layer."},
+ )
+ use_dora: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use the weight-decomposed lora method (DoRA)."},
+ )
+ pissa_init: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to initialize a PiSSA adapter."},
+ )
+ pissa_iter: int = field(
+ default=16,
+ metadata={"help": "The number of iteration steps performed by FSVD in PiSSA. Use -1 to disable it."},
+ )
+ pissa_convert: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to convert the PiSSA adapter to a normal LoRA adapter."},
+ )
+ create_new_adapter: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to create a new adapter with randomly initialized weight."},
+ )
+
+
+@dataclass
+class RLHFArguments:
+ r"""Arguments pertaining to the PPO, DPO and KTO training."""
+
+ pref_beta: float = field(
+ default=0.1,
+ metadata={"help": "The beta parameter in the preference loss."},
+ )
+ pref_ftx: float = field(
+ default=0.0,
+ metadata={"help": "The supervised fine-tuning loss coefficient in DPO training."},
+ )
+ pref_loss: Literal["sigmoid", "hinge", "ipo", "kto_pair", "orpo", "simpo"] = field(
+ default="sigmoid",
+ metadata={"help": "The type of DPO loss to use."},
+ )
+ dpo_label_smoothing: float = field(
+ default=0.0,
+ metadata={"help": "The robust DPO label smoothing parameter in cDPO that should be between 0 and 0.5."},
+ )
+ kto_chosen_weight: float = field(
+ default=1.0,
+ metadata={"help": "The weight factor of the desirable losses in KTO training."},
+ )
+ kto_rejected_weight: float = field(
+ default=1.0,
+ metadata={"help": "The weight factor of the undesirable losses in KTO training."},
+ )
+ simpo_gamma: float = field(
+ default=0.5,
+ metadata={"help": "The target reward margin term in SimPO loss."},
+ )
+ ppo_buffer_size: int = field(
+ default=1,
+ metadata={"help": "The number of mini-batches to make experience buffer in a PPO optimization step."},
+ )
+ ppo_epochs: int = field(
+ default=4,
+ metadata={"help": "The number of epochs to perform in a PPO optimization step."},
+ )
+ ppo_score_norm: bool = field(
+ default=False,
+ metadata={"help": "Use score normalization in PPO training."},
+ )
+ ppo_target: float = field(
+ default=6.0,
+ metadata={"help": "Target KL value for adaptive KL control in PPO training."},
+ )
+ ppo_whiten_rewards: bool = field(
+ default=False,
+ metadata={"help": "Whiten the rewards before compute advantages in PPO training."},
+ )
+ ref_model: Optional[str] = field(
+ default=None,
+ metadata={"help": "Path to the reference model used for the PPO or DPO training."},
+ )
+ ref_model_adapters: Optional[str] = field(
+ default=None,
+ metadata={"help": "Path to the adapters of the reference model."},
+ )
+ ref_model_quantization_bit: Optional[int] = field(
+ default=None,
+ metadata={"help": "The number of bits to quantize the reference model."},
+ )
+ reward_model: Optional[str] = field(
+ default=None,
+ metadata={"help": "Path to the reward model used for the PPO training."},
+ )
+ reward_model_adapters: Optional[str] = field(
+ default=None,
+ metadata={"help": "Path to the adapters of the reward model."},
+ )
+ reward_model_quantization_bit: Optional[int] = field(
+ default=None,
+ metadata={"help": "The number of bits to quantize the reward model."},
+ )
+ reward_model_type: Literal["lora", "full", "api"] = field(
+ default="lora",
+ metadata={"help": "The type of the reward model in PPO training. Lora model only supports lora training."},
+ )
+
+
+@dataclass
+class GaloreArguments:
+ r"""Arguments pertaining to the GaLore algorithm."""
+
+ use_galore: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use the gradient low-Rank projection (GaLore)."},
+ )
+ galore_target: str = field(
+ default="all",
+ metadata={
+ "help": (
+ "Name(s) of modules to apply GaLore. Use commas to separate multiple modules. "
+ "Use `all` to specify all the linear modules."
+ )
+ },
+ )
+ galore_rank: int = field(
+ default=16,
+ metadata={"help": "The rank of GaLore gradients."},
+ )
+ galore_update_interval: int = field(
+ default=200,
+ metadata={"help": "Number of steps to update the GaLore projection."},
+ )
+ galore_scale: float = field(
+ default=2.0,
+ metadata={"help": "GaLore scaling coefficient."},
+ )
+ galore_proj_type: Literal["std", "reverse_std", "right", "left", "full"] = field(
+ default="std",
+ metadata={"help": "Type of GaLore projection."},
+ )
+ galore_layerwise: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to enable layer-wise update to further save memory."},
+ )
+
+
+@dataclass
+class ApolloArguments:
+ r"""Arguments pertaining to the APOLLO algorithm."""
+
+ use_apollo: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use the APOLLO optimizer."},
+ )
+ apollo_target: str = field(
+ default="all",
+ metadata={
+ "help": (
+ "Name(s) of modules to apply APOLLO. Use commas to separate multiple modules. "
+ "Use `all` to specify all the linear modules."
+ )
+ },
+ )
+ apollo_rank: int = field(
+ default=16,
+ metadata={"help": "The rank of APOLLO gradients."},
+ )
+ apollo_update_interval: int = field(
+ default=200,
+ metadata={"help": "Number of steps to update the APOLLO projection."},
+ )
+ apollo_scale: float = field(
+ default=32.0,
+ metadata={"help": "APOLLO scaling coefficient."},
+ )
+ apollo_proj: Literal["svd", "random"] = field(
+ default="random",
+ metadata={"help": "Type of APOLLO low-rank projection algorithm (svd or random)."},
+ )
+ apollo_proj_type: Literal["std", "right", "left"] = field(
+ default="std",
+ metadata={"help": "Type of APOLLO projection."},
+ )
+ apollo_scale_type: Literal["channel", "tensor"] = field(
+ default="channel",
+ metadata={"help": "Type of APOLLO scaling (channel or tensor)."},
+ )
+ apollo_layerwise: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to enable layer-wise update to further save memory."},
+ )
+ apollo_scale_front: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use the norm-growth limiter in front of gradient scaling."},
+ )
+
+
+@dataclass
+class BAdamArgument:
+ r"""Arguments pertaining to the BAdam optimizer."""
+
+ use_badam: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use the BAdam optimizer."},
+ )
+ badam_mode: Literal["layer", "ratio"] = field(
+ default="layer",
+ metadata={"help": "Whether to use layer-wise or ratio-wise BAdam optimizer."},
+ )
+ badam_start_block: Optional[int] = field(
+ default=None,
+ metadata={"help": "The starting block index for layer-wise BAdam."},
+ )
+ badam_switch_mode: Optional[Literal["ascending", "descending", "random", "fixed"]] = field(
+ default="ascending",
+ metadata={"help": "the strategy of picking block to update for layer-wise BAdam."},
+ )
+ badam_switch_interval: Optional[int] = field(
+ default=50,
+ metadata={
+ "help": "Number of steps to update the block for layer-wise BAdam. Use -1 to disable the block update."
+ },
+ )
+ badam_update_ratio: float = field(
+ default=0.05,
+ metadata={"help": "The ratio of the update for ratio-wise BAdam."},
+ )
+ badam_mask_mode: Literal["adjacent", "scatter"] = field(
+ default="adjacent",
+ metadata={
+ "help": (
+ "The mode of the mask for BAdam optimizer. "
+ "`adjacent` means that the trainable parameters are adjacent to each other, "
+ "`scatter` means that trainable parameters are randomly choosed from the weight."
+ )
+ },
+ )
+ badam_verbose: int = field(
+ default=0,
+ metadata={
+ "help": (
+ "The verbosity level of BAdam optimizer. "
+ "0 for no print, 1 for print the block prefix, 2 for print trainable parameters."
+ )
+ },
+ )
+
+
+@dataclass
+class SwanLabArguments:
+ use_swanlab: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use the SwanLab (an experiment tracking and visualization tool)."},
+ )
+ swanlab_project: Optional[str] = field(
+ default="llamafactory",
+ metadata={"help": "The project name in SwanLab."},
+ )
+ swanlab_workspace: Optional[str] = field(
+ default=None,
+ metadata={"help": "The workspace name in SwanLab."},
+ )
+ swanlab_run_name: Optional[str] = field(
+ default=None,
+ metadata={"help": "The experiment name in SwanLab."},
+ )
+ swanlab_mode: Literal["cloud", "local"] = field(
+ default="cloud",
+ metadata={"help": "The mode of SwanLab."},
+ )
+ swanlab_api_key: Optional[str] = field(
+ default=None,
+ metadata={"help": "The API key for SwanLab."},
+ )
+ swanlab_logdir: Optional[str] = field(
+ default=None,
+ metadata={"help": "The log directory for SwanLab."},
+ )
+ swanlab_lark_webhook_url: Optional[str] = field(
+ default=None,
+ metadata={"help": "The Lark(飞书) webhook URL for SwanLab."},
+ )
+ swanlab_lark_secret: Optional[str] = field(
+ default=None,
+ metadata={"help": "The Lark(飞书) secret for SwanLab."},
+ )
+
+
+@dataclass
+class FinetuningArguments(
+ SwanLabArguments, BAdamArgument, ApolloArguments, GaloreArguments, RLHFArguments, LoraArguments, FreezeArguments
+):
+ r"""Arguments pertaining to which techniques we are going to fine-tuning with."""
+
+ pure_bf16: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to train model in purely bf16 precision (without AMP)."},
+ )
+ stage: Literal["pt", "sft", "rm", "ppo", "dpo", "kto"] = field(
+ default="sft",
+ metadata={"help": "Which stage will be performed in training."},
+ )
+ finetuning_type: Literal["lora", "freeze", "full"] = field(
+ default="lora",
+ metadata={"help": "Which fine-tuning method to use."},
+ )
+ use_llama_pro: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to make only the parameters in the expanded blocks trainable."},
+ )
+ use_adam_mini: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use the Adam-mini optimizer."},
+ )
+ freeze_vision_tower: bool = field(
+ default=True,
+ metadata={"help": "Whether ot not to freeze the vision tower in MLLM training."},
+ )
+ freeze_multi_modal_projector: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to freeze the multi modal projector in MLLM training."},
+ )
+ freeze_language_model: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to freeze the language model in MLLM training."},
+ )
+ compute_accuracy: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to compute the token-level accuracy at evaluation."},
+ )
+ disable_shuffling: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to disable the shuffling of the training set."},
+ )
+ plot_loss: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to save the training loss curves."},
+ )
+ include_effective_tokens_per_second: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to compute effective tokens per second."},
+ )
+
+ def __post_init__(self):
+ def split_arg(arg):
+ if isinstance(arg, str):
+ return [item.strip() for item in arg.split(",")]
+ return arg
+
+ self.freeze_trainable_modules: list[str] = split_arg(self.freeze_trainable_modules)
+ self.freeze_extra_modules: Optional[list[str]] = split_arg(self.freeze_extra_modules)
+ self.lora_alpha: int = self.lora_alpha or self.lora_rank * 2
+ self.lora_target: list[str] = split_arg(self.lora_target)
+ self.additional_target: Optional[list[str]] = split_arg(self.additional_target)
+ self.galore_target: list[str] = split_arg(self.galore_target)
+ self.apollo_target: list[str] = split_arg(self.apollo_target)
+ self.use_ref_model = self.stage == "dpo" and self.pref_loss not in ["orpo", "simpo"]
+
+ assert self.finetuning_type in ["lora", "freeze", "full"], "Invalid fine-tuning method."
+ assert self.ref_model_quantization_bit in [None, 8, 4], "We only accept 4-bit or 8-bit quantization."
+ assert self.reward_model_quantization_bit in [None, 8, 4], "We only accept 4-bit or 8-bit quantization."
+
+ if self.stage == "ppo" and self.reward_model is None:
+ raise ValueError("`reward_model` is necessary for PPO training.")
+
+ if self.stage == "ppo" and self.reward_model_type == "lora" and self.finetuning_type != "lora":
+ raise ValueError("`reward_model_type` cannot be lora for Freeze/Full PPO training.")
+
+ if self.stage == "dpo" and self.pref_loss != "sigmoid" and self.dpo_label_smoothing > 1e-6:
+ raise ValueError("`dpo_label_smoothing` is only valid for sigmoid loss function.")
+
+ if self.use_llama_pro and self.finetuning_type == "full":
+ raise ValueError("`use_llama_pro` is only valid for Freeze or LoRA training.")
+
+ if self.finetuning_type == "lora" and (self.use_galore or self.use_apollo or self.use_badam):
+ raise ValueError("Cannot use LoRA with GaLore, APOLLO or BAdam together.")
+
+ if int(self.use_galore) + int(self.use_apollo) + (self.use_badam) > 1:
+ raise ValueError("Cannot use GaLore, APOLLO or BAdam together.")
+
+ if self.pissa_init and (self.stage in ["ppo", "kto"] or self.use_ref_model):
+ raise ValueError("Cannot use PiSSA for current training stage.")
+
+ if self.finetuning_type != "lora":
+ if self.loraplus_lr_ratio is not None:
+ raise ValueError("`loraplus_lr_ratio` is only valid for LoRA training.")
+
+ if self.use_rslora:
+ raise ValueError("`use_rslora` is only valid for LoRA training.")
+
+ if self.use_dora:
+ raise ValueError("`use_dora` is only valid for LoRA training.")
+
+ if self.pissa_init:
+ raise ValueError("`pissa_init` is only valid for LoRA training.")
+
+ def to_dict(self) -> dict[str, Any]:
+ args = asdict(self)
+ args = {k: f"<{k.upper()}>" if k.endswith("api_key") else v for k, v in args.items()}
+ return args
diff --git a/post-training/LLaMA-Factory/src/llamafactory/hparams/generating_args.py b/post-training/LLaMA-Factory/src/llamafactory/hparams/generating_args.py
new file mode 100644
index 0000000..251822b
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/hparams/generating_args.py
@@ -0,0 +1,87 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataclasses import asdict, dataclass, field
+from typing import Any, Optional
+
+from transformers import GenerationConfig
+
+
+@dataclass
+class GeneratingArguments:
+ r"""Arguments pertaining to specify the decoding parameters."""
+
+ do_sample: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to use sampling, use greedy decoding otherwise."},
+ )
+ temperature: float = field(
+ default=0.95,
+ metadata={"help": "The value used to modulate the next token probabilities."},
+ )
+ top_p: float = field(
+ default=0.7,
+ metadata={
+ "help": (
+ "The smallest set of most probable tokens with probabilities that add up to top_p or higher are kept."
+ )
+ },
+ )
+ top_k: int = field(
+ default=50,
+ metadata={"help": "The number of highest probability vocabulary tokens to keep for top-k filtering."},
+ )
+ num_beams: int = field(
+ default=1,
+ metadata={"help": "Number of beams for beam search. 1 means no beam search."},
+ )
+ max_length: int = field(
+ default=1024,
+ metadata={"help": "The maximum length the generated tokens can have. It can be overridden by max_new_tokens."},
+ )
+ max_new_tokens: int = field(
+ default=1024,
+ metadata={"help": "The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt."},
+ )
+ repetition_penalty: float = field(
+ default=1.0,
+ metadata={"help": "The parameter for repetition penalty. 1.0 means no penalty."},
+ )
+ length_penalty: float = field(
+ default=1.0,
+ metadata={"help": "Exponential penalty to the length that is used with beam-based generation."},
+ )
+ default_system: Optional[str] = field(
+ default=None,
+ metadata={"help": "Default system message to use in chat completion."},
+ )
+ skip_special_tokens: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to remove special tokens in the decoding."},
+ )
+
+ def to_dict(self, obey_generation_config: bool = False) -> dict[str, Any]:
+ args = asdict(self)
+ if args.get("max_new_tokens", -1) > 0:
+ args.pop("max_length", None)
+ else:
+ args.pop("max_new_tokens", None)
+
+ if obey_generation_config:
+ generation_config = GenerationConfig()
+ for key in list(args.keys()):
+ if not hasattr(generation_config, key):
+ args.pop(key)
+
+ return args
diff --git a/post-training/LLaMA-Factory/src/llamafactory/hparams/model_args.py b/post-training/LLaMA-Factory/src/llamafactory/hparams/model_args.py
new file mode 100644
index 0000000..f86291b
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/hparams/model_args.py
@@ -0,0 +1,416 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/examples/pytorch/language-modeling/run_clm.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+from dataclasses import asdict, dataclass, field, fields
+from typing import Any, Literal, Optional, Union
+
+import torch
+from transformers.training_args import _convert_str_dict
+from typing_extensions import Self
+
+from ..extras.constants import AttentionFunction, EngineName, QuantizationMethod, RopeScaling
+
+
+@dataclass
+class BaseModelArguments:
+ r"""Arguments pertaining to the model."""
+
+ model_name_or_path: Optional[str] = field(
+ default=None,
+ metadata={
+ "help": "Path to the model weight or identifier from huggingface.co/models or modelscope.cn/models."
+ },
+ )
+ adapter_name_or_path: Optional[str] = field(
+ default=None,
+ metadata={
+ "help": (
+ "Path to the adapter weight or identifier from huggingface.co/models. "
+ "Use commas to separate multiple adapters."
+ )
+ },
+ )
+ adapter_folder: Optional[str] = field(
+ default=None,
+ metadata={"help": "The folder containing the adapter weights to load."},
+ )
+ cache_dir: Optional[str] = field(
+ default=None,
+ metadata={"help": "Where to store the pre-trained models downloaded from huggingface.co or modelscope.cn."},
+ )
+ use_fast_tokenizer: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to use one of the fast tokenizer (backed by the tokenizers library)."},
+ )
+ resize_vocab: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to resize the tokenizer vocab and the embedding layers."},
+ )
+ split_special_tokens: bool = field(
+ default=False,
+ metadata={"help": "Whether or not the special tokens should be split during the tokenization process."},
+ )
+ new_special_tokens: Optional[str] = field(
+ default=None,
+ metadata={"help": "Special tokens to be added into the tokenizer. Use commas to separate multiple tokens."},
+ )
+ model_revision: str = field(
+ default="main",
+ metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
+ )
+ low_cpu_mem_usage: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to use memory-efficient model loading."},
+ )
+ rope_scaling: Optional[RopeScaling] = field(
+ default=None,
+ metadata={"help": "Which scaling strategy should be adopted for the RoPE embeddings."},
+ )
+ flash_attn: AttentionFunction = field(
+ default=AttentionFunction.AUTO,
+ metadata={"help": "Enable FlashAttention for faster training and inference."},
+ )
+ shift_attn: bool = field(
+ default=False,
+ metadata={"help": "Enable shift short attention (S^2-Attn) proposed by LongLoRA."},
+ )
+ mixture_of_depths: Optional[Literal["convert", "load"]] = field(
+ default=None,
+ metadata={"help": "Convert the model to mixture-of-depths (MoD) or load the MoD model."},
+ )
+ use_unsloth: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use unsloth's optimization for the LoRA training."},
+ )
+ use_unsloth_gc: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use unsloth's gradient checkpointing (no need to install unsloth)."},
+ )
+ enable_liger_kernel: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to enable liger kernel for faster training."},
+ )
+ moe_aux_loss_coef: Optional[float] = field(
+ default=None,
+ metadata={"help": "Coefficient of the auxiliary router loss in mixture-of-experts model."},
+ )
+ disable_gradient_checkpointing: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to disable gradient checkpointing."},
+ )
+ use_reentrant_gc: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to use reentrant gradient checkpointing."},
+ )
+ upcast_layernorm: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to upcast the layernorm weights in fp32."},
+ )
+ upcast_lmhead_output: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to upcast the output of lm_head in fp32."},
+ )
+ train_from_scratch: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to randomly initialize the model weights."},
+ )
+ infer_backend: EngineName = field(
+ default=EngineName.HF,
+ metadata={"help": "Backend engine used at inference."},
+ )
+ offload_folder: str = field(
+ default="offload",
+ metadata={"help": "Path to offload model weights."},
+ )
+ use_cache: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to use KV cache in generation."},
+ )
+ infer_dtype: Literal["auto", "float16", "bfloat16", "float32"] = field(
+ default="auto",
+ metadata={"help": "Data type for model weights and activations at inference."},
+ )
+ hf_hub_token: Optional[str] = field(
+ default=None,
+ metadata={"help": "Auth token to log in with Hugging Face Hub."},
+ )
+ ms_hub_token: Optional[str] = field(
+ default=None,
+ metadata={"help": "Auth token to log in with ModelScope Hub."},
+ )
+ om_hub_token: Optional[str] = field(
+ default=None,
+ metadata={"help": "Auth token to log in with Modelers Hub."},
+ )
+ print_param_status: bool = field(
+ default=False,
+ metadata={"help": "For debugging purposes, print the status of the parameters in the model."},
+ )
+ trust_remote_code: bool = field(
+ default=False,
+ metadata={"help": "Whether to trust the execution of code from datasets/models defined on the Hub or not."},
+ )
+
+ def __post_init__(self):
+ if self.model_name_or_path is None:
+ raise ValueError("Please provide `model_name_or_path`.")
+
+ if self.split_special_tokens and self.use_fast_tokenizer:
+ raise ValueError("`split_special_tokens` is only supported for slow tokenizers.")
+
+ if self.adapter_name_or_path is not None: # support merging multiple lora weights
+ self.adapter_name_or_path = [path.strip() for path in self.adapter_name_or_path.split(",")]
+
+ if self.new_special_tokens is not None: # support multiple special tokens
+ self.new_special_tokens = [token.strip() for token in self.new_special_tokens.split(",")]
+
+
+@dataclass
+class QuantizationArguments:
+ r"""Arguments pertaining to the quantization method."""
+
+ quantization_method: QuantizationMethod = field(
+ default=QuantizationMethod.BNB,
+ metadata={"help": "Quantization method to use for on-the-fly quantization."},
+ )
+ quantization_bit: Optional[int] = field(
+ default=None,
+ metadata={"help": "The number of bits to quantize the model using on-the-fly quantization."},
+ )
+ quantization_type: Literal["fp4", "nf4"] = field(
+ default="nf4",
+ metadata={"help": "Quantization data type to use in bitsandbytes int4 training."},
+ )
+ double_quantization: bool = field(
+ default=True,
+ metadata={"help": "Whether or not to use double quantization in bitsandbytes int4 training."},
+ )
+ quantization_device_map: Optional[Literal["auto"]] = field(
+ default=None,
+ metadata={"help": "Device map used to infer the 4-bit quantized model, needs bitsandbytes>=0.43.0."},
+ )
+
+
+@dataclass
+class ProcessorArguments:
+ r"""Arguments pertaining to the image processor."""
+
+ image_max_pixels: int = field(
+ default=768 * 768,
+ metadata={"help": "The maximum number of pixels of image inputs."},
+ )
+ image_min_pixels: int = field(
+ default=32 * 32,
+ metadata={"help": "The minimum number of pixels of image inputs."},
+ )
+ image_do_pan_and_scan: bool = field(
+ default=False,
+ metadata={"help": "Use pan and scan to process image for gemma3."},
+ )
+ use_audio_in_video: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to use audio in video inputs."},
+ )
+ video_max_pixels: int = field(
+ default=256 * 256,
+ metadata={"help": "The maximum number of pixels of video inputs."},
+ )
+ video_min_pixels: int = field(
+ default=16 * 16,
+ metadata={"help": "The minimum number of pixels of video inputs."},
+ )
+ video_fps: float = field(
+ default=2.0,
+ metadata={"help": "The frames to sample per second for video inputs."},
+ )
+ video_maxlen: int = field(
+ default=128,
+ metadata={"help": "The maximum number of sampled frames for video inputs."},
+ )
+ audio_sampling_rate: int = field(
+ default=16000,
+ metadata={"help": "The sampling rate of audio inputs."},
+ )
+
+ def __post_init__(self):
+ if self.image_max_pixels < self.image_min_pixels:
+ raise ValueError("`image_max_pixels` cannot be smaller than `image_min_pixels`.")
+
+ if self.video_max_pixels < self.video_min_pixels:
+ raise ValueError("`video_max_pixels` cannot be smaller than `video_min_pixels`.")
+
+
+@dataclass
+class ExportArguments:
+ r"""Arguments pertaining to the model export."""
+
+ export_dir: Optional[str] = field(
+ default=None,
+ metadata={"help": "Path to the directory to save the exported model."},
+ )
+ export_size: int = field(
+ default=5,
+ metadata={"help": "The file shard size (in GB) of the exported model."},
+ )
+ export_device: Literal["cpu", "auto"] = field(
+ default="cpu",
+ metadata={"help": "The device used in model export, use `auto` to accelerate exporting."},
+ )
+ export_quantization_bit: Optional[int] = field(
+ default=None,
+ metadata={"help": "The number of bits to quantize the exported model."},
+ )
+ export_quantization_dataset: Optional[str] = field(
+ default=None,
+ metadata={"help": "Path to the dataset or dataset name to use in quantizing the exported model."},
+ )
+ export_quantization_nsamples: int = field(
+ default=128,
+ metadata={"help": "The number of samples used for quantization."},
+ )
+ export_quantization_maxlen: int = field(
+ default=1024,
+ metadata={"help": "The maximum length of the model inputs used for quantization."},
+ )
+ export_legacy_format: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to save the `.bin` files instead of `.safetensors`."},
+ )
+ export_hub_model_id: Optional[str] = field(
+ default=None,
+ metadata={"help": "The name of the repository if push the model to the Hugging Face hub."},
+ )
+
+ def __post_init__(self):
+ if self.export_quantization_bit is not None and self.export_quantization_dataset is None:
+ raise ValueError("Quantization dataset is necessary for exporting.")
+
+
+@dataclass
+class VllmArguments:
+ r"""Arguments pertaining to the vLLM worker."""
+
+ vllm_maxlen: int = field(
+ default=4096,
+ metadata={"help": "Maximum sequence (prompt + response) length of the vLLM engine."},
+ )
+ vllm_gpu_util: float = field(
+ default=0.7,
+ metadata={"help": "The fraction of GPU memory in (0,1) to be used for the vLLM engine."},
+ )
+ vllm_enforce_eager: bool = field(
+ default=False,
+ metadata={"help": "Whether or not to disable CUDA graph in the vLLM engine."},
+ )
+ vllm_max_lora_rank: int = field(
+ default=32,
+ metadata={"help": "Maximum rank of all LoRAs in the vLLM engine."},
+ )
+ vllm_config: Optional[Union[dict, str]] = field(
+ default=None,
+ metadata={"help": "Config to initialize the vllm engine. Please use JSON strings."},
+ )
+
+ def __post_init__(self):
+ if isinstance(self.vllm_config, str) and self.vllm_config.startswith("{"):
+ self.vllm_config = _convert_str_dict(json.loads(self.vllm_config))
+
+
+@dataclass
+class SGLangArguments:
+ r"""Arguments pertaining to the SGLang worker."""
+
+ sglang_maxlen: int = field(
+ default=4096,
+ metadata={"help": "Maximum sequence (prompt + response) length of the SGLang engine."},
+ )
+ sglang_mem_fraction: float = field(
+ default=0.7,
+ metadata={"help": "The memory fraction (0-1) to be used for the SGLang engine."},
+ )
+ sglang_tp_size: int = field(
+ default=-1,
+ metadata={"help": "Tensor parallel size for the SGLang engine."},
+ )
+ sglang_config: Optional[Union[dict, str]] = field(
+ default=None,
+ metadata={"help": "Config to initialize the SGLang engine. Please use JSON strings."},
+ )
+
+ def __post_init__(self):
+ if isinstance(self.sglang_config, str) and self.sglang_config.startswith("{"):
+ self.sglang_config = _convert_str_dict(json.loads(self.sglang_config))
+
+
+@dataclass
+class ModelArguments(
+ SGLangArguments, VllmArguments, ExportArguments, ProcessorArguments, QuantizationArguments, BaseModelArguments
+):
+ r"""Arguments pertaining to which model/config/tokenizer we are going to fine-tune or infer.
+
+ The class on the most right will be displayed first.
+ """
+
+ compute_dtype: Optional[torch.dtype] = field(
+ default=None,
+ init=False,
+ metadata={"help": "Torch data type for computing model outputs, derived from `fp/bf16`. Do not specify it."},
+ )
+ device_map: Optional[Union[str, dict[str, Any]]] = field(
+ default=None,
+ init=False,
+ metadata={"help": "Device map for model placement, derived from training stage. Do not specify it."},
+ )
+ model_max_length: Optional[int] = field(
+ default=None,
+ init=False,
+ metadata={"help": "The maximum input length for model, derived from `cutoff_len`. Do not specify it."},
+ )
+ block_diag_attn: bool = field(
+ default=False,
+ init=False,
+ metadata={"help": "Whether use block diag attention or not, derived from `neat_packing`. Do not specify it."},
+ )
+
+ def __post_init__(self):
+ BaseModelArguments.__post_init__(self)
+ ProcessorArguments.__post_init__(self)
+ ExportArguments.__post_init__(self)
+ VllmArguments.__post_init__(self)
+ SGLangArguments.__post_init__(self)
+
+ @classmethod
+ def copyfrom(cls, source: "Self", **kwargs) -> "Self":
+ init_args, lazy_args = {}, {}
+ for attr in fields(source):
+ if attr.init:
+ init_args[attr.name] = getattr(source, attr.name)
+ else:
+ lazy_args[attr.name] = getattr(source, attr.name)
+
+ init_args.update(kwargs)
+ result = cls(**init_args)
+ for name, value in lazy_args.items():
+ setattr(result, name, value)
+
+ return result
+
+ def to_dict(self) -> dict[str, Any]:
+ args = asdict(self)
+ args = {k: f"<{k.upper()}>" if k.endswith("token") else v for k, v in args.items()}
+ return args
diff --git a/post-training/LLaMA-Factory/src/llamafactory/hparams/parser.py b/post-training/LLaMA-Factory/src/llamafactory/hparams/parser.py
new file mode 100644
index 0000000..bc200f6
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/hparams/parser.py
@@ -0,0 +1,452 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/examples/pytorch/language-modeling/run_clm.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any, Optional, Union
+
+import torch
+import transformers
+import yaml
+from transformers import HfArgumentParser
+from transformers.integrations import is_deepspeed_zero3_enabled
+from transformers.trainer_utils import get_last_checkpoint
+from transformers.training_args import ParallelMode
+from transformers.utils import is_torch_bf16_gpu_available, is_torch_npu_available
+
+from ..extras import logging
+from ..extras.constants import CHECKPOINT_NAMES, EngineName
+from ..extras.misc import check_dependencies, check_version, get_current_device, is_env_enabled
+from .data_args import DataArguments
+from .evaluation_args import EvaluationArguments
+from .finetuning_args import FinetuningArguments
+from .generating_args import GeneratingArguments
+from .model_args import ModelArguments
+from .training_args import RayArguments, TrainingArguments
+
+
+logger = logging.get_logger(__name__)
+
+check_dependencies()
+
+
+_TRAIN_ARGS = [ModelArguments, DataArguments, TrainingArguments, FinetuningArguments, GeneratingArguments]
+_TRAIN_CLS = tuple[ModelArguments, DataArguments, TrainingArguments, FinetuningArguments, GeneratingArguments]
+_INFER_ARGS = [ModelArguments, DataArguments, FinetuningArguments, GeneratingArguments]
+_INFER_CLS = tuple[ModelArguments, DataArguments, FinetuningArguments, GeneratingArguments]
+_EVAL_ARGS = [ModelArguments, DataArguments, EvaluationArguments, FinetuningArguments]
+_EVAL_CLS = tuple[ModelArguments, DataArguments, EvaluationArguments, FinetuningArguments]
+
+
+def read_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> Union[dict[str, Any], list[str]]:
+ r"""Get arguments from the command line or a config file."""
+ if args is not None:
+ return args
+
+ if len(sys.argv) == 2 and (sys.argv[1].endswith(".yaml") or sys.argv[1].endswith(".yml")):
+ return yaml.safe_load(Path(sys.argv[1]).absolute().read_text())
+ elif len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
+ return json.loads(Path(sys.argv[1]).absolute().read_text())
+ else:
+ return sys.argv[1:]
+
+
+def _parse_args(
+ parser: "HfArgumentParser", args: Optional[Union[dict[str, Any], list[str]]] = None, allow_extra_keys: bool = False
+) -> tuple[Any]:
+ args = read_args(args)
+ if isinstance(args, dict):
+ return parser.parse_dict(args, allow_extra_keys=allow_extra_keys)
+
+ (*parsed_args, unknown_args) = parser.parse_args_into_dataclasses(args=args, return_remaining_strings=True)
+
+ if unknown_args and not allow_extra_keys:
+ print(parser.format_help())
+ print(f"Got unknown args, potentially deprecated arguments: {unknown_args}")
+ raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {unknown_args}")
+
+ return tuple(parsed_args)
+
+
+def _set_transformers_logging() -> None:
+ if os.getenv("LLAMAFACTORY_VERBOSITY", "INFO") in ["DEBUG", "INFO"]:
+ transformers.utils.logging.set_verbosity_info()
+ transformers.utils.logging.enable_default_handler()
+ transformers.utils.logging.enable_explicit_format()
+
+
+def _set_env_vars() -> None:
+ if is_torch_npu_available():
+ # avoid JIT compile on NPU devices, see https://zhuanlan.zhihu.com/p/660875458
+ torch.npu.set_compile_mode(jit_compile=is_env_enabled("NPU_JIT_COMPILE"))
+ # avoid use fork method on NPU devices, see https://github.com/hiyouga/LLaMA-Factory/issues/7447
+ os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
+
+
+def _verify_model_args(
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ finetuning_args: "FinetuningArguments",
+) -> None:
+ if model_args.adapter_name_or_path is not None and finetuning_args.finetuning_type != "lora":
+ raise ValueError("Adapter is only valid for the LoRA method.")
+
+ if model_args.quantization_bit is not None:
+ if finetuning_args.finetuning_type != "lora":
+ raise ValueError("Quantization is only compatible with the LoRA method.")
+
+ if finetuning_args.pissa_init:
+ raise ValueError("Please use scripts/pissa_init.py to initialize PiSSA for a quantized model.")
+
+ if model_args.resize_vocab:
+ raise ValueError("Cannot resize embedding layers of a quantized model.")
+
+ if model_args.adapter_name_or_path is not None and finetuning_args.create_new_adapter:
+ raise ValueError("Cannot create new adapter upon a quantized model.")
+
+ if model_args.adapter_name_or_path is not None and len(model_args.adapter_name_or_path) != 1:
+ raise ValueError("Quantized model only accepts a single adapter. Merge them first.")
+
+ if data_args.template == "yi" and model_args.use_fast_tokenizer:
+ logger.warning_rank0("We should use slow tokenizer for the Yi models. Change `use_fast_tokenizer` to False.")
+ model_args.use_fast_tokenizer = False
+
+
+def _check_extra_dependencies(
+ model_args: "ModelArguments",
+ finetuning_args: "FinetuningArguments",
+ training_args: Optional["TrainingArguments"] = None,
+) -> None:
+ if model_args.use_unsloth:
+ check_version("unsloth", mandatory=True)
+
+ if model_args.enable_liger_kernel:
+ check_version("liger-kernel", mandatory=True)
+
+ if model_args.mixture_of_depths is not None:
+ check_version("mixture-of-depth>=1.1.6", mandatory=True)
+
+ if model_args.infer_backend == EngineName.VLLM:
+ check_version("vllm>=0.4.3,<=0.8.4")
+ check_version("vllm", mandatory=True)
+ elif model_args.infer_backend == EngineName.SGLANG:
+ check_version("sglang>=0.4.4")
+ check_version("sglang", mandatory=True)
+
+ if finetuning_args.use_galore:
+ check_version("galore_torch", mandatory=True)
+
+ if finetuning_args.use_apollo:
+ check_version("apollo_torch", mandatory=True)
+
+ if finetuning_args.use_badam:
+ check_version("badam>=1.2.1", mandatory=True)
+
+ if finetuning_args.use_adam_mini:
+ check_version("adam-mini", mandatory=True)
+
+ if finetuning_args.plot_loss:
+ check_version("matplotlib", mandatory=True)
+
+ if training_args is not None and training_args.predict_with_generate:
+ check_version("jieba", mandatory=True)
+ check_version("nltk", mandatory=True)
+ check_version("rouge_chinese", mandatory=True)
+
+
+def _parse_train_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> _TRAIN_CLS:
+ parser = HfArgumentParser(_TRAIN_ARGS)
+ allow_extra_keys = is_env_enabled("ALLOW_EXTRA_ARGS")
+ return _parse_args(parser, args, allow_extra_keys=allow_extra_keys)
+
+
+def _parse_infer_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> _INFER_CLS:
+ parser = HfArgumentParser(_INFER_ARGS)
+ allow_extra_keys = is_env_enabled("ALLOW_EXTRA_ARGS")
+ return _parse_args(parser, args, allow_extra_keys=allow_extra_keys)
+
+
+def _parse_eval_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> _EVAL_CLS:
+ parser = HfArgumentParser(_EVAL_ARGS)
+ allow_extra_keys = is_env_enabled("ALLOW_EXTRA_ARGS")
+ return _parse_args(parser, args, allow_extra_keys=allow_extra_keys)
+
+
+def get_ray_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> RayArguments:
+ parser = HfArgumentParser(RayArguments)
+ (ray_args,) = _parse_args(parser, args, allow_extra_keys=True)
+ return ray_args
+
+
+def get_train_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> _TRAIN_CLS:
+ model_args, data_args, training_args, finetuning_args, generating_args = _parse_train_args(args)
+
+ # Setup logging
+ if training_args.should_log:
+ _set_transformers_logging()
+
+ # Check arguments
+ if finetuning_args.stage != "sft":
+ if training_args.predict_with_generate:
+ raise ValueError("`predict_with_generate` cannot be set as True except SFT.")
+
+ if data_args.neat_packing:
+ raise ValueError("`neat_packing` cannot be set as True except SFT.")
+
+ if data_args.train_on_prompt or data_args.mask_history:
+ raise ValueError("`train_on_prompt` or `mask_history` cannot be set as True except SFT.")
+
+ if finetuning_args.stage == "sft" and training_args.do_predict and not training_args.predict_with_generate:
+ raise ValueError("Please enable `predict_with_generate` to save model predictions.")
+
+ if finetuning_args.stage in ["rm", "ppo"] and training_args.load_best_model_at_end:
+ raise ValueError("RM and PPO stages do not support `load_best_model_at_end`.")
+
+ if finetuning_args.stage == "ppo":
+ if not training_args.do_train:
+ raise ValueError("PPO training does not support evaluation, use the SFT stage to evaluate models.")
+
+ if model_args.shift_attn:
+ raise ValueError("PPO training is incompatible with S^2-Attn.")
+
+ if finetuning_args.reward_model_type == "lora" and model_args.use_unsloth:
+ raise ValueError("Unsloth does not support lora reward model.")
+
+ if training_args.report_to and training_args.report_to[0] not in ["wandb", "tensorboard"]:
+ raise ValueError("PPO only accepts wandb or tensorboard logger.")
+
+ if training_args.parallel_mode == ParallelMode.NOT_DISTRIBUTED:
+ raise ValueError("Please launch distributed training with `llamafactory-cli` or `torchrun`.")
+
+ if training_args.deepspeed and training_args.parallel_mode != ParallelMode.DISTRIBUTED:
+ raise ValueError("Please use `FORCE_TORCHRUN=1` to launch DeepSpeed training.")
+
+ if training_args.max_steps == -1 and data_args.streaming:
+ raise ValueError("Please specify `max_steps` in streaming mode.")
+
+ if training_args.do_train and data_args.dataset is None:
+ raise ValueError("Please specify dataset for training.")
+
+ if (training_args.do_eval or training_args.do_predict) and (
+ data_args.eval_dataset is None and data_args.val_size < 1e-6
+ ):
+ raise ValueError("Please specify dataset for evaluation.")
+
+ if training_args.predict_with_generate:
+ if is_deepspeed_zero3_enabled():
+ raise ValueError("`predict_with_generate` is incompatible with DeepSpeed ZeRO-3.")
+
+ if data_args.eval_dataset is None:
+ raise ValueError("Cannot use `predict_with_generate` if `eval_dataset` is None.")
+
+ if finetuning_args.compute_accuracy:
+ raise ValueError("Cannot use `predict_with_generate` and `compute_accuracy` together.")
+
+ if training_args.do_train and model_args.quantization_device_map == "auto":
+ raise ValueError("Cannot use device map for quantized models in training.")
+
+ if finetuning_args.pissa_init and is_deepspeed_zero3_enabled():
+ raise ValueError("Please use scripts/pissa_init.py to initialize PiSSA in DeepSpeed ZeRO-3.")
+
+ if finetuning_args.pure_bf16:
+ if not (is_torch_bf16_gpu_available() or (is_torch_npu_available() and torch.npu.is_bf16_supported())):
+ raise ValueError("This device does not support `pure_bf16`.")
+
+ if is_deepspeed_zero3_enabled():
+ raise ValueError("`pure_bf16` is incompatible with DeepSpeed ZeRO-3.")
+
+ if training_args.parallel_mode == ParallelMode.DISTRIBUTED:
+ if finetuning_args.use_galore and finetuning_args.galore_layerwise:
+ raise ValueError("Distributed training does not support layer-wise GaLore.")
+
+ if finetuning_args.use_apollo and finetuning_args.apollo_layerwise:
+ raise ValueError("Distributed training does not support layer-wise APOLLO.")
+
+ if finetuning_args.use_badam:
+ if finetuning_args.badam_mode == "ratio":
+ raise ValueError("Radio-based BAdam does not yet support distributed training, use layer-wise BAdam.")
+ elif not is_deepspeed_zero3_enabled():
+ raise ValueError("Layer-wise BAdam only supports DeepSpeed ZeRO-3 training.")
+
+ if training_args.deepspeed is not None and (finetuning_args.use_galore or finetuning_args.use_apollo):
+ raise ValueError("GaLore and APOLLO are incompatible with DeepSpeed yet.")
+
+ if model_args.infer_backend != EngineName.HF:
+ raise ValueError("vLLM/SGLang backend is only available for API, CLI and Web.")
+
+ if model_args.use_unsloth and is_deepspeed_zero3_enabled():
+ raise ValueError("Unsloth is incompatible with DeepSpeed ZeRO-3.")
+
+ _set_env_vars()
+ _verify_model_args(model_args, data_args, finetuning_args)
+ _check_extra_dependencies(model_args, finetuning_args, training_args)
+
+ if (
+ training_args.do_train
+ and finetuning_args.finetuning_type == "lora"
+ and model_args.quantization_bit is None
+ and model_args.resize_vocab
+ and finetuning_args.additional_target is None
+ ):
+ logger.warning_rank0(
+ "Remember to add embedding layers to `additional_target` to make the added tokens trainable."
+ )
+
+ if training_args.do_train and model_args.quantization_bit is not None and (not model_args.upcast_layernorm):
+ logger.warning_rank0("We recommend enable `upcast_layernorm` in quantized training.")
+
+ if training_args.do_train and (not training_args.fp16) and (not training_args.bf16):
+ logger.warning_rank0("We recommend enable mixed precision training.")
+
+ if (
+ training_args.do_train
+ and (finetuning_args.use_galore or finetuning_args.use_apollo)
+ and not finetuning_args.pure_bf16
+ ):
+ logger.warning_rank0(
+ "Using GaLore or APOLLO with mixed precision training may significantly increases GPU memory usage."
+ )
+
+ if (not training_args.do_train) and model_args.quantization_bit is not None:
+ logger.warning_rank0("Evaluating model in 4/8-bit mode may cause lower scores.")
+
+ if (not training_args.do_train) and finetuning_args.stage == "dpo" and finetuning_args.ref_model is None:
+ logger.warning_rank0("Specify `ref_model` for computing rewards at evaluation.")
+
+ # Post-process training arguments
+ if (
+ training_args.parallel_mode == ParallelMode.DISTRIBUTED
+ and training_args.ddp_find_unused_parameters is None
+ and finetuning_args.finetuning_type == "lora"
+ ):
+ logger.warning_rank0("`ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.")
+ training_args.ddp_find_unused_parameters = False
+
+ if finetuning_args.stage in ["rm", "ppo"] and finetuning_args.finetuning_type in ["full", "freeze"]:
+ can_resume_from_checkpoint = False
+ if training_args.resume_from_checkpoint is not None:
+ logger.warning_rank0("Cannot resume from checkpoint in current stage.")
+ training_args.resume_from_checkpoint = None
+ else:
+ can_resume_from_checkpoint = True
+
+ if (
+ training_args.resume_from_checkpoint is None
+ and training_args.do_train
+ and os.path.isdir(training_args.output_dir)
+ and not training_args.overwrite_output_dir
+ and can_resume_from_checkpoint
+ ):
+ last_checkpoint = get_last_checkpoint(training_args.output_dir)
+ if last_checkpoint is None and any(
+ os.path.isfile(os.path.join(training_args.output_dir, name)) for name in CHECKPOINT_NAMES
+ ):
+ raise ValueError("Output directory already exists and is not empty. Please set `overwrite_output_dir`.")
+
+ if last_checkpoint is not None:
+ training_args.resume_from_checkpoint = last_checkpoint
+ logger.info_rank0(f"Resuming training from {training_args.resume_from_checkpoint}.")
+ logger.info_rank0("Change `output_dir` or use `overwrite_output_dir` to avoid.")
+
+ if (
+ finetuning_args.stage in ["rm", "ppo"]
+ and finetuning_args.finetuning_type == "lora"
+ and training_args.resume_from_checkpoint is not None
+ ):
+ logger.warning_rank0(
+ f"Add {training_args.resume_from_checkpoint} to `adapter_name_or_path` to resume training from checkpoint."
+ )
+
+ # Post-process model arguments
+ if training_args.bf16 or finetuning_args.pure_bf16:
+ model_args.compute_dtype = torch.bfloat16
+ elif training_args.fp16:
+ model_args.compute_dtype = torch.float16
+
+ model_args.device_map = {"": get_current_device()}
+ model_args.model_max_length = data_args.cutoff_len
+ model_args.block_diag_attn = data_args.neat_packing
+ data_args.packing = data_args.packing if data_args.packing is not None else finetuning_args.stage == "pt"
+
+ # Log on each process the small summary
+ logger.info(
+ f"Process rank: {training_args.process_index}, "
+ f"world size: {training_args.world_size}, device: {training_args.device}, "
+ f"distributed training: {training_args.parallel_mode == ParallelMode.DISTRIBUTED}, "
+ f"compute dtype: {str(model_args.compute_dtype)}"
+ )
+ transformers.set_seed(training_args.seed)
+
+ return model_args, data_args, training_args, finetuning_args, generating_args
+
+
+def get_infer_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> _INFER_CLS:
+ model_args, data_args, finetuning_args, generating_args = _parse_infer_args(args)
+
+ # Setup logging
+ _set_transformers_logging()
+
+ # Check arguments
+ if model_args.infer_backend == "vllm":
+ if finetuning_args.stage != "sft":
+ raise ValueError("vLLM engine only supports auto-regressive models.")
+
+ if model_args.quantization_bit is not None:
+ raise ValueError("vLLM engine does not support bnb quantization (GPTQ and AWQ are supported).")
+
+ if model_args.rope_scaling is not None:
+ raise ValueError("vLLM engine does not support RoPE scaling.")
+
+ if model_args.adapter_name_or_path is not None and len(model_args.adapter_name_or_path) != 1:
+ raise ValueError("vLLM only accepts a single adapter. Merge them first.")
+
+ _set_env_vars()
+ _verify_model_args(model_args, data_args, finetuning_args)
+ _check_extra_dependencies(model_args, finetuning_args)
+
+ # Post-process model arguments
+ if model_args.export_dir is not None and model_args.export_device == "cpu":
+ model_args.device_map = {"": torch.device("cpu")}
+ if data_args.cutoff_len != DataArguments().cutoff_len: # override cutoff_len if it is not default
+ model_args.model_max_length = data_args.cutoff_len
+ else:
+ model_args.device_map = "auto"
+
+ return model_args, data_args, finetuning_args, generating_args
+
+
+def get_eval_args(args: Optional[Union[dict[str, Any], list[str]]] = None) -> _EVAL_CLS:
+ model_args, data_args, eval_args, finetuning_args = _parse_eval_args(args)
+
+ # Setup logging
+ _set_transformers_logging()
+
+ # Check arguments
+ if model_args.infer_backend != EngineName.HF:
+ raise ValueError("vLLM/SGLang backend is only available for API, CLI and Web.")
+
+ _set_env_vars()
+ _verify_model_args(model_args, data_args, finetuning_args)
+ _check_extra_dependencies(model_args, finetuning_args)
+
+ model_args.device_map = "auto"
+
+ transformers.set_seed(eval_args.seed)
+
+ return model_args, data_args, eval_args, finetuning_args
diff --git a/post-training/LLaMA-Factory/src/llamafactory/hparams/training_args.py b/post-training/LLaMA-Factory/src/llamafactory/hparams/training_args.py
new file mode 100644
index 0000000..ee9c4e9
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/hparams/training_args.py
@@ -0,0 +1,66 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+from dataclasses import dataclass, field
+from typing import Literal, Optional, Union
+
+from transformers import Seq2SeqTrainingArguments
+from transformers.training_args import _convert_str_dict
+
+from ..extras.misc import use_ray
+
+
+@dataclass
+class RayArguments:
+ r"""Arguments pertaining to the Ray training."""
+
+ ray_run_name: Optional[str] = field(
+ default=None,
+ metadata={"help": "The training results will be saved at `/ray_run_name`."},
+ )
+ ray_storage_path: str = field(
+ default="./saves",
+ metadata={"help": "The storage path to save training results to"},
+ )
+ ray_num_workers: int = field(
+ default=1,
+ metadata={"help": "The number of workers for Ray training. Default is 1 worker."},
+ )
+ resources_per_worker: Union[dict, str] = field(
+ default_factory=lambda: {"GPU": 1},
+ metadata={"help": "The resources per worker for Ray training. Default is to use 1 GPU per worker."},
+ )
+ placement_strategy: Literal["SPREAD", "PACK", "STRICT_SPREAD", "STRICT_PACK"] = field(
+ default="PACK",
+ metadata={"help": "The placement strategy for Ray training. Default is PACK."},
+ )
+ ray_init_kwargs: Optional[dict] = field(
+ default=None,
+ metadata={"help": "The arguments to pass to ray.init for Ray training. Default is None."},
+ )
+
+ def __post_init__(self):
+ self.use_ray = use_ray()
+ if isinstance(self.resources_per_worker, str) and self.resources_per_worker.startswith("{"):
+ self.resources_per_worker = _convert_str_dict(json.loads(self.resources_per_worker))
+
+
+@dataclass
+class TrainingArguments(RayArguments, Seq2SeqTrainingArguments):
+ r"""Arguments pertaining to the trainer."""
+
+ def __post_init__(self):
+ Seq2SeqTrainingArguments.__post_init__(self)
+ RayArguments.__post_init__(self)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/launcher.py b/post-training/LLaMA-Factory/src/llamafactory/launcher.py
new file mode 100644
index 0000000..169b042
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/launcher.py
@@ -0,0 +1,23 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from llamafactory.train.tuner import run_exp # use absolute import
+
+
+def launch():
+ run_exp()
+
+
+if __name__ == "__main__":
+ launch()
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/model/__init__.py
new file mode 100644
index 0000000..71d4f47
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/__init__.py
@@ -0,0 +1,28 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .loader import load_config, load_model, load_tokenizer
+from .model_utils.misc import find_all_linear_modules
+from .model_utils.quantization import QuantizationMethod
+from .model_utils.valuehead import load_valuehead_params
+
+
+__all__ = [
+ "QuantizationMethod",
+ "find_all_linear_modules",
+ "load_config",
+ "load_model",
+ "load_tokenizer",
+ "load_valuehead_params",
+]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/adapter.py b/post-training/LLaMA-Factory/src/llamafactory/model/adapter.py
new file mode 100644
index 0000000..bbc0056
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/adapter.py
@@ -0,0 +1,302 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import re
+from typing import TYPE_CHECKING
+
+import torch
+from peft import LoraConfig, LoraModel, PeftModel, TaskType, get_peft_model
+from transformers.integrations import is_deepspeed_zero3_enabled
+
+from ..extras import logging
+from .model_utils.misc import find_all_linear_modules, find_expanded_modules
+from .model_utils.quantization import QuantizationMethod
+from .model_utils.unsloth import get_unsloth_peft_model, load_unsloth_peft_model
+from .model_utils.visual import get_forbidden_modules, patch_target_modules
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig, PreTrainedModel
+
+ from ..hparams import FinetuningArguments, ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def _setup_full_tuning(
+ model: "PreTrainedModel",
+ finetuning_args: "FinetuningArguments",
+ is_trainable: bool,
+ cast_trainable_params_to_fp32: bool,
+) -> None:
+ if not is_trainable:
+ return
+
+ logger.info_rank0("Fine-tuning method: Full")
+ forbidden_modules = get_forbidden_modules(model.config, finetuning_args)
+ for name, param in model.named_parameters():
+ if not any(forbidden_module in name for forbidden_module in forbidden_modules):
+ if cast_trainable_params_to_fp32:
+ param.data = param.data.to(torch.float32)
+ else:
+ param.requires_grad_(False)
+
+
+def _setup_freeze_tuning(
+ model: "PreTrainedModel",
+ finetuning_args: "FinetuningArguments",
+ is_trainable: bool,
+ cast_trainable_params_to_fp32: bool,
+) -> None:
+ if not is_trainable:
+ return
+
+ logger.info_rank0("Fine-tuning method: Freeze")
+ if hasattr(model.config, "text_config"): # composite models
+ config = getattr(model.config, "text_config")
+ else:
+ config = model.config
+
+ num_layers = (
+ getattr(config, "num_hidden_layers", None)
+ or getattr(config, "num_layers", None)
+ or getattr(config, "n_layer", None)
+ )
+ if not num_layers:
+ raise ValueError("Current model does not support freeze tuning.")
+
+ if finetuning_args.use_llama_pro:
+ if num_layers % finetuning_args.freeze_trainable_layers != 0:
+ raise ValueError(
+ f"`num_layers` {num_layers} should be "
+ f"divisible by `num_layer_trainable` {finetuning_args.freeze_trainable_layers}."
+ )
+
+ stride = num_layers // finetuning_args.freeze_trainable_layers
+ trainable_layer_ids = range(stride - 1, num_layers + stride - 1, stride)
+ elif finetuning_args.freeze_trainable_layers > 0: # fine-tuning the last n layers if num_layer_trainable > 0
+ trainable_layer_ids = range(max(0, num_layers - finetuning_args.freeze_trainable_layers), num_layers)
+ else: # fine-tuning the first n layers if num_layer_trainable < 0
+ trainable_layer_ids = range(min(-finetuning_args.freeze_trainable_layers, num_layers))
+
+ hidden_modules = set()
+ non_hidden_modules = set()
+ for name, _ in model.named_parameters():
+ if ".0." in name:
+ hidden_modules.add(name.split(".0.")[-1].split(".")[0])
+ elif ".1." in name: # MoD starts from layer 1
+ hidden_modules.add(name.split(".1.")[-1].split(".")[0])
+
+ if re.search(r"\.\d+\.", name) is None:
+ non_hidden_modules.add(name.split(".")[-2])
+
+ trainable_layers = []
+ for module_name in finetuning_args.freeze_trainable_modules:
+ if module_name != "all" and module_name not in hidden_modules:
+ raise ValueError(
+ "Module {} is not found, please choose from {}".format(module_name, ", ".join(hidden_modules))
+ )
+
+ for idx in trainable_layer_ids:
+ trainable_layers.append(".{:d}.{}".format(idx, module_name if module_name != "all" else ""))
+
+ if finetuning_args.freeze_extra_modules:
+ for module_name in finetuning_args.freeze_extra_modules:
+ if module_name not in non_hidden_modules:
+ raise ValueError(
+ "Module {} is not found, please choose from {}".format(module_name, ", ".join(non_hidden_modules))
+ )
+
+ trainable_layers.append(module_name)
+
+ forbidden_modules = get_forbidden_modules(model.config, finetuning_args)
+ for name, param in model.named_parameters():
+ if any(trainable_layer in name for trainable_layer in trainable_layers) and not any(
+ forbidden_module in name for forbidden_module in forbidden_modules
+ ):
+ if cast_trainable_params_to_fp32:
+ param.data = param.data.to(torch.float32)
+ else:
+ param.requires_grad_(False)
+
+ logger.info_rank0("Set trainable layers: {}".format(",".join(trainable_layers)))
+
+
+def _setup_lora_tuning(
+ config: "PretrainedConfig",
+ model: "PreTrainedModel",
+ model_args: "ModelArguments",
+ finetuning_args: "FinetuningArguments",
+ is_trainable: bool,
+ cast_trainable_params_to_fp32: bool,
+) -> "PeftModel":
+ if is_trainable:
+ logger.info_rank0("Fine-tuning method: {}".format("DoRA" if finetuning_args.use_dora else "LoRA"))
+
+ adapter_to_resume = None
+
+ if model_args.adapter_name_or_path is not None:
+ is_mergeable = True
+ if getattr(model, "quantization_method", None): # merge lora in quantized model is unstable
+ assert len(model_args.adapter_name_or_path) == 1, "Quantized model only accepts a single adapter."
+ is_mergeable = False
+
+ if is_deepspeed_zero3_enabled():
+ assert len(model_args.adapter_name_or_path) == 1, "Cannot use multiple adapters in DeepSpeed ZeRO-3."
+ is_mergeable = False
+
+ if model_args.use_unsloth:
+ assert len(model_args.adapter_name_or_path) == 1, "Unsloth model only accepts a single adapter."
+ is_mergeable = False
+
+ if (is_trainable and not finetuning_args.create_new_adapter) or (not is_mergeable):
+ adapter_to_merge = model_args.adapter_name_or_path[:-1]
+ adapter_to_resume = model_args.adapter_name_or_path[-1]
+ else:
+ adapter_to_merge = model_args.adapter_name_or_path
+
+ init_kwargs = {
+ "subfolder": model_args.adapter_folder,
+ "offload_folder": model_args.offload_folder,
+ "cache_dir": model_args.cache_dir,
+ "revision": model_args.model_revision,
+ "token": model_args.hf_hub_token,
+ }
+
+ for adapter in adapter_to_merge:
+ model: LoraModel = PeftModel.from_pretrained(model, adapter, **init_kwargs)
+ model = model.merge_and_unload()
+
+ if len(adapter_to_merge) > 0:
+ logger.info_rank0(f"Merged {len(adapter_to_merge)} adapter(s).")
+
+ if adapter_to_resume is not None: # resume lora training
+ if model_args.use_unsloth:
+ model = load_unsloth_peft_model(config, model_args, is_trainable=is_trainable)
+ else:
+ model = PeftModel.from_pretrained(model, adapter_to_resume, is_trainable=is_trainable, **init_kwargs)
+
+ logger.info_rank0("Loaded adapter(s): {}".format(",".join(model_args.adapter_name_or_path)))
+
+ if is_trainable and adapter_to_resume is None: # create new lora weights while training
+ if len(finetuning_args.lora_target) == 1 and finetuning_args.lora_target[0] == "all":
+ target_modules = find_all_linear_modules(model, finetuning_args.freeze_vision_tower)
+ else:
+ target_modules = finetuning_args.lora_target
+
+ if finetuning_args.use_llama_pro:
+ target_modules = find_expanded_modules(model, target_modules, finetuning_args.freeze_trainable_layers)
+
+ target_modules = patch_target_modules(model, finetuning_args, target_modules)
+
+ if (
+ finetuning_args.use_dora
+ and getattr(model, "quantization_method", None) is not None
+ and getattr(model, "quantization_method", None) != QuantizationMethod.BITS_AND_BYTES
+ ):
+ raise ValueError("DoRA is not compatible with PTQ-quantized models.")
+
+ if model_args.resize_vocab and finetuning_args.additional_target is None:
+ input_embeddings = model.get_input_embeddings()
+ output_embeddings = model.get_output_embeddings()
+ module_names = set()
+ for name, module in model.named_modules():
+ if module in [input_embeddings, output_embeddings]:
+ module_names.add(name.split(".")[-1])
+
+ finetuning_args.additional_target = module_names
+ logger.warning_rank0("Vocab has been resized, add {} to trainable params.".format(",".join(module_names)))
+
+ peft_kwargs = {
+ "r": finetuning_args.lora_rank,
+ "target_modules": target_modules,
+ "lora_alpha": finetuning_args.lora_alpha,
+ "lora_dropout": finetuning_args.lora_dropout,
+ "use_rslora": finetuning_args.use_rslora,
+ "use_dora": finetuning_args.use_dora,
+ "modules_to_save": finetuning_args.additional_target,
+ }
+
+ if model_args.use_unsloth:
+ model = get_unsloth_peft_model(model, model_args, peft_kwargs)
+ else:
+ if finetuning_args.pissa_init:
+ if finetuning_args.pissa_iter == -1:
+ logger.info_rank0("Using PiSSA initialization.")
+ peft_kwargs["init_lora_weights"] = "pissa"
+ else:
+ logger.info_rank0(f"Using PiSSA initialization with FSVD steps {finetuning_args.pissa_iter}.")
+ peft_kwargs["init_lora_weights"] = f"pissa_niter_{finetuning_args.pissa_iter}"
+
+ lora_config = LoraConfig(
+ task_type=TaskType.CAUSAL_LM,
+ inference_mode=False,
+ **peft_kwargs,
+ )
+ model = get_peft_model(model, lora_config)
+
+ if is_trainable and cast_trainable_params_to_fp32:
+ for param in filter(lambda p: p.requires_grad, model.parameters()):
+ param.data = param.data.to(torch.float32)
+
+ return model
+
+
+def init_adapter(
+ config: "PretrainedConfig",
+ model: "PreTrainedModel",
+ model_args: "ModelArguments",
+ finetuning_args: "FinetuningArguments",
+ is_trainable: bool,
+) -> "PreTrainedModel":
+ r"""Initialize the adapters.
+
+ Support full-parameter, freeze and LoRA training.
+
+ Note that the trainable parameters must be cast to float32.
+ """
+ if is_trainable and getattr(model, "quantization_method", None) is not None:
+ if finetuning_args.finetuning_type != "lora":
+ raise ValueError("Quantized models can only be used for the LoRA tuning.")
+
+ if finetuning_args.pissa_init:
+ raise ValueError("Cannot initialize PiSSA adapter on quantized models.")
+
+ # cast trainable parameters to float32 if:
+ # 1. is_trainable and not pure_bf16 and not badam and quantization_bit is not None (qlora)
+ # 2. is_trainable and not pure_bf16 and not badam and not zero3 (zero3 already in fp32)
+ cast_trainable_params_to_fp32 = False
+ if not is_trainable:
+ pass
+ elif finetuning_args.pure_bf16 or finetuning_args.use_badam:
+ logger.info_rank0("Pure bf16 / BAdam detected, remaining trainable params in half precision.")
+ elif model_args.quantization_bit is None and is_deepspeed_zero3_enabled():
+ logger.info_rank0("DeepSpeed ZeRO3 detected, remaining trainable params in float32.")
+ else:
+ logger.info_rank0("Upcasting trainable params to float32.")
+ cast_trainable_params_to_fp32 = True
+
+ if finetuning_args.finetuning_type == "full":
+ _setup_full_tuning(model, finetuning_args, is_trainable, cast_trainable_params_to_fp32)
+ elif finetuning_args.finetuning_type == "freeze":
+ _setup_freeze_tuning(model, finetuning_args, is_trainable, cast_trainable_params_to_fp32)
+ elif finetuning_args.finetuning_type == "lora":
+ model = _setup_lora_tuning(
+ config, model, model_args, finetuning_args, is_trainable, cast_trainable_params_to_fp32
+ )
+ else:
+ raise NotImplementedError(f"Unknown finetuning type: {finetuning_args.finetuning_type}.")
+
+ return model
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/loader.py b/post-training/LLaMA-Factory/src/llamafactory/model/loader.py
new file mode 100644
index 0000000..c87e796
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/loader.py
@@ -0,0 +1,219 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from typing import TYPE_CHECKING, Any, Optional, TypedDict
+
+import torch
+from transformers import (
+ AutoConfig,
+ AutoModelForCausalLM,
+ AutoModelForSeq2SeqLM,
+ AutoModelForTextToWaveform,
+ AutoModelForVision2Seq,
+ AutoProcessor,
+ AutoTokenizer,
+)
+from trl import AutoModelForCausalLMWithValueHead
+
+from ..extras import logging
+from ..extras.misc import count_parameters, skip_check_imports, try_download_model_from_other_hub
+from ..extras.packages import is_transformers_version_greater_than
+from .adapter import init_adapter
+from .model_utils.liger_kernel import apply_liger_kernel
+from .model_utils.misc import register_autoclass
+from .model_utils.mod import convert_pretrained_model_to_mod, load_mod_pretrained_model
+from .model_utils.unsloth import load_unsloth_pretrained_model
+from .model_utils.valuehead import load_valuehead_params
+from .patcher import patch_config, patch_model, patch_processor, patch_tokenizer, patch_valuehead_model
+
+
+if is_transformers_version_greater_than("4.46.0"):
+ from transformers import AutoModelForImageTextToText
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig, PreTrainedModel, PreTrainedTokenizer, ProcessorMixin
+
+ from ..hparams import FinetuningArguments, ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+class TokenizerModule(TypedDict):
+ tokenizer: "PreTrainedTokenizer"
+ processor: Optional["ProcessorMixin"]
+
+
+def _get_init_kwargs(model_args: "ModelArguments") -> dict[str, Any]:
+ r"""Get arguments to load config/tokenizer/model.
+
+ Note: including inplace operation of model_args.
+ """
+ skip_check_imports()
+ model_args.model_name_or_path = try_download_model_from_other_hub(model_args)
+ return {
+ "trust_remote_code": model_args.trust_remote_code,
+ "cache_dir": model_args.cache_dir,
+ "revision": model_args.model_revision,
+ "token": model_args.hf_hub_token,
+ }
+
+
+def load_tokenizer(model_args: "ModelArguments") -> "TokenizerModule":
+ r"""Load pretrained tokenizer and optionally loads processor.
+
+ Note: including inplace operation of model_args.
+ """
+ init_kwargs = _get_init_kwargs(model_args)
+ try:
+ tokenizer = AutoTokenizer.from_pretrained(
+ model_args.model_name_or_path,
+ use_fast=model_args.use_fast_tokenizer,
+ split_special_tokens=model_args.split_special_tokens,
+ padding_side="right",
+ **init_kwargs,
+ )
+ except ValueError: # try the fast one
+ tokenizer = AutoTokenizer.from_pretrained(
+ model_args.model_name_or_path,
+ use_fast=True,
+ padding_side="right",
+ **init_kwargs,
+ )
+ except Exception as e:
+ raise OSError("Failed to load tokenizer.") from e
+
+ patch_tokenizer(tokenizer, model_args)
+ try:
+ processor = AutoProcessor.from_pretrained(model_args.model_name_or_path, **init_kwargs)
+ patch_processor(processor, tokenizer, model_args)
+ except Exception as e:
+ logger.debug(f"Failed to load processor: {e}.")
+ processor = None
+
+ # Avoid load tokenizer, see:
+ # https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/models/auto/processing_auto.py#L324
+ if processor is not None and "Processor" not in processor.__class__.__name__:
+ logger.debug("The loaded processor is not an instance of Processor. Dropping it.")
+ processor = None
+
+ return {"tokenizer": tokenizer, "processor": processor}
+
+
+def load_config(model_args: "ModelArguments") -> "PretrainedConfig":
+ r"""Load model config."""
+ init_kwargs = _get_init_kwargs(model_args)
+ return AutoConfig.from_pretrained(model_args.model_name_or_path, **init_kwargs)
+
+
+def load_model(
+ tokenizer: "PreTrainedTokenizer",
+ model_args: "ModelArguments",
+ finetuning_args: "FinetuningArguments",
+ is_trainable: bool = False,
+ add_valuehead: bool = False,
+) -> "PreTrainedModel":
+ r"""Load pretrained model."""
+ init_kwargs = _get_init_kwargs(model_args)
+ config = load_config(model_args)
+ patch_config(config, tokenizer, model_args, init_kwargs, is_trainable)
+ apply_liger_kernel(config, model_args, is_trainable, require_logits=(finetuning_args.stage not in ["pt", "sft"]))
+
+ model = None
+ lazy_load = False
+ if model_args.use_unsloth:
+ if model_args.adapter_name_or_path is not None:
+ lazy_load = True
+ elif is_trainable:
+ model = load_unsloth_pretrained_model(config, model_args)
+
+ if model is None and not lazy_load:
+ init_kwargs["config"] = config
+ init_kwargs["pretrained_model_name_or_path"] = model_args.model_name_or_path
+
+ if model_args.mixture_of_depths == "load":
+ model = load_mod_pretrained_model(**init_kwargs)
+ else:
+ if type(config) in AutoModelForVision2Seq._model_mapping.keys(): # image-text
+ load_class = AutoModelForVision2Seq
+ elif (
+ is_transformers_version_greater_than("4.46.0")
+ and type(config) in AutoModelForImageTextToText._model_mapping.keys()
+ ): # image-text
+ load_class = AutoModelForImageTextToText
+ elif type(config) in AutoModelForSeq2SeqLM._model_mapping.keys(): # audio-text
+ load_class = AutoModelForSeq2SeqLM
+ elif type(config) in AutoModelForTextToWaveform._model_mapping.keys(): # audio hack for qwen2_5_omni
+ load_class = AutoModelForTextToWaveform
+ else:
+ load_class = AutoModelForCausalLM
+
+ if model_args.train_from_scratch:
+ model = load_class.from_config(config, trust_remote_code=model_args.trust_remote_code)
+ else:
+ model = load_class.from_pretrained(**init_kwargs)
+ if getattr(model.config, "model_type", None) == "qwen2_5_omni":
+ model = model.thinker # use part of Omni model
+
+ if model_args.mixture_of_depths == "convert":
+ model = convert_pretrained_model_to_mod(model, config, model_args)
+
+ if not lazy_load:
+ patch_model(model, tokenizer, model_args, is_trainable, add_valuehead)
+ register_autoclass(config, model, tokenizer)
+
+ model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
+
+ if add_valuehead:
+ model = AutoModelForCausalLMWithValueHead.from_pretrained(model)
+ patch_valuehead_model(model)
+
+ if model_args.adapter_name_or_path is not None:
+ vhead_path = model_args.adapter_name_or_path[-1]
+ else:
+ vhead_path = model_args.model_name_or_path
+
+ vhead_params = load_valuehead_params(vhead_path, model_args)
+ if vhead_params is not None:
+ model.load_state_dict(vhead_params, strict=False)
+ logger.info_rank0(f"Loaded valuehead from checkpoint: {vhead_path}")
+
+ if not is_trainable:
+ model.requires_grad_(False)
+ for param in model.parameters():
+ if param.data.dtype == torch.float32 and model_args.compute_dtype != torch.float32:
+ param.data = param.data.to(model_args.compute_dtype)
+
+ model.eval()
+ else:
+ model.train()
+
+ trainable_params, all_param = count_parameters(model)
+ if is_trainable:
+ param_stats = (
+ f"trainable params: {trainable_params:,} || "
+ f"all params: {all_param:,} || trainable%: {100 * trainable_params / all_param:.4f}"
+ )
+ else:
+ param_stats = f"all params: {all_param:,}"
+
+ logger.info_rank0(param_stats)
+
+ if model_args.print_param_status and int(os.getenv("LOCAL_RANK", "0")) == 0:
+ for name, param in model.named_parameters():
+ print(f"name: {name}, dtype: {param.dtype}, device: {param.device}, trainable: {param.requires_grad}")
+
+ return model
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/attention.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/attention.py
new file mode 100644
index 0000000..0f95d3f
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/attention.py
@@ -0,0 +1,87 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from transformers.utils import is_flash_attn_2_available, is_torch_sdpa_available
+
+from ...extras import logging
+from ...extras.constants import AttentionFunction
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig
+
+ from ...hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def configure_attn_implementation(
+ config: "PretrainedConfig", model_args: "ModelArguments", is_trainable: bool
+) -> None:
+ if getattr(config, "model_type", None) == "gemma2" and is_trainable:
+ if model_args.flash_attn == AttentionFunction.AUTO or model_args.flash_attn == AttentionFunction.FA2:
+ if is_flash_attn_2_available():
+ if model_args.flash_attn != AttentionFunction.FA2:
+ logger.warning_rank0("Gemma 2 should use flash attention 2, change `flash_attn` to fa2.")
+ model_args.flash_attn = AttentionFunction.FA2
+ else:
+ logger.warning_rank0("FlashAttention-2 is not installed, use eager attention.")
+ model_args.flash_attn = AttentionFunction.DISABLED
+ elif model_args.flash_attn == AttentionFunction.SDPA:
+ logger.warning_rank0(
+ "Gemma-2 should use soft-capping attention, while the SDPA attention does not support it."
+ )
+
+ if model_args.flash_attn == AttentionFunction.AUTO:
+ return
+
+ elif model_args.flash_attn == AttentionFunction.DISABLED:
+ requested_attn_implementation = "eager"
+
+ elif model_args.flash_attn == AttentionFunction.SDPA:
+ if not is_torch_sdpa_available():
+ logger.warning_rank0("torch>=2.1.1 is required for SDPA attention.")
+ return
+
+ requested_attn_implementation = "sdpa"
+ elif model_args.flash_attn == AttentionFunction.FA2:
+ if not is_flash_attn_2_available():
+ logger.warning_rank0("FlashAttention-2 is not installed.")
+ return
+
+ requested_attn_implementation = "flash_attention_2"
+ else:
+ raise NotImplementedError(f"Unknown attention type: {model_args.flash_attn}")
+
+ if getattr(config, "model_type", None) == "internlm2": # special case for custom models
+ setattr(config, "attn_implementation", requested_attn_implementation)
+ else:
+ setattr(config, "_attn_implementation", requested_attn_implementation)
+
+
+def print_attn_implementation(config: "PretrainedConfig") -> None:
+ if getattr(config, "model_type", None) == "internlm2": # special case for custom models
+ attn_implementation = getattr(config, "attn_implementation", None)
+ else:
+ attn_implementation = getattr(config, "_attn_implementation", None)
+
+ if attn_implementation == "flash_attention_2":
+ logger.info_rank0("Using FlashAttention-2 for faster training and inference.")
+ elif attn_implementation == "sdpa":
+ logger.info_rank0("Using torch SDPA for faster training and inference.")
+ else:
+ logger.info_rank0("Using vanilla attention implementation.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/checkpointing.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/checkpointing.py
new file mode 100644
index 0000000..28e2a79
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/checkpointing.py
@@ -0,0 +1,174 @@
+# Copyright 2025 HuggingFace Inc., Daniel Han-Chen & the Unsloth team and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's Transformers and PEFT library,
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/modeling_utils.py
+# https://github.com/huggingface/peft/blob/v0.10.0/src/peft/utils/other.py
+# and the Unsloth library.
+# https://github.com/unslothai/unsloth/blob/July-2024/unsloth/models/_utils.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import inspect
+from functools import WRAPPER_ASSIGNMENTS, partial, wraps
+from types import MethodType
+from typing import TYPE_CHECKING, Any, Callable, Optional, Union
+
+import torch
+
+from ...extras import logging
+from ...extras.constants import LAYERNORM_NAMES
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel
+
+ from ...hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def get_unsloth_gradient_checkpointing_func() -> Callable:
+ class UnslothGradientCheckpointing(torch.autograd.Function):
+ r"""Saves VRAM by smartly offloading to RAM."""
+
+ @staticmethod
+ @torch.cuda.amp.custom_fwd
+ def forward(
+ ctx: "torch.autograd.Function",
+ forward_function: "torch.Module",
+ hidden_states: "torch.Tensor",
+ *args: Union["torch.Tensor", Any],
+ ) -> "torch.Tensor":
+ saved_hidden_states = hidden_states.to("cpu", non_blocking=True)
+ with torch.no_grad():
+ output = forward_function(hidden_states, *args)
+
+ ctx.save_for_backward(saved_hidden_states)
+ ctx.forward_function = forward_function
+ ctx.args = args
+ return output
+
+ @staticmethod
+ @torch.cuda.amp.custom_bwd
+ def backward(ctx: "torch.autograd.Function", grad_output: "torch.Tensor") -> "torch.Tensor":
+ (hidden_states,) = ctx.saved_tensors
+ hidden_states = hidden_states.to("cuda", non_blocking=True).detach()
+ hidden_states.requires_grad_(True)
+ with torch.enable_grad():
+ (output,) = ctx.forward_function(hidden_states, *ctx.args)
+
+ torch.autograd.backward(output, grad_output)
+ return (None, hidden_states.grad) + (None,) * len(ctx.args)
+
+ return UnslothGradientCheckpointing.apply
+
+
+def get_custom_gradient_checkpointing_func(gradient_checkpointing_func: Callable) -> Callable:
+ r"""Only applies gradient checkpointing to trainable layers."""
+
+ @wraps(gradient_checkpointing_func, assigned=WRAPPER_ASSIGNMENTS + ("__self__",))
+ def custom_gradient_checkpointing_func(func: Callable, *args: Union["torch.Tensor", Any], **kwargs):
+ if isinstance(func, partial):
+ module: torch.nn.Module = func.func.__self__
+ else:
+ module: torch.nn.Module = func.__self__
+
+ has_grad = False
+ if any(param.requires_grad for param in module.parameters()):
+ has_grad = True
+ for arg in args:
+ if torch.is_tensor(arg) and torch.is_floating_point(arg):
+ arg.requires_grad_(True)
+ break # assume the first tensor is always the hidden states
+
+ if has_grad:
+ return gradient_checkpointing_func(func, *args, **kwargs)
+ else:
+ return func(*args, **kwargs)
+
+ return custom_gradient_checkpointing_func
+
+
+def _gradient_checkpointing_enable(
+ self: "PreTrainedModel",
+ gradient_checkpointing_kwargs: Optional[dict[str, Any]] = None,
+ use_unsloth_gc: bool = False,
+) -> None:
+ r"""Activates gradient checkpointing for the current model.
+
+ Modification of the original method to enable gradient checkpointing for block-wise optimizer.
+ """
+ from torch.utils.checkpoint import checkpoint
+
+ if not self.supports_gradient_checkpointing:
+ raise ValueError(f"{self.__class__.__name__} does not support gradient checkpointing.")
+
+ if gradient_checkpointing_kwargs is None:
+ gradient_checkpointing_kwargs = {"use_reentrant": True}
+
+ if use_unsloth_gc:
+ gradient_checkpointing_func = get_unsloth_gradient_checkpointing_func()
+ else:
+ gradient_checkpointing_func = partial(checkpoint, **gradient_checkpointing_kwargs)
+
+ gradient_checkpointing_func = get_custom_gradient_checkpointing_func(gradient_checkpointing_func)
+ if "value" in inspect.signature(self._set_gradient_checkpointing).parameters: # old GC format
+ self.apply(partial(self._set_gradient_checkpointing, value=True))
+ self.enable_input_require_grads()
+ logger.warning_rank0_once("You are using the old GC format, some features (e.g. BAdam) will be invalid.")
+ else: # have already enabled input require gradients
+ self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func)
+
+
+def _fp32_forward_post_hook(
+ module: "torch.nn.Module", args: tuple["torch.Tensor"], output: "torch.Tensor"
+) -> "torch.Tensor":
+ return output.to(torch.float32)
+
+
+def prepare_model_for_training(model: "PreTrainedModel", model_args: "ModelArguments") -> None:
+ r"""Prepare the model before training.
+
+ Include:
+ (1) cast the layernorm in fp32
+ (2) make output embedding layer require grads
+ (3) add the upcasting of the lm_head in fp32.
+ """
+ if model_args.upcast_layernorm:
+ logger.info_rank0("Upcasting layernorm weights in float32.")
+ for name, param in model.named_parameters():
+ if param.ndim == 1 and any(ln_name in name for ln_name in LAYERNORM_NAMES):
+ param.data = param.data.to(torch.float32)
+
+ if not model_args.disable_gradient_checkpointing:
+ if not getattr(model, "supports_gradient_checkpointing", False):
+ logger.warning_rank0("Current model does not support gradient checkpointing.")
+ else:
+ # use_reentrant=False might increase VRAM usage (have not been empirically verified yet)
+ # According to: https://github.com/huggingface/transformers/issues/28339
+ gradient_checkpointing_enable = partial(
+ _gradient_checkpointing_enable, use_unsloth_gc=model_args.use_unsloth_gc
+ )
+ model.gradient_checkpointing_enable = MethodType(gradient_checkpointing_enable, model)
+ model.gradient_checkpointing_enable(
+ gradient_checkpointing_kwargs={"use_reentrant": model_args.use_reentrant_gc}
+ )
+ setattr(model.config, "use_cache", False) # turn off when gradient checkpointing is enabled
+ logger.info_rank0("Gradient checkpointing enabled.")
+
+ if model_args.upcast_lmhead_output:
+ output_layer = model.get_output_embeddings()
+ if isinstance(output_layer, torch.nn.Linear) and output_layer.weight.dtype != torch.float32:
+ logger.info_rank0("Upcasting lm_head outputs in float32.")
+ output_layer.register_forward_hook(_fp32_forward_post_hook)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/embedding.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/embedding.py
new file mode 100644
index 0000000..c10e34f
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/embedding.py
@@ -0,0 +1,70 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from contextlib import nullcontext
+from typing import TYPE_CHECKING
+
+import torch
+from transformers.integrations import is_deepspeed_zero3_enabled
+
+from ...extras import logging
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel, PreTrainedTokenizer
+
+
+logger = logging.get_logger(__name__)
+
+
+def _noisy_mean_initialization(embed_weight: "torch.Tensor", num_new_tokens: int) -> None:
+ embedding_dim = embed_weight.size(1)
+ avg_weight = embed_weight[:-num_new_tokens].mean(dim=0, keepdim=True)
+ noise_weight = torch.empty_like(embed_weight[-num_new_tokens:])
+ noise_weight.normal_(mean=0, std=(1.0 / math.sqrt(embedding_dim)))
+ embed_weight[-num_new_tokens:] = avg_weight + noise_weight
+
+
+def resize_embedding_layer(model: "PreTrainedModel", tokenizer: "PreTrainedTokenizer") -> None:
+ r"""Resize token embeddings."""
+ if is_deepspeed_zero3_enabled():
+ import deepspeed # type: ignore
+
+ params = [model.get_input_embeddings().weight]
+ if model.get_output_embeddings() is not None and not model.config.tie_word_embeddings:
+ params.append(model.get_output_embeddings().weight)
+
+ context_maybe_zero3 = deepspeed.zero.GatheredParameters(params, modifier_rank=0)
+ else:
+ context_maybe_zero3 = nullcontext()
+
+ with context_maybe_zero3:
+ current_embedding_size = model.get_input_embeddings().weight.size(0)
+
+ if len(tokenizer) > current_embedding_size:
+ if getattr(model, "quantization_method", None):
+ raise ValueError("Cannot resize embedding layers of a quantized model.")
+
+ if not isinstance(model.get_output_embeddings(), torch.nn.Linear):
+ raise ValueError("Current model does not support resizing embedding layers.")
+
+ model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=64)
+ with context_maybe_zero3:
+ new_embedding_size = model.get_input_embeddings().weight.size(0)
+ num_new_tokens = new_embedding_size - current_embedding_size
+ _noisy_mean_initialization(model.get_input_embeddings().weight.data, num_new_tokens)
+ _noisy_mean_initialization(model.get_output_embeddings().weight.data, num_new_tokens)
+
+ logger.info_rank0(f"Resized token embeddings from {current_embedding_size} to {new_embedding_size}.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/kv_cache.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/kv_cache.py
new file mode 100644
index 0000000..cd2c119
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/kv_cache.py
@@ -0,0 +1,44 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from ...extras import logging
+
+
+logger = logging.get_logger(__name__)
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig
+
+ from ...hparams import ModelArguments
+
+
+def configure_kv_cache(config: "PretrainedConfig", model_args: "ModelArguments", is_trainable: bool) -> None:
+ if not is_trainable:
+ setattr(config, "use_cache", model_args.use_cache)
+ if hasattr(config, "text_config"):
+ setattr(config.text_config, "use_cache", model_args.use_cache)
+
+ if model_args.use_cache:
+ logger.info_rank0("KV cache is enabled for faster generation.")
+ else:
+ logger.info_rank0("KV cache is disabled.")
+ else:
+ setattr(config, "use_cache", False)
+ if hasattr(config, "text_config"):
+ setattr(config.text_config, "use_cache", False)
+
+ logger.info_rank0("KV cache is disabled during training.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/liger_kernel.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/liger_kernel.py
new file mode 100644
index 0000000..486c54e
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/liger_kernel.py
@@ -0,0 +1,77 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import inspect
+from typing import TYPE_CHECKING
+
+from ...extras import logging
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig
+
+ from ...hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def apply_liger_kernel(
+ config: "PretrainedConfig",
+ model_args: "ModelArguments",
+ is_trainable: bool,
+ require_logits: bool,
+) -> None:
+ if not is_trainable or not model_args.enable_liger_kernel:
+ return
+
+ model_type = getattr(config, "model_type", None)
+ if model_type == "gemma":
+ from liger_kernel.transformers import apply_liger_kernel_to_gemma as apply_liger_kernel
+ elif model_type == "gemma2":
+ from liger_kernel.transformers import apply_liger_kernel_to_gemma2 as apply_liger_kernel
+ elif model_type == "gemma3":
+ from liger_kernel.transformers import apply_liger_kernel_to_gemma3 as apply_liger_kernel
+ elif model_type == "gemma3_text":
+ from liger_kernel.transformers import apply_liger_kernel_to_gemma3_text as apply_liger_kernel
+ elif model_type == "paligemma":
+ from liger_kernel.transformers import apply_liger_kernel_to_paligemma as apply_liger_kernel
+ elif model_type == "llama":
+ from liger_kernel.transformers import apply_liger_kernel_to_llama as apply_liger_kernel
+ elif model_type == "mistral":
+ from liger_kernel.transformers import apply_liger_kernel_to_mistral as apply_liger_kernel
+ elif model_type == "mixtral":
+ from liger_kernel.transformers import apply_liger_kernel_to_mixtral as apply_liger_kernel
+ elif model_type == "mllama":
+ from liger_kernel.transformers import apply_liger_kernel_to_mllama as apply_liger_kernel
+ elif model_type == "phi3":
+ from liger_kernel.transformers import apply_liger_kernel_to_phi3 as apply_liger_kernel
+ elif model_type == "qwen2":
+ from liger_kernel.transformers import apply_liger_kernel_to_qwen2 as apply_liger_kernel
+ elif model_type == "qwen2_vl":
+ from liger_kernel.transformers import apply_liger_kernel_to_qwen2_vl as apply_liger_kernel
+ elif model_type == "qwen2_5_vl":
+ from liger_kernel.transformers import apply_liger_kernel_to_qwen2_5_vl as apply_liger_kernel
+ else:
+ logger.warning_rank0("Current model does not support liger kernel.")
+ return
+
+ if require_logits and "fused_linear_cross_entropy" in inspect.signature(apply_liger_kernel).parameters:
+ logger.info_rank0("Current training stage does not support chunked cross entropy.")
+ kwargs = {"fused_linear_cross_entropy": False, "cross_entropy": True}
+ else:
+ kwargs = {}
+
+ apply_liger_kernel(**kwargs)
+ logger.info_rank0("Liger kernel has been applied to the model.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/longlora.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/longlora.py
new file mode 100644
index 0000000..f7c36ee
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/longlora.py
@@ -0,0 +1,370 @@
+# Copyright 2025 EleutherAI, HuggingFace Inc., Yukang Chen, and the LlamaFactory team.
+#
+# This code is based on the EleutherAI's GPT-NeoX and the HuggingFace's Transformers libraries.
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/models/llama/modeling_llama.py
+# This code is also inspired by the original LongLoRA implementation.
+# https://github.com/dvlab-research/LongLoRA/blob/main/llama_attn_replace.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import TYPE_CHECKING, Optional
+
+import torch
+import torch.nn as nn
+import transformers
+
+from ...extras import logging
+from ...extras.constants import SUPPORTED_CLASS_FOR_S2ATTN
+from ...extras.misc import check_version
+from ...extras.packages import is_transformers_version_greater_than
+
+
+if not is_transformers_version_greater_than("4.48.0"):
+ from transformers.modeling_flash_attention_utils import _flash_attention_forward
+ from transformers.models.llama.modeling_llama import (
+ Cache,
+ LlamaAttention,
+ LlamaFlashAttention2,
+ LlamaSdpaAttention,
+ apply_rotary_pos_emb,
+ repeat_kv,
+ )
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig
+
+ from ...hparams import ModelArguments
+
+
+transformers_logger = transformers.utils.logging.get_logger(__name__)
+
+
+# Modified from:
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/models/llama/modeling_llama.py
+def llama_attention_forward(
+ self: "LlamaAttention",
+ hidden_states: "torch.Tensor",
+ attention_mask: Optional["torch.Tensor"] = None,
+ position_ids: Optional["torch.LongTensor"] = None,
+ past_key_value: Optional["Cache"] = None,
+ output_attentions: bool = False,
+ cache_position: Optional["torch.LongTensor"] = None,
+ position_embeddings: Optional[tuple["torch.Tensor", "torch.Tensor"]] = None,
+ **kwargs,
+) -> tuple["torch.Tensor", Optional["torch.Tensor"], Optional[tuple["torch.Tensor"]]]:
+ bsz, q_len, _ = hidden_states.size()
+
+ query_states: torch.Tensor = self.q_proj(hidden_states)
+ key_states: torch.Tensor = self.k_proj(hidden_states)
+ value_states: torch.Tensor = self.v_proj(hidden_states)
+
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
+
+ if position_embeddings is None:
+ cos, sin = self.rotary_emb(value_states, position_ids)
+ else:
+ cos, sin = position_embeddings
+
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
+
+ if past_key_value is not None:
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+
+ key_states = repeat_kv(key_states, self.num_key_value_groups)
+ value_states = repeat_kv(value_states, self.num_key_value_groups)
+
+ if getattr(self.config, "group_size_ratio", None) and self.training: # shift
+ groupsz = int(q_len * getattr(self.config, "group_size_ratio"))
+ assert q_len % groupsz == 0, f"q_len {q_len} should be divisible by group size {groupsz}."
+ num_groups = q_len // groupsz
+
+ def shift(state: "torch.Tensor") -> "torch.Tensor":
+ state = state.transpose(1, 2) # output: (bsz, seq_len, n_heads, head_dim)
+ state = torch.cat(
+ (state[:, :, : self.num_heads // 2], state[:, :, self.num_heads // 2 :].roll(-groupsz // 2, dims=1)),
+ dim=2,
+ )
+ return state.reshape(bsz * num_groups, groupsz, self.num_heads, self.head_dim).transpose(1, 2)
+
+ query_states, key_states, value_states = shift(query_states), shift(key_states), shift(value_states)
+ if attention_mask is not None:
+ attention_mask = attention_mask[:, :, :groupsz, :groupsz].repeat(num_groups, 1, 1, 1)
+
+ attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
+
+ if attention_mask is not None: # no matter the length, we just slice it
+ causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+ attn_weights = attn_weights + causal_mask
+
+ # upcast attention to fp32
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
+ attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+ attn_output = torch.matmul(attn_weights, value_states) # (bsz, :, seq_len, :) or (bsz * n_group, :, groupsz, :)
+ attn_output = attn_output.transpose(1, 2).contiguous()
+
+ if getattr(self.config, "group_size_ratio", None) and self.training: # shift back
+ attn_output.reshape(bsz, q_len, self.num_heads, self.head_dim)
+ attn_output = torch.cat(
+ (
+ attn_output[:, :, : self.num_heads // 2],
+ attn_output[:, :, self.num_heads // 2 :].roll(groupsz // 2, dims=1),
+ ),
+ dim=2,
+ )
+
+ attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+ attn_output = self.o_proj(attn_output)
+
+ if not output_attentions:
+ attn_weights = None
+
+ return attn_output, attn_weights, past_key_value
+
+
+# Modified from:
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/models/llama/modeling_llama.py
+def llama_flash_attention_2_forward(
+ self: "LlamaFlashAttention2",
+ hidden_states: "torch.Tensor",
+ attention_mask: Optional["torch.Tensor"] = None,
+ position_ids: Optional["torch.LongTensor"] = None,
+ past_key_value: Optional["Cache"] = None,
+ output_attentions: bool = False,
+ cache_position: Optional["torch.LongTensor"] = None,
+ position_embeddings: Optional[tuple["torch.Tensor", "torch.Tensor"]] = None,
+ **kwargs,
+) -> tuple["torch.Tensor", Optional["torch.Tensor"], Optional[tuple["torch.Tensor"]]]:
+ # LlamaFlashAttention2 attention does not support output_attentions
+ output_attentions = False
+
+ bsz, q_len, _ = hidden_states.size()
+
+ query_states: torch.Tensor = self.q_proj(hidden_states)
+ key_states: torch.Tensor = self.k_proj(hidden_states)
+ value_states: torch.Tensor = self.v_proj(hidden_states)
+
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
+
+ if position_embeddings is None:
+ cos, sin = self.rotary_emb(value_states, position_ids)
+ else:
+ cos, sin = position_embeddings
+
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
+
+ if past_key_value is not None:
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+
+ key_states = repeat_kv(key_states, self.num_key_value_groups)
+ value_states = repeat_kv(value_states, self.num_key_value_groups)
+
+ # FlashAttention requires the input to have the shape (bsz, seq_len, n_heads, head_dim)
+ query_states = query_states.transpose(1, 2)
+ key_states = key_states.transpose(1, 2)
+ value_states = value_states.transpose(1, 2)
+
+ dropout_rate = self.attention_dropout if self.training else 0.0
+
+ input_dtype = query_states.dtype
+ if input_dtype == torch.float32:
+ if torch.is_autocast_enabled():
+ target_dtype = torch.get_autocast_gpu_dtype()
+ elif hasattr(self.config, "_pre_quantization_dtype"):
+ target_dtype = self.config._pre_quantization_dtype
+ else:
+ target_dtype = self.q_proj.weight.dtype
+
+ transformers_logger.warning_once("The input hidden states seems to be silently casted in float32.")
+ query_states = query_states.to(target_dtype)
+ key_states = key_states.to(target_dtype)
+ value_states = value_states.to(target_dtype)
+
+ if getattr(self.config, "group_size_ratio", None) and self.training: # shift
+ groupsz = int(q_len * getattr(self.config, "group_size_ratio"))
+ assert q_len % groupsz == 0, f"q_len {q_len} should be divisible by group size {groupsz}."
+ num_groups = q_len // groupsz
+
+ def shift(state: "torch.Tensor") -> "torch.Tensor":
+ state = torch.cat(
+ (state[:, :, : self.num_heads // 2], state[:, :, self.num_heads // 2 :].roll(-groupsz // 2, dims=1)),
+ dim=2,
+ )
+ return state.reshape(bsz * num_groups, groupsz, self.num_heads, self.head_dim)
+
+ query_states, key_states, value_states = shift(query_states), shift(key_states), shift(value_states)
+ if attention_mask is not None:
+ attention_mask = attention_mask[:, :groupsz].repeat(num_groups, 1)
+
+ attn_output: torch.Tensor = _flash_attention_forward(
+ query_states,
+ key_states,
+ value_states,
+ attention_mask,
+ query_states.size(1),
+ dropout=dropout_rate,
+ sliding_window=getattr(self, "sliding_window", None),
+ use_top_left_mask=self._flash_attn_uses_top_left_mask,
+ is_causal=self.is_causal,
+ )
+
+ if getattr(self.config, "group_size_ratio", None) and self.training: # shift back
+ attn_output.reshape(bsz, q_len, self.num_heads, self.head_dim)
+ attn_output = torch.cat(
+ (
+ attn_output[:, :, : self.num_heads // 2],
+ attn_output[:, :, self.num_heads // 2 :].roll(groupsz // 2, dims=1),
+ ),
+ dim=2,
+ )
+
+ attn_output = attn_output.reshape(bsz, q_len, self.hidden_size).contiguous()
+ attn_output = self.o_proj(attn_output)
+
+ if not output_attentions:
+ attn_weights = None
+
+ return attn_output, attn_weights, past_key_value
+
+
+# Modified from:
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/models/llama/modeling_llama.py
+def llama_sdpa_attention_forward(
+ self: "LlamaSdpaAttention",
+ hidden_states: "torch.Tensor",
+ attention_mask: Optional["torch.Tensor"] = None,
+ position_ids: Optional["torch.LongTensor"] = None,
+ past_key_value: Optional["Cache"] = None,
+ output_attentions: bool = False,
+ cache_position: Optional["torch.LongTensor"] = None,
+ position_embeddings: Optional[tuple["torch.Tensor", "torch.Tensor"]] = None,
+ **kwargs,
+) -> tuple["torch.Tensor", Optional["torch.Tensor"], Optional[tuple["torch.Tensor"]]]:
+ if output_attentions:
+ transformers_logger.warning_once(
+ "SDPA does not support `output_attentions=True`. Falling back to the vanilla attention"
+ )
+ return llama_attention_forward(
+ self,
+ hidden_states=hidden_states,
+ attention_mask=attention_mask,
+ position_ids=position_ids,
+ past_key_value=past_key_value,
+ output_attentions=output_attentions,
+ cache_position=cache_position,
+ **kwargs,
+ )
+
+ bsz, q_len, _ = hidden_states.size()
+
+ query_states: torch.Tensor = self.q_proj(hidden_states)
+ key_states: torch.Tensor = self.k_proj(hidden_states)
+ value_states: torch.Tensor = self.v_proj(hidden_states)
+
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
+
+ if position_embeddings is None:
+ cos, sin = self.rotary_emb(value_states, position_ids)
+ else:
+ cos, sin = position_embeddings
+
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
+
+ if past_key_value is not None:
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+
+ key_states = repeat_kv(key_states, self.num_key_value_groups)
+ value_states = repeat_kv(value_states, self.num_key_value_groups)
+
+ if getattr(self.config, "group_size_ratio", None) and self.training: # shift
+ groupsz = int(q_len * getattr(self.config, "group_size_ratio"))
+ assert q_len % groupsz == 0, f"q_len {q_len} should be divisible by group size {groupsz}."
+ num_groups = q_len // groupsz
+
+ def shift(state: "torch.Tensor") -> "torch.Tensor":
+ state = state.transpose(1, 2) # output: (bsz, seq_len, n_heads, head_dim)
+ state = torch.cat(
+ (state[:, :, : self.num_heads // 2], state[:, :, self.num_heads // 2 :].roll(-groupsz // 2, dims=1)),
+ dim=2,
+ )
+ return state.reshape(bsz * num_groups, groupsz, self.num_heads, self.head_dim).transpose(1, 2)
+
+ query_states, key_states, value_states = shift(query_states), shift(key_states), shift(value_states)
+ if attention_mask is not None:
+ attention_mask = attention_mask[:, :, :groupsz, :groupsz].repeat(num_groups, 1, 1, 1)
+
+ causal_mask = attention_mask
+ if attention_mask is not None:
+ causal_mask = causal_mask[:, :, :, : key_states.shape[-2]]
+
+ if query_states.device.type == "cuda" and causal_mask is not None: # avoid pytorch bug
+ query_states = query_states.contiguous()
+ key_states = key_states.contiguous()
+ value_states = value_states.contiguous()
+
+ is_causal = True if causal_mask is None and q_len > 1 else False
+ attn_output = torch.nn.functional.scaled_dot_product_attention(
+ query_states,
+ key_states,
+ value_states,
+ attn_mask=causal_mask,
+ dropout_p=self.attention_dropout if self.training else 0.0,
+ is_causal=is_causal,
+ )
+ attn_output = attn_output.transpose(1, 2).contiguous()
+
+ if getattr(self.config, "group_size_ratio", None) and self.training: # shift back
+ attn_output.reshape(bsz, q_len, self.num_heads, self.head_dim)
+ attn_output = torch.cat(
+ (
+ attn_output[:, :, : self.num_heads // 2],
+ attn_output[:, :, self.num_heads // 2 :].roll(groupsz // 2, dims=1),
+ ),
+ dim=2,
+ )
+
+ attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+ attn_output = self.o_proj(attn_output)
+
+ return attn_output, None, past_key_value
+
+
+def _apply_llama_patch() -> None:
+ check_version("transformers>=4.45.0,<4.48.0", mandatory=True)
+ LlamaAttention.forward = llama_attention_forward
+ LlamaFlashAttention2.forward = llama_flash_attention_2_forward
+ LlamaSdpaAttention.forward = llama_sdpa_attention_forward
+
+
+def configure_longlora(config: "PretrainedConfig", model_args: "ModelArguments", is_trainable: bool) -> None:
+ if not is_trainable or not model_args.shift_attn:
+ return
+
+ logger = logging.get_logger(__name__)
+
+ if getattr(config, "model_type", None) in SUPPORTED_CLASS_FOR_S2ATTN:
+ setattr(config, "group_size_ratio", 0.25)
+ _apply_llama_patch()
+ logger.info_rank0("Using shift short attention with group_size_ratio=1/4.")
+ else:
+ logger.warning_rank0("Current model does not support shift short attention.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/misc.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/misc.py
new file mode 100644
index 0000000..b0249b4
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/misc.py
@@ -0,0 +1,86 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from ...extras import logging
+from .visual import COMPOSITE_MODELS
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig, PreTrainedModel, PreTrainedTokenizer
+
+
+logger = logging.get_logger(__name__)
+
+
+def find_all_linear_modules(model: "PreTrainedModel", freeze_vision_tower: bool) -> list[str]:
+ r"""Find all available modules to apply LoRA, GaLore or APOLLO."""
+ model_type = getattr(model.config, "model_type", None)
+ forbidden_modules = {"lm_head"}
+ if model_type == "chatglm":
+ forbidden_modules.add("output_layer")
+ elif model_type == "internlm2":
+ forbidden_modules.add("output")
+
+ if model_type in COMPOSITE_MODELS:
+ forbidden_modules.add(COMPOSITE_MODELS[model_type].projector_key)
+
+ if freeze_vision_tower and model_type in COMPOSITE_MODELS:
+ forbidden_modules.update(COMPOSITE_MODELS[model_type].vision_model_keys)
+
+ module_names = set()
+ for name, module in model.named_modules():
+ if any(forbidden_module in name for forbidden_module in forbidden_modules):
+ continue
+
+ if "Linear" in module.__class__.__name__ and "Embedding" not in module.__class__.__name__:
+ module_names.add(name.split(".")[-1])
+
+ logger.info_rank0("Found linear modules: {}".format(",".join(module_names)))
+ return list(module_names)
+
+
+def find_expanded_modules(model: "PreTrainedModel", target_modules: list[str], num_layer_trainable: int) -> list[str]:
+ r"""Find the modules in the expanded blocks to apply lora."""
+ num_layers = getattr(model.config, "num_hidden_layers", None)
+ if not num_layers:
+ raise ValueError("Model was not supported.")
+
+ if num_layers % num_layer_trainable != 0:
+ raise ValueError(
+ f"`num_layers` {num_layers} should be divisible by `num_layer_trainable` {num_layer_trainable}."
+ )
+
+ stride = num_layers // num_layer_trainable
+ trainable_layer_ids = range(stride - 1, num_layers + stride - 1, stride)
+ trainable_layers = [f".{idx:d}." for idx in trainable_layer_ids]
+ module_names = []
+ for name, _ in model.named_modules():
+ if any(target_module in name for target_module in target_modules) and any(
+ trainable_layer in name for trainable_layer in trainable_layers
+ ):
+ module_names.append(name)
+
+ logger.info_rank0("Apply lora to layers: {}.".format(",".join(map(str, trainable_layer_ids))))
+ return module_names
+
+
+def register_autoclass(config: "PretrainedConfig", model: "PreTrainedModel", tokenizer: "PreTrainedTokenizer"):
+ if "AutoConfig" in getattr(config, "auto_map", {}):
+ config.__class__.register_for_auto_class()
+ if "AutoModelForCausalLM" in getattr(config, "auto_map", {}):
+ model.__class__.register_for_auto_class()
+ if "AutoTokenizer" in tokenizer.init_kwargs.get("auto_map", {}):
+ tokenizer.__class__.register_for_auto_class()
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/mod.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/mod.py
new file mode 100644
index 0000000..5f67cd5
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/mod.py
@@ -0,0 +1,42 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from ...extras.constants import MOD_SUPPORTED_MODELS
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig, PreTrainedModel
+
+ from ...hparams import ModelArguments
+
+
+def load_mod_pretrained_model(**init_kwargs) -> "PreTrainedModel":
+ from MoD import AutoMoDModelForCausalLM
+
+ return AutoMoDModelForCausalLM.from_pretrained(**init_kwargs)
+
+
+def convert_pretrained_model_to_mod(
+ model: "PreTrainedModel", config: "PretrainedConfig", model_args: "ModelArguments"
+) -> "PreTrainedModel":
+ from MoD import apply_mod_to_hf
+
+ if getattr(config, "model_type", None) not in MOD_SUPPORTED_MODELS:
+ raise ValueError("Current model is not supported by mixture-of-depth.")
+
+ model = apply_mod_to_hf(model)
+ model = model.to(model_args.compute_dtype)
+ return model
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/moe.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/moe.py
new file mode 100644
index 0000000..b3fca4f
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/moe.py
@@ -0,0 +1,87 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+import torch
+from transformers.integrations import is_deepspeed_zero3_enabled
+
+from ...extras.misc import check_version
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig, PreTrainedModel
+
+ from ...hparams import ModelArguments
+
+
+def _set_z3_leaf_modules(model: "PreTrainedModel", leaf_modules: list["torch.nn.Module"]) -> None:
+ check_version("deepspeed>=0.13.0")
+ from deepspeed.utils import set_z3_leaf_modules # type: ignore
+
+ set_z3_leaf_modules(model, leaf_modules)
+
+
+def add_z3_leaf_module(model: "PreTrainedModel") -> None:
+ r"""Set module as a leaf module to skip partitioning in deepspeed zero3."""
+ if not is_deepspeed_zero3_enabled():
+ return
+
+ model_type = getattr(model.config, "model_type", None)
+ if model_type == "dbrx":
+ from transformers.models.dbrx.modeling_dbrx import DbrxFFN
+
+ _set_z3_leaf_modules(model, [DbrxFFN])
+
+ if model_type == "jamba":
+ from transformers.models.jamba.modeling_jamba import JambaSparseMoeBlock
+
+ _set_z3_leaf_modules(model, [JambaSparseMoeBlock])
+
+ if model_type == "jetmoe":
+ from transformers.models.jetmoe.modeling_jetmoe import JetMoeMoA, JetMoeMoE
+
+ _set_z3_leaf_modules(model, [JetMoeMoA, JetMoeMoE])
+
+ if model_type in ["kimi_vl", "deepseek_v3"]:
+ check_version("transformers>=4.51.1")
+ from transformers.models.deepseek_v3.modeling_deepseek_v3 import DeepseekV3MoE
+
+ _set_z3_leaf_modules(model, [DeepseekV3MoE])
+
+ if model_type == "mixtral":
+ from transformers.models.mixtral.modeling_mixtral import MixtralSparseMoeBlock
+
+ _set_z3_leaf_modules(model, [MixtralSparseMoeBlock])
+
+ if model_type == "qwen2_moe":
+ from transformers.models.qwen2_moe.modeling_qwen2_moe import Qwen2MoeSparseMoeBlock
+
+ _set_z3_leaf_modules(model, [Qwen2MoeSparseMoeBlock])
+
+
+def configure_moe(config: "PretrainedConfig", model_args: "ModelArguments", is_trainable: bool) -> None:
+ model_type = getattr(config, "model_type", None)
+ if model_args.moe_aux_loss_coef is not None:
+ if model_type in ["jamba", "mixtral", "qwen2_moe"]:
+ setattr(config, "router_aux_loss_coef", model_args.moe_aux_loss_coef)
+
+ elif model_type == "deepseek":
+ setattr(config, "aux_loss_alpha", model_args.moe_aux_loss_coef)
+
+ elif model_type == "jetmoe":
+ setattr(config, "aux_loss_coef", model_args.moe_aux_loss_coef)
+
+ if model_type in ["dbrx", "jamba", "jetmoe", "mixtral", "qwen2_moe"]:
+ setattr(config, "output_router_logits", is_trainable)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/packing.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/packing.py
new file mode 100644
index 0000000..ed0d6c4
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/packing.py
@@ -0,0 +1,117 @@
+# Copyright 2025 Musab Gultekin and the LlamaFactory team.
+#
+# This code is based on the Musab Gultekin's functionary library.
+# https://github.com/MeetKai/functionary/blob/main/functionary/train/packing/monkey_patch_packing.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# MIT License
+#
+# Copyright (c) 2023 Musab Gultekin
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+from typing import TYPE_CHECKING
+
+import torch
+import torch.nn.functional as F
+
+from ...extras import logging
+
+
+if TYPE_CHECKING:
+ from ...hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def get_seqlens_in_batch(attention_mask: "torch.Tensor") -> "torch.Tensor":
+ r"""Get the sequnce lengths in the current batch.
+
+ e.g.
+ ```python
+ # input
+ [
+ [1, 1, 2, 2, 2, 0],
+ [1, 2, 2, 3, 3, 3],
+ ]
+ # output
+ [2, 3, 1, 2, 3]
+ ```
+ """
+ bsz = attention_mask.size(0)
+ dtype, device = attention_mask.dtype, attention_mask.device
+ max_num = torch.max(attention_mask).item()
+ counts: torch.Tensor = torch.zeros((bsz, max_num), dtype=dtype, device=device)
+ for i in range(max_num):
+ counts[:, i] = torch.sum(attention_mask == (i + 1), dim=-1)
+
+ counts = counts.flatten()
+ seqlens = counts[counts.nonzero().squeeze(dim=-1)]
+ return seqlens
+
+
+def get_unpad_data(attention_mask: "torch.Tensor") -> tuple["torch.Tensor", "torch.Tensor", int]:
+ r"""Prepare the indices and seqlens for flash attn varlen function.
+
+ Returns:
+ indices: indices of non-masked tokens from the flattened sequence.
+ cu_seqlens: the cumulative sequence lengths in the current batch, always starts from 0.
+ max_seqlen_in_batch: the largest seqlen in the current batch.
+
+ e.g.
+ ```python
+ # input
+ [
+ [1, 1, 2, 2, 2, 0],
+ [1, 2, 2, 3, 3, 3],
+ ]
+ # output
+ [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11]
+ [0, 2, 5, 6, 8, 11]
+ 3
+ ```
+
+ """
+ seqlens_in_batch = get_seqlens_in_batch(attention_mask)
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
+ max_seqlen_in_batch = seqlens_in_batch.max().item()
+ cu_seqlens = F.pad(torch.cumsum(seqlens_in_batch, dim=0, dtype=torch.int32), (1, 0))
+ return indices, cu_seqlens, max_seqlen_in_batch
+
+
+def configure_packing(model_args: "ModelArguments", is_trainable: bool) -> None:
+ if not is_trainable or not model_args.block_diag_attn:
+ return
+
+ import transformers.modeling_flash_attention_utils
+
+ transformers.modeling_flash_attention_utils._get_unpad_data = get_unpad_data
+ logger.info_rank0("Using block diagonal attention for sequence packing without cross-attention.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/quantization.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/quantization.py
new file mode 100644
index 0000000..ffbf582
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/quantization.py
@@ -0,0 +1,183 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's Transformers and Optimum library.
+# https://github.com/huggingface/transformers/blob/v4.41.0/src/transformers/utils/quantization_config.py
+# https://github.com/huggingface/optimum/blob/v1.20.0/optimum/gptq/data.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import random
+from typing import TYPE_CHECKING, Any
+
+import torch
+from datasets import load_dataset
+from transformers import BitsAndBytesConfig, EetqConfig, GPTQConfig, HqqConfig
+from transformers.integrations import is_deepspeed_zero3_enabled
+from transformers.modeling_utils import is_fsdp_enabled
+
+from ...extras import logging
+from ...extras.constants import FILEEXT2TYPE, QuantizationMethod
+from ...extras.misc import check_version, get_current_device
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig, PreTrainedTokenizer
+
+ from ...hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def _get_quantization_dataset(tokenizer: "PreTrainedTokenizer", model_args: "ModelArguments") -> list[dict[str, Any]]:
+ r"""Prepare the tokenized dataset to perform AutoGPTQ. Do not use tensor output for JSON serialization."""
+ if os.path.isfile(model_args.export_quantization_dataset):
+ data_path = FILEEXT2TYPE.get(model_args.export_quantization_dataset.split(".")[-1], None)
+ data_files = model_args.export_quantization_dataset
+ else:
+ data_path = model_args.export_quantization_dataset
+ data_files = None
+
+ dataset = load_dataset(
+ path=data_path,
+ data_files=data_files,
+ split="train",
+ cache_dir=model_args.cache_dir,
+ token=model_args.hf_hub_token,
+ )
+
+ samples = []
+ maxlen = model_args.export_quantization_maxlen
+ for _ in range(model_args.export_quantization_nsamples):
+ n_try = 0
+ while True:
+ if n_try > 100:
+ raise ValueError("Cannot find satisfying example, considering decrease `export_quantization_maxlen`.")
+
+ sample_idx = random.randint(0, len(dataset) - 1)
+ sample: dict[str, torch.Tensor] = tokenizer(dataset[sample_idx]["text"], return_tensors="pt")
+ n_try += 1
+ if sample["input_ids"].size(1) > maxlen:
+ break # TODO: fix large maxlen
+
+ word_idx = random.randint(0, sample["input_ids"].size(1) - maxlen - 1)
+ input_ids = sample["input_ids"][:, word_idx : word_idx + maxlen]
+ attention_mask = sample["attention_mask"][:, word_idx : word_idx + maxlen]
+ samples.append({"input_ids": input_ids.tolist(), "attention_mask": attention_mask.tolist()})
+
+ return samples
+
+
+def configure_quantization(
+ config: "PretrainedConfig",
+ tokenizer: "PreTrainedTokenizer",
+ model_args: "ModelArguments",
+ init_kwargs: dict[str, Any],
+) -> None:
+ r"""Priority: PTQ-quantized (train/infer) > AutoGPTQ (export) > On-the-fly quantization (train/infer)."""
+ if getattr(config, "quantization_config", None): # ptq
+ if model_args.quantization_bit is not None:
+ logger.warning_rank0("`quantization_bit` will not affect on the PTQ-quantized models.")
+
+ if is_deepspeed_zero3_enabled() or is_fsdp_enabled():
+ raise ValueError("DeepSpeed ZeRO-3 or FSDP is incompatible with PTQ-quantized models.")
+
+ quantization_config: dict[str, Any] = getattr(config, "quantization_config", None)
+ quant_method = quantization_config.get("quant_method", "")
+
+ if quant_method == QuantizationMethod.GPTQ:
+ check_version("auto_gptq>=0.5.0", mandatory=True)
+ quantization_config.pop("disable_exllama", None) # remove deprecated args
+ quantization_config["use_exllama"] = False # disable exllama
+
+ if quant_method == QuantizationMethod.AWQ:
+ check_version("autoawq", mandatory=True)
+
+ if quant_method == QuantizationMethod.AQLM:
+ check_version("aqlm>=1.1.0", mandatory=True)
+ quantization_config["bits"] = 2
+
+ quant_bits = quantization_config.get("bits", "?")
+ logger.info_rank0(f"Loading {quant_bits}-bit {quant_method.upper()}-quantized model.")
+
+ elif model_args.export_quantization_bit is not None: # auto-gptq
+ if model_args.export_quantization_bit not in [8, 4, 3, 2]:
+ raise ValueError("AutoGPTQ only accepts 2/3/4/8-bit quantization.")
+
+ check_version("optimum>=1.17.0", mandatory=True)
+ check_version("auto_gptq>=0.5.0", mandatory=True)
+ from accelerate.utils import get_max_memory
+
+ if getattr(config, "model_type", None) == "chatglm":
+ raise ValueError("ChatGLM model is not supported yet.")
+
+ init_kwargs["quantization_config"] = GPTQConfig(
+ bits=model_args.export_quantization_bit,
+ dataset=_get_quantization_dataset(tokenizer, model_args),
+ )
+ init_kwargs["device_map"] = "auto"
+ init_kwargs["max_memory"] = get_max_memory()
+ logger.info_rank0(f"Quantizing model to {model_args.export_quantization_bit} bit with AutoGPTQ.")
+
+ elif model_args.quantization_bit is not None: # on-the-fly
+ if model_args.quantization_method == QuantizationMethod.BNB:
+ if model_args.quantization_bit == 8:
+ check_version("bitsandbytes>=0.37.0", mandatory=True)
+ init_kwargs["quantization_config"] = BitsAndBytesConfig(load_in_8bit=True)
+ elif model_args.quantization_bit == 4:
+ check_version("bitsandbytes>=0.39.0", mandatory=True)
+ init_kwargs["quantization_config"] = BitsAndBytesConfig(
+ load_in_4bit=True,
+ bnb_4bit_compute_dtype=model_args.compute_dtype,
+ bnb_4bit_use_double_quant=model_args.double_quantization,
+ bnb_4bit_quant_type=model_args.quantization_type,
+ bnb_4bit_quant_storage=model_args.compute_dtype, # crucial for fsdp+qlora
+ )
+ else:
+ raise ValueError("Bitsandbytes only accepts 4-bit or 8-bit quantization.")
+
+ # Do not assign device map if:
+ # 1. deepspeed zero3 or fsdp (train)
+ # 2. auto quantization device map (inference)
+ if is_deepspeed_zero3_enabled() or is_fsdp_enabled() or model_args.quantization_device_map == "auto":
+ if model_args.quantization_bit != 4:
+ raise ValueError("Only 4-bit quantized model can use fsdp+qlora or auto device map.")
+
+ check_version("bitsandbytes>=0.43.0", mandatory=True)
+ else:
+ init_kwargs["device_map"] = {"": get_current_device()} # change auto device map for inference
+
+ logger.info_rank0(f"Quantizing model to {model_args.quantization_bit} bit with bitsandbytes.")
+ elif model_args.quantization_method == QuantizationMethod.HQQ:
+ if model_args.quantization_bit not in [8, 6, 5, 4, 3, 2, 1]:
+ raise ValueError("HQQ only accepts 1/2/3/4/5/6/8-bit quantization.")
+
+ if is_deepspeed_zero3_enabled() or is_fsdp_enabled():
+ raise ValueError("HQQ quantization is incompatible with DeepSpeed ZeRO-3 or FSDP.")
+
+ check_version("hqq", mandatory=True)
+ init_kwargs["quantization_config"] = HqqConfig(
+ nbits=model_args.quantization_bit, quant_zero=False, quant_scale=False, axis=0
+ ) # use ATEN kernel (axis=0) for performance
+ logger.info_rank0(f"Quantizing model to {model_args.quantization_bit} bit with HQQ.")
+ elif model_args.quantization_method == QuantizationMethod.EETQ:
+ if model_args.quantization_bit != 8:
+ raise ValueError("EETQ only accepts 8-bit quantization.")
+
+ if is_deepspeed_zero3_enabled() or is_fsdp_enabled():
+ raise ValueError("EETQ quantization is incompatible with DeepSpeed ZeRO-3 or FSDP.")
+
+ check_version("eetq", mandatory=True)
+ init_kwargs["quantization_config"] = EetqConfig()
+ logger.info_rank0(f"Quantizing model to {model_args.quantization_bit} bit with EETQ.")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/rope.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/rope.py
new file mode 100644
index 0000000..30d0fdd
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/rope.py
@@ -0,0 +1,71 @@
+# Copyright 2025 LMSYS and the LlamaFactory team.
+# Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
+#
+# This code is inspired by the LMSYS's FastChat library.
+# https://github.com/lm-sys/FastChat/blob/v0.2.30/fastchat/train/train.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import TYPE_CHECKING
+
+from ...extras import logging
+from ...extras.constants import RopeScaling
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig
+
+ from ...hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def configure_rope(config: "PretrainedConfig", model_args: "ModelArguments", is_trainable: bool) -> None:
+ if model_args.rope_scaling is None:
+ return
+
+ if not hasattr(config, "rope_scaling"):
+ logger.warning_rank0("Current model does not support RoPE scaling.")
+ return
+
+ rope_kwargs = {"rope_type": getattr(model_args.rope_scaling, "value", model_args.rope_scaling)} # handle enum
+ if model_args.model_max_length is not None:
+ if is_trainable and model_args.rope_scaling == RopeScaling.DYNAMIC:
+ logger.warning_rank0(
+ "Dynamic NTK scaling may not work well with fine-tuning. "
+ "See: https://github.com/huggingface/transformers/pull/24653"
+ )
+
+ current_max_length = getattr(config, "max_position_embeddings", None)
+ if (not current_max_length) or model_args.model_max_length <= current_max_length:
+ logger.warning_rank0("Input length is smaller than max length. Disabling rope scaling.")
+ return
+
+ logger.info_rank0(f"Enlarge max model length from {current_max_length} to {model_args.model_max_length}.")
+ setattr(config, "max_position_embeddings", model_args.model_max_length)
+ rope_kwargs["factor"] = float(math.ceil(model_args.model_max_length / current_max_length))
+ if model_args.rope_scaling == RopeScaling.DYNAMIC:
+ rope_kwargs["original_max_position_embeddings"] = current_max_length
+ elif model_args.rope_scaling == RopeScaling.LLAMA3:
+ rope_kwargs["original_max_position_embeddings"] = current_max_length
+ rope_kwargs["low_freq_factor"] = 1.0
+ rope_kwargs["high_freq_factor"] = 4.0
+ else:
+ rope_kwargs["factor"] = 2.0
+
+ setattr(config, "rope_scaling", rope_kwargs)
+ logger.info_rank0(
+ f"Using {rope_kwargs['rope_type']} scaling strategy and setting scaling factor to {rope_kwargs['factor']}."
+ )
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py
new file mode 100644
index 0000000..8bb6aa6
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/unsloth.py
@@ -0,0 +1,96 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING, Any, Optional
+
+from ...extras import logging
+from ...extras.misc import get_current_device
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig, PreTrainedModel
+
+ from ...hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def _get_unsloth_kwargs(
+ config: "PretrainedConfig", model_name_or_path: str, model_args: "ModelArguments"
+) -> dict[str, Any]:
+ return {
+ "model_name": model_name_or_path,
+ "max_seq_length": model_args.model_max_length or 4096,
+ "dtype": model_args.compute_dtype,
+ "load_in_4bit": model_args.quantization_bit == 4,
+ "token": model_args.hf_hub_token,
+ "device_map": {"": get_current_device()},
+ "rope_scaling": getattr(config, "rope_scaling", None),
+ "fix_tokenizer": False,
+ "trust_remote_code": model_args.trust_remote_code,
+ "use_gradient_checkpointing": "unsloth",
+ }
+
+
+def load_unsloth_pretrained_model(
+ config: "PretrainedConfig", model_args: "ModelArguments"
+) -> Optional["PreTrainedModel"]:
+ r"""Optionally load pretrained model with unsloth. Used in training."""
+ from unsloth import FastLanguageModel # type: ignore
+
+ unsloth_kwargs = _get_unsloth_kwargs(config, model_args.model_name_or_path, model_args)
+ try:
+ model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs)
+ except NotImplementedError:
+ logger.warning_rank0("Unsloth does not support model type {}.".format(getattr(config, "model_type", None)))
+ model = None
+ model_args.use_unsloth = False
+
+ return model
+
+
+def get_unsloth_peft_model(
+ model: "PreTrainedModel", model_args: "ModelArguments", peft_kwargs: dict[str, Any]
+) -> "PreTrainedModel":
+ r"""Get the peft model for the pretrained model with unsloth. Used in training."""
+ from unsloth import FastLanguageModel # type: ignore
+
+ unsloth_peft_kwargs = {
+ "model": model,
+ "max_seq_length": model_args.model_max_length,
+ "use_gradient_checkpointing": "unsloth",
+ }
+ return FastLanguageModel.get_peft_model(**peft_kwargs, **unsloth_peft_kwargs)
+
+
+def load_unsloth_peft_model(
+ config: "PretrainedConfig", model_args: "ModelArguments", is_trainable: bool
+) -> "PreTrainedModel":
+ r"""Load peft model with unsloth. Used in both training and inference."""
+ from unsloth import FastLanguageModel # type: ignore
+
+ unsloth_kwargs = _get_unsloth_kwargs(config, model_args.adapter_name_or_path[0], model_args)
+ try:
+ if not is_trainable:
+ unsloth_kwargs["use_gradient_checkpointing"] = False
+
+ model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs)
+ except NotImplementedError:
+ raise ValueError("Unsloth does not support model type {}.".format(getattr(config, "model_type", None)))
+
+ if not is_trainable:
+ FastLanguageModel.for_inference(model)
+
+ return model
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/valuehead.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/valuehead.py
new file mode 100644
index 0000000..137c6b7
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/valuehead.py
@@ -0,0 +1,72 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+import torch
+from transformers.utils import cached_file
+
+from ...extras import logging
+from ...extras.constants import V_HEAD_SAFE_WEIGHTS_NAME, V_HEAD_WEIGHTS_NAME
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel
+
+ from ...hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def load_valuehead_params(path_or_repo_id: str, model_args: "ModelArguments") -> dict[str, torch.Tensor]:
+ r"""Load value head parameters from Hugging Face Hub or local disk.
+
+ Returns: dict with keys `v_head.summary.weight` and `v_head.summary.bias`.
+ """
+ kwargs = {"path_or_repo_id": path_or_repo_id, "cache_dir": model_args.cache_dir, "token": model_args.hf_hub_token}
+ err_text = ""
+
+ try:
+ from safetensors import safe_open
+
+ vhead_file = cached_file(filename=V_HEAD_SAFE_WEIGHTS_NAME, **kwargs)
+ with safe_open(vhead_file, framework="pt", device="cpu") as f:
+ return {key: f.get_tensor(key) for key in f.keys()}
+ except Exception as err:
+ err_text = str(err)
+
+ try:
+ vhead_file = cached_file(filename=V_HEAD_WEIGHTS_NAME, **kwargs)
+ return torch.load(vhead_file, map_location="cpu")
+ except Exception as err:
+ err_text = str(err)
+
+ logger.info_rank0(f"Provided path ({path_or_repo_id}) does not contain value head weights: {err_text}.")
+ logger.info_rank0("Ignore the above message if you are not resuming the training of a value head model.")
+ return None
+
+
+def prepare_valuehead_model(model: "PreTrainedModel") -> None:
+ if getattr(model.config, "model_type", None) == "llava":
+ setattr(model, "lm_head", model.language_model.get_output_embeddings())
+ setattr(model, "_keys_to_ignore_on_save", ["lm_head.weight"])
+
+ if getattr(model.config, "model_type", None) == "chatglm":
+ setattr(model, "lm_head", model.transformer.output_layer)
+ setattr(model, "_keys_to_ignore_on_save", ["lm_head.weight"])
+
+ if getattr(model.config, "model_type", None) == "internlm2":
+ setattr(model, "lm_head", model.output)
+ setattr(model, "_keys_to_ignore_on_save", ["lm_head.weight"])
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/visual.py b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/visual.py
new file mode 100644
index 0000000..901010c
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/model_utils/visual.py
@@ -0,0 +1,295 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's Transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/models/llava/modeling_llava.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING, Optional
+
+import torch
+import transformers
+import transformers.models
+from transformers.activations import ACT2FN
+
+from ...extras import logging
+
+
+if TYPE_CHECKING:
+ from transformers import LlavaConfig, PretrainedConfig, PreTrainedModel
+
+ from ...hparams import FinetuningArguments, ModelArguments
+
+
+logger = logging.get_logger(__name__)
+transformers_logger = transformers.utils.logging.get_logger(__name__)
+
+
+@dataclass
+class CompositeModel:
+ model_type: str
+ projector_key: str
+ vision_model_keys: list[str]
+ language_model_keys: list[str]
+ lora_conflict_keys: list[str]
+
+ def get_projector(self, module: "torch.nn.Module") -> "torch.nn.Module":
+ for key in self.projector_key.split("."):
+ module = getattr(module, key)
+
+ return module
+
+
+COMPOSITE_MODELS: dict[str, "CompositeModel"] = {}
+
+
+def _register_composite_model(
+ model_type: str,
+ projector_key: Optional[str] = None,
+ vision_model_keys: Optional[list[str]] = None,
+ language_model_keys: Optional[list[str]] = None,
+ lora_conflict_keys: Optional[list[str]] = None,
+):
+ r"""Register a new composite model.
+
+ Args:
+ model_type: model type
+ projector_key: multi_modal_projector
+ vision_model_keys: vision_tower
+ language_model_keys: language_model
+ lora_conflict_keys: None
+
+ """
+ COMPOSITE_MODELS[model_type] = CompositeModel(
+ model_type=model_type,
+ projector_key=projector_key or "multi_modal_projector",
+ vision_model_keys=vision_model_keys or ["vision_tower"],
+ language_model_keys=language_model_keys or ["language_model"],
+ lora_conflict_keys=lora_conflict_keys or [],
+ )
+
+
+class LlavaMultiModalProjectorForYiVL(torch.nn.Module):
+ def __init__(self, config: "LlavaConfig") -> None:
+ super().__init__()
+
+ self.config = config
+ if config is None:
+ return
+
+ self.linear_1 = torch.nn.Linear(config.vision_config.hidden_size, config.text_config.hidden_size, bias=True)
+ self.linear_2 = torch.nn.LayerNorm(config.text_config.hidden_size, bias=True)
+ self.linear_3 = torch.nn.Linear(config.text_config.hidden_size, config.text_config.hidden_size, bias=True)
+ self.linear_4 = torch.nn.LayerNorm(config.text_config.hidden_size, bias=True)
+ self.act = ACT2FN[config.projector_hidden_act]
+
+ def forward(self, image_features: "torch.Tensor") -> "torch.Tensor":
+ hidden_states = self.linear_1(image_features)
+ hidden_states = self.linear_2(hidden_states)
+ hidden_states = self.act(hidden_states)
+ hidden_states = self.linear_3(hidden_states)
+ hidden_states = self.linear_4(hidden_states)
+ if hidden_states.dtype == torch.float32:
+ if torch.is_autocast_enabled():
+ target_dtype = torch.get_autocast_gpu_dtype()
+ elif hasattr(self.config, "_pre_quantization_dtype"):
+ target_dtype = self.config._pre_quantization_dtype
+ else:
+ target_dtype = self.linear_1.weight.dtype
+
+ transformers_logger.warning_once("The hidden states seems to be silently casted in float32.")
+ hidden_states = hidden_states.to(target_dtype)
+
+ return hidden_states
+
+
+class LlavaMultiModalProjectorForYiVLForVLLM(LlavaMultiModalProjectorForYiVL):
+ def __init__(self, vision_hidden_size: int, text_hidden_size: int, projector_hidden_act: str) -> None:
+ super().__init__(config=None)
+
+ self.linear_1 = torch.nn.Linear(vision_hidden_size, text_hidden_size, bias=True)
+ self.linear_2 = torch.nn.LayerNorm(text_hidden_size, bias=True)
+ self.linear_3 = torch.nn.Linear(text_hidden_size, text_hidden_size, bias=True)
+ self.linear_4 = torch.nn.LayerNorm(text_hidden_size, bias=True)
+ self.act = ACT2FN[projector_hidden_act]
+
+
+def autocast_projector_dtype(model: "PreTrainedModel", model_args: "ModelArguments") -> None:
+ r"""Cast projector output to half precision for fine-tuning quantized VLMs."""
+
+ def _mm_projector_forward_post_hook(
+ module: "torch.nn.Module", args: tuple["torch.Tensor"], output: "torch.Tensor"
+ ) -> "torch.Tensor":
+ return output.to(model_args.compute_dtype)
+
+ if getattr(model, "quantization_method", None):
+ model_type = getattr(model.config, "model_type", None)
+ if model_type in COMPOSITE_MODELS:
+ mm_projector = COMPOSITE_MODELS[model_type].get_projector(model)
+ else:
+ return
+
+ logger.info_rank0(f"Casting multimodal projector outputs in {model_args.compute_dtype}.")
+ mm_projector.register_forward_hook(_mm_projector_forward_post_hook)
+
+
+def configure_visual_model(config: "PretrainedConfig") -> None:
+ r"""Patch VLMs before loading them."""
+ if getattr(config, "text_config", None) and not getattr(config, "hidden_size", None):
+ # required for ds zero3 and valuehead models
+ setattr(config, "hidden_size", getattr(config.text_config, "hidden_size", None))
+
+ if getattr(config, "is_yi_vl_derived_model", None):
+ logger.info_rank0("Detected Yi-VL model, applying projector patch.")
+ transformers.models.llava.modeling_llava.LlavaMultiModalProjector = LlavaMultiModalProjectorForYiVL
+
+
+def get_forbidden_modules(config: "PretrainedConfig", finetuning_args: "FinetuningArguments") -> set[str]:
+ r"""Freeze vision tower and language model for VLM full/freeze tuning."""
+ model_type = getattr(config, "model_type", None)
+ forbidden_modules = set()
+ if model_type in COMPOSITE_MODELS:
+ if finetuning_args.freeze_vision_tower:
+ vision_model_keys = COMPOSITE_MODELS[model_type].vision_model_keys
+ logger.info_rank0(f"Set vision model not trainable: {vision_model_keys}.")
+ forbidden_modules.update(vision_model_keys)
+
+ if finetuning_args.freeze_multi_modal_projector:
+ projector_key = COMPOSITE_MODELS[model_type].projector_key
+ logger.info_rank0(f"Set multi model projector not trainable: {projector_key}.")
+ forbidden_modules.add(projector_key)
+
+ if finetuning_args.freeze_language_model:
+ language_model_keys = COMPOSITE_MODELS[model_type].language_model_keys
+ logger.info_rank0(f"Set language model not trainable: {language_model_keys}.")
+ forbidden_modules.update(language_model_keys)
+
+ return forbidden_modules
+
+
+def patch_target_modules(
+ model: "PreTrainedModel", finetuning_args: "FinetuningArguments", target_modules: list[str]
+) -> list[str]:
+ r"""Freeze vision tower for VLM LoRA tuning."""
+ model_type = getattr(model.config, "model_type", None)
+ if model_type in COMPOSITE_MODELS:
+ forbidden_modules = get_forbidden_modules(model.config, finetuning_args)
+ forbidden_modules.update(COMPOSITE_MODELS[model_type].lora_conflict_keys)
+ module_names = []
+ for name, _ in model.named_modules():
+ if any(target_module in name for target_module in target_modules) and not any(
+ forbidden_module in name for forbidden_module in forbidden_modules
+ ):
+ module_names.append(name)
+
+ return module_names
+ else:
+ return target_modules
+
+
+_register_composite_model(
+ model_type="internvl",
+)
+
+
+_register_composite_model(
+ model_type="gemma3",
+)
+
+
+_register_composite_model(
+ model_type="llama4",
+ vision_model_keys=["vision_model"],
+)
+
+
+_register_composite_model(
+ model_type="llava",
+)
+
+
+_register_composite_model(
+ model_type="llava_next",
+)
+
+
+_register_composite_model(
+ model_type="llava_next_video",
+)
+
+
+_register_composite_model(
+ model_type="minicpmv",
+ projector_key="resampler",
+ vision_model_keys=["vpm"],
+ language_model_keys=["llm"],
+)
+
+
+_register_composite_model(
+ model_type="minicpmo",
+ projector_key="resampler",
+ vision_model_keys=["vpm", "apm", "audio_avg_pooler", "audio_projection_layer", "tts"],
+ language_model_keys=["llm"],
+ lora_conflict_keys=["audio_projection_layer"],
+)
+
+
+_register_composite_model(
+ model_type="paligemma",
+)
+
+
+_register_composite_model(
+ model_type="video_llava",
+)
+
+
+_register_composite_model(
+ model_type="mllama",
+ vision_model_keys=["vision_model"],
+)
+
+
+_register_composite_model(
+ model_type="qwen2_audio",
+ vision_model_keys=["audio_tower"],
+)
+
+
+_register_composite_model(
+ model_type="qwen2_5_omni_thinker",
+ projector_key="visual.merger",
+ vision_model_keys=["visual.patch_embed", "visual.blocks", "audio_tower"],
+ language_model_keys=["model", "lm_head"],
+ lora_conflict_keys=["patch_embed"],
+)
+
+
+_register_composite_model(
+ model_type="qwen2_vl",
+ projector_key="visual.merger",
+ vision_model_keys=["visual.patch_embed", "visual.blocks"],
+ language_model_keys=["model", "lm_head"],
+ lora_conflict_keys=["patch_embed"],
+)
+
+
+_register_composite_model(
+ model_type="qwen2_5_vl",
+ projector_key="visual.merger",
+ vision_model_keys=["visual.patch_embed", "visual.blocks"],
+ language_model_keys=["model", "lm_head"],
+ lora_conflict_keys=["patch_embed"],
+)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/model/patcher.py b/post-training/LLaMA-Factory/src/llamafactory/model/patcher.py
new file mode 100644
index 0000000..28014de
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/model/patcher.py
@@ -0,0 +1,203 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from types import MethodType
+from typing import TYPE_CHECKING, Any
+
+import torch
+from peft import PeftModel
+from transformers import PreTrainedModel, PreTrainedTokenizerBase
+from transformers.integrations import is_deepspeed_zero3_enabled
+from transformers.modeling_utils import is_fsdp_enabled
+
+from ..extras import logging
+from ..extras.misc import infer_optim_dtype
+from ..extras.packages import is_transformers_version_greater_than
+from .model_utils.attention import configure_attn_implementation, print_attn_implementation
+from .model_utils.checkpointing import prepare_model_for_training
+from .model_utils.embedding import resize_embedding_layer
+from .model_utils.kv_cache import configure_kv_cache
+from .model_utils.longlora import configure_longlora
+from .model_utils.moe import add_z3_leaf_module, configure_moe
+from .model_utils.packing import configure_packing
+from .model_utils.quantization import configure_quantization
+from .model_utils.rope import configure_rope
+from .model_utils.valuehead import prepare_valuehead_model
+from .model_utils.visual import autocast_projector_dtype, configure_visual_model
+
+
+if TYPE_CHECKING:
+ from transformers import PretrainedConfig, PreTrainedTokenizer, ProcessorMixin
+ from trl import AutoModelForCausalLMWithValueHead
+
+ from ..hparams import ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def patch_tokenizer(tokenizer: "PreTrainedTokenizer", model_args: "ModelArguments") -> None:
+ if "PreTrainedTokenizerBase" not in str(tokenizer._pad.__func__):
+ tokenizer._pad = MethodType(PreTrainedTokenizerBase._pad, tokenizer)
+
+ if model_args.model_max_length is not None and tokenizer.model_max_length < model_args.model_max_length:
+ tokenizer.model_max_length = model_args.model_max_length # enlarge the tokenizer max length
+
+ if model_args.new_special_tokens is not None:
+ num_added_tokens = tokenizer.add_special_tokens(
+ dict(additional_special_tokens=model_args.new_special_tokens),
+ replace_additional_special_tokens=False,
+ )
+ logger.info_rank0("Add {} to special tokens.".format(",".join(model_args.new_special_tokens)))
+ if num_added_tokens > 0 and not model_args.resize_vocab:
+ model_args.resize_vocab = True
+ logger.warning_rank0("New tokens have been added, changed `resize_vocab` to True.")
+
+
+def patch_processor(
+ processor: "ProcessorMixin",
+ tokenizer: "PreTrainedTokenizer",
+ model_args: "ModelArguments",
+) -> None:
+ setattr(processor, "tokenizer", tokenizer)
+ setattr(processor, "image_max_pixels", model_args.image_max_pixels)
+ setattr(processor, "image_min_pixels", model_args.image_min_pixels)
+ setattr(processor, "image_do_pan_and_scan", model_args.image_do_pan_and_scan)
+ setattr(processor, "video_max_pixels", model_args.video_max_pixels)
+ setattr(processor, "video_min_pixels", model_args.video_min_pixels)
+ setattr(processor, "video_fps", model_args.video_fps)
+ setattr(processor, "video_maxlen", model_args.video_maxlen)
+ setattr(processor, "audio_sampling_rate", model_args.audio_sampling_rate)
+ setattr(processor, "use_audio_in_video", model_args.use_audio_in_video)
+
+
+def patch_config(
+ config: "PretrainedConfig",
+ tokenizer: "PreTrainedTokenizer",
+ model_args: "ModelArguments",
+ init_kwargs: dict[str, Any],
+ is_trainable: bool,
+) -> None:
+ if model_args.compute_dtype is None: # priority: bf16 > fp16 > fp32
+ if model_args.infer_dtype != "auto" and not is_trainable:
+ model_args.compute_dtype = getattr(torch, model_args.infer_dtype)
+ else:
+ model_args.compute_dtype = infer_optim_dtype(model_dtype=getattr(config, "torch_dtype", None))
+
+ configure_attn_implementation(config, model_args, is_trainable)
+ configure_rope(config, model_args, is_trainable)
+ configure_longlora(config, model_args, is_trainable)
+ configure_quantization(config, tokenizer, model_args, init_kwargs)
+ configure_moe(config, model_args, is_trainable)
+ configure_visual_model(config)
+ configure_packing(model_args, is_trainable)
+ configure_kv_cache(config, model_args, is_trainable)
+
+ if getattr(config, "model_type", None) == "qwen":
+ setattr(config, "use_flash_attn", model_args.flash_attn == "fa2")
+ for dtype_name, dtype in [("fp16", torch.float16), ("bf16", torch.bfloat16), ("fp32", torch.float32)]:
+ setattr(config, dtype_name, model_args.compute_dtype == dtype)
+
+ if getattr(config, "model_type", None) == "minicpmo":
+ setattr(config, "init_audio", True)
+ setattr(config, "init_tts", False)
+
+ # replace the top-k gating method
+ if getattr(config, "model_type", None) == "kimi_vl" and is_trainable:
+ setattr(config.text_config, "topk_method", "greedy")
+
+ if "LlavaLlamaForCausalLM" in getattr(config, "architectures", []):
+ raise ValueError("Please download llava models with hf-compatible format: https://huggingface.co/llava-hf")
+
+ if getattr(config, "model_type", None) == "internlm3" and not is_transformers_version_greater_than("4.47.1"):
+ raise RuntimeError("InternLM3 model requires transformers>=4.47.1, please upgrade it.")
+
+ # deepspeed zero3 is not compatible with low_cpu_mem_usage
+ init_kwargs["low_cpu_mem_usage"] = model_args.low_cpu_mem_usage and (not is_deepspeed_zero3_enabled())
+
+ # do not cast data type of the model deepspeed zero3 without qlora
+ if not (is_deepspeed_zero3_enabled() and model_args.quantization_bit is None):
+ init_kwargs["torch_dtype"] = model_args.compute_dtype
+
+ if init_kwargs["low_cpu_mem_usage"] and not is_fsdp_enabled(): # fsdp does not need device map
+ if "device_map" not in init_kwargs and model_args.device_map:
+ init_kwargs["device_map"] = model_args.device_map # device map requires low_cpu_mem_usage=True
+
+ if init_kwargs.get("device_map", None) == "auto":
+ init_kwargs["offload_folder"] = model_args.offload_folder
+
+
+def patch_model(
+ model: "PreTrainedModel",
+ tokenizer: "PreTrainedTokenizer",
+ model_args: "ModelArguments",
+ is_trainable: bool,
+ add_valuehead: bool,
+) -> None:
+ gen_config = model.generation_config # check and fix generation config
+ if not gen_config.do_sample and (
+ (gen_config.temperature is not None and gen_config.temperature != 1.0)
+ or (gen_config.top_p is not None and gen_config.top_p != 1.0)
+ or (gen_config.typical_p is not None and gen_config.typical_p != 1.0)
+ ):
+ gen_config.do_sample = True
+
+ if getattr(model.config, "model_type", None) not in ["minicpmv", "minicpmo"] and "GenerationMixin" not in str(
+ model.generate.__func__
+ ):
+ model.generate = MethodType(PreTrainedModel.generate, model)
+
+ if add_valuehead:
+ prepare_valuehead_model(model)
+
+ if model_args.resize_vocab:
+ resize_embedding_layer(model, tokenizer)
+
+ if is_trainable:
+ prepare_model_for_training(model, model_args)
+ autocast_projector_dtype(model, model_args)
+ add_z3_leaf_module(model)
+
+ if not model_args.use_unsloth:
+ print_attn_implementation(model.config)
+
+ try:
+ model.add_model_tags(["llama-factory"])
+ except Exception:
+ logger.warning_rank0("Cannot properly tag the model.")
+
+
+def patch_valuehead_model(model: "AutoModelForCausalLMWithValueHead") -> None:
+ def tie_weights(self: "AutoModelForCausalLMWithValueHead") -> None:
+ if isinstance(self.pretrained_model, PreTrainedModel):
+ self.pretrained_model.tie_weights()
+
+ def get_input_embeddings(self: "AutoModelForCausalLMWithValueHead") -> torch.nn.Module:
+ if isinstance(self.pretrained_model, PreTrainedModel):
+ return self.pretrained_model.get_input_embeddings()
+
+ def get_output_embeddings(self: "AutoModelForCausalLMWithValueHead") -> torch.nn.Module:
+ if isinstance(self.pretrained_model, PreTrainedModel):
+ return self.pretrained_model.get_output_embeddings()
+
+ def create_or_update_model_card(self: "AutoModelForCausalLMWithValueHead", output_dir: str) -> None:
+ if isinstance(self.pretrained_model, PeftModel):
+ self.pretrained_model.create_or_update_model_card(output_dir)
+
+ ignore_modules = [name for name, _ in model.named_parameters() if "pretrained_model" in name]
+ setattr(model, "_keys_to_ignore_on_save", ignore_modules)
+ setattr(model, "tie_weights", MethodType(tie_weights, model))
+ setattr(model, "get_input_embeddings", MethodType(get_input_embeddings, model))
+ setattr(model, "get_output_embeddings", MethodType(get_output_embeddings, model))
+ setattr(model, "create_or_update_model_card", MethodType(create_or_update_model_card, model))
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/train/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/callbacks.py b/post-training/LLaMA-Factory/src/llamafactory/train/callbacks.py
new file mode 100644
index 0000000..c1bd159
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/callbacks.py
@@ -0,0 +1,385 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+import signal
+import sys
+import time
+from concurrent.futures import ThreadPoolExecutor
+from datetime import timedelta
+from typing import TYPE_CHECKING, Any, Optional
+
+import torch
+import transformers
+from peft import PeftModel
+from transformers import PreTrainedModel, ProcessorMixin, TrainerCallback
+from transformers.trainer_utils import PREFIX_CHECKPOINT_DIR, has_length
+from transformers.utils import (
+ SAFE_WEIGHTS_NAME,
+ WEIGHTS_NAME,
+ is_safetensors_available,
+)
+from typing_extensions import override
+
+from ..extras import logging
+from ..extras.constants import TRAINER_LOG, V_HEAD_SAFE_WEIGHTS_NAME, V_HEAD_WEIGHTS_NAME
+from ..extras.misc import get_peak_memory, is_env_enabled, use_ray
+
+
+if is_safetensors_available():
+ from safetensors import safe_open
+ from safetensors.torch import save_file
+
+
+if TYPE_CHECKING:
+ from transformers import TrainerControl, TrainerState, TrainingArguments
+ from trl import AutoModelForCausalLMWithValueHead
+
+ from ..hparams import DataArguments, FinetuningArguments, GeneratingArguments, ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+def fix_valuehead_checkpoint(
+ model: "AutoModelForCausalLMWithValueHead", output_dir: str, safe_serialization: bool
+) -> None:
+ r"""Fix the valuehead checkpoint files.
+
+ The model is already unwrapped.
+
+ There are three cases:
+ 1. full tuning without ds_zero3: state_dict = {"model.layers.*": ..., "v_head.summary.*": ...}
+ 2. lora tuning without ds_zero3: state_dict = {"v_head.summary.*": ...}
+ 3. under deepspeed zero3: state_dict = {"pretrained_model.model.layers.*": ..., "v_head.summary.*": ...}
+
+ We assume `stage3_gather_16bit_weights_on_model_save=true`.
+ """
+ if not isinstance(model.pretrained_model, (PreTrainedModel, PeftModel)):
+ return
+
+ if safe_serialization:
+ path_to_checkpoint = os.path.join(output_dir, SAFE_WEIGHTS_NAME)
+ with safe_open(path_to_checkpoint, framework="pt", device="cpu") as f:
+ state_dict: dict[str, torch.Tensor] = {key: f.get_tensor(key) for key in f.keys()}
+ else:
+ path_to_checkpoint = os.path.join(output_dir, WEIGHTS_NAME)
+ state_dict: dict[str, torch.Tensor] = torch.load(path_to_checkpoint, map_location="cpu")
+
+ os.remove(path_to_checkpoint)
+ decoder_state_dict, v_head_state_dict = {}, {}
+ for name, param in state_dict.items():
+ if name.startswith("v_head."):
+ v_head_state_dict[name] = param
+ else:
+ decoder_state_dict[name.replace("pretrained_model.", "", 1)] = param
+
+ model.pretrained_model.save_pretrained(
+ output_dir, state_dict=decoder_state_dict or None, safe_serialization=safe_serialization
+ )
+
+ if safe_serialization:
+ save_file(v_head_state_dict, os.path.join(output_dir, V_HEAD_SAFE_WEIGHTS_NAME), metadata={"format": "pt"})
+ else:
+ torch.save(v_head_state_dict, os.path.join(output_dir, V_HEAD_WEIGHTS_NAME))
+
+ logger.info_rank0(f"Value head model saved at: {output_dir}")
+
+
+class FixValueHeadModelCallback(TrainerCallback):
+ r"""A callback for fixing the checkpoint for valuehead models."""
+
+ @override
+ def on_save(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if args.should_save:
+ output_dir = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")
+ fix_valuehead_checkpoint(
+ model=kwargs.pop("model"), output_dir=output_dir, safe_serialization=args.save_safetensors
+ )
+
+
+class SaveProcessorCallback(TrainerCallback):
+ r"""A callback for saving the processor."""
+
+ def __init__(self, processor: "ProcessorMixin") -> None:
+ self.processor = processor
+
+ @override
+ def on_save(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if args.should_save:
+ output_dir = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")
+ self.processor.save_pretrained(output_dir)
+
+ @override
+ def on_train_end(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if args.should_save:
+ self.processor.save_pretrained(args.output_dir)
+
+
+class PissaConvertCallback(TrainerCallback):
+ r"""A callback for converting the PiSSA adapter to a normal one."""
+
+ @override
+ def on_train_begin(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if args.should_save:
+ model = kwargs.pop("model")
+ pissa_init_dir = os.path.join(args.output_dir, "pissa_init")
+ logger.info_rank0(f"Initial PiSSA adapter will be saved at: {pissa_init_dir}.")
+ if isinstance(model, PeftModel):
+ init_lora_weights = getattr(model.peft_config["default"], "init_lora_weights")
+ setattr(model.peft_config["default"], "init_lora_weights", True)
+ model.save_pretrained(pissa_init_dir, safe_serialization=args.save_safetensors)
+ setattr(model.peft_config["default"], "init_lora_weights", init_lora_weights)
+
+ @override
+ def on_train_end(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if args.should_save:
+ model = kwargs.pop("model")
+ pissa_init_dir = os.path.join(args.output_dir, "pissa_init")
+ pissa_backup_dir = os.path.join(args.output_dir, "pissa_backup")
+ pissa_convert_dir = os.path.join(args.output_dir, "pissa_converted")
+ logger.info_rank0(f"Converted PiSSA adapter will be saved at: {pissa_convert_dir}.")
+ # 1. save a pissa backup with init_lora_weights: True
+ # 2. save a converted lora with init_lora_weights: pissa
+ # 3. load the pissa backup with init_lora_weights: True
+ # 4. delete the initial adapter and change init_lora_weights to pissa
+ if isinstance(model, PeftModel):
+ init_lora_weights = getattr(model.peft_config["default"], "init_lora_weights")
+ setattr(model.peft_config["default"], "init_lora_weights", True)
+ model.save_pretrained(pissa_backup_dir, safe_serialization=args.save_safetensors)
+ setattr(model.peft_config["default"], "init_lora_weights", init_lora_weights)
+ model.save_pretrained(
+ pissa_convert_dir,
+ safe_serialization=args.save_safetensors,
+ path_initial_model_for_weight_conversion=pissa_init_dir,
+ )
+ model.load_adapter(pissa_backup_dir, "default", is_trainable=True)
+ model.set_adapter("default")
+ setattr(model.peft_config["default"], "init_lora_weights", init_lora_weights)
+
+
+class LogCallback(TrainerCallback):
+ r"""A callback for logging training and evaluation status."""
+
+ def __init__(self) -> None:
+ # Progress
+ self.start_time = 0
+ self.cur_steps = 0
+ self.max_steps = 0
+ self.elapsed_time = ""
+ self.remaining_time = ""
+ self.thread_pool: Optional[ThreadPoolExecutor] = None
+ # Status
+ self.aborted = False
+ self.do_train = False
+ # Web UI
+ self.webui_mode = is_env_enabled("LLAMABOARD_ENABLED")
+ if self.webui_mode and not use_ray():
+ signal.signal(signal.SIGABRT, self._set_abort)
+ self.logger_handler = logging.LoggerHandler(os.getenv("LLAMABOARD_WORKDIR"))
+ logging.add_handler(self.logger_handler)
+ transformers.logging.add_handler(self.logger_handler)
+
+ def _set_abort(self, signum, frame) -> None:
+ self.aborted = True
+
+ def _reset(self, max_steps: int = 0) -> None:
+ self.start_time = time.time()
+ self.cur_steps = 0
+ self.max_steps = max_steps
+ self.elapsed_time = ""
+ self.remaining_time = ""
+
+ def _timing(self, cur_steps: int) -> None:
+ cur_time = time.time()
+ elapsed_time = cur_time - self.start_time
+ avg_time_per_step = elapsed_time / cur_steps if cur_steps != 0 else 0
+ remaining_time = (self.max_steps - cur_steps) * avg_time_per_step
+ self.cur_steps = cur_steps
+ self.elapsed_time = str(timedelta(seconds=int(elapsed_time)))
+ self.remaining_time = str(timedelta(seconds=int(remaining_time)))
+
+ def _write_log(self, output_dir: str, logs: dict[str, Any]) -> None:
+ with open(os.path.join(output_dir, TRAINER_LOG), "a", encoding="utf-8") as f:
+ f.write(json.dumps(logs) + "\n")
+
+ def _create_thread_pool(self, output_dir: str) -> None:
+ os.makedirs(output_dir, exist_ok=True)
+ self.thread_pool = ThreadPoolExecutor(max_workers=1)
+
+ def _close_thread_pool(self) -> None:
+ if self.thread_pool is not None:
+ self.thread_pool.shutdown(wait=True)
+ self.thread_pool = None
+
+ @override
+ def on_init_end(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if (
+ args.should_save
+ and os.path.exists(os.path.join(args.output_dir, TRAINER_LOG))
+ and args.overwrite_output_dir
+ ):
+ logger.warning_rank0_once("Previous trainer log in this folder will be deleted.")
+ os.remove(os.path.join(args.output_dir, TRAINER_LOG))
+
+ @override
+ def on_train_begin(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if args.should_save:
+ self.do_train = True
+ self._reset(max_steps=state.max_steps)
+ self._create_thread_pool(output_dir=args.output_dir)
+
+ @override
+ def on_train_end(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ self._close_thread_pool()
+
+ @override
+ def on_substep_end(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if self.aborted:
+ control.should_epoch_stop = True
+ control.should_training_stop = True
+
+ @override
+ def on_step_end(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if self.aborted:
+ control.should_epoch_stop = True
+ control.should_training_stop = True
+
+ @override
+ def on_evaluate(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if not self.do_train:
+ self._close_thread_pool()
+
+ @override
+ def on_predict(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if not self.do_train:
+ self._close_thread_pool()
+
+ @override
+ def on_log(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if not args.should_save:
+ return
+
+ self._timing(cur_steps=state.global_step)
+ logs = dict(
+ current_steps=self.cur_steps,
+ total_steps=self.max_steps,
+ loss=state.log_history[-1].get("loss"),
+ eval_loss=state.log_history[-1].get("eval_loss"),
+ predict_loss=state.log_history[-1].get("predict_loss"),
+ reward=state.log_history[-1].get("reward"),
+ accuracy=state.log_history[-1].get("rewards/accuracies"),
+ lr=state.log_history[-1].get("learning_rate"),
+ epoch=state.log_history[-1].get("epoch"),
+ percentage=round(self.cur_steps / self.max_steps * 100, 2) if self.max_steps != 0 else 100,
+ elapsed_time=self.elapsed_time,
+ remaining_time=self.remaining_time,
+ )
+ if state.num_input_tokens_seen:
+ logs["throughput"] = round(state.num_input_tokens_seen / (time.time() - self.start_time), 2)
+ logs["total_tokens"] = state.num_input_tokens_seen
+
+ if is_env_enabled("RECORD_VRAM"):
+ vram_allocated, vram_reserved = get_peak_memory()
+ logs["vram_allocated"] = round(vram_allocated / (1024**3), 2)
+ logs["vram_reserved"] = round(vram_reserved / (1024**3), 2)
+
+ logs = {k: v for k, v in logs.items() if v is not None}
+ if self.webui_mode and all(key in logs for key in ("loss", "lr", "epoch")):
+ log_str = f"'loss': {logs['loss']:.4f}, 'learning_rate': {logs['lr']:2.4e}, 'epoch': {logs['epoch']:.2f}"
+ for extra_key in ("reward", "accuracy", "throughput"):
+ if logs.get(extra_key):
+ log_str += f", '{extra_key}': {logs[extra_key]:.2f}"
+
+ logger.info_rank0("{" + log_str + "}")
+
+ if self.thread_pool is not None:
+ self.thread_pool.submit(self._write_log, args.output_dir, logs)
+
+ @override
+ def on_prediction_step(
+ self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs
+ ):
+ if self.do_train:
+ return
+
+ if self.aborted:
+ sys.exit(0)
+
+ if not args.should_save:
+ return
+
+ eval_dataloader = kwargs.pop("eval_dataloader", None)
+ if has_length(eval_dataloader):
+ if self.max_steps == 0:
+ self._reset(max_steps=len(eval_dataloader))
+ self._create_thread_pool(output_dir=args.output_dir)
+
+ self._timing(cur_steps=self.cur_steps + 1)
+ if self.cur_steps % 5 == 0 and self.thread_pool is not None:
+ logs = dict(
+ current_steps=self.cur_steps,
+ total_steps=self.max_steps,
+ percentage=round(self.cur_steps / self.max_steps * 100, 2) if self.max_steps != 0 else 100,
+ elapsed_time=self.elapsed_time,
+ remaining_time=self.remaining_time,
+ )
+ self.thread_pool.submit(self._write_log, args.output_dir, logs)
+
+
+class ReporterCallback(TrainerCallback):
+ r"""A callback for reporting training status to external logger."""
+
+ def __init__(
+ self,
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ finetuning_args: "FinetuningArguments",
+ generating_args: "GeneratingArguments",
+ ) -> None:
+ self.model_args = model_args
+ self.data_args = data_args
+ self.finetuning_args = finetuning_args
+ self.generating_args = generating_args
+ os.environ["WANDB_PROJECT"] = os.getenv("WANDB_PROJECT", "llamafactory")
+
+ @override
+ def on_train_begin(self, args: "TrainingArguments", state: "TrainerState", control: "TrainerControl", **kwargs):
+ if not state.is_world_process_zero:
+ return
+
+ if "wandb" in args.report_to:
+ import wandb
+
+ wandb.config.update(
+ {
+ "model_args": self.model_args.to_dict(),
+ "data_args": self.data_args.to_dict(),
+ "finetuning_args": self.finetuning_args.to_dict(),
+ "generating_args": self.generating_args.to_dict(),
+ }
+ )
+
+ if self.finetuning_args.use_swanlab:
+ import swanlab # type: ignore
+
+ swanlab.config.update(
+ {
+ "model_args": self.model_args.to_dict(),
+ "data_args": self.data_args.to_dict(),
+ "finetuning_args": self.finetuning_args.to_dict(),
+ "generating_args": self.generating_args.to_dict(),
+ }
+ )
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/dpo/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/train/dpo/__init__.py
new file mode 100644
index 0000000..73c1a4a
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/dpo/__init__.py
@@ -0,0 +1,18 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .workflow import run_dpo
+
+
+__all__ = ["run_dpo"]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/dpo/trainer.py b/post-training/LLaMA-Factory/src/llamafactory/train/dpo/trainer.py
new file mode 100644
index 0000000..80f67c6
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/dpo/trainer.py
@@ -0,0 +1,297 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's TRL library.
+# https://github.com/huggingface/trl/blob/v0.8.0/trl/trainer/dpo_trainer.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import warnings
+from collections import defaultdict
+from contextlib import nullcontext
+from types import MethodType
+from typing import TYPE_CHECKING, Literal, Optional, Union
+
+import torch
+import torch.nn.functional as F
+from transformers import Trainer
+from trl import DPOTrainer
+from trl.trainer import disable_dropout_in_model
+from typing_extensions import override
+
+from ...extras.constants import IGNORE_INDEX
+from ...extras.packages import is_transformers_version_greater_than
+from ..callbacks import SaveProcessorCallback
+from ..trainer_utils import create_custom_optimizer, create_custom_scheduler, get_batch_logps, nested_detach
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel, ProcessorMixin
+
+ from ...hparams import FinetuningArguments
+
+
+class CustomDPOTrainer(DPOTrainer):
+ def __init__(
+ self,
+ model: Union["PreTrainedModel", torch.nn.Module],
+ ref_model: Optional[Union["PreTrainedModel", torch.nn.Module]],
+ finetuning_args: "FinetuningArguments",
+ processor: Optional["ProcessorMixin"],
+ disable_dropout: bool = True,
+ **kwargs,
+ ):
+ if is_transformers_version_greater_than("4.46"):
+ kwargs["processing_class"] = kwargs.pop("tokenizer")
+
+ if disable_dropout:
+ disable_dropout_in_model(model)
+ if ref_model is not None:
+ disable_dropout_in_model(ref_model)
+
+ self.finetuning_args = finetuning_args
+ self.f_divergence_type = "reverse_kl"
+ self.reference_free = False
+ self.use_dpo_data_collator = True # hack to avoid warning
+ self.generate_during_eval = False # disable at evaluation
+ self.label_pad_token_id = IGNORE_INDEX
+ self.padding_value = 0
+ self.is_encoder_decoder = model.config.is_encoder_decoder
+ self.precompute_ref_log_probs = False
+ self._precomputed_train_ref_log_probs = False
+ self._precomputed_eval_ref_log_probs = False
+ self._peft_has_been_casted_to_bf16 = False
+
+ self.ref_model = ref_model
+ self._stored_metrics = defaultdict(lambda: defaultdict(list))
+
+ # dpo hyperparams
+ self.beta = finetuning_args.pref_beta
+ self.loss_type = finetuning_args.pref_loss
+ self.ftx_gamma = finetuning_args.pref_ftx
+ self.label_smoothing = finetuning_args.dpo_label_smoothing
+ self.simpo_gamma = finetuning_args.simpo_gamma
+
+ Trainer.__init__(self, model=model, **kwargs)
+ self.model_accepts_loss_kwargs = False # overwrite trainer's default behavior
+ if not hasattr(self, "accelerator"):
+ raise AttributeError("Please update `transformers`.")
+
+ warnings.simplefilter("ignore") # remove gc warnings on ref model
+
+ if ref_model is not None:
+ if self.is_deepspeed_enabled:
+ if not (
+ getattr(ref_model, "is_loaded_in_8bit", False) or getattr(ref_model, "is_loaded_in_4bit", False)
+ ): # quantized models are already set on the correct device
+ self.ref_model = self._prepare_deepspeed(self.ref_model)
+ else:
+ self.ref_model = self.accelerator.prepare_model(self.ref_model, evaluation_mode=True)
+ self.ref_model.eval()
+
+ if processor is not None:
+ self.add_callback(SaveProcessorCallback(processor))
+
+ if finetuning_args.use_badam:
+ from badam import BAdamCallback, clip_grad_norm_old_version # type: ignore
+
+ self.accelerator.clip_grad_norm_ = MethodType(clip_grad_norm_old_version, self.accelerator)
+ self.add_callback(BAdamCallback)
+
+ @override
+ def create_optimizer(self) -> "torch.optim.Optimizer":
+ if self.optimizer is None:
+ self.optimizer = create_custom_optimizer(self.model, self.args, self.finetuning_args)
+ return super().create_optimizer()
+
+ @override
+ def create_scheduler(
+ self, num_training_steps: int, optimizer: Optional["torch.optim.Optimizer"] = None
+ ) -> "torch.optim.lr_scheduler.LRScheduler":
+ create_custom_scheduler(self.args, num_training_steps, optimizer)
+ return super().create_scheduler(num_training_steps, optimizer)
+
+ @override
+ def _get_train_sampler(self) -> Optional["torch.utils.data.Sampler"]:
+ if self.finetuning_args.disable_shuffling:
+ return torch.utils.data.SequentialSampler(self.train_dataset)
+
+ return super()._get_train_sampler()
+
+ @override
+ def get_batch_samples(self, *args, **kwargs):
+ r"""Replace the method of DPO Trainer with the one of the standard Trainer."""
+ return Trainer.get_batch_samples(self, *args, **kwargs)
+
+ def odds_ratio_loss(self, chosen_logps: "torch.Tensor", rejected_logps: "torch.Tensor") -> "torch.Tensor":
+ r"""Compute ORPO's odds ratio (OR) loss for batched log probabilities of the policy model."""
+ log_odds = (chosen_logps - rejected_logps) - (
+ torch.log1p(-torch.exp(chosen_logps)) - torch.log1p(-torch.exp(rejected_logps))
+ )
+ sft_loss = -chosen_logps
+ odds_ratio_loss = -F.logsigmoid(log_odds)
+ orpo_loss = sft_loss + self.beta * odds_ratio_loss
+ return orpo_loss
+
+ def simpo_loss(self, chosen_logps: "torch.Tensor", rejected_logps: "torch.Tensor") -> "torch.Tensor":
+ r"""Compute SimPO loss for batched log probabilities of the policy model."""
+ pi_logratios = chosen_logps - rejected_logps
+ gamma_logratios = self.simpo_gamma / self.beta
+ logits = pi_logratios - gamma_logratios
+ simpo_loss = -F.logsigmoid(self.beta * logits)
+ return simpo_loss
+
+ def compute_preference_loss(
+ self,
+ policy_chosen_logps: "torch.Tensor",
+ policy_rejected_logps: "torch.Tensor",
+ reference_chosen_logps: Optional["torch.Tensor"],
+ reference_rejected_logps: Optional["torch.Tensor"],
+ ) -> tuple["torch.Tensor", "torch.Tensor", "torch.Tensor"]:
+ r"""Compute loss for preference learning."""
+ if not self.finetuning_args.use_ref_model:
+ if self.loss_type == "orpo":
+ losses = self.odds_ratio_loss(policy_chosen_logps, policy_rejected_logps)
+ elif self.loss_type == "simpo":
+ losses = self.simpo_loss(policy_chosen_logps, policy_rejected_logps)
+ else:
+ raise NotImplementedError(f"Unknown loss type: {self.loss_type}.")
+
+ chosen_rewards = self.beta * policy_chosen_logps.to(self.accelerator.device).detach()
+ rejected_rewards = self.beta * policy_rejected_logps.to(self.accelerator.device).detach()
+ else:
+ losses, chosen_rewards, rejected_rewards = self.dpo_loss(
+ policy_chosen_logps, policy_rejected_logps, reference_chosen_logps, reference_rejected_logps
+ )
+
+ return losses, chosen_rewards, rejected_rewards
+
+ @override
+ def concatenated_forward(
+ self, model: "PreTrainedModel", batch: dict[str, "torch.Tensor"]
+ ) -> tuple["torch.Tensor", "torch.Tensor", "torch.Tensor", "torch.Tensor", "torch.Tensor"]:
+ r"""Compute the sum log probabilities of the labels under given logits if loss_type is not IPO, ORPO or SimPO.
+
+ Otherwise the average log probabilities.
+ """
+ if self.finetuning_args.use_ref_model:
+ batch = nested_detach(batch, clone=True) # avoid error
+
+ all_logits: torch.Tensor = model(**batch, return_dict=True, use_cache=False).logits.to(torch.float32)
+ all_logps, valid_length = get_batch_logps(logits=all_logits, labels=batch["labels"])
+ if self.loss_type in ["ipo", "orpo", "simpo"]:
+ all_logps = all_logps / valid_length
+
+ batch_size = batch["input_ids"].size(0) // 2
+ chosen_logps, rejected_logps = all_logps.split(batch_size, dim=0)
+ chosen_logits, rejected_logits = all_logits.split(batch_size, dim=0)
+ chosen_length, _ = valid_length.split(batch_size, dim=0)
+
+ if self.loss_type in ["ipo", "orpo", "simpo"]:
+ return chosen_logps, rejected_logps, chosen_logits, rejected_logits, chosen_logps
+ else:
+ return chosen_logps, rejected_logps, chosen_logits, rejected_logits, chosen_logps / chosen_length
+
+ @override
+ def compute_reference_log_probs(
+ self, model: "PreTrainedModel", batch: dict[str, "torch.Tensor"]
+ ) -> tuple[Optional["torch.Tensor"], Optional["torch.Tensor"]]:
+ r"""Compute log probabilities of the reference model."""
+ if not self.finetuning_args.use_ref_model:
+ return None, None
+
+ if self.ref_model is None:
+ ref_model = model
+ ref_context = self.accelerator.unwrap_model(model).disable_adapter()
+ else:
+ ref_model = self.ref_model
+ ref_context = nullcontext()
+
+ with torch.no_grad(), ref_context:
+ reference_chosen_logps, reference_rejected_logps, *_ = self.concatenated_forward(ref_model, batch)
+
+ return reference_chosen_logps, reference_rejected_logps
+
+ @override
+ def get_batch_loss_metrics(
+ self,
+ model: "PreTrainedModel",
+ batch: dict[str, "torch.Tensor"],
+ train_eval: Literal["train", "eval"] = "train",
+ ) -> tuple["torch.Tensor", dict[str, "torch.Tensor"]]:
+ r"""Compute the DPO loss and other metrics for the given batch of inputs for train or test."""
+ metrics = {}
+ (
+ policy_chosen_logps,
+ policy_rejected_logps,
+ policy_chosen_logits,
+ policy_rejected_logits,
+ policy_chosen_logps_avg,
+ ) = self.concatenated_forward(model, batch)
+
+ reference_chosen_logps, reference_rejected_logps = self.compute_reference_log_probs(model, batch)
+ losses, chosen_rewards, rejected_rewards = self.compute_preference_loss(
+ policy_chosen_logps,
+ policy_rejected_logps,
+ reference_chosen_logps,
+ reference_rejected_logps,
+ )
+ sft_loss = -policy_chosen_logps_avg
+ if self.ftx_gamma > 1e-6:
+ losses += self.ftx_gamma * sft_loss
+
+ prefix = "eval_" if train_eval == "eval" else ""
+ metrics[f"{prefix}rewards/chosen"] = chosen_rewards.mean().item()
+ metrics[f"{prefix}rewards/rejected"] = rejected_rewards.mean().item()
+ metrics[f"{prefix}rewards/accuracies"] = (chosen_rewards > rejected_rewards).float().mean().item()
+ metrics[f"{prefix}rewards/margins"] = (chosen_rewards - rejected_rewards).mean().item()
+ metrics[f"{prefix}logps/chosen"] = policy_chosen_logps.mean().item()
+ metrics[f"{prefix}logps/rejected"] = policy_rejected_logps.mean().item()
+ metrics[f"{prefix}logits/chosen"] = policy_chosen_logits.mean().item()
+ metrics[f"{prefix}logits/rejected"] = policy_rejected_logits.mean().item()
+ if self.loss_type == "orpo":
+ metrics[f"{prefix}sft_loss"] = sft_loss.mean().item()
+ metrics[f"{prefix}odds_ratio_loss"] = ((losses - sft_loss) / self.beta).mean().item()
+
+ return losses.mean(), metrics
+
+ @override
+ def compute_loss(
+ self, model: "PreTrainedModel", inputs: dict[str, "torch.Tensor"], return_outputs: bool = False, **kwargs
+ ) -> Union["torch.Tensor", tuple["torch.Tensor", list["torch.Tensor"]]]:
+ r"""Subclass and override to accept extra kwargs."""
+ return super().compute_loss(model, inputs, return_outputs)
+
+ @override
+ def log(self, logs: dict[str, float], *args, **kwargs) -> None:
+ r"""Log `logs` on the various objects watching training, including stored metrics."""
+ # logs either has "loss" or "eval_loss"
+ train_eval = "train" if "loss" in logs else "eval"
+ # Add averaged stored metrics to logs
+ key_list, metric_list = [], []
+ for key, metrics in self._stored_metrics[train_eval].items():
+ key_list.append(key)
+ metric_list.append(torch.tensor(metrics, dtype=torch.float).to(self.accelerator.device).mean().item())
+
+ del self._stored_metrics[train_eval]
+ if len(metric_list) < 10: # pad to for all reduce
+ for i in range(10 - len(metric_list)):
+ key_list.append(f"dummy_{i}")
+ metric_list.append(0.0)
+
+ metric_list = torch.tensor(metric_list, dtype=torch.float).to(self.accelerator.device)
+ metric_list = self.accelerator.reduce(metric_list, "mean").tolist()
+ for key, metric in zip(key_list, metric_list): # add remaining items
+ if not key.startswith("dummy_"):
+ logs[key] = metric
+
+ return Trainer.log(self, logs, *args, **kwargs)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/dpo/workflow.py b/post-training/LLaMA-Factory/src/llamafactory/train/dpo/workflow.py
new file mode 100644
index 0000000..97262ad
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/dpo/workflow.py
@@ -0,0 +1,113 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's TRL library.
+# https://github.com/huggingface/trl/blob/v0.8.0/examples/scripts/dpo.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING, Optional
+
+from ...data import PairwiseDataCollatorWithPadding, get_dataset, get_template_and_fix_tokenizer
+from ...extras.constants import IGNORE_INDEX
+from ...extras.misc import calculate_tps
+from ...extras.ploting import plot_loss
+from ...hparams import ModelArguments
+from ...model import load_model, load_tokenizer
+from ..trainer_utils import create_modelcard_and_push, create_ref_model
+from .trainer import CustomDPOTrainer
+
+
+if TYPE_CHECKING:
+ from transformers import Seq2SeqTrainingArguments, TrainerCallback
+
+ from ...hparams import DataArguments, FinetuningArguments
+
+
+def run_dpo(
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ training_args: "Seq2SeqTrainingArguments",
+ finetuning_args: "FinetuningArguments",
+ callbacks: Optional[list["TrainerCallback"]] = None,
+):
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ dataset_module = get_dataset(template, model_args, data_args, training_args, stage="rm", **tokenizer_module)
+ model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
+
+ data_collator = PairwiseDataCollatorWithPadding(
+ template=template,
+ model=model,
+ pad_to_multiple_of=8,
+ label_pad_token_id=IGNORE_INDEX if data_args.ignore_pad_token_for_loss else tokenizer.pad_token_id,
+ **tokenizer_module,
+ )
+
+ # Create reference model
+ if finetuning_args.use_ref_model:
+ if finetuning_args.ref_model is None and (not training_args.do_train): # use the model itself
+ ref_model = model
+ else:
+ ref_model = create_ref_model(model_args, finetuning_args)
+ else:
+ ref_model = None
+
+ # Update arguments
+ training_args.remove_unused_columns = False # important for multimodal and pairwise dataset
+
+ # Initialize our Trainer
+ trainer = CustomDPOTrainer(
+ model=model,
+ ref_model=ref_model,
+ args=training_args,
+ finetuning_args=finetuning_args,
+ data_collator=data_collator,
+ callbacks=callbacks,
+ **dataset_module,
+ **tokenizer_module,
+ )
+
+ # Training
+ if training_args.do_train:
+ train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+ trainer.save_model()
+ if finetuning_args.include_effective_tokens_per_second:
+ train_result.metrics["effective_tokens_per_sec"] = calculate_tps(
+ dataset_module["train_dataset"], train_result.metrics, stage="rm"
+ )
+
+ trainer.log_metrics("train", train_result.metrics)
+ trainer.save_metrics("train", train_result.metrics)
+ trainer.save_state()
+ if trainer.is_world_process_zero() and finetuning_args.plot_loss:
+ keys = ["loss", "rewards/accuracies"]
+ if isinstance(dataset_module.get("eval_dataset"), dict):
+ keys += [f"eval_{key}_loss" for key in dataset_module["eval_dataset"].keys()]
+ else:
+ keys += ["eval_loss"]
+
+ plot_loss(training_args.output_dir, keys=keys)
+
+ # Evaluation
+ if training_args.do_eval:
+ metrics = trainer.evaluate(metric_key_prefix="eval")
+ if id(model) == id(ref_model): # unable to compute rewards if reference model is the model itself
+ remove_keys = [key for key in metrics.keys() if "rewards" in key]
+ for key in remove_keys:
+ metrics.pop(key)
+ trainer.log_metrics("eval", metrics)
+ trainer.save_metrics("eval", metrics)
+
+ # Create model card
+ create_modelcard_and_push(trainer, model_args, data_args, training_args, finetuning_args)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/kto/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/train/kto/__init__.py
new file mode 100644
index 0000000..491b067
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/kto/__init__.py
@@ -0,0 +1,18 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .workflow import run_kto
+
+
+__all__ = ["run_kto"]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/kto/trainer.py b/post-training/LLaMA-Factory/src/llamafactory/train/kto/trainer.py
new file mode 100644
index 0000000..0323041
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/kto/trainer.py
@@ -0,0 +1,298 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's TRL library.
+# https://github.com/huggingface/trl/blob/v0.8.0/trl/trainer/kto_trainer.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import warnings
+from collections import defaultdict
+from contextlib import nullcontext
+from types import MethodType
+from typing import TYPE_CHECKING, Literal, Optional, Union
+
+import torch
+from transformers import Trainer
+from trl import KTOTrainer
+from trl.trainer import disable_dropout_in_model
+from typing_extensions import override
+
+from ...extras.constants import IGNORE_INDEX
+from ...extras.packages import is_transformers_version_greater_than
+from ..callbacks import SaveProcessorCallback
+from ..trainer_utils import create_custom_optimizer, create_custom_scheduler, get_batch_logps, nested_detach
+
+
+if TYPE_CHECKING:
+ import torch.utils.data
+ from transformers import PreTrainedModel, ProcessorMixin
+
+ from ...hparams import FinetuningArguments
+
+
+class CustomKTOTrainer(KTOTrainer):
+ def __init__(
+ self,
+ model: Union["PreTrainedModel", torch.nn.Module],
+ ref_model: Optional[Union["PreTrainedModel", torch.nn.Module]],
+ finetuning_args: "FinetuningArguments",
+ processor: Optional["ProcessorMixin"],
+ disable_dropout: bool = True,
+ **kwargs,
+ ):
+ if is_transformers_version_greater_than("4.46"):
+ kwargs["processing_class"] = kwargs.pop("tokenizer")
+
+ if disable_dropout:
+ disable_dropout_in_model(model)
+ if ref_model is not None:
+ disable_dropout_in_model(ref_model)
+
+ self.finetuning_args = finetuning_args
+ self.reference_free = False
+ self.use_dpo_data_collator = True # hack to avoid warning
+ self.generate_during_eval = False # disable at evaluation
+ self.label_pad_token_id = IGNORE_INDEX
+ self.padding_value = 0
+ self.is_encoder_decoder = model.config.is_encoder_decoder
+ self.precompute_ref_log_probs = False
+ self._precomputed_train_ref_log_probs = False
+ self._precomputed_eval_ref_log_probs = False
+ self._peft_has_been_casted_to_bf16 = False
+
+ self.ref_model = ref_model
+ self._stored_metrics = defaultdict(lambda: defaultdict(list))
+
+ # kto hyperparams
+ self.beta = finetuning_args.pref_beta
+ self.desirable_weight = finetuning_args.kto_chosen_weight
+ self.undesirable_weight = finetuning_args.kto_rejected_weight
+ self.ftx_gamma = finetuning_args.pref_ftx
+
+ Trainer.__init__(self, model=model, **kwargs)
+ self.model_accepts_loss_kwargs = False # overwrite trainer's default behavior
+ if not hasattr(self, "accelerator"):
+ raise AttributeError("Please update `transformers`.")
+
+ warnings.simplefilter("ignore") # remove gc warnings on ref model
+
+ if ref_model is not None:
+ if self.is_deepspeed_enabled:
+ if not (
+ getattr(ref_model, "is_loaded_in_8bit", False) or getattr(ref_model, "is_loaded_in_4bit", False)
+ ): # quantized models are already set on the correct device
+ self.ref_model = self._prepare_deepspeed(self.ref_model)
+ else:
+ self.ref_model = self.accelerator.prepare_model(self.ref_model, evaluation_mode=True)
+ self.ref_model.eval()
+
+ if processor is not None:
+ self.add_callback(SaveProcessorCallback(processor))
+
+ if finetuning_args.use_badam:
+ from badam import BAdamCallback, clip_grad_norm_old_version # type: ignore
+
+ self.accelerator.clip_grad_norm_ = MethodType(clip_grad_norm_old_version, self.accelerator)
+ self.add_callback(BAdamCallback)
+
+ @override
+ def create_optimizer(self) -> "torch.optim.Optimizer":
+ if self.optimizer is None:
+ self.optimizer = create_custom_optimizer(self.model, self.args, self.finetuning_args)
+ return super().create_optimizer()
+
+ @override
+ def create_scheduler(
+ self, num_training_steps: int, optimizer: Optional["torch.optim.Optimizer"] = None
+ ) -> "torch.optim.lr_scheduler.LRScheduler":
+ create_custom_scheduler(self.args, num_training_steps, optimizer)
+ return super().create_scheduler(num_training_steps, optimizer)
+
+ @override
+ def _get_train_sampler(self) -> Optional["torch.utils.data.Sampler"]:
+ r"""Replace the sequential sampler of KTO Trainer created by trl with the random sampler."""
+ if self.finetuning_args.disable_shuffling:
+ return torch.utils.data.SequentialSampler(self.train_dataset)
+
+ return Trainer._get_train_sampler(self)
+
+ @override
+ def get_batch_samples(self, *args, **kwargs):
+ r"""Replace the method of KTO Trainer with the one of the standard Trainer."""
+ return Trainer.get_batch_samples(self, *args, **kwargs)
+
+ @override
+ def forward(
+ self, model: "PreTrainedModel", batch: dict[str, "torch.Tensor"], prefix: Literal["", "kl_"] = ""
+ ) -> tuple["torch.Tensor", "torch.Tensor", "torch.Tensor"]:
+ r"""Run forward pass and computes the log probabilities."""
+ batch = nested_detach(batch, clone=True) # avoid error
+ model_inputs = {
+ "input_ids": batch[f"{prefix}input_ids"],
+ "attention_mask": batch[f"{prefix}attention_mask"],
+ }
+ if f"{prefix}token_type_ids" in batch:
+ model_inputs["token_type_ids"] = batch[f"{prefix}token_type_ids"]
+
+ if "pixel_values" in batch:
+ model_inputs["pixel_values"] = batch["pixel_values"]
+
+ if "image_sizes" in batch:
+ model_inputs["image_sizes"] = batch["image_sizes"]
+
+ if "image_grid_thw" in batch:
+ model_inputs["image_grid_thw"] = batch["image_grid_thw"]
+
+ if "aspect_ratio_ids" in batch:
+ model_inputs["aspect_ratio_ids"] = batch["aspect_ratio_ids"]
+
+ if "aspect_ratio_mask" in batch:
+ model_inputs["aspect_ratio_mask"] = batch["aspect_ratio_mask"]
+
+ if f"{prefix}cross_attention_mask" in batch:
+ model_inputs["cross_attention_mask"] = batch[f"{prefix}cross_attention_mask"]
+
+ logits = model(**model_inputs, return_dict=True, use_cache=False).logits.to(torch.float32)
+ logps, valid_length = get_batch_logps(logits=logits, labels=batch[f"{prefix}labels"])
+ return logits, logps, logps / valid_length
+
+ @override
+ def concatenated_forward(
+ self, model: "PreTrainedModel", batch: dict[str, "torch.Tensor"]
+ ) -> tuple["torch.Tensor", "torch.Tensor", "torch.Tensor", "torch.Tensor", "torch.Tensor", "torch.Tensor"]:
+ target_logits, target_logps, target_logps_avg = self.forward(model, batch)
+ with torch.no_grad():
+ _, kl_logps, _ = self.forward(model, batch, prefix="kl_")
+
+ if len(target_logps) != len(batch["kto_tags"]):
+ raise ValueError("Mismatched shape of inputs and labels.")
+
+ chosen_logits = target_logits[batch["kto_tags"]]
+ chosen_logps = target_logps[batch["kto_tags"]]
+ rejected_logits = target_logits[~batch["kto_tags"]]
+ rejected_logps = target_logps[~batch["kto_tags"]]
+ chosen_logps_avg = target_logps_avg[batch["kto_tags"]]
+ return chosen_logps, rejected_logps, chosen_logits, rejected_logits, kl_logps, chosen_logps_avg
+
+ @override
+ def compute_reference_log_probs(
+ self, model: "PreTrainedModel", batch: dict[str, "torch.Tensor"]
+ ) -> tuple["torch.Tensor", "torch.Tensor", "torch.Tensor"]:
+ r"""Compute log probabilities of the reference model."""
+ if self.ref_model is None:
+ ref_model = model
+ ref_context = self.accelerator.unwrap_model(model).disable_adapter()
+ else:
+ ref_model = self.ref_model
+ ref_context = nullcontext()
+
+ with torch.no_grad(), ref_context:
+ reference_chosen_logps, reference_rejected_logps, _, _, reference_kl_logps, _ = self.concatenated_forward(
+ ref_model, batch
+ )
+
+ return reference_chosen_logps, reference_rejected_logps, reference_kl_logps
+
+ @override
+ def get_batch_loss_metrics(
+ self,
+ model: "PreTrainedModel",
+ batch: dict[str, "torch.Tensor"],
+ ) -> tuple["torch.Tensor", dict[str, "torch.Tensor"]]:
+ r"""Compute the DPO loss and other metrics for the given batch of inputs for train or test."""
+ metrics = {}
+ (
+ policy_chosen_logps,
+ policy_rejected_logps,
+ policy_chosen_logits,
+ policy_rejected_logits,
+ policy_kl_logps,
+ policy_chosen_logps_avg,
+ ) = self.concatenated_forward(model, batch)
+ reference_chosen_logps, reference_rejected_logps, reference_kl_logps = self.compute_reference_log_probs(
+ model, batch
+ )
+ losses, chosen_rewards, rejected_rewards, kl = self.kto_loss(
+ policy_chosen_logps,
+ policy_rejected_logps,
+ policy_kl_logps,
+ reference_chosen_logps,
+ reference_rejected_logps,
+ reference_kl_logps,
+ )
+ losses = losses.nanmean()
+
+ if self.ftx_gamma > 1e-6 and len(policy_chosen_logps) > 0: # remember to rescale
+ sft_loss = -policy_chosen_logps_avg
+ losses += self.ftx_gamma * sft_loss.nanmean() / len(policy_chosen_logps) * len(batch["labels"])
+
+ num_chosen = len(chosen_rewards)
+ num_rejected = len(rejected_rewards)
+ if num_chosen > 0:
+ metrics["rewards/chosen_sum"] = chosen_rewards.nansum().item()
+ metrics["logps/chosen_sum"] = policy_chosen_logps.nansum().item()
+ metrics["logits/chosen_sum"] = policy_chosen_logits.nansum().item()
+ metrics["count/chosen"] = float(num_chosen)
+
+ if num_rejected > 0:
+ metrics["rewards/rejected_sum"] = rejected_rewards.nansum().item()
+ metrics["logps/rejected_sum"] = policy_rejected_logps.nansum().item()
+ metrics["logits/rejected_sum"] = policy_rejected_logits.nansum().item()
+ metrics["count/rejected"] = float(num_rejected)
+
+ metrics["kl"] = kl.item()
+ return losses, metrics
+
+ @override
+ def compute_loss(
+ self, model: "PreTrainedModel", inputs: dict[str, "torch.Tensor"], return_outputs: bool = False, **kwargs
+ ) -> Union["torch.Tensor", tuple["torch.Tensor", list["torch.Tensor"]]]:
+ r"""Subclass and override to accept extra kwargs."""
+ return super().compute_loss(model, inputs, return_outputs)
+
+ @override
+ def log(self, logs: dict[str, float], *args, **kwargs) -> None:
+ r"""Log `logs` on the various objects watching training, including stored metrics."""
+ # logs either has "loss" or "eval_loss"
+ train_eval = "train" if "loss" in logs else "eval"
+ prefix = "eval_" if train_eval == "eval" else ""
+ # Add averaged stored metrics to logs
+ key_list, metric_list = [], []
+ for key, metrics in self._stored_metrics[train_eval].items():
+ key_list.append(key)
+ metric_list.append(torch.tensor(metrics, dtype=torch.float).to(self.accelerator.device).sum().item())
+
+ del self._stored_metrics[train_eval]
+ if len(metric_list) < 9: # pad to for all reduce
+ for i in range(9 - len(metric_list)):
+ key_list.append(f"dummy_{i}")
+ metric_list.append(0.0)
+
+ metric_list = torch.tensor(metric_list, dtype=torch.float).to(self.accelerator.device)
+ metric_list = self.accelerator.reduce(metric_list, "sum").tolist()
+ metric_dict: dict[str, float] = dict(zip(key_list, metric_list))
+ for split in ["chosen", "rejected"]: # accumulate average metrics from sums and lengths
+ if f"count/{split}" in metric_dict:
+ for key in ("rewards", "logps", "logits"):
+ logs[f"{prefix}{key}/{split}"] = metric_dict[f"{key}/{split}_sum"] / metric_dict[f"count/{split}"]
+ del metric_dict[f"{key}/{split}_sum"]
+ del metric_dict[f"count/{split}"]
+
+ if f"{prefix}rewards/chosen" in logs and f"{prefix}rewards/rejected" in logs: # calculate reward margin
+ logs[f"{prefix}rewards/margins"] = logs[f"{prefix}rewards/chosen"] - logs[f"{prefix}rewards/rejected"]
+
+ for key, metric in metric_dict.items(): # add remaining items
+ if not key.startswith("dummy_"):
+ logs[key] = metric
+
+ return Trainer.log(self, logs, *args, **kwargs)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/kto/workflow.py b/post-training/LLaMA-Factory/src/llamafactory/train/kto/workflow.py
new file mode 100644
index 0000000..7b16d1d
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/kto/workflow.py
@@ -0,0 +1,104 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's TRL library.
+# https://github.com/huggingface/trl/blob/v0.8.0/examples/scripts/kto.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING, Optional
+
+from ...data import KTODataCollatorWithPadding, get_dataset, get_template_and_fix_tokenizer
+from ...extras.constants import IGNORE_INDEX
+from ...extras.ploting import plot_loss
+from ...hparams import ModelArguments
+from ...model import load_model, load_tokenizer
+from ..trainer_utils import create_modelcard_and_push, create_ref_model
+from .trainer import CustomKTOTrainer
+
+
+if TYPE_CHECKING:
+ from transformers import Seq2SeqTrainingArguments, TrainerCallback
+
+ from ...hparams import DataArguments, FinetuningArguments
+
+
+def run_kto(
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ training_args: "Seq2SeqTrainingArguments",
+ finetuning_args: "FinetuningArguments",
+ callbacks: Optional[list["TrainerCallback"]] = None,
+):
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ dataset_module = get_dataset(template, model_args, data_args, training_args, stage="kto", **tokenizer_module)
+ model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
+
+ data_collator = KTODataCollatorWithPadding(
+ template=template,
+ model=model,
+ pad_to_multiple_of=8,
+ label_pad_token_id=IGNORE_INDEX if data_args.ignore_pad_token_for_loss else tokenizer.pad_token_id,
+ **tokenizer_module,
+ )
+
+ # Create reference model
+ if finetuning_args.ref_model is None and (not training_args.do_train): # use the model itself
+ ref_model = model
+ else:
+ ref_model = create_ref_model(model_args, finetuning_args)
+
+ # Update arguments
+ training_args.remove_unused_columns = False # important for multimodal and pairwise dataset
+
+ # Initialize our Trainer
+ trainer = CustomKTOTrainer(
+ model=model,
+ ref_model=ref_model,
+ args=training_args,
+ finetuning_args=finetuning_args,
+ data_collator=data_collator,
+ callbacks=callbacks,
+ **dataset_module,
+ **tokenizer_module,
+ )
+
+ # Training
+ if training_args.do_train:
+ train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+ trainer.save_model()
+ trainer.log_metrics("train", train_result.metrics)
+ trainer.save_metrics("train", train_result.metrics)
+ trainer.save_state()
+ if trainer.is_world_process_zero() and finetuning_args.plot_loss:
+ keys = ["loss", "rewards/chosen"]
+ if isinstance(dataset_module.get("eval_dataset"), dict):
+ keys += [f"eval_{key}_loss" for key in dataset_module["eval_dataset"].keys()]
+ else:
+ keys += ["eval_loss"]
+
+ plot_loss(training_args.output_dir, keys=keys)
+
+ # Evaluation
+ if training_args.do_eval:
+ metrics = trainer.evaluate(metric_key_prefix="eval")
+ if id(model) == id(ref_model): # unable to compute rewards without a reference model
+ remove_keys = [key for key in metrics.keys() if "rewards" in key]
+ for key in remove_keys:
+ metrics.pop(key)
+ trainer.log_metrics("eval", metrics)
+ trainer.save_metrics("eval", metrics)
+
+ # Create model card
+ create_modelcard_and_push(trainer, model_args, data_args, training_args, finetuning_args)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/ppo/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/train/ppo/__init__.py
new file mode 100644
index 0000000..ed9bc4d
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/ppo/__init__.py
@@ -0,0 +1,18 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .workflow import run_ppo
+
+
+__all__ = ["run_ppo"]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/ppo/ppo_utils.py b/post-training/LLaMA-Factory/src/llamafactory/train/ppo/ppo_utils.py
new file mode 100644
index 0000000..9d462e7
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/ppo/ppo_utils.py
@@ -0,0 +1,80 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+from contextlib import nullcontext
+from typing import TYPE_CHECKING, Literal, Optional
+
+import torch
+from transformers.integrations import is_deepspeed_zero3_enabled
+
+from ...extras.packages import is_requests_available
+
+
+if is_requests_available():
+ import requests
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel
+ from trl import AutoModelForCausalLMWithValueHead
+
+
+def get_rewards_from_server(server_url: str, messages: list[str]) -> list["torch.Tensor"]:
+ r"""Get reward scores from the API server."""
+ headers = {"Content-Type": "application/json"}
+ payload = {"model": "model", "messages": messages}
+ response = requests.post(server_url, json=payload, headers=headers)
+ rewards = json.loads(response.text)["scores"]
+ return torch.Tensor(rewards)
+
+
+def replace_model(model: "AutoModelForCausalLMWithValueHead", target: Literal["default", "reward"]) -> None:
+ r"""Replace the default/reward modules in the model. The model is already unwrapped."""
+ v_head_layer = model.v_head.summary
+ if is_deepspeed_zero3_enabled():
+ import deepspeed # type: ignore
+
+ params = [v_head_layer.weight, v_head_layer.bias]
+ context_maybe_zero3 = deepspeed.zero.GatheredParameters(params, modifier_rank=0)
+ else:
+ context_maybe_zero3 = nullcontext()
+
+ model.pretrained_model.set_adapter(target) # set the LoRA adapter to be active
+ with context_maybe_zero3:
+ if target == "reward": # save default head temporarily
+ setattr(model, "default_head_weight", v_head_layer.weight.data.detach().clone())
+ setattr(model, "default_head_bias", v_head_layer.bias.data.detach().clone())
+
+ device = v_head_layer.weight.device
+ v_head_layer.weight.data = model.get_buffer(f"{target}_head_weight").detach().clone().to(device)
+ v_head_layer.bias.data = model.get_buffer(f"{target}_head_bias").detach().clone().to(device)
+
+
+def dump_layernorm(model: "PreTrainedModel") -> dict[str, "torch.Tensor"]:
+ r"""Dump the layernorm parameters in the model. The model is already unwrapped (and gathered)."""
+ layer_norm_params = {}
+ for name, param in model.named_parameters():
+ if param.data.dtype == torch.float32:
+ layer_norm_params[name] = param.data.detach().clone()
+ param.data = param.data.to(model.config.torch_dtype)
+
+ return layer_norm_params
+
+
+def restore_layernorm(model: "PreTrainedModel", layernorm_params: Optional[dict[str, "torch.Tensor"]] = None) -> None:
+ r"""Restore the layernorm parameters in the model. The model is already unwrapped (and gathered)."""
+ for name, param in model.named_parameters():
+ if name in layernorm_params:
+ param.data = layernorm_params[name]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py b/post-training/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py
new file mode 100644
index 0000000..1449745
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py
@@ -0,0 +1,503 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's TRL library.
+# https://github.com/huggingface/trl/blob/v0.8.0/trl/trainer/ppo_trainer.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+import os
+import sys
+import warnings
+from types import MethodType
+from typing import TYPE_CHECKING, Any, Optional
+
+import torch
+from accelerate.utils import DistributedDataParallelKwargs
+from tqdm import tqdm
+from transformers import GenerationConfig, Trainer, TrainerControl, TrainerState
+from transformers.optimization import get_scheduler
+from transformers.trainer import DEFAULT_CALLBACKS
+from transformers.trainer_callback import CallbackHandler
+from transformers.trainer_pt_utils import remove_dummy_checkpoint
+from transformers.trainer_utils import PREFIX_CHECKPOINT_DIR
+from transformers.utils import SAFE_WEIGHTS_NAME, WEIGHTS_NAME
+from trl import PPOConfig, PPOTrainer
+from trl.core import PPODecorators, logprobs_from_logits
+from trl.models.utils import unwrap_model_for_generation
+from typing_extensions import override
+
+from ...extras import logging
+from ...extras.misc import AverageMeter, count_parameters, get_current_device, get_logits_processor
+from ..callbacks import FixValueHeadModelCallback, SaveProcessorCallback
+from ..trainer_utils import create_custom_optimizer, create_custom_scheduler
+from .ppo_utils import dump_layernorm, get_rewards_from_server, replace_model, restore_layernorm
+
+
+if TYPE_CHECKING:
+ from datasets import Dataset
+ from transformers import (
+ DataCollatorWithPadding,
+ PreTrainedTokenizer,
+ ProcessorMixin,
+ Seq2SeqTrainingArguments,
+ TrainerCallback,
+ )
+ from trl import AutoModelForCausalLMWithValueHead
+
+ from ...hparams import FinetuningArguments, GeneratingArguments, ModelArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+class CustomPPOTrainer(PPOTrainer, Trainer):
+ r"""Inherit PPOTrainer."""
+
+ def __init__(
+ self,
+ model_args: "ModelArguments",
+ training_args: "Seq2SeqTrainingArguments",
+ finetuning_args: "FinetuningArguments",
+ generating_args: "GeneratingArguments",
+ callbacks: Optional[list["TrainerCallback"]],
+ model: "AutoModelForCausalLMWithValueHead",
+ reward_model: Optional["AutoModelForCausalLMWithValueHead"],
+ ref_model: Optional["AutoModelForCausalLMWithValueHead"],
+ tokenizer: "PreTrainedTokenizer",
+ processor: Optional["ProcessorMixin"],
+ data_collator: "DataCollatorWithPadding",
+ train_dataset: Optional["Dataset"] = None,
+ eval_dataset: Optional["Dataset"] = None,
+ ) -> None:
+ if eval_dataset is not None:
+ raise NotImplementedError("PPOTrainer does not support eval dataset yet.")
+
+ backward_batch_size = training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps
+ ppo_config = PPOConfig(
+ model_name=model_args.model_name_or_path,
+ learning_rate=training_args.learning_rate,
+ mini_batch_size=training_args.per_device_train_batch_size,
+ batch_size=backward_batch_size * finetuning_args.ppo_buffer_size,
+ gradient_accumulation_steps=training_args.gradient_accumulation_steps,
+ ppo_epochs=finetuning_args.ppo_epochs,
+ max_grad_norm=training_args.max_grad_norm,
+ seed=training_args.seed,
+ optimize_device_cache=True,
+ target=finetuning_args.ppo_target,
+ use_score_scaling=finetuning_args.ppo_score_norm,
+ use_score_norm=finetuning_args.ppo_score_norm,
+ whiten_rewards=finetuning_args.ppo_whiten_rewards,
+ accelerator_kwargs={"step_scheduler_with_optimizer": False},
+ log_with=training_args.report_to[0] if training_args.report_to else None,
+ project_kwargs={"logging_dir": training_args.logging_dir},
+ )
+
+ # Add deepspeed config
+ if training_args.deepspeed_plugin is not None:
+ ppo_config.accelerator_kwargs["kwargs_handlers"] = [
+ DistributedDataParallelKwargs(find_unused_parameters=training_args.ddp_find_unused_parameters)
+ ]
+ ppo_config.accelerator_kwargs["deepspeed_plugin"] = training_args.deepspeed_plugin
+ if ppo_config.log_with is not None:
+ logger.warning_rank0("PPOTrainer cannot use external logger when DeepSpeed is enabled.")
+ ppo_config.log_with = None
+
+ # Create optimizer and scheduler
+ if training_args.max_steps > 0:
+ num_training_steps = training_args.max_steps
+ else:
+ total_train_batch_size = backward_batch_size * finetuning_args.ppo_buffer_size * training_args.world_size
+ num_training_steps = training_args.num_train_epochs * math.ceil(
+ len(train_dataset) / total_train_batch_size
+ )
+
+ optimizer = self.create_optimizer(model, training_args, finetuning_args)
+ scheduler = self.create_scheduler(training_args, num_training_steps, optimizer)
+
+ PPOTrainer.__init__(
+ self,
+ config=ppo_config,
+ model=model,
+ ref_model=ref_model,
+ tokenizer=tokenizer,
+ dataset=train_dataset,
+ optimizer=optimizer,
+ data_collator=data_collator,
+ lr_scheduler=scheduler,
+ )
+
+ self.args = training_args
+ self.model_args = model_args
+ self.finetuning_args = finetuning_args
+ self.reward_model = reward_model
+ self.current_device = get_current_device() # patch for deepspeed training
+
+ self.generation_config = GenerationConfig(
+ pad_token_id=self.tokenizer.pad_token_id,
+ eos_token_id=[self.tokenizer.eos_token_id] + self.tokenizer.additional_special_tokens_ids,
+ **generating_args.to_dict(),
+ )
+
+ self.state = TrainerState()
+ self.control = TrainerControl()
+ self.is_deepspeed_enabled = getattr(self.accelerator.state, "deepspeed_plugin", None) is not None
+ self.is_fsdp_enabled = getattr(self.accelerator.state, "fsdp_plugin", None) is not None
+ callbacks = DEFAULT_CALLBACKS if callbacks is None else DEFAULT_CALLBACKS + callbacks
+ self.callback_handler = CallbackHandler(
+ callbacks, self.accelerator.unwrap_model(self.model), self.tokenizer, self.optimizer, self.lr_scheduler
+ )
+ if self.args.max_steps > 0:
+ logger.info_rank0("max_steps is given, it will override any value given in num_train_epochs")
+
+ self.amp_context = torch.autocast(self.current_device.type)
+ warnings.simplefilter("ignore") # remove gc warnings on ref model
+
+ if finetuning_args.reward_model_type == "full":
+ if self.is_deepspeed_enabled:
+ if not (
+ getattr(reward_model.pretrained_model, "is_loaded_in_8bit", False)
+ or getattr(reward_model.pretrained_model, "is_loaded_in_4bit", False)
+ ): # quantized models are already set on the correct device
+ self.reward_model = self._prepare_deepspeed(self.reward_model)
+ else:
+ self.reward_model = self.accelerator.prepare_model(self.reward_model, evaluation_mode=True)
+
+ self.add_callback(FixValueHeadModelCallback)
+
+ if processor is not None:
+ self.add_callback(SaveProcessorCallback(processor))
+
+ if finetuning_args.use_badam:
+ from badam import BAdamCallback, clip_grad_norm_old_version # type: ignore
+
+ self.accelerator.clip_grad_norm_ = MethodType(clip_grad_norm_old_version, self.accelerator)
+ self.add_callback(BAdamCallback)
+
+ def ppo_train(self, resume_from_checkpoint: Optional[str] = None) -> None:
+ r"""Implement training loop for the PPO stage, like _inner_training_loop() in Huggingface's Trainer."""
+ if resume_from_checkpoint is not None:
+ raise ValueError("`resume_from_checkpoint` will be supported in the future version.")
+
+ total_train_batch_size = (
+ self.args.per_device_train_batch_size
+ * self.args.gradient_accumulation_steps
+ * self.finetuning_args.ppo_buffer_size
+ * self.args.world_size
+ )
+ if self.args.max_steps > 0:
+ num_examples = total_train_batch_size * self.args.max_steps
+ num_train_epochs = sys.maxsize
+ max_steps = self.args.max_steps
+ steps_in_epoch = self.args.max_steps
+ else:
+ len_dataloader = len(self.dataloader)
+ num_examples = len(self.dataset)
+ num_train_epochs = self.args.num_train_epochs
+ max_steps = math.ceil(num_train_epochs * len_dataloader)
+ steps_in_epoch = len_dataloader
+
+ self.state.max_steps = max_steps
+ self.state.num_train_epochs = num_train_epochs
+ self.state.is_local_process_zero = self.is_local_process_zero()
+ self.state.is_world_process_zero = self.is_world_process_zero()
+
+ logger.info_rank0("***** Running training *****")
+ logger.info_rank0(f" Num examples = {num_examples:,}")
+ logger.info_rank0(f" Num Epochs = {num_train_epochs:,}")
+ logger.info_rank0(f" Instantaneous batch size per device = {self.args.per_device_train_batch_size:,}")
+ logger.info_rank0(
+ f" Total train batch size (w. parallel, buffer, distributed & accumulation) = {total_train_batch_size:,}"
+ )
+ logger.info_rank0(f" Gradient Accumulation steps = {self.args.gradient_accumulation_steps:,}")
+ logger.info_rank0(f" Num optimization epochs per batch = {self.finetuning_args.ppo_epochs:,}")
+ logger.info_rank0(f" Total training steps = {max_steps:,}")
+ logger.info_rank0(f" Number of trainable parameters = {count_parameters(self.model)[0]:,}")
+
+ dataiter = iter(self.dataloader)
+ loss_meter = AverageMeter()
+ reward_meter = AverageMeter()
+ self.callback_handler.on_train_begin(self.args, self.state, self.control)
+
+ for step in tqdm(range(max_steps), disable=not self.is_local_process_zero()):
+ try:
+ batch = next(dataiter)
+ except StopIteration:
+ dataiter = iter(self.dataloader)
+ batch = next(dataiter)
+
+ # Get inputs
+ self.model.eval()
+ self.tokenizer.padding_side = "right" # change padding side
+ queries, responses, rewards = [], [], []
+ for idx in range(0, self.config.batch_size, self.config.mini_batch_size):
+ mini_batch = {
+ "input_ids": batch["input_ids"][idx : idx + self.config.mini_batch_size],
+ "attention_mask": batch["attention_mask"][idx : idx + self.config.mini_batch_size],
+ }
+ mini_batch_queries, mini_batch_responses = self.get_inputs(mini_batch)
+ mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
+ queries.extend(mini_batch_queries)
+ responses.extend(mini_batch_responses)
+ rewards.extend(mini_batch_rewards)
+
+ # Run PPO step
+ self.model.train()
+ stats = self.step(queries, responses, rewards)
+ self.tokenizer.padding_side = "left" # restore padding side
+ loss_meter.update(float(stats["ppo/loss/total"]), n=len(rewards))
+ reward_meter.update(torch.stack(rewards).mean().item(), n=len(rewards))
+
+ if self.config.log_with is not None:
+ try:
+ batch["query"] = self.tokenizer.batch_decode(queries, skip_special_tokens=True)
+ batch["response"] = self.tokenizer.batch_decode(responses, skip_special_tokens=True)
+ self.log_stats(stats, batch, rewards)
+ except Exception:
+ logger.warning_rank0("Failed to save stats due to unknown errors.")
+
+ self.state.global_step += 1
+ self.callback_handler.on_step_end(self.args, self.state, self.control)
+
+ if self.is_local_process_zero() and (step + 1) % self.args.logging_steps == 0:
+ logs = dict(
+ loss=round(loss_meter.avg, 4),
+ reward=round(reward_meter.avg, 4),
+ learning_rate=stats["ppo/learning_rate"],
+ epoch=round(step / steps_in_epoch, 2),
+ )
+ tqdm.write(str(logs))
+ logs["step"] = step
+ self.state.log_history.append(logs)
+ self.callback_handler.on_log(self.args, self.state, self.control, logs)
+ loss_meter.reset()
+ reward_meter.reset()
+
+ if (step + 1) % self.args.save_steps == 0: # save checkpoint
+ self.save_model(
+ os.path.join(self.args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{self.state.global_step}")
+ )
+ self.callback_handler.on_save(self.args, self.state, self.control)
+
+ if self.control.should_epoch_stop or self.control.should_training_stop:
+ break
+
+ self.callback_handler.on_train_end(self.args, self.state, self.control)
+
+ @override
+ def create_optimizer(
+ self,
+ model: "AutoModelForCausalLMWithValueHead",
+ training_args: "Seq2SeqTrainingArguments",
+ finetuning_args: "FinetuningArguments",
+ ) -> "torch.optim.Optimizer":
+ optimizer = create_custom_optimizer(model, training_args, finetuning_args)
+ if optimizer is None:
+ decay_params, nodecay_params = [], []
+ decay_param_names = self.get_decay_parameter_names(model)
+ for name, param in model.named_parameters():
+ if param.requires_grad:
+ if name in decay_param_names:
+ decay_params.append(param)
+ else:
+ nodecay_params.append(param)
+
+ optim_class, optim_kwargs = Trainer.get_optimizer_cls_and_kwargs(training_args)
+ param_groups = [
+ dict(params=nodecay_params),
+ dict(params=decay_params, weight_decay=training_args.weight_decay),
+ ]
+ optimizer = optim_class(param_groups, **optim_kwargs)
+
+ return optimizer
+
+ @override
+ def create_scheduler(
+ self, training_args: "Seq2SeqTrainingArguments", num_training_steps: int, optimizer: "torch.optim.Optimizer"
+ ) -> "torch.optim.lr_scheduler.LRScheduler":
+ create_custom_scheduler(training_args, num_training_steps, optimizer)
+ lr_scheduler = get_scheduler(
+ training_args.lr_scheduler_type,
+ optimizer=optimizer,
+ num_warmup_steps=training_args.get_warmup_steps(num_training_steps),
+ num_training_steps=num_training_steps,
+ )
+ return lr_scheduler
+
+ @torch.no_grad()
+ def get_inputs(self, batch: dict[str, "torch.Tensor"]) -> tuple[list["torch.Tensor"], list["torch.Tensor"]]:
+ r"""Generate model's responses given queries."""
+ if batch["input_ids"].size(0) == 1: # handle llama2 ppo with gradient accumulation > 1
+ start_index = (batch["input_ids"][0] != self.tokenizer.pad_token_id).nonzero()[0].item()
+ for k, v in batch.items():
+ batch[k] = v[:, start_index:]
+
+ with unwrap_model_for_generation(self.model, self.accelerator) as unwrapped_model:
+ unwrapped_model: AutoModelForCausalLMWithValueHead = self.accelerator.unwrap_model(self.model)
+ if self.model_args.upcast_layernorm:
+ layernorm_params = dump_layernorm(unwrapped_model)
+
+ generate_output: torch.Tensor = unwrapped_model.generate(
+ generation_config=self.generation_config, logits_processor=get_logits_processor(), **batch
+ )
+ if self.model_args.upcast_layernorm:
+ restore_layernorm(unwrapped_model, layernorm_params)
+
+ query = batch["input_ids"].detach().cpu()
+ response = generate_output[:, batch["input_ids"].size(-1) :].detach().cpu()
+ queries, responses = [], []
+ for i in range(len(query)):
+ query_start_index = (query[i] != self.tokenizer.pad_token_id).nonzero()[0].item()
+ response_indexes = (response[i] != self.tokenizer.pad_token_id).nonzero()
+
+ if len(response_indexes) == 0: # allow empty response
+ response_length = 1
+ elif self.tokenizer.eos_token_id == self.tokenizer.pad_token_id: # include eos token
+ response_length = response_indexes[-1].item() + 2
+ else:
+ response_length = response_indexes[-1].item() + 1
+
+ queries.append(query[i, query_start_index:]) # remove padding from left
+ responses.append(response[i, :response_length]) # remove padding from right
+
+ return queries, responses
+
+ @torch.no_grad()
+ def get_rewards(
+ self,
+ queries: list["torch.Tensor"],
+ responses: list["torch.Tensor"],
+ ) -> list["torch.Tensor"]:
+ r"""Compute scores using given reward model.
+
+ Both inputs and outputs are put on CPU.
+ """
+ if self.finetuning_args.reward_model_type == "api":
+ token_ids = [torch.cat((q, r), dim=-1).tolist() for q, r in zip(queries, responses)]
+ messages = self.tokenizer.batch_decode(token_ids, skip_special_tokens=False)
+ return get_rewards_from_server(self.reward_model, messages)
+
+ batch: dict[str, torch.Tensor] = self.prepare_model_inputs(queries, responses)
+ unwrapped_model: AutoModelForCausalLMWithValueHead = self.accelerator.unwrap_model(self.model)
+
+ if self.finetuning_args.reward_model_type == "lora":
+ replace_model(unwrapped_model, target="reward")
+ reward_model = self.model
+ else:
+ reward_model = self.reward_model
+
+ with unwrap_model_for_generation(reward_model, self.accelerator), self.amp_context: # support bf16
+ values: torch.Tensor = reward_model(**batch, return_dict=True, use_cache=False)[-1]
+
+ if self.finetuning_args.reward_model_type == "lora":
+ replace_model(unwrapped_model, target="default")
+
+ rewards = values.gather(dim=-1, index=(batch["attention_mask"].sum(dim=-1, keepdim=True) - 1))
+ return rewards.float().detach() # use fp32 type
+
+ @override
+ @PPODecorators.empty_device_cache()
+ def batched_forward_pass(
+ self,
+ model: "AutoModelForCausalLMWithValueHead",
+ queries: "torch.Tensor",
+ responses: "torch.Tensor",
+ model_inputs: dict[str, Any],
+ return_logits: bool = False,
+ response_masks: Optional["torch.Tensor"] = None,
+ ) -> tuple["torch.Tensor", Optional["torch.Tensor"], "torch.Tensor", "torch.Tensor"]:
+ r"""Calculate model outputs in multiple batches.
+
+ Subclass and override to inject custom behavior.
+ """
+ bs = len(queries)
+ fbs = self.config.mini_batch_size
+ all_logprobs = []
+ all_logits = []
+ all_masks = []
+ all_values = []
+
+ for i in range(math.ceil(bs / fbs)):
+ input_kwargs = {key: value[i * fbs : (i + 1) * fbs] for key, value in model_inputs.items()}
+ query_batch = queries[i * fbs : (i + 1) * fbs]
+ response_batch = responses[i * fbs : (i + 1) * fbs]
+ if response_masks is not None:
+ response_masks_batch = response_masks[i * fbs : (i + 1) * fbs]
+ input_ids = input_kwargs["input_ids"]
+ attention_mask = input_kwargs["attention_mask"]
+
+ with self.amp_context: # support bf16
+ logits, _, values = model(**input_kwargs, return_dict=True, use_cache=False)
+
+ logprobs = logprobs_from_logits(logits[:, :-1, :], input_ids[:, 1:])
+ masks = torch.zeros_like(attention_mask)
+ masks[:, :-1] = attention_mask[:, 1:]
+
+ for j in range(len(query_batch)):
+ start = len(query_batch[j]) - 1
+ if attention_mask[j, 0] == 0: # offset left padding
+ start += attention_mask[j, :].nonzero()[0].item()
+ end = start + len(response_batch[j])
+
+ if response_masks is not None:
+ response_masks_batch = torch.cat((torch.zeros_like(query_batch[j]), response_masks_batch[j]))[1:]
+
+ masks[j, :start] = 0
+ masks[j, end:] = 0
+ if response_masks is not None:
+ masks[j, start:end] = masks[j, start:end] * response_masks_batch[j][start:end]
+
+ if return_logits:
+ all_logits.append(logits)
+ else:
+ del logits
+
+ all_values.append(values)
+ all_logprobs.append(logprobs)
+ all_masks.append(masks)
+
+ return (
+ torch.cat(all_logprobs),
+ torch.cat(all_logits)[:, :-1] if return_logits else None,
+ torch.cat(all_values)[:, :-1],
+ torch.cat(all_masks)[:, :-1],
+ )
+
+ @override
+ def save_model(self, output_dir: Optional[str] = None) -> None:
+ r"""Save model checkpoint.
+
+ Subclass and override to inject custom behavior.
+ """
+ if output_dir is None:
+ output_dir = self.args.output_dir
+
+ if self.is_fsdp_enabled or self.is_deepspeed_enabled:
+ try:
+ state_dict = self.accelerator.get_state_dict(self.model) # must be called at all ranks
+ if self.args.should_save:
+ self._save(output_dir, state_dict=state_dict)
+ except ValueError:
+ logger.warning_rank0(
+ " stage3_gather_16bit_weights_on_model_save=false. Saving the full checkpoint instead,"
+ " use zero_to_fp32.py to recover weights"
+ )
+ if self.args.should_save:
+ self._save(output_dir, state_dict={})
+ # remove the dummy state_dict
+ remove_dummy_checkpoint(self.args.should_save, output_dir, [WEIGHTS_NAME, SAFE_WEIGHTS_NAME])
+ self.model.save_checkpoint(output_dir)
+
+ elif self.args.should_save:
+ unwrapped_model: AutoModelForCausalLMWithValueHead = self.accelerator.unwrap_model(self.model)
+ self._save(output_dir, state_dict=unwrapped_model.state_dict())
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py b/post-training/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py
new file mode 100644
index 0000000..fa6629a
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py
@@ -0,0 +1,79 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's TRL library.
+# https://github.com/huggingface/trl/blob/v0.8.0/examples/scripts/ppo.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING, Optional
+
+from ...data import MultiModalDataCollatorForSeq2Seq, get_dataset, get_template_and_fix_tokenizer
+from ...extras.ploting import plot_loss
+from ...model import load_model, load_tokenizer
+from ..callbacks import fix_valuehead_checkpoint
+from ..trainer_utils import create_ref_model, create_reward_model
+from .trainer import CustomPPOTrainer
+
+
+if TYPE_CHECKING:
+ from transformers import Seq2SeqTrainingArguments, TrainerCallback
+
+ from ...hparams import DataArguments, FinetuningArguments, GeneratingArguments, ModelArguments
+
+
+def run_ppo(
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ training_args: "Seq2SeqTrainingArguments",
+ finetuning_args: "FinetuningArguments",
+ generating_args: "GeneratingArguments",
+ callbacks: Optional[list["TrainerCallback"]] = None,
+):
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ dataset_module = get_dataset(template, model_args, data_args, training_args, stage="ppo", **tokenizer_module)
+ model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train, add_valuehead=True)
+
+ tokenizer.padding_side = "left" # use left-padding in generation while using right-padding in training
+ data_collator = MultiModalDataCollatorForSeq2Seq(template=template, model=model, **tokenizer_module)
+
+ # Create reference model and reward model
+ ref_model = create_ref_model(model_args, finetuning_args, add_valuehead=True)
+ reward_model = create_reward_model(model, model_args, finetuning_args)
+
+ # Initialize our Trainer
+ ppo_trainer: CustomPPOTrainer = CustomPPOTrainer(
+ model_args=model_args,
+ training_args=training_args,
+ finetuning_args=finetuning_args,
+ generating_args=generating_args,
+ callbacks=callbacks,
+ model=model,
+ reward_model=reward_model,
+ ref_model=ref_model,
+ data_collator=data_collator,
+ **dataset_module,
+ **tokenizer_module,
+ )
+
+ # Training
+ if training_args.do_train:
+ ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+ ppo_trainer.save_model()
+ if training_args.should_save:
+ fix_valuehead_checkpoint(model, training_args.output_dir, training_args.save_safetensors)
+
+ ppo_trainer.save_state() # must be called after save_model to have a folder
+ if ppo_trainer.is_world_process_zero() and finetuning_args.plot_loss:
+ plot_loss(training_args.output_dir, keys=["loss", "reward"])
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/pt/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/train/pt/__init__.py
new file mode 100644
index 0000000..1f5c289
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/pt/__init__.py
@@ -0,0 +1,18 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .workflow import run_pt
+
+
+__all__ = ["run_pt"]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/pt/trainer.py b/post-training/LLaMA-Factory/src/llamafactory/train/pt/trainer.py
new file mode 100644
index 0000000..8495bbb
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/pt/trainer.py
@@ -0,0 +1,81 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from types import MethodType
+from typing import TYPE_CHECKING, Optional
+
+import torch
+from transformers import Trainer
+from typing_extensions import override
+
+from ...extras.packages import is_transformers_version_greater_than
+from ..callbacks import SaveProcessorCallback
+from ..trainer_utils import create_custom_optimizer, create_custom_scheduler
+
+
+if TYPE_CHECKING:
+ from transformers import ProcessorMixin
+
+ from ...hparams import FinetuningArguments
+
+
+class CustomTrainer(Trainer):
+ r"""Inherit Trainer for custom optimizer."""
+
+ def __init__(
+ self, finetuning_args: "FinetuningArguments", processor: Optional["ProcessorMixin"], **kwargs
+ ) -> None:
+ if is_transformers_version_greater_than("4.46"):
+ kwargs["processing_class"] = kwargs.pop("tokenizer")
+
+ super().__init__(**kwargs)
+ if processor is not None:
+ # avoid wrong loss under gradient accumulation
+ # https://github.com/huggingface/transformers/pull/36044#issuecomment-2746657112
+ self.model_accepts_loss_kwargs = False
+
+ self.finetuning_args = finetuning_args
+
+ if processor is not None:
+ self.add_callback(SaveProcessorCallback(processor))
+
+ if finetuning_args.use_badam:
+ from badam import BAdamCallback, clip_grad_norm_old_version # type: ignore
+
+ self.accelerator.clip_grad_norm_ = MethodType(clip_grad_norm_old_version, self.accelerator)
+ self.add_callback(BAdamCallback)
+
+ @override
+ def create_optimizer(self) -> "torch.optim.Optimizer":
+ if self.optimizer is None:
+ self.optimizer = create_custom_optimizer(self.model, self.args, self.finetuning_args)
+ return super().create_optimizer()
+
+ @override
+ def create_scheduler(
+ self, num_training_steps: int, optimizer: Optional["torch.optim.Optimizer"] = None
+ ) -> "torch.optim.lr_scheduler.LRScheduler":
+ create_custom_scheduler(self.args, num_training_steps, optimizer)
+ return super().create_scheduler(num_training_steps, optimizer)
+
+ @override
+ def _get_train_sampler(self) -> Optional["torch.utils.data.Sampler"]:
+ if self.finetuning_args.disable_shuffling:
+ return torch.utils.data.SequentialSampler(self.train_dataset)
+
+ return super()._get_train_sampler()
+
+ @override
+ def compute_loss(self, model, inputs, *args, **kwargs):
+ return super().compute_loss(model, inputs, *args, **kwargs)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/pt/workflow.py b/post-training/LLaMA-Factory/src/llamafactory/train/pt/workflow.py
new file mode 100644
index 0000000..c6f4829
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/pt/workflow.py
@@ -0,0 +1,90 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/examples/pytorch/language-modeling/run_clm.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import TYPE_CHECKING, Optional
+
+from transformers import DataCollatorForLanguageModeling
+
+from ...data import get_dataset, get_template_and_fix_tokenizer
+from ...extras.ploting import plot_loss
+from ...model import load_model, load_tokenizer
+from ..trainer_utils import create_modelcard_and_push
+from .trainer import CustomTrainer
+
+
+if TYPE_CHECKING:
+ from transformers import Seq2SeqTrainingArguments, TrainerCallback
+
+ from ...hparams import DataArguments, FinetuningArguments, ModelArguments
+
+
+def run_pt(
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ training_args: "Seq2SeqTrainingArguments",
+ finetuning_args: "FinetuningArguments",
+ callbacks: Optional[list["TrainerCallback"]] = None,
+):
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ dataset_module = get_dataset(template, model_args, data_args, training_args, stage="pt", **tokenizer_module)
+ model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
+ data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
+
+ # Initialize our Trainer
+ trainer = CustomTrainer(
+ model=model,
+ args=training_args,
+ finetuning_args=finetuning_args,
+ data_collator=data_collator,
+ callbacks=callbacks,
+ **dataset_module,
+ **tokenizer_module,
+ )
+
+ # Training
+ if training_args.do_train:
+ train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+ trainer.save_model()
+ trainer.log_metrics("train", train_result.metrics)
+ trainer.save_metrics("train", train_result.metrics)
+ trainer.save_state()
+ if trainer.is_world_process_zero() and finetuning_args.plot_loss:
+ keys = ["loss"]
+ if isinstance(dataset_module.get("eval_dataset"), dict):
+ keys += [f"eval_{key}_loss" for key in dataset_module["eval_dataset"].keys()]
+ else:
+ keys += ["eval_loss"]
+
+ plot_loss(training_args.output_dir, keys=keys)
+
+ # Evaluation
+ if training_args.do_eval:
+ metrics = trainer.evaluate(metric_key_prefix="eval")
+ try:
+ perplexity = math.exp(metrics["eval_loss"])
+ except OverflowError:
+ perplexity = float("inf")
+
+ metrics["perplexity"] = perplexity
+ trainer.log_metrics("eval", metrics)
+ trainer.save_metrics("eval", metrics)
+
+ # Create model card
+ create_modelcard_and_push(trainer, model_args, data_args, training_args, finetuning_args)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/rm/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/train/rm/__init__.py
new file mode 100644
index 0000000..f0e8a45
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/rm/__init__.py
@@ -0,0 +1,18 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .workflow import run_rm
+
+
+__all__ = ["run_rm"]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/rm/metric.py b/post-training/LLaMA-Factory/src/llamafactory/train/rm/metric.py
new file mode 100644
index 0000000..a7c3c43
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/rm/metric.py
@@ -0,0 +1,51 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING, Optional
+
+import numpy as np
+
+from ...extras.misc import numpify
+
+
+if TYPE_CHECKING:
+ from transformers import EvalPrediction
+
+
+@dataclass
+class ComputeAccuracy:
+ r"""Compute reward accuracy and support `batch_eval_metrics`."""
+
+ def _dump(self) -> Optional[dict[str, float]]:
+ result = None
+ if hasattr(self, "score_dict"):
+ result = {k: float(np.mean(v)) for k, v in self.score_dict.items()}
+
+ self.score_dict = {"accuracy": []}
+ return result
+
+ def __post_init__(self):
+ self._dump()
+
+ def __call__(self, eval_preds: "EvalPrediction", compute_result: bool = True) -> Optional[dict[str, float]]:
+ chosen_scores, rejected_scores = numpify(eval_preds.predictions[0]), numpify(eval_preds.predictions[1])
+ if not chosen_scores.shape:
+ self.score_dict["accuracy"].append(chosen_scores > rejected_scores)
+ else:
+ for i in range(len(chosen_scores)):
+ self.score_dict["accuracy"].append(chosen_scores[i] > rejected_scores[i])
+
+ if compute_result:
+ return self._dump()
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/rm/trainer.py b/post-training/LLaMA-Factory/src/llamafactory/train/rm/trainer.py
new file mode 100644
index 0000000..8c14b0a
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/rm/trainer.py
@@ -0,0 +1,129 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/trainer.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from types import MethodType
+from typing import TYPE_CHECKING, Optional, Union
+
+import torch
+from transformers import Trainer
+from typing_extensions import override
+
+from ...extras import logging
+from ...extras.packages import is_transformers_version_greater_than
+from ..callbacks import FixValueHeadModelCallback, SaveProcessorCallback
+from ..trainer_utils import create_custom_optimizer, create_custom_scheduler
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel, ProcessorMixin
+ from transformers.trainer import PredictionOutput
+
+ from ...hparams import FinetuningArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+class PairwiseTrainer(Trainer):
+ r"""Inherits Trainer to compute pairwise loss."""
+
+ def __init__(
+ self, finetuning_args: "FinetuningArguments", processor: Optional["ProcessorMixin"], **kwargs
+ ) -> None:
+ if is_transformers_version_greater_than("4.46"):
+ kwargs["processing_class"] = kwargs.pop("tokenizer")
+
+ super().__init__(**kwargs)
+ self.model_accepts_loss_kwargs = False # overwrite trainer's default behavior
+ self.finetuning_args = finetuning_args
+ self.can_return_loss = True # override property to return eval_loss
+ self.add_callback(FixValueHeadModelCallback)
+
+ if processor is not None:
+ self.add_callback(SaveProcessorCallback(processor))
+
+ if finetuning_args.use_badam:
+ from badam import BAdamCallback, clip_grad_norm_old_version # type: ignore
+
+ self.accelerator.clip_grad_norm_ = MethodType(clip_grad_norm_old_version, self.accelerator)
+ self.add_callback(BAdamCallback)
+
+ @override
+ def create_optimizer(self) -> "torch.optim.Optimizer":
+ if self.optimizer is None:
+ self.optimizer = create_custom_optimizer(self.model, self.args, self.finetuning_args)
+ return super().create_optimizer()
+
+ @override
+ def create_scheduler(
+ self, num_training_steps: int, optimizer: Optional["torch.optim.Optimizer"] = None
+ ) -> "torch.optim.lr_scheduler.LRScheduler":
+ create_custom_scheduler(self.args, num_training_steps, optimizer)
+ return super().create_scheduler(num_training_steps, optimizer)
+
+ @override
+ def _get_train_sampler(self) -> Optional["torch.utils.data.Sampler"]:
+ if self.finetuning_args.disable_shuffling:
+ return torch.utils.data.SequentialSampler(self.train_dataset)
+
+ return super()._get_train_sampler()
+
+ @override
+ def compute_loss(
+ self, model: "PreTrainedModel", inputs: dict[str, "torch.Tensor"], return_outputs: bool = False, **kwargs
+ ) -> Union["torch.Tensor", tuple["torch.Tensor", list["torch.Tensor"]]]:
+ r"""Compute pairwise loss. The first n examples are chosen and the last n examples are rejected.
+
+ Subclass and override to inject custom behavior.
+
+ Note that the first element will be removed from the output tuple.
+ See: https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/trainer.py#L3842
+ """
+ _, _, values = model(**inputs, output_hidden_states=True, return_dict=True, use_cache=False)
+ batch_size = inputs["input_ids"].size(0) // 2
+ chosen_masks, rejected_masks = torch.split(inputs["attention_mask"], batch_size, dim=0)
+ chosen_rewards, rejected_rewards = torch.split(values, batch_size, dim=0)
+ chosen_scores = chosen_rewards.gather(dim=-1, index=(chosen_masks.sum(dim=-1, keepdim=True) - 1))
+ rejected_scores = rejected_rewards.gather(dim=-1, index=(rejected_masks.sum(dim=-1, keepdim=True) - 1))
+ chosen_scores, rejected_scores = chosen_scores.squeeze(), rejected_scores.squeeze()
+
+ loss = -torch.nn.functional.logsigmoid(chosen_scores.float() - rejected_scores.float()).mean()
+ if return_outputs:
+ return loss, (loss, chosen_scores, rejected_scores)
+ else:
+ return loss
+
+ def save_predictions(self, predict_results: "PredictionOutput") -> None:
+ r"""Save model predictions to `output_dir`.
+
+ A custom behavior that not contained in Seq2SeqTrainer.
+ """
+ if not self.is_world_process_zero():
+ return
+
+ output_prediction_file = os.path.join(self.args.output_dir, "generated_predictions.jsonl")
+ logger.info_rank0(f"Saving prediction results to {output_prediction_file}")
+ chosen_scores, rejected_scores = predict_results.predictions
+
+ with open(output_prediction_file, "w", encoding="utf-8") as writer:
+ res: list[str] = []
+ for c_score, r_score in zip(chosen_scores, rejected_scores):
+ res.append(json.dumps({"chosen": round(float(c_score), 2), "rejected": round(float(r_score), 2)}))
+
+ writer.write("\n".join(res))
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/rm/workflow.py b/post-training/LLaMA-Factory/src/llamafactory/train/rm/workflow.py
new file mode 100644
index 0000000..18d562e
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/rm/workflow.py
@@ -0,0 +1,101 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/examples/pytorch/summarization/run_summarization.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING, Optional
+
+from ...data import PairwiseDataCollatorWithPadding, get_dataset, get_template_and_fix_tokenizer
+from ...extras.ploting import plot_loss
+from ...model import load_model, load_tokenizer
+from ..callbacks import fix_valuehead_checkpoint
+from ..trainer_utils import create_modelcard_and_push
+from .metric import ComputeAccuracy
+from .trainer import PairwiseTrainer
+
+
+if TYPE_CHECKING:
+ from transformers import Seq2SeqTrainingArguments, TrainerCallback
+
+ from ...hparams import DataArguments, FinetuningArguments, ModelArguments
+
+
+def run_rm(
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ training_args: "Seq2SeqTrainingArguments",
+ finetuning_args: "FinetuningArguments",
+ callbacks: Optional[list["TrainerCallback"]] = None,
+):
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ dataset_module = get_dataset(template, model_args, data_args, training_args, stage="rm", **tokenizer_module)
+ model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train, add_valuehead=True)
+ data_collator = PairwiseDataCollatorWithPadding(
+ template=template, model=model, pad_to_multiple_of=8, **tokenizer_module
+ )
+
+ # Update arguments
+ training_args.remove_unused_columns = False # important for multimodal and pairwise dataset
+
+ # Initialize our Trainer
+ trainer = PairwiseTrainer(
+ model=model,
+ args=training_args,
+ finetuning_args=finetuning_args,
+ data_collator=data_collator,
+ callbacks=callbacks,
+ compute_metrics=ComputeAccuracy(),
+ **dataset_module,
+ **tokenizer_module,
+ )
+
+ # Training
+ if training_args.do_train:
+ train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+ trainer.save_model()
+ if training_args.should_save:
+ fix_valuehead_checkpoint(model, training_args.output_dir, training_args.save_safetensors)
+
+ trainer.log_metrics("train", train_result.metrics)
+ trainer.save_metrics("train", train_result.metrics)
+ trainer.save_state()
+ if trainer.is_world_process_zero() and finetuning_args.plot_loss:
+ keys = ["loss"]
+ if isinstance(dataset_module.get("eval_dataset"), dict):
+ keys += sum(
+ [[f"eval_{key}_loss", f"eval_{key}_accuracy"] for key in dataset_module["eval_dataset"].keys()], []
+ )
+ else:
+ keys += ["eval_loss", "eval_accuracy"]
+
+ plot_loss(training_args.output_dir, keys=keys)
+
+ # Evaluation
+ if training_args.do_eval:
+ metrics = trainer.evaluate(metric_key_prefix="eval")
+ trainer.log_metrics("eval", metrics)
+ trainer.save_metrics("eval", metrics)
+
+ # Predict
+ if training_args.do_predict:
+ predict_results = trainer.predict(dataset_module["eval_dataset"], metric_key_prefix="predict")
+ trainer.log_metrics("predict", predict_results.metrics)
+ trainer.save_metrics("predict", predict_results.metrics)
+ trainer.save_predictions(predict_results)
+
+ # Create model card
+ create_modelcard_and_push(trainer, model_args, data_args, training_args, finetuning_args)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/sft/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/train/sft/__init__.py
new file mode 100644
index 0000000..6107a9a
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/sft/__init__.py
@@ -0,0 +1,18 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .workflow import run_sft
+
+
+__all__ = ["run_sft"]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/sft/metric.py b/post-training/LLaMA-Factory/src/llamafactory/train/sft/metric.py
new file mode 100644
index 0000000..f4f73ee
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/sft/metric.py
@@ -0,0 +1,134 @@
+# Copyright 2025 HuggingFace Inc., THUDM, and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library and the THUDM's ChatGLM implementation.
+# https://github.com/huggingface/transformers/blob/v4.40.0/examples/pytorch/summarization/run_summarization.py
+# https://github.com/THUDM/ChatGLM-6B/blob/main/ptuning/main.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING, Optional
+
+import numpy as np
+import torch
+from transformers.utils import is_jieba_available, is_nltk_available
+
+from ...extras.constants import IGNORE_INDEX
+from ...extras.misc import numpify
+from ...extras.packages import is_rouge_available
+
+
+if TYPE_CHECKING:
+ from transformers import EvalPrediction, PreTrainedTokenizer
+
+
+if is_jieba_available():
+ import jieba # type: ignore
+
+
+if is_nltk_available():
+ from nltk.translate.bleu_score import SmoothingFunction, sentence_bleu # type: ignore
+
+
+if is_rouge_available():
+ from rouge_chinese import Rouge # type: ignore
+
+
+def eval_logit_processor(logits: "torch.Tensor", labels: "torch.Tensor") -> "torch.Tensor":
+ r"""Compute the token with the largest likelihood to reduce memory footprint."""
+ if isinstance(logits, (list, tuple)):
+ if logits[0].dim() == 3: # (batch_size, seq_len, vocab_size)
+ logits = logits[0]
+ else: # moe models have aux loss
+ logits = logits[1]
+
+ if logits.dim() != 3:
+ raise ValueError("Cannot process the logits.")
+
+ return torch.argmax(logits, dim=-1)
+
+
+@dataclass
+class ComputeAccuracy:
+ r"""Compute accuracy and support `batch_eval_metrics`."""
+
+ def _dump(self) -> Optional[dict[str, float]]:
+ result = None
+ if hasattr(self, "score_dict"):
+ result = {k: float(np.mean(v)) for k, v in self.score_dict.items()}
+
+ self.score_dict = {"accuracy": []}
+ return result
+
+ def __post_init__(self):
+ self._dump()
+
+ def __call__(self, eval_preds: "EvalPrediction", compute_result: bool = True) -> Optional[dict[str, float]]:
+ preds, labels = numpify(eval_preds.predictions), numpify(eval_preds.label_ids)
+ for i in range(len(preds)):
+ pred, label = preds[i, :-1], labels[i, 1:]
+ label_mask = label != IGNORE_INDEX
+ self.score_dict["accuracy"].append(np.mean(pred[label_mask] == label[label_mask]))
+
+ if compute_result:
+ return self._dump()
+
+
+@dataclass
+class ComputeSimilarity:
+ r"""Compute text similarity scores and support `batch_eval_metrics`.
+
+ Wraps the tokenizer into metric functions, used in CustomSeq2SeqTrainer.
+ """
+
+ tokenizer: "PreTrainedTokenizer"
+
+ def _dump(self) -> Optional[dict[str, float]]:
+ result = None
+ if hasattr(self, "score_dict"):
+ result = {k: float(np.mean(v)) for k, v in self.score_dict.items()}
+
+ self.score_dict = {"rouge-1": [], "rouge-2": [], "rouge-l": [], "bleu-4": []}
+ return result
+
+ def __post_init__(self):
+ self._dump()
+
+ def __call__(self, eval_preds: "EvalPrediction", compute_result: bool = True) -> Optional[dict[str, float]]:
+ preds, labels = numpify(eval_preds.predictions), numpify(eval_preds.label_ids)
+
+ preds = np.where(preds != IGNORE_INDEX, preds, self.tokenizer.pad_token_id)
+ labels = np.where(labels != IGNORE_INDEX, labels, self.tokenizer.pad_token_id)
+
+ decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)
+ decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)
+
+ for pred, label in zip(decoded_preds, decoded_labels):
+ hypothesis = list(jieba.cut(pred))
+ reference = list(jieba.cut(label))
+
+ if len(" ".join(hypothesis).split()) == 0 or len(" ".join(reference).split()) == 0:
+ result = {"rouge-1": {"f": 0.0}, "rouge-2": {"f": 0.0}, "rouge-l": {"f": 0.0}}
+ else:
+ rouge = Rouge()
+ scores = rouge.get_scores(" ".join(hypothesis), " ".join(reference))
+ result = scores[0]
+
+ for k, v in result.items():
+ self.score_dict[k].append(round(v["f"] * 100, 4))
+
+ bleu_score = sentence_bleu([list(label)], list(pred), smoothing_function=SmoothingFunction().method3)
+ self.score_dict["bleu-4"].append(round(bleu_score * 100, 4))
+
+ if compute_result:
+ return self._dump()
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/sft/trainer.py b/post-training/LLaMA-Factory/src/llamafactory/train/sft/trainer.py
new file mode 100644
index 0000000..fece1bd
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/sft/trainer.py
@@ -0,0 +1,165 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/trainer_seq2seq.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from types import MethodType
+from typing import TYPE_CHECKING, Any, Optional, Union
+
+import numpy as np
+import torch
+from transformers import Seq2SeqTrainer
+from typing_extensions import override
+
+from ...extras import logging
+from ...extras.constants import IGNORE_INDEX
+from ...extras.packages import is_transformers_version_greater_than
+from ..callbacks import SaveProcessorCallback
+from ..trainer_utils import create_custom_optimizer, create_custom_scheduler
+
+
+if TYPE_CHECKING:
+ from torch.utils.data import Dataset
+ from transformers import PreTrainedTokenizer, ProcessorMixin
+ from transformers.trainer import PredictionOutput
+
+ from ...hparams import FinetuningArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+class CustomSeq2SeqTrainer(Seq2SeqTrainer):
+ r"""Inherits Seq2SeqTrainer to compute generative metrics such as BLEU and ROUGE."""
+
+ def __init__(
+ self,
+ finetuning_args: "FinetuningArguments",
+ processor: Optional["ProcessorMixin"],
+ gen_kwargs: Optional[dict[str, Any]] = None,
+ **kwargs,
+ ) -> None:
+ if is_transformers_version_greater_than("4.46"):
+ kwargs["processing_class"] = kwargs.pop("tokenizer")
+ else:
+ self.processing_class: PreTrainedTokenizer = kwargs.get("tokenizer")
+
+ super().__init__(**kwargs)
+ if processor is not None:
+ # avoid wrong loss under gradient accumulation
+ # https://github.com/huggingface/transformers/pull/36044#issuecomment-2746657112
+ self.model_accepts_loss_kwargs = False
+
+ self.finetuning_args = finetuning_args
+ if gen_kwargs is not None:
+ # https://github.com/huggingface/transformers/blob/v4.45.0/src/transformers/trainer_seq2seq.py#L287
+ self._gen_kwargs = gen_kwargs
+
+ if processor is not None:
+ self.add_callback(SaveProcessorCallback(processor))
+
+ if finetuning_args.use_badam:
+ from badam import BAdamCallback, clip_grad_norm_old_version # type: ignore
+
+ self.accelerator.clip_grad_norm_ = MethodType(clip_grad_norm_old_version, self.accelerator)
+ self.add_callback(BAdamCallback)
+
+ @override
+ def create_optimizer(self) -> "torch.optim.Optimizer":
+ if self.optimizer is None:
+ self.optimizer = create_custom_optimizer(self.model, self.args, self.finetuning_args)
+ return super().create_optimizer()
+
+ @override
+ def create_scheduler(
+ self, num_training_steps: int, optimizer: Optional["torch.optim.Optimizer"] = None
+ ) -> "torch.optim.lr_scheduler.LRScheduler":
+ create_custom_scheduler(self.args, num_training_steps, optimizer)
+ return super().create_scheduler(num_training_steps, optimizer)
+
+ @override
+ def _get_train_sampler(self) -> Optional["torch.utils.data.Sampler"]:
+ if self.finetuning_args.disable_shuffling:
+ return torch.utils.data.SequentialSampler(self.train_dataset)
+
+ return super()._get_train_sampler()
+
+ @override
+ def compute_loss(self, model, inputs, *args, **kwargs):
+ return super().compute_loss(model, inputs, *args, **kwargs)
+
+ @override
+ def prediction_step(
+ self,
+ model: "torch.nn.Module",
+ inputs: dict[str, Union["torch.Tensor", Any]],
+ prediction_loss_only: bool,
+ ignore_keys: Optional[list[str]] = None,
+ **gen_kwargs,
+ ) -> tuple[Optional[float], Optional["torch.Tensor"], Optional["torch.Tensor"]]:
+ r"""Remove the prompt part in the generated tokens.
+
+ Subclass and override to inject custom behavior.
+ """
+ if self.args.predict_with_generate: # do not pass labels to model when generate
+ labels = inputs.pop("labels", None)
+ else:
+ labels = inputs.get("labels")
+
+ loss, generated_tokens, _ = super().prediction_step(
+ model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys, **gen_kwargs
+ )
+ if generated_tokens is not None and self.args.predict_with_generate:
+ generated_tokens[:, : inputs["input_ids"].size(-1)] = self.processing_class.pad_token_id
+ generated_tokens = generated_tokens.contiguous()
+
+ return loss, generated_tokens, labels
+
+ def save_predictions(
+ self, dataset: "Dataset", predict_results: "PredictionOutput", skip_special_tokens: bool = True
+ ) -> None:
+ r"""Save model predictions to `output_dir`.
+
+ A custom behavior that not contained in Seq2SeqTrainer.
+ """
+ if not self.is_world_process_zero():
+ return
+
+ output_prediction_file = os.path.join(self.args.output_dir, "generated_predictions.jsonl")
+ logger.info_rank0(f"Saving prediction results to {output_prediction_file}")
+
+ labels = np.where(
+ predict_results.label_ids != IGNORE_INDEX, predict_results.label_ids, self.processing_class.pad_token_id
+ )
+ preds = np.where(
+ predict_results.predictions != IGNORE_INDEX,
+ predict_results.predictions,
+ self.processing_class.pad_token_id,
+ )
+
+ for i in range(len(preds)):
+ pad_len = np.nonzero(preds[i] != self.processing_class.pad_token_id)[0]
+ if len(pad_len): # move pad token to last
+ preds[i] = np.concatenate((preds[i][pad_len[0] :], preds[i][: pad_len[0]]), axis=-1)
+
+ decoded_inputs = self.processing_class.batch_decode(dataset["input_ids"], skip_special_tokens=False)
+ decoded_preds = self.processing_class.batch_decode(preds, skip_special_tokens=skip_special_tokens)
+ decoded_labels = self.processing_class.batch_decode(labels, skip_special_tokens=skip_special_tokens)
+
+ with open(output_prediction_file, "w", encoding="utf-8") as f:
+ for text, pred, label in zip(decoded_inputs, decoded_preds, decoded_labels):
+ f.write(json.dumps({"prompt": text, "predict": pred, "label": label}, ensure_ascii=False) + "\n")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/sft/workflow.py b/post-training/LLaMA-Factory/src/llamafactory/train/sft/workflow.py
new file mode 100644
index 0000000..9e6c549
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/sft/workflow.py
@@ -0,0 +1,140 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the HuggingFace's transformers library.
+# https://github.com/huggingface/transformers/blob/v4.40.0/examples/pytorch/summarization/run_summarization.py
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING, Optional
+
+from ...data import SFTDataCollatorWith4DAttentionMask, get_dataset, get_template_and_fix_tokenizer
+from ...extras.constants import IGNORE_INDEX
+from ...extras.logging import get_logger
+from ...extras.misc import calculate_tps
+from ...extras.ploting import plot_loss
+from ...model import load_model, load_tokenizer
+from ..trainer_utils import create_modelcard_and_push
+from .metric import ComputeAccuracy, ComputeSimilarity, eval_logit_processor
+from .trainer import CustomSeq2SeqTrainer
+
+
+if TYPE_CHECKING:
+ from transformers import Seq2SeqTrainingArguments, TrainerCallback
+
+ from ...hparams import DataArguments, FinetuningArguments, GeneratingArguments, ModelArguments
+
+
+logger = get_logger(__name__)
+
+
+def run_sft(
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ training_args: "Seq2SeqTrainingArguments",
+ finetuning_args: "FinetuningArguments",
+ generating_args: "GeneratingArguments",
+ callbacks: Optional[list["TrainerCallback"]] = None,
+):
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
+ model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
+
+ if getattr(model, "is_quantized", False) and not training_args.do_train:
+ setattr(model, "_hf_peft_config_loaded", True) # hack here: make model compatible with prediction
+
+ data_collator = SFTDataCollatorWith4DAttentionMask(
+ template=template,
+ model=model if not training_args.predict_with_generate else None,
+ pad_to_multiple_of=8 if training_args.do_train else None, # for shift short attention
+ label_pad_token_id=IGNORE_INDEX if data_args.ignore_pad_token_for_loss else tokenizer.pad_token_id,
+ block_diag_attn=model_args.block_diag_attn,
+ attn_implementation=getattr(model.config, "_attn_implementation", None),
+ compute_dtype=model_args.compute_dtype,
+ **tokenizer_module,
+ )
+
+ # Override the decoding parameters of Seq2SeqTrainer
+ training_args.generation_max_length = training_args.generation_max_length or data_args.cutoff_len
+ training_args.generation_num_beams = data_args.eval_num_beams or training_args.generation_num_beams
+ training_args.remove_unused_columns = False # important for multimodal dataset
+
+ # Metric utils
+ metric_module = {}
+ if training_args.predict_with_generate:
+ metric_module["compute_metrics"] = ComputeSimilarity(tokenizer=tokenizer)
+ elif finetuning_args.compute_accuracy:
+ metric_module["compute_metrics"] = ComputeAccuracy()
+ metric_module["preprocess_logits_for_metrics"] = eval_logit_processor
+
+ # Keyword arguments for `model.generate`
+ gen_kwargs = generating_args.to_dict(obey_generation_config=True)
+ gen_kwargs["eos_token_id"] = [tokenizer.eos_token_id] + tokenizer.additional_special_tokens_ids
+ gen_kwargs["pad_token_id"] = tokenizer.pad_token_id
+
+ # Initialize our Trainer
+ trainer = CustomSeq2SeqTrainer(
+ model=model,
+ args=training_args,
+ finetuning_args=finetuning_args,
+ data_collator=data_collator,
+ callbacks=callbacks,
+ gen_kwargs=gen_kwargs,
+ **dataset_module,
+ **tokenizer_module,
+ **metric_module,
+ )
+
+ # Training
+ if training_args.do_train:
+ train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
+ trainer.save_model()
+ if finetuning_args.include_effective_tokens_per_second:
+ train_result.metrics["effective_tokens_per_sec"] = calculate_tps(
+ dataset_module["train_dataset"], train_result.metrics, stage="sft"
+ )
+
+ trainer.log_metrics("train", train_result.metrics)
+ trainer.save_metrics("train", train_result.metrics)
+ trainer.save_state()
+ if trainer.is_world_process_zero() and finetuning_args.plot_loss:
+ keys = ["loss"]
+ if isinstance(dataset_module.get("eval_dataset"), dict):
+ keys += sum(
+ [[f"eval_{key}_loss", f"eval_{key}_accuracy"] for key in dataset_module["eval_dataset"].keys()], []
+ )
+ else:
+ keys += ["eval_loss", "eval_accuracy"]
+
+ plot_loss(training_args.output_dir, keys=keys)
+
+ if training_args.predict_with_generate:
+ tokenizer.padding_side = "left" # use left-padding in generation
+
+ # Evaluation
+ if training_args.do_eval:
+ metrics = trainer.evaluate(metric_key_prefix="eval", **gen_kwargs)
+ trainer.log_metrics("eval", metrics)
+ trainer.save_metrics("eval", metrics)
+
+ # Predict
+ if training_args.do_predict:
+ logger.warning_rank0_once("Batch generation can be very slow. Consider using `scripts/vllm_infer.py` instead.")
+ predict_results = trainer.predict(dataset_module["eval_dataset"], metric_key_prefix="predict", **gen_kwargs)
+ trainer.log_metrics("predict", predict_results.metrics)
+ trainer.save_metrics("predict", predict_results.metrics)
+ trainer.save_predictions(dataset_module["eval_dataset"], predict_results, generating_args.skip_special_tokens)
+
+ # Create model card
+ create_modelcard_and_push(trainer, model_args, data_args, training_args, finetuning_args)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/test_utils.py b/post-training/LLaMA-Factory/src/llamafactory/train/test_utils.py
new file mode 100644
index 0000000..6e4c4ff
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/test_utils.py
@@ -0,0 +1,119 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING, Optional, Union
+
+import torch
+from peft import PeftModel
+from transformers import AutoModelForCausalLM
+from trl import AutoModelForCausalLMWithValueHead
+
+from ..data import get_dataset, get_template_and_fix_tokenizer
+from ..extras.misc import get_current_device
+from ..hparams import get_infer_args, get_train_args
+from ..model import load_model, load_tokenizer
+
+
+if TYPE_CHECKING:
+ from peft import LoraModel
+ from transformers import PreTrainedModel
+
+ from ..data.data_utils import DatasetModule
+
+
+def compare_model(model_a: "torch.nn.Module", model_b: "torch.nn.Module", diff_keys: list[str] = []) -> None:
+ state_dict_a = model_a.state_dict()
+ state_dict_b = model_b.state_dict()
+ assert set(state_dict_a.keys()) == set(state_dict_b.keys())
+ for name in state_dict_a.keys():
+ if any(key in name for key in diff_keys):
+ assert torch.allclose(state_dict_a[name], state_dict_b[name], rtol=1e-4, atol=1e-5) is False
+ else:
+ assert torch.allclose(state_dict_a[name], state_dict_b[name], rtol=1e-4, atol=1e-5) is True
+
+
+def check_lora_model(model: "LoraModel") -> tuple[set[str], set[str]]:
+ linear_modules, extra_modules = set(), set()
+ for name, param in model.named_parameters():
+ if any(module in name for module in ["lora_A", "lora_B"]):
+ linear_modules.add(name.split(".lora_", maxsplit=1)[0].split(".")[-1])
+ assert param.requires_grad is True
+ assert param.dtype == torch.float32
+ elif "modules_to_save" in name:
+ extra_modules.add(name.split(".modules_to_save", maxsplit=1)[0].split(".")[-1])
+ assert param.requires_grad is True
+ assert param.dtype == torch.float32
+ else:
+ assert param.requires_grad is False
+ assert param.dtype == torch.float16
+
+ return linear_modules, extra_modules
+
+
+def load_train_model(add_valuehead: bool = False, **kwargs) -> "PreTrainedModel":
+ model_args, _, _, finetuning_args, _ = get_train_args(kwargs)
+ tokenizer = load_tokenizer(model_args)["tokenizer"]
+ return load_model(tokenizer, model_args, finetuning_args, is_trainable=True, add_valuehead=add_valuehead)
+
+
+def load_infer_model(add_valuehead: bool = False, **kwargs) -> "PreTrainedModel":
+ model_args, _, finetuning_args, _ = get_infer_args(kwargs)
+ tokenizer = load_tokenizer(model_args)["tokenizer"]
+ return load_model(tokenizer, model_args, finetuning_args, is_trainable=False, add_valuehead=add_valuehead)
+
+
+def load_reference_model(
+ model_path: str,
+ lora_path: Optional[str] = None,
+ use_lora: bool = False,
+ use_pissa: bool = False,
+ is_trainable: bool = False,
+ add_valuehead: bool = False,
+) -> Union["PreTrainedModel", "LoraModel"]:
+ current_device = get_current_device()
+ if add_valuehead:
+ model: AutoModelForCausalLMWithValueHead = AutoModelForCausalLMWithValueHead.from_pretrained(
+ model_path, torch_dtype=torch.float16, device_map=current_device
+ )
+ if not is_trainable:
+ model.v_head = model.v_head.to(torch.float16)
+
+ return model
+
+ model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map=current_device)
+ if use_lora or use_pissa:
+ model = PeftModel.from_pretrained(
+ model, lora_path, subfolder="pissa_init" if use_pissa else None, is_trainable=is_trainable
+ )
+ for param in filter(lambda p: p.requires_grad, model.parameters()):
+ param.data = param.data.to(torch.float32)
+
+ return model
+
+
+def load_dataset_module(**kwargs) -> "DatasetModule":
+ model_args, data_args, training_args, _, _ = get_train_args(kwargs)
+ tokenizer_module = load_tokenizer(model_args)
+ template = get_template_and_fix_tokenizer(tokenizer_module["tokenizer"], data_args)
+ dataset_module = get_dataset(template, model_args, data_args, training_args, kwargs["stage"], **tokenizer_module)
+ return dataset_module
+
+
+def patch_valuehead_model() -> None:
+ def post_init(self: "AutoModelForCausalLMWithValueHead", state_dict: dict[str, "torch.Tensor"]) -> None:
+ state_dict = {k[7:]: state_dict[k] for k in state_dict.keys() if k.startswith("v_head.")}
+ self.v_head.load_state_dict(state_dict, strict=False)
+ del state_dict
+
+ AutoModelForCausalLMWithValueHead.post_init = post_init
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/trainer_utils.py b/post-training/LLaMA-Factory/src/llamafactory/train/trainer_utils.py
new file mode 100644
index 0000000..89459a8
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/trainer_utils.py
@@ -0,0 +1,665 @@
+# Copyright 2025 HuggingFace Inc. and the LlamaFactory team.
+#
+# This code is inspired by the original GaLore's implementation: https://github.com/jiaweizzhao/GaLore
+# and the original LoRA+'s implementation: https://github.com/nikhil-ghosh-berkeley/loraplus
+# and the original BAdam's implementation: https://github.com/Ledzy/BAdam
+# and the HuggingFace's TRL library: https://github.com/huggingface/trl
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from collections.abc import Mapping
+from pathlib import Path
+from typing import TYPE_CHECKING, Any, Callable, Optional, Union
+
+import torch
+from transformers import Trainer
+from transformers.integrations import is_deepspeed_zero3_enabled
+from transformers.modeling_utils import is_fsdp_enabled
+from transformers.optimization import get_scheduler
+from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS
+from transformers.trainer_pt_utils import get_parameter_names
+from typing_extensions import override
+
+from ..extras import logging
+from ..extras.constants import IGNORE_INDEX, SWANLAB_CONFIG
+from ..extras.packages import is_apollo_available, is_galore_available, is_ray_available
+from ..hparams import FinetuningArguments, ModelArguments
+from ..model import find_all_linear_modules, load_model, load_tokenizer, load_valuehead_params
+
+
+if is_galore_available():
+ from galore_torch import GaLoreAdafactor, GaLoreAdamW, GaLoreAdamW8bit # type: ignore
+
+
+if is_apollo_available():
+ from apollo_torch import APOLLOAdamW # type: ignore
+
+
+if is_ray_available():
+ import ray
+ from ray.train import RunConfig, ScalingConfig
+ from ray.train.torch import TorchTrainer
+
+
+if TYPE_CHECKING:
+ from transformers import PreTrainedModel, TrainerCallback, TrainerState
+ from trl import AutoModelForCausalLMWithValueHead
+
+ from ..hparams import DataArguments, RayArguments, TrainingArguments
+
+
+logger = logging.get_logger(__name__)
+
+
+class DummyOptimizer(torch.optim.Optimizer):
+ r"""A dummy optimizer used for the GaLore or APOLLO algorithm."""
+
+ def __init__(
+ self, lr: float = 1e-3, optimizer_dict: Optional[dict["torch.nn.Parameter", "torch.optim.Optimizer"]] = None
+ ) -> None:
+ dummy_tensor = torch.randn(1, 1)
+ self.optimizer_dict = optimizer_dict
+ super().__init__([dummy_tensor], {"lr": lr})
+
+ @override
+ def zero_grad(self, set_to_none: bool = True) -> None:
+ pass
+
+ @override
+ def step(self, closure: Optional[Callable[[], float]] = None) -> Optional[float]:
+ pass
+
+
+def create_modelcard_and_push(
+ trainer: "Trainer",
+ model_args: "ModelArguments",
+ data_args: "DataArguments",
+ training_args: "TrainingArguments",
+ finetuning_args: "FinetuningArguments",
+) -> None:
+ kwargs = {
+ "tasks": "text-generation",
+ "finetuned_from": model_args.model_name_or_path,
+ "tags": ["llama-factory", finetuning_args.finetuning_type],
+ }
+ if data_args.dataset is not None:
+ kwargs["dataset"] = data_args.dataset
+
+ if model_args.use_unsloth:
+ kwargs["tags"] = kwargs["tags"] + ["unsloth"]
+
+ if not training_args.do_train:
+ pass
+ elif training_args.push_to_hub:
+ trainer.push_to_hub(**kwargs)
+ else:
+ trainer.create_model_card(license="other", **kwargs) # prevent from connecting to hub
+
+
+def create_ref_model(
+ model_args: "ModelArguments", finetuning_args: "FinetuningArguments", add_valuehead: bool = False
+) -> Optional[Union["PreTrainedModel", "AutoModelForCausalLMWithValueHead"]]:
+ r"""Create reference model for PPO/DPO training. Evaluation mode is not supported.
+
+ The valuehead parameter is randomly initialized since it is useless for PPO training.
+ """
+ if finetuning_args.ref_model is not None:
+ ref_model_args = ModelArguments.copyfrom(
+ model_args,
+ model_name_or_path=finetuning_args.ref_model,
+ adapter_name_or_path=finetuning_args.ref_model_adapters,
+ quantization_bit=finetuning_args.ref_model_quantization_bit,
+ )
+ ref_finetuning_args = FinetuningArguments()
+ tokenizer = load_tokenizer(ref_model_args)["tokenizer"]
+ ref_model = load_model(
+ tokenizer, ref_model_args, ref_finetuning_args, is_trainable=False, add_valuehead=add_valuehead
+ )
+ logger.info_rank0(f"Created reference model from {finetuning_args.ref_model}")
+ else:
+ if finetuning_args.finetuning_type == "lora":
+ ref_model = None
+ else:
+ ref_model_args = ModelArguments.copyfrom(model_args)
+ ref_finetuning_args = FinetuningArguments()
+ tokenizer = load_tokenizer(ref_model_args)["tokenizer"]
+ ref_model = load_model(
+ tokenizer, ref_model_args, ref_finetuning_args, is_trainable=False, add_valuehead=add_valuehead
+ )
+ logger.info_rank0("Created reference model from the model itself.")
+
+ return ref_model
+
+
+def create_reward_model(
+ model: "AutoModelForCausalLMWithValueHead", model_args: "ModelArguments", finetuning_args: "FinetuningArguments"
+) -> Optional["AutoModelForCausalLMWithValueHead"]:
+ r"""Create reward model for PPO training."""
+ if finetuning_args.reward_model_type == "api":
+ assert finetuning_args.reward_model.startswith("http"), "Please provide full url."
+ logger.info_rank0(f"Use reward server {finetuning_args.reward_model}")
+ return finetuning_args.reward_model
+ elif finetuning_args.reward_model_type == "lora":
+ model.pretrained_model.load_adapter(finetuning_args.reward_model, "reward")
+ for name, param in model.named_parameters(): # https://github.com/huggingface/peft/issues/1090
+ if "default" in name:
+ param.data = param.data.to(torch.float32) # trainable params should in fp32
+ vhead_params = load_valuehead_params(finetuning_args.reward_model, model_args)
+ assert vhead_params is not None, "Reward model is not correctly loaded."
+ model.register_buffer("reward_head_weight", vhead_params["v_head.summary.weight"], persistent=False)
+ model.register_buffer("reward_head_bias", vhead_params["v_head.summary.bias"], persistent=False)
+ model.register_buffer(
+ "default_head_weight", torch.zeros_like(vhead_params["v_head.summary.weight"]), persistent=False
+ )
+ model.register_buffer(
+ "default_head_bias", torch.zeros_like(vhead_params["v_head.summary.bias"]), persistent=False
+ )
+ logger.info_rank0(f"Loaded adapter weights of reward model from {finetuning_args.reward_model}")
+ return None
+ else:
+ reward_model_args = ModelArguments.copyfrom(
+ model_args,
+ model_name_or_path=finetuning_args.reward_model,
+ adapter_name_or_path=finetuning_args.reward_model_adapters,
+ quantization_bit=finetuning_args.reward_model_quantization_bit,
+ )
+ reward_finetuning_args = FinetuningArguments()
+ tokenizer = load_tokenizer(reward_model_args)["tokenizer"]
+ reward_model = load_model(
+ tokenizer, reward_model_args, reward_finetuning_args, is_trainable=False, add_valuehead=True
+ )
+ logger.info_rank0(f"Loaded full weights of reward model from {finetuning_args.reward_model}")
+ logger.warning_rank0("Please ensure the ppo model and reward model share SAME tokenizer and vocabulary.")
+ return reward_model
+
+
+def _get_decay_parameter_names(model: "PreTrainedModel") -> list[str]:
+ r"""Return a list of names of parameters with weight decay. (weights in non-layernorm layers)."""
+ decay_parameters = get_parameter_names(model, ALL_LAYERNORM_LAYERS)
+ decay_parameters = [name for name in decay_parameters if "bias" not in name]
+ return decay_parameters
+
+
+def _create_galore_optimizer(
+ model: "PreTrainedModel",
+ training_args: "TrainingArguments",
+ finetuning_args: "FinetuningArguments",
+) -> "torch.optim.Optimizer":
+ if len(finetuning_args.galore_target) == 1 and finetuning_args.galore_target[0] == "all":
+ galore_targets = find_all_linear_modules(model, finetuning_args.freeze_vision_tower)
+ else:
+ galore_targets = finetuning_args.galore_target
+
+ galore_params: list[torch.nn.Parameter] = []
+ for name, module in model.named_modules():
+ if isinstance(module, torch.nn.Linear) and any(target in name for target in galore_targets):
+ for param in module.parameters():
+ if param.requires_grad and len(param.shape) > 1:
+ galore_params.append(param)
+
+ galore_kwargs = {
+ "rank": finetuning_args.galore_rank,
+ "update_proj_gap": finetuning_args.galore_update_interval,
+ "scale": finetuning_args.galore_scale,
+ "proj_type": finetuning_args.galore_proj_type,
+ }
+
+ id_galore_params = {id(param) for param in galore_params}
+ decay_params, nodecay_params = [], [] # they are non-galore parameters
+ trainable_params: list[torch.nn.Parameter] = [] # galore_params + decay_params + nodecay_params
+ decay_param_names = _get_decay_parameter_names(model)
+ for name, param in model.named_parameters():
+ if param.requires_grad:
+ trainable_params.append(param)
+ if id(param) not in id_galore_params:
+ if name in decay_param_names:
+ decay_params.append(param)
+ else:
+ nodecay_params.append(param)
+
+ _, optim_kwargs = Trainer.get_optimizer_cls_and_kwargs(training_args)
+
+ if training_args.optim == "adamw_torch":
+ optim_class = GaLoreAdamW
+ elif training_args.optim in ["adamw_bnb_8bit", "adamw_8bit", "paged_adamw_8bit"]:
+ optim_class = GaLoreAdamW8bit
+ elif training_args.optim == "adafactor":
+ optim_class = GaLoreAdafactor
+ else:
+ raise NotImplementedError(f"Unknown optim: {training_args.optim}.")
+
+ if finetuning_args.galore_layerwise:
+ logger.warning_rank0("The displayed gradient norm will be all zeros in layerwise GaLore.")
+ if training_args.gradient_accumulation_steps != 1:
+ raise ValueError("Per-layer GaLore does not support gradient accumulation.")
+
+ optimizer_dict: dict[torch.Tensor, torch.optim.Optimizer] = {}
+ for param in nodecay_params:
+ param_groups = [dict(params=[param], weight_decay=0.0)]
+ optimizer_dict[param] = optim_class(param_groups, **optim_kwargs)
+ for param in decay_params:
+ param_groups = [dict(params=[param], weight_decay=training_args.weight_decay)]
+ optimizer_dict[param] = optim_class(param_groups, **optim_kwargs)
+ for param in galore_params: # galore params have weight decay
+ param_groups = [dict(params=[param], weight_decay=training_args.weight_decay, **galore_kwargs)]
+ optimizer_dict[param] = optim_class(param_groups, **optim_kwargs)
+
+ def optimizer_hook(param: "torch.nn.Parameter"):
+ if param.grad is not None:
+ optimizer_dict[param].step()
+ optimizer_dict[param].zero_grad()
+
+ for param in trainable_params:
+ param.register_post_accumulate_grad_hook(optimizer_hook)
+
+ optimizer = DummyOptimizer(lr=training_args.learning_rate, optimizer_dict=optimizer_dict)
+ else:
+ param_groups = [
+ dict(params=nodecay_params, weight_decay=0.0),
+ dict(params=decay_params, weight_decay=training_args.weight_decay),
+ dict(params=galore_params, weight_decay=training_args.weight_decay, **galore_kwargs),
+ ]
+ optimizer = optim_class(param_groups, **optim_kwargs)
+
+ logger.info_rank0(
+ f"Using GaLore optimizer with args: {galore_kwargs}. "
+ "It may cause hanging at the start of training, wait patiently."
+ )
+ return optimizer
+
+
+def _create_apollo_optimizer(
+ model: "PreTrainedModel",
+ training_args: "TrainingArguments",
+ finetuning_args: "FinetuningArguments",
+) -> "torch.optim.Optimizer":
+ if len(finetuning_args.apollo_target) == 1 and finetuning_args.apollo_target[0] == "all":
+ apollo_targets = find_all_linear_modules(model, finetuning_args.freeze_vision_tower)
+ else:
+ apollo_targets = finetuning_args.apollo_target
+
+ apollo_params: list[torch.nn.Parameter] = []
+ for name, module in model.named_modules():
+ if isinstance(module, torch.nn.Linear) and any(target in name for target in apollo_targets):
+ for param in module.parameters():
+ if param.requires_grad and len(param.shape) > 1:
+ apollo_params.append(param)
+
+ apollo_kwargs = {
+ "rank": finetuning_args.apollo_rank,
+ "proj": finetuning_args.apollo_proj,
+ "proj_type": finetuning_args.apollo_proj_type,
+ "update_proj_gap": finetuning_args.apollo_update_interval,
+ "scale": finetuning_args.apollo_scale,
+ "scale_type": finetuning_args.apollo_scale_type,
+ "scale_front": finetuning_args.apollo_scale_front,
+ }
+
+ id_apollo_params = {id(param) for param in apollo_params}
+ decay_params, nodecay_params = [], [] # they are non-apollo parameters
+ trainable_params: list[torch.nn.Parameter] = [] # apollo_params + decay_params + nodecay_params
+ decay_param_names = _get_decay_parameter_names(model)
+ for name, param in model.named_parameters():
+ if param.requires_grad:
+ trainable_params.append(param)
+ if id(param) not in id_apollo_params:
+ if name in decay_param_names:
+ decay_params.append(param)
+ else:
+ nodecay_params.append(param)
+
+ _, optim_kwargs = Trainer.get_optimizer_cls_and_kwargs(training_args)
+
+ if training_args.optim == "adamw_torch":
+ optim_class = APOLLOAdamW
+ else:
+ raise NotImplementedError(f"Unknown optim: {training_args.optim}.")
+
+ if finetuning_args.apollo_layerwise:
+ logger.warning_rank0("The displayed gradient norm will be all zeros in layerwise APOLLO.")
+ if training_args.gradient_accumulation_steps != 1:
+ raise ValueError("Per-layer APOLLO does not support gradient accumulation.")
+
+ optimizer_dict: dict[torch.Tensor, torch.optim.Optimizer] = {}
+ for param in nodecay_params:
+ param_groups = [dict(params=[param], weight_decay=0.0)]
+ optimizer_dict[param] = optim_class(param_groups, **optim_kwargs)
+ for param in decay_params:
+ param_groups = [dict(params=[param], weight_decay=training_args.weight_decay)]
+ optimizer_dict[param] = optim_class(param_groups, **optim_kwargs)
+ for param in apollo_params: # apollo params have weight decay
+ param_groups = [dict(params=[param], weight_decay=training_args.weight_decay, **apollo_kwargs)]
+ optimizer_dict[param] = optim_class(param_groups, **optim_kwargs)
+
+ def optimizer_hook(param: "torch.nn.Parameter"):
+ if param.grad is not None:
+ optimizer_dict[param].step()
+ optimizer_dict[param].zero_grad()
+
+ for param in trainable_params:
+ param.register_post_accumulate_grad_hook(optimizer_hook)
+
+ optimizer = DummyOptimizer(lr=training_args.learning_rate, optimizer_dict=optimizer_dict)
+ else:
+ param_groups = [
+ dict(params=nodecay_params, weight_decay=0.0),
+ dict(params=decay_params, weight_decay=training_args.weight_decay),
+ dict(params=apollo_params, weight_decay=training_args.weight_decay, **apollo_kwargs),
+ ]
+ optimizer = optim_class(param_groups, **optim_kwargs)
+
+ logger.info_rank0(f"Using APOLLO optimizer with args: {apollo_kwargs}.")
+ return optimizer
+
+
+def _create_loraplus_optimizer(
+ model: "PreTrainedModel",
+ training_args: "TrainingArguments",
+ finetuning_args: "FinetuningArguments",
+) -> "torch.optim.Optimizer":
+ default_lr = training_args.learning_rate
+ loraplus_lr = training_args.learning_rate * finetuning_args.loraplus_lr_ratio
+ embedding_lr = finetuning_args.loraplus_lr_embedding
+
+ decay_param_names = _get_decay_parameter_names(model)
+ param_dict: dict[str, list[torch.nn.Parameter]] = {
+ "lora_a": [],
+ "lora_b": [],
+ "lora_b_nodecay": [],
+ "embedding": [],
+ }
+ for name, param in model.named_parameters():
+ if param.requires_grad:
+ if "lora_embedding_B" in name:
+ param_dict["embedding"].append(param)
+ elif "lora_B" in name or param.ndim == 1:
+ if name in decay_param_names:
+ param_dict["lora_b"].append(param)
+ else:
+ param_dict["lora_b_nodecay"].append(param)
+ else:
+ param_dict["lora_a"].append(param)
+
+ optim_class, optim_kwargs = Trainer.get_optimizer_cls_and_kwargs(training_args)
+ param_groups = [
+ dict(params=param_dict["lora_a"], lr=default_lr, weight_decay=training_args.weight_decay),
+ dict(params=param_dict["lora_b"], lr=loraplus_lr, weight_decay=training_args.weight_decay),
+ dict(params=param_dict["lora_b_nodecay"], lr=loraplus_lr, weight_decay=0.0),
+ dict(params=param_dict["embedding"], lr=embedding_lr, weight_decay=training_args.weight_decay),
+ ]
+ optimizer = optim_class(param_groups, **optim_kwargs)
+ logger.info_rank0(f"Using LoRA+ optimizer with loraplus lr ratio {finetuning_args.loraplus_lr_ratio:.2f}.")
+ return optimizer
+
+
+def _create_badam_optimizer(
+ model: "PreTrainedModel",
+ training_args: "TrainingArguments",
+ finetuning_args: "FinetuningArguments",
+) -> "torch.optim.Optimizer":
+ decay_params, nodecay_params = [], []
+ decay_param_names = _get_decay_parameter_names(model)
+ for name, param in model.named_parameters():
+ if param.requires_grad:
+ if name in decay_param_names:
+ decay_params.append(param)
+ else:
+ nodecay_params.append(param)
+
+ optim_class, optim_kwargs = Trainer.get_optimizer_cls_and_kwargs(training_args)
+ param_groups = [
+ dict(params=nodecay_params, weight_decay=0.0),
+ dict(params=decay_params, weight_decay=training_args.weight_decay),
+ ]
+
+ if finetuning_args.badam_mode == "layer":
+ from badam import BlockOptimizer # type: ignore
+
+ base_optimizer = optim_class(param_groups, **optim_kwargs)
+ optimizer = BlockOptimizer(
+ base_optimizer=base_optimizer,
+ named_parameters_list=list(model.named_parameters()),
+ block_prefix_list=None,
+ switch_block_every=finetuning_args.badam_switch_interval,
+ start_block=finetuning_args.badam_start_block,
+ switch_mode=finetuning_args.badam_switch_mode,
+ verbose=finetuning_args.badam_verbose,
+ ds_zero3_enabled=is_deepspeed_zero3_enabled(),
+ )
+ logger.info_rank0(
+ f"Using BAdam optimizer with layer-wise update, switch mode is {finetuning_args.badam_switch_mode}, "
+ f"switch block every {finetuning_args.badam_switch_interval} steps, "
+ f"default start block is {finetuning_args.badam_start_block}"
+ )
+
+ elif finetuning_args.badam_mode == "ratio":
+ from badam import BlockOptimizerRatio # type: ignore
+
+ assert finetuning_args.badam_update_ratio > 1e-6
+ optimizer = BlockOptimizerRatio(
+ param_groups=param_groups,
+ named_parameters_list=list(model.named_parameters()),
+ update_ratio=finetuning_args.badam_update_ratio,
+ mask_mode=finetuning_args.badam_mask_mode,
+ verbose=finetuning_args.badam_verbose,
+ include_embedding=False,
+ **optim_kwargs,
+ )
+ logger.info_rank0(
+ f"Using BAdam optimizer with ratio-based update, update ratio is {finetuning_args.badam_update_ratio}, "
+ f"mask mode is {finetuning_args.badam_mask_mode}"
+ )
+
+ return optimizer
+
+
+def _create_adam_mini_optimizer(
+ model: "PreTrainedModel",
+ training_args: "TrainingArguments",
+) -> "torch.optim.Optimizer":
+ from adam_mini import Adam_mini # type: ignore
+
+ hidden_size = getattr(model.config, "hidden_size", None)
+ num_q_head = getattr(model.config, "num_attention_heads", None)
+ num_kv_head = getattr(model.config, "num_key_value_heads", None)
+
+ optimizer = Adam_mini(
+ named_parameters=model.named_parameters(),
+ lr=training_args.learning_rate,
+ betas=(training_args.adam_beta1, training_args.adam_beta2),
+ eps=training_args.adam_epsilon,
+ weight_decay=training_args.weight_decay,
+ model_sharding=is_fsdp_enabled() or is_deepspeed_zero3_enabled(),
+ dim=hidden_size,
+ n_heads=num_q_head,
+ n_kv_heads=num_kv_head,
+ )
+ logger.info_rank0("Using Adam-mini optimizer.")
+ return optimizer
+
+
+def create_custom_optimizer(
+ model: "PreTrainedModel",
+ training_args: "TrainingArguments",
+ finetuning_args: "FinetuningArguments",
+) -> Optional["torch.optim.Optimizer"]:
+ if finetuning_args.use_galore:
+ return _create_galore_optimizer(model, training_args, finetuning_args)
+
+ if finetuning_args.use_apollo:
+ return _create_apollo_optimizer(model, training_args, finetuning_args)
+
+ if finetuning_args.loraplus_lr_ratio is not None:
+ return _create_loraplus_optimizer(model, training_args, finetuning_args)
+
+ if finetuning_args.use_badam:
+ return _create_badam_optimizer(model, training_args, finetuning_args)
+
+ if finetuning_args.use_adam_mini:
+ return _create_adam_mini_optimizer(model, training_args)
+
+
+def create_custom_scheduler(
+ training_args: "TrainingArguments",
+ num_training_steps: int,
+ optimizer: Optional["torch.optim.Optimizer"] = None,
+) -> None:
+ if training_args.lr_scheduler_type == "warmup_stable_decay":
+ num_warmup_steps = training_args.get_warmup_steps(num_training_steps)
+ remaining_steps = num_training_steps - num_warmup_steps
+ num_stable_steps = remaining_steps // 3 # use 1/3 for stable by default
+ num_decay_steps = remaining_steps - num_stable_steps
+ scheduler_kwargs = training_args.lr_scheduler_kwargs or {}
+ default_kwargs = {
+ "num_stable_steps": num_stable_steps,
+ "num_decay_steps": num_decay_steps,
+ }
+ for key, value in default_kwargs.items():
+ if key not in scheduler_kwargs:
+ scheduler_kwargs[key] = value
+
+ training_args.lr_scheduler_kwargs = scheduler_kwargs
+
+ if optimizer is not None and isinstance(optimizer, DummyOptimizer):
+ optimizer_dict = optimizer.optimizer_dict
+ scheduler_dict: dict[torch.nn.Parameter, torch.optim.lr_scheduler.LRScheduler] = {}
+
+ for param in optimizer_dict.keys():
+ scheduler_dict[param] = get_scheduler(
+ training_args.lr_scheduler_type,
+ optimizer=optimizer_dict[param],
+ num_warmup_steps=training_args.get_warmup_steps(num_training_steps),
+ num_training_steps=num_training_steps,
+ scheduler_specific_kwargs=training_args.lr_scheduler_kwargs,
+ )
+
+ def scheduler_hook(param: "torch.nn.Parameter"):
+ scheduler_dict[param].step()
+
+ for param in optimizer_dict.keys():
+ param.register_post_accumulate_grad_hook(scheduler_hook)
+
+
+def get_batch_logps(
+ logits: "torch.Tensor", labels: "torch.Tensor", label_pad_token_id: int = IGNORE_INDEX
+) -> tuple["torch.Tensor", "torch.Tensor"]:
+ r"""Compute the log probabilities of the given labels under the given logits.
+
+ Returns:
+ logps: A tensor of shape (batch_size,) containing the sum of log probabilities.
+ valid_length: A tensor of shape (batch_size,) containing the number of non-masked tokens.
+
+ """
+ if logits.shape[:-1] != labels.shape:
+ raise ValueError("Logits (batchsize x seqlen) and labels must have the same shape.")
+
+ labels = labels[:, 1:].clone()
+ logits = logits[:, :-1, :]
+ loss_mask = labels != label_pad_token_id
+ labels[labels == label_pad_token_id] = 0 # dummy token
+ per_token_logps = torch.gather(logits.log_softmax(-1), dim=2, index=labels.unsqueeze(2)).squeeze(2)
+ return (per_token_logps * loss_mask).sum(-1), loss_mask.sum(-1)
+
+
+def nested_detach(
+ tensors: Union["torch.Tensor", list["torch.Tensor"], tuple["torch.Tensor"], dict[str, "torch.Tensor"]],
+ clone: bool = False,
+):
+ r"""Detach `tensors` (even if it's a nested list/tuple/dict of tensors)."""
+ if isinstance(tensors, (list, tuple)):
+ return type(tensors)(nested_detach(t, clone=clone) for t in tensors)
+ elif isinstance(tensors, Mapping):
+ return type(tensors)({k: nested_detach(t, clone=clone) for k, t in tensors.items()})
+
+ if isinstance(tensors, torch.Tensor):
+ if clone:
+ return tensors.detach().clone()
+ else:
+ return tensors.detach()
+ else:
+ return tensors
+
+
+def get_swanlab_callback(finetuning_args: "FinetuningArguments") -> "TrainerCallback":
+ r"""Get the callback for logging to SwanLab."""
+ import swanlab # type: ignore
+ from swanlab.integration.transformers import SwanLabCallback # type: ignore
+
+ if finetuning_args.swanlab_api_key is not None:
+ swanlab.login(api_key=finetuning_args.swanlab_api_key)
+
+ if finetuning_args.swanlab_lark_webhook_url is not None:
+ from swanlab.plugin.notification import LarkCallback # type: ignore
+
+ lark_callback = LarkCallback(
+ webhook_url=finetuning_args.swanlab_lark_webhook_url,
+ secret=finetuning_args.swanlab_lark_secret,
+ )
+ swanlab.register_callbacks([lark_callback])
+
+ class SwanLabCallbackExtension(SwanLabCallback):
+ def setup(self, args: "TrainingArguments", state: "TrainerState", model: "PreTrainedModel", **kwargs):
+ if not state.is_world_process_zero:
+ return
+
+ super().setup(args, state, model, **kwargs)
+ try:
+ if hasattr(self, "_swanlab"):
+ swanlab_public_config = self._swanlab.get_run().public.json()
+ else: # swanlab <= 0.4.9
+ swanlab_public_config = self._experiment.get_run().public.json()
+ except Exception:
+ swanlab_public_config = {}
+
+ with open(os.path.join(args.output_dir, SWANLAB_CONFIG), "w") as f:
+ f.write(json.dumps(swanlab_public_config, indent=2))
+
+ swanlab_callback = SwanLabCallbackExtension(
+ project=finetuning_args.swanlab_project,
+ workspace=finetuning_args.swanlab_workspace,
+ experiment_name=finetuning_args.swanlab_run_name,
+ mode=finetuning_args.swanlab_mode,
+ config={"Framework": "🦙LlamaFactory"},
+ logdir=finetuning_args.swanlab_logdir,
+ )
+ return swanlab_callback
+
+
+def get_ray_trainer(
+ training_function: Callable,
+ train_loop_config: dict[str, Any],
+ ray_args: "RayArguments",
+) -> "TorchTrainer":
+ if not ray_args.use_ray:
+ raise ValueError("Ray was not enabled. Please set `USE_RAY=1` to enable ray.")
+
+ if ray_args.ray_init_kwargs is not None:
+ ray.init(**ray_args.ray_init_kwargs)
+
+ trainer = TorchTrainer(
+ training_function,
+ train_loop_config=train_loop_config,
+ scaling_config=ScalingConfig(
+ num_workers=ray_args.ray_num_workers,
+ resources_per_worker=ray_args.resources_per_worker,
+ placement_strategy=ray_args.placement_strategy,
+ use_gpu=True,
+ ),
+ run_config=RunConfig(
+ name=ray_args.ray_run_name,
+ storage_path=Path(ray_args.ray_storage_path).absolute().as_posix(),
+ ),
+ )
+ return trainer
diff --git a/post-training/LLaMA-Factory/src/llamafactory/train/tuner.py b/post-training/LLaMA-Factory/src/llamafactory/train/tuner.py
new file mode 100644
index 0000000..3adb382
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/train/tuner.py
@@ -0,0 +1,195 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import shutil
+from typing import TYPE_CHECKING, Any, Optional
+
+import torch
+import torch.distributed as dist
+from transformers import PreTrainedModel
+
+from ..data import get_template_and_fix_tokenizer
+from ..extras import logging
+from ..extras.constants import V_HEAD_SAFE_WEIGHTS_NAME, V_HEAD_WEIGHTS_NAME
+from ..extras.misc import infer_optim_dtype
+from ..extras.packages import is_ray_available
+from ..hparams import get_infer_args, get_ray_args, get_train_args, read_args
+from ..model import load_model, load_tokenizer
+from .callbacks import LogCallback, PissaConvertCallback, ReporterCallback
+from .dpo import run_dpo
+from .kto import run_kto
+from .ppo import run_ppo
+from .pt import run_pt
+from .rm import run_rm
+from .sft import run_sft
+from .trainer_utils import get_ray_trainer, get_swanlab_callback
+
+
+if is_ray_available():
+ import ray
+ from ray.train.huggingface.transformers import RayTrainReportCallback
+
+
+if TYPE_CHECKING:
+ from transformers import TrainerCallback
+
+
+logger = logging.get_logger(__name__)
+
+
+def _training_function(config: dict[str, Any]) -> None:
+ args = config.get("args")
+ callbacks: list[Any] = config.get("callbacks")
+ model_args, data_args, training_args, finetuning_args, generating_args = get_train_args(args)
+
+ callbacks.append(LogCallback())
+ if finetuning_args.pissa_convert:
+ callbacks.append(PissaConvertCallback())
+
+ if finetuning_args.use_swanlab:
+ callbacks.append(get_swanlab_callback(finetuning_args))
+
+ callbacks.append(ReporterCallback(model_args, data_args, finetuning_args, generating_args)) # add to last
+
+ if finetuning_args.stage == "pt":
+ run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
+ elif finetuning_args.stage == "sft":
+ run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
+ elif finetuning_args.stage == "rm":
+ run_rm(model_args, data_args, training_args, finetuning_args, callbacks)
+ elif finetuning_args.stage == "ppo":
+ run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
+ elif finetuning_args.stage == "dpo":
+ run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
+ elif finetuning_args.stage == "kto":
+ run_kto(model_args, data_args, training_args, finetuning_args, callbacks)
+ else:
+ raise ValueError(f"Unknown task: {finetuning_args.stage}.")
+
+ if is_ray_available() and ray.is_initialized():
+ return # if ray is intialized it will destroy the process group on return
+
+ try:
+ if dist.is_initialized():
+ dist.destroy_process_group()
+ except Exception as e:
+ logger.warning(f"Failed to destroy process group: {e}.")
+
+
+def run_exp(args: Optional[dict[str, Any]] = None, callbacks: Optional[list["TrainerCallback"]] = None) -> None:
+ args = read_args(args)
+ if "-h" in args or "--help" in args:
+ get_train_args(args)
+
+ ray_args = get_ray_args(args)
+ callbacks = callbacks or []
+ if ray_args.use_ray:
+ callbacks.append(RayTrainReportCallback())
+ trainer = get_ray_trainer(
+ training_function=_training_function,
+ train_loop_config={"args": args, "callbacks": callbacks},
+ ray_args=ray_args,
+ )
+ trainer.fit()
+ else:
+ _training_function(config={"args": args, "callbacks": callbacks})
+
+
+def export_model(args: Optional[dict[str, Any]] = None) -> None:
+ model_args, data_args, finetuning_args, _ = get_infer_args(args)
+
+ if model_args.export_dir is None:
+ raise ValueError("Please specify `export_dir` to save model.")
+
+ if model_args.adapter_name_or_path is not None and model_args.export_quantization_bit is not None:
+ raise ValueError("Please merge adapters before quantizing the model.")
+
+ tokenizer_module = load_tokenizer(model_args)
+ tokenizer = tokenizer_module["tokenizer"]
+ processor = tokenizer_module["processor"]
+ template = get_template_and_fix_tokenizer(tokenizer, data_args)
+ model = load_model(tokenizer, model_args, finetuning_args) # must after fixing tokenizer to resize vocab
+
+ if getattr(model, "quantization_method", None) is not None and model_args.adapter_name_or_path is not None:
+ raise ValueError("Cannot merge adapters to a quantized model.")
+
+ if not isinstance(model, PreTrainedModel):
+ raise ValueError("The model is not a `PreTrainedModel`, export aborted.")
+
+ if getattr(model, "quantization_method", None) is not None: # quantized model adopts float16 type
+ setattr(model.config, "torch_dtype", torch.float16)
+ else:
+ if model_args.infer_dtype == "auto":
+ output_dtype = getattr(model.config, "torch_dtype", torch.float32)
+ if output_dtype == torch.float32: # if infer_dtype is auto, try using half precision first
+ output_dtype = infer_optim_dtype(torch.bfloat16)
+ else:
+ output_dtype = getattr(torch, model_args.infer_dtype)
+
+ setattr(model.config, "torch_dtype", output_dtype)
+ model = model.to(output_dtype)
+ logger.info_rank0(f"Convert model dtype to: {output_dtype}.")
+
+ model.save_pretrained(
+ save_directory=model_args.export_dir,
+ max_shard_size=f"{model_args.export_size}GB",
+ safe_serialization=(not model_args.export_legacy_format),
+ )
+ if model_args.export_hub_model_id is not None:
+ model.push_to_hub(
+ model_args.export_hub_model_id,
+ token=model_args.hf_hub_token,
+ max_shard_size=f"{model_args.export_size}GB",
+ safe_serialization=(not model_args.export_legacy_format),
+ )
+
+ if finetuning_args.stage == "rm":
+ if model_args.adapter_name_or_path is not None:
+ vhead_path = model_args.adapter_name_or_path[-1]
+ else:
+ vhead_path = model_args.model_name_or_path
+
+ if os.path.exists(os.path.join(vhead_path, V_HEAD_SAFE_WEIGHTS_NAME)):
+ shutil.copy(
+ os.path.join(vhead_path, V_HEAD_SAFE_WEIGHTS_NAME),
+ os.path.join(model_args.export_dir, V_HEAD_SAFE_WEIGHTS_NAME),
+ )
+ logger.info_rank0(f"Copied valuehead to {model_args.export_dir}.")
+ elif os.path.exists(os.path.join(vhead_path, V_HEAD_WEIGHTS_NAME)):
+ shutil.copy(
+ os.path.join(vhead_path, V_HEAD_WEIGHTS_NAME),
+ os.path.join(model_args.export_dir, V_HEAD_WEIGHTS_NAME),
+ )
+ logger.info_rank0(f"Copied valuehead to {model_args.export_dir}.")
+
+ try:
+ tokenizer.padding_side = "left" # restore padding side
+ tokenizer.init_kwargs["padding_side"] = "left"
+ tokenizer.save_pretrained(model_args.export_dir)
+ if model_args.export_hub_model_id is not None:
+ tokenizer.push_to_hub(model_args.export_hub_model_id, token=model_args.hf_hub_token)
+
+ if processor is not None:
+ processor.save_pretrained(model_args.export_dir)
+ if model_args.export_hub_model_id is not None:
+ processor.push_to_hub(model_args.export_hub_model_id, token=model_args.hf_hub_token)
+
+ except Exception as e:
+ logger.warning_rank0(f"Cannot save tokenizer, please copy the files manually: {e}.")
+
+ ollama_modelfile = os.path.join(model_args.export_dir, "Modelfile")
+ with open(ollama_modelfile, "w", encoding="utf-8") as f:
+ f.write(template.get_ollama_modelfile(tokenizer))
+ logger.info_rank0(f"Ollama modelfile saved in {ollama_modelfile}")
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/webui/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/chatter.py b/post-training/LLaMA-Factory/src/llamafactory/webui/chatter.py
new file mode 100644
index 0000000..a2242bb
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/chatter.py
@@ -0,0 +1,230 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from collections.abc import Generator
+from typing import TYPE_CHECKING, Any, Optional
+
+from transformers.utils import is_torch_npu_available
+
+from ..chat import ChatModel
+from ..data import Role
+from ..extras.constants import PEFT_METHODS
+from ..extras.misc import torch_gc
+from ..extras.packages import is_gradio_available
+from .common import get_save_dir, load_config
+from .locales import ALERTS
+
+
+if TYPE_CHECKING:
+ from ..chat import BaseEngine
+ from .manager import Manager
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+def _escape_html(text: str) -> str:
+ r"""Escape HTML characters."""
+ return text.replace("<", "<").replace(">", ">")
+
+
+def _format_response(text: str, lang: str, escape_html: bool, thought_words: tuple[str, str]) -> str:
+ r"""Post-process the response text.
+
+ Based on: https://huggingface.co/spaces/Lyte/DeepSeek-R1-Distill-Qwen-1.5B-Demo-GGUF/blob/main/app.py
+ """
+ if thought_words[0] not in text:
+ return _escape_html(text) if escape_html else text
+
+ text = text.replace(thought_words[0], "")
+ result = text.split(thought_words[1], maxsplit=1)
+ if len(result) == 1:
+ summary = ALERTS["info_thinking"][lang]
+ thought, answer = text, ""
+ else:
+ summary = ALERTS["info_thought"][lang]
+ thought, answer = result
+
+ if escape_html:
+ thought, answer = _escape_html(thought), _escape_html(answer)
+
+ return (
+ f"{summary} \n\n"
+ f"\n{thought}\n
\n {answer}"
+ )
+
+
+class WebChatModel(ChatModel):
+ def __init__(self, manager: "Manager", demo_mode: bool = False, lazy_init: bool = True) -> None:
+ self.manager = manager
+ self.demo_mode = demo_mode
+ self.engine: Optional[BaseEngine] = None
+
+ if not lazy_init: # read arguments from command line
+ super().__init__()
+
+ if demo_mode and os.getenv("DEMO_MODEL") and os.getenv("DEMO_TEMPLATE"): # load demo model
+ model_name_or_path = os.getenv("DEMO_MODEL")
+ template = os.getenv("DEMO_TEMPLATE")
+ infer_backend = os.getenv("DEMO_BACKEND", "huggingface")
+ super().__init__(
+ dict(model_name_or_path=model_name_or_path, template=template, infer_backend=infer_backend)
+ )
+
+ @property
+ def loaded(self) -> bool:
+ return self.engine is not None
+
+ def load_model(self, data) -> Generator[str, None, None]:
+ get = lambda elem_id: data[self.manager.get_elem_by_id(elem_id)]
+ lang, model_name, model_path = get("top.lang"), get("top.model_name"), get("top.model_path")
+ finetuning_type, checkpoint_path = get("top.finetuning_type"), get("top.checkpoint_path")
+ user_config = load_config()
+
+ error = ""
+ if self.loaded:
+ error = ALERTS["err_exists"][lang]
+ elif not model_name:
+ error = ALERTS["err_no_model"][lang]
+ elif not model_path:
+ error = ALERTS["err_no_path"][lang]
+ elif self.demo_mode:
+ error = ALERTS["err_demo"][lang]
+
+ if error:
+ gr.Warning(error)
+ yield error
+ return
+
+ yield ALERTS["info_loading"][lang]
+ args = dict(
+ model_name_or_path=model_path,
+ cache_dir=user_config.get("cache_dir", None),
+ finetuning_type=finetuning_type,
+ template=get("top.template"),
+ rope_scaling=get("top.rope_scaling") if get("top.rope_scaling") != "none" else None,
+ flash_attn="fa2" if get("top.booster") == "flashattn2" else "auto",
+ use_unsloth=(get("top.booster") == "unsloth"),
+ enable_liger_kernel=(get("top.booster") == "liger_kernel"),
+ infer_backend=get("infer.infer_backend"),
+ infer_dtype=get("infer.infer_dtype"),
+ vllm_enforce_eager=True,
+ trust_remote_code=True,
+ )
+
+ # checkpoints
+ if checkpoint_path:
+ if finetuning_type in PEFT_METHODS: # list
+ args["adapter_name_or_path"] = ",".join(
+ [get_save_dir(model_name, finetuning_type, adapter) for adapter in checkpoint_path]
+ )
+ else: # str
+ args["model_name_or_path"] = get_save_dir(model_name, finetuning_type, checkpoint_path)
+
+ # quantization
+ if get("top.quantization_bit") != "none":
+ args["quantization_bit"] = int(get("top.quantization_bit"))
+ args["quantization_method"] = get("top.quantization_method")
+ args["double_quantization"] = not is_torch_npu_available()
+
+ super().__init__(args)
+ yield ALERTS["info_loaded"][lang]
+
+ def unload_model(self, data) -> Generator[str, None, None]:
+ lang = data[self.manager.get_elem_by_id("top.lang")]
+
+ if self.demo_mode:
+ gr.Warning(ALERTS["err_demo"][lang])
+ yield ALERTS["err_demo"][lang]
+ return
+
+ yield ALERTS["info_unloading"][lang]
+ self.engine = None
+ torch_gc()
+ yield ALERTS["info_unloaded"][lang]
+
+ @staticmethod
+ def append(
+ chatbot: list[dict[str, str]],
+ messages: list[dict[str, str]],
+ role: str,
+ query: str,
+ escape_html: bool,
+ ) -> tuple[list[dict[str, str]], list[dict[str, str]], str]:
+ r"""Add the user input to chatbot.
+
+ Inputs: infer.chatbot, infer.messages, infer.role, infer.query, infer.escape_html
+ Output: infer.chatbot, infer.messages, infer.query
+ """
+ return (
+ chatbot + [{"role": "user", "content": _escape_html(query) if escape_html else query}],
+ messages + [{"role": role, "content": query}],
+ "",
+ )
+
+ def stream(
+ self,
+ chatbot: list[dict[str, str]],
+ messages: list[dict[str, str]],
+ lang: str,
+ system: str,
+ tools: str,
+ image: Optional[Any],
+ video: Optional[Any],
+ audio: Optional[Any],
+ max_new_tokens: int,
+ top_p: float,
+ temperature: float,
+ skip_special_tokens: bool,
+ escape_html: bool,
+ ) -> Generator[tuple[list[dict[str, str]], list[dict[str, str]]], None, None]:
+ r"""Generate output text in stream.
+
+ Inputs: infer.chatbot, infer.messages, infer.system, infer.tools, infer.image, infer.video, ...
+ Output: infer.chatbot, infer.messages
+ """
+ chatbot.append({"role": "assistant", "content": ""})
+ response = ""
+ for new_text in self.stream_chat(
+ messages,
+ system,
+ tools,
+ images=[image] if image else None,
+ videos=[video] if video else None,
+ audios=[audio] if audio else None,
+ max_new_tokens=max_new_tokens,
+ top_p=top_p,
+ temperature=temperature,
+ skip_special_tokens=skip_special_tokens,
+ ):
+ response += new_text
+ if tools:
+ result = self.engine.template.extract_tool(response)
+ else:
+ result = response
+
+ if isinstance(result, list):
+ tool_calls = [{"name": tool.name, "arguments": json.loads(tool.arguments)} for tool in result]
+ tool_calls = json.dumps(tool_calls, ensure_ascii=False)
+ output_messages = messages + [{"role": Role.FUNCTION.value, "content": tool_calls}]
+ bot_text = "```json\n" + tool_calls + "\n```"
+ else:
+ output_messages = messages + [{"role": Role.ASSISTANT.value, "content": result}]
+ bot_text = _format_response(result, lang, escape_html, self.engine.template.thought_words)
+
+ chatbot[-1] = {"role": "assistant", "content": bot_text}
+ yield chatbot, output_messages
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/common.py b/post-training/LLaMA-Factory/src/llamafactory/webui/common.py
new file mode 100644
index 0000000..2387174
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/common.py
@@ -0,0 +1,266 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+import signal
+from collections import defaultdict
+from datetime import datetime
+from typing import Any, Optional, Union
+
+from psutil import Process
+from yaml import safe_dump, safe_load
+
+from ..extras import logging
+from ..extras.constants import (
+ DATA_CONFIG,
+ DEFAULT_TEMPLATE,
+ MULTIMODAL_SUPPORTED_MODELS,
+ SUPPORTED_MODELS,
+ TRAINING_ARGS,
+ DownloadSource,
+)
+from ..extras.misc import use_modelscope, use_openmind
+
+
+logger = logging.get_logger(__name__)
+
+DEFAULT_CACHE_DIR = "cache"
+DEFAULT_CONFIG_DIR = "config"
+DEFAULT_DATA_DIR = "data"
+DEFAULT_SAVE_DIR = "saves"
+USER_CONFIG = "user_config.yaml"
+
+
+def abort_process(pid: int) -> None:
+ r"""Abort the processes recursively in a bottom-up way."""
+ try:
+ children = Process(pid).children()
+ if children:
+ for child in children:
+ abort_process(child.pid)
+
+ os.kill(pid, signal.SIGABRT)
+ except Exception:
+ pass
+
+
+def get_save_dir(*paths: str) -> os.PathLike:
+ r"""Get the path to saved model checkpoints."""
+ if os.path.sep in paths[-1]:
+ logger.warning_rank0("Found complex path, some features may be not available.")
+ return paths[-1]
+
+ paths = (path.replace(" ", "").strip() for path in paths)
+ return os.path.join(DEFAULT_SAVE_DIR, *paths)
+
+
+def _get_config_path() -> os.PathLike:
+ r"""Get the path to user config."""
+ return os.path.join(DEFAULT_CACHE_DIR, USER_CONFIG)
+
+
+def load_config() -> dict[str, Union[str, dict[str, Any]]]:
+ r"""Load user config if exists."""
+ try:
+ with open(_get_config_path(), encoding="utf-8") as f:
+ return safe_load(f)
+ except Exception:
+ return {"lang": None, "last_model": None, "path_dict": {}, "cache_dir": None}
+
+
+def save_config(lang: str, model_name: Optional[str] = None, model_path: Optional[str] = None) -> None:
+ r"""Save user config."""
+ os.makedirs(DEFAULT_CACHE_DIR, exist_ok=True)
+ user_config = load_config()
+ user_config["lang"] = lang or user_config["lang"]
+ if model_name:
+ user_config["last_model"] = model_name
+
+ if model_name and model_path:
+ user_config["path_dict"][model_name] = model_path
+
+ with open(_get_config_path(), "w", encoding="utf-8") as f:
+ safe_dump(user_config, f)
+
+
+def get_model_path(model_name: str) -> str:
+ r"""Get the model path according to the model name."""
+ user_config = load_config()
+ path_dict: dict[DownloadSource, str] = SUPPORTED_MODELS.get(model_name, defaultdict(str))
+ model_path = user_config["path_dict"].get(model_name, "") or path_dict.get(DownloadSource.DEFAULT, "")
+ if (
+ use_modelscope()
+ and path_dict.get(DownloadSource.MODELSCOPE)
+ and model_path == path_dict.get(DownloadSource.DEFAULT)
+ ): # replace hf path with ms path
+ model_path = path_dict.get(DownloadSource.MODELSCOPE)
+
+ if (
+ use_openmind()
+ and path_dict.get(DownloadSource.OPENMIND)
+ and model_path == path_dict.get(DownloadSource.DEFAULT)
+ ): # replace hf path with om path
+ model_path = path_dict.get(DownloadSource.OPENMIND)
+
+ return model_path
+
+
+def get_template(model_name: str) -> str:
+ r"""Get the template name if the model is a chat/distill/instruct model."""
+ return DEFAULT_TEMPLATE.get(model_name, "default")
+
+
+def get_time() -> str:
+ r"""Get current date and time."""
+ return datetime.now().strftime(r"%Y-%m-%d-%H-%M-%S")
+
+
+def is_multimodal(model_name: str) -> bool:
+ r"""Judge if the model is a vision language model."""
+ return model_name in MULTIMODAL_SUPPORTED_MODELS
+
+
+def load_dataset_info(dataset_dir: str) -> dict[str, dict[str, Any]]:
+ r"""Load dataset_info.json."""
+ if dataset_dir == "ONLINE" or dataset_dir.startswith("REMOTE:"):
+ logger.info_rank0(f"dataset_dir is {dataset_dir}, using online dataset.")
+ return {}
+
+ try:
+ with open(os.path.join(dataset_dir, DATA_CONFIG), encoding="utf-8") as f:
+ return json.load(f)
+ except Exception as err:
+ logger.warning_rank0(f"Cannot open {os.path.join(dataset_dir, DATA_CONFIG)} due to {str(err)}.")
+ return {}
+
+
+def load_args(config_path: str) -> Optional[dict[str, Any]]:
+ r"""Load the training configuration from config path."""
+ try:
+ with open(config_path, encoding="utf-8") as f:
+ return safe_load(f)
+ except Exception:
+ return None
+
+
+def save_args(config_path: str, config_dict: dict[str, Any]) -> None:
+ r"""Save the training configuration to config path."""
+ with open(config_path, "w", encoding="utf-8") as f:
+ safe_dump(config_dict, f)
+
+
+def _clean_cmd(args: dict[str, Any]) -> dict[str, Any]:
+ r"""Remove args with NoneType or False or empty string value."""
+ no_skip_keys = ["packing"]
+ return {k: v for k, v in args.items() if (k in no_skip_keys) or (v is not None and v is not False and v != "")}
+
+
+def gen_cmd(args: dict[str, Any]) -> str:
+ r"""Generate CLI commands for previewing."""
+ cmd_lines = ["llamafactory-cli train "]
+ for k, v in _clean_cmd(args).items():
+ if isinstance(v, dict):
+ cmd_lines.append(f" --{k} {json.dumps(v, ensure_ascii=False)} ")
+ elif isinstance(v, list):
+ cmd_lines.append(f" --{k} {' '.join(map(str, v))} ")
+ else:
+ cmd_lines.append(f" --{k} {str(v)} ")
+
+ if os.name == "nt":
+ cmd_text = "`\n".join(cmd_lines)
+ else:
+ cmd_text = "\\\n".join(cmd_lines)
+
+ cmd_text = f"```bash\n{cmd_text}\n```"
+ return cmd_text
+
+
+def save_cmd(args: dict[str, Any]) -> str:
+ r"""Save CLI commands to launch training."""
+ output_dir = args["output_dir"]
+ os.makedirs(output_dir, exist_ok=True)
+ with open(os.path.join(output_dir, TRAINING_ARGS), "w", encoding="utf-8") as f:
+ safe_dump(_clean_cmd(args), f)
+
+ return os.path.join(output_dir, TRAINING_ARGS)
+
+
+def load_eval_results(path: os.PathLike) -> str:
+ r"""Get scores after evaluation."""
+ with open(path, encoding="utf-8") as f:
+ result = json.dumps(json.load(f), indent=4)
+
+ return f"```json\n{result}\n```\n"
+
+
+def create_ds_config() -> None:
+ r"""Create deepspeed config in the current directory."""
+ os.makedirs(DEFAULT_CACHE_DIR, exist_ok=True)
+ ds_config = {
+ "train_batch_size": "auto",
+ "train_micro_batch_size_per_gpu": "auto",
+ "gradient_accumulation_steps": "auto",
+ "gradient_clipping": "auto",
+ "zero_allow_untested_optimizer": True,
+ "fp16": {
+ "enabled": "auto",
+ "loss_scale": 0,
+ "loss_scale_window": 1000,
+ "initial_scale_power": 16,
+ "hysteresis": 2,
+ "min_loss_scale": 1,
+ },
+ "bf16": {"enabled": "auto"},
+ }
+ offload_config = {
+ "device": "cpu",
+ "pin_memory": True,
+ }
+ ds_config["zero_optimization"] = {
+ "stage": 2,
+ "allgather_partitions": True,
+ "allgather_bucket_size": 5e8,
+ "overlap_comm": True,
+ "reduce_scatter": True,
+ "reduce_bucket_size": 5e8,
+ "contiguous_gradients": True,
+ "round_robin_gradients": True,
+ }
+ with open(os.path.join(DEFAULT_CACHE_DIR, "ds_z2_config.json"), "w", encoding="utf-8") as f:
+ json.dump(ds_config, f, indent=2)
+
+ ds_config["zero_optimization"]["offload_optimizer"] = offload_config
+ with open(os.path.join(DEFAULT_CACHE_DIR, "ds_z2_offload_config.json"), "w", encoding="utf-8") as f:
+ json.dump(ds_config, f, indent=2)
+
+ ds_config["zero_optimization"] = {
+ "stage": 3,
+ "overlap_comm": True,
+ "contiguous_gradients": True,
+ "sub_group_size": 1e9,
+ "reduce_bucket_size": "auto",
+ "stage3_prefetch_bucket_size": "auto",
+ "stage3_param_persistence_threshold": "auto",
+ "stage3_max_live_parameters": 1e9,
+ "stage3_max_reuse_distance": 1e9,
+ "stage3_gather_16bit_weights_on_model_save": True,
+ }
+ with open(os.path.join(DEFAULT_CACHE_DIR, "ds_z3_config.json"), "w", encoding="utf-8") as f:
+ json.dump(ds_config, f, indent=2)
+
+ ds_config["zero_optimization"]["offload_optimizer"] = offload_config
+ ds_config["zero_optimization"]["offload_param"] = offload_config
+ with open(os.path.join(DEFAULT_CACHE_DIR, "ds_z3_offload_config.json"), "w", encoding="utf-8") as f:
+ json.dump(ds_config, f, indent=2)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/components/__init__.py b/post-training/LLaMA-Factory/src/llamafactory/webui/components/__init__.py
new file mode 100644
index 0000000..eb3c9d4
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/components/__init__.py
@@ -0,0 +1,30 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .chatbot import create_chat_box
+from .eval import create_eval_tab
+from .export import create_export_tab
+from .infer import create_infer_tab
+from .top import create_top
+from .train import create_train_tab
+
+
+__all__ = [
+ "create_chat_box",
+ "create_eval_tab",
+ "create_export_tab",
+ "create_infer_tab",
+ "create_top",
+ "create_train_tab",
+]
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/components/chatbot.py b/post-training/LLaMA-Factory/src/llamafactory/webui/components/chatbot.py
new file mode 100644
index 0000000..52217e1
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/components/chatbot.py
@@ -0,0 +1,132 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+from typing import TYPE_CHECKING
+
+from ...data import Role
+from ...extras.packages import is_gradio_available
+from ..locales import ALERTS
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+ from ..engine import Engine
+
+
+def check_json_schema(text: str, lang: str) -> None:
+ r"""Check if the json schema is valid."""
+ try:
+ tools = json.loads(text)
+ if tools:
+ assert isinstance(tools, list)
+ for tool in tools:
+ if "name" not in tool:
+ raise NotImplementedError("Name not found.")
+ except NotImplementedError:
+ gr.Warning(ALERTS["err_tool_name"][lang])
+ except Exception:
+ gr.Warning(ALERTS["err_json_schema"][lang])
+
+
+def create_chat_box(
+ engine: "Engine", visible: bool = False
+) -> tuple["Component", "Component", dict[str, "Component"]]:
+ lang = engine.manager.get_elem_by_id("top.lang")
+ with gr.Column(visible=visible) as chat_box:
+ chatbot = gr.Chatbot(type="messages", show_copy_button=True)
+ messages = gr.State([])
+ with gr.Row():
+ with gr.Column(scale=4):
+ with gr.Row():
+ with gr.Column():
+ role = gr.Dropdown(choices=[Role.USER.value, Role.OBSERVATION.value], value=Role.USER.value)
+ system = gr.Textbox(show_label=False)
+ tools = gr.Textbox(show_label=False, lines=3)
+
+ with gr.Column() as mm_box:
+ with gr.Tab("Image"):
+ image = gr.Image(type="pil")
+
+ with gr.Tab("Video"):
+ video = gr.Video()
+
+ with gr.Tab("Audio"):
+ audio = gr.Audio(type="filepath")
+
+ query = gr.Textbox(show_label=False, lines=8)
+ submit_btn = gr.Button(variant="primary")
+
+ with gr.Column(scale=1):
+ max_new_tokens = gr.Slider(minimum=8, maximum=8192, value=1024, step=1)
+ top_p = gr.Slider(minimum=0.01, maximum=1.0, value=0.7, step=0.01)
+ temperature = gr.Slider(minimum=0.01, maximum=1.5, value=0.95, step=0.01)
+ skip_special_tokens = gr.Checkbox(value=True)
+ escape_html = gr.Checkbox(value=True)
+ clear_btn = gr.Button()
+
+ tools.input(check_json_schema, inputs=[tools, engine.manager.get_elem_by_id("top.lang")])
+
+ submit_btn.click(
+ engine.chatter.append,
+ [chatbot, messages, role, query, escape_html],
+ [chatbot, messages, query],
+ ).then(
+ engine.chatter.stream,
+ [
+ chatbot,
+ messages,
+ lang,
+ system,
+ tools,
+ image,
+ video,
+ audio,
+ max_new_tokens,
+ top_p,
+ temperature,
+ skip_special_tokens,
+ escape_html,
+ ],
+ [chatbot, messages],
+ )
+ clear_btn.click(lambda: ([], []), outputs=[chatbot, messages])
+
+ return (
+ chatbot,
+ messages,
+ dict(
+ chat_box=chat_box,
+ role=role,
+ system=system,
+ tools=tools,
+ mm_box=mm_box,
+ image=image,
+ video=video,
+ audio=audio,
+ query=query,
+ submit_btn=submit_btn,
+ max_new_tokens=max_new_tokens,
+ top_p=top_p,
+ temperature=temperature,
+ skip_special_tokens=skip_special_tokens,
+ escape_html=escape_html,
+ clear_btn=clear_btn,
+ ),
+ )
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/components/data.py b/post-training/LLaMA-Factory/src/llamafactory/webui/components/data.py
new file mode 100644
index 0000000..8f27bd1
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/components/data.py
@@ -0,0 +1,122 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from typing import TYPE_CHECKING, Any
+
+from ...extras.constants import DATA_CONFIG
+from ...extras.packages import is_gradio_available
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+
+PAGE_SIZE = 2
+
+
+def prev_page(page_index: int) -> int:
+ return page_index - 1 if page_index > 0 else page_index
+
+
+def next_page(page_index: int, total_num: int) -> int:
+ return page_index + 1 if (page_index + 1) * PAGE_SIZE < total_num else page_index
+
+
+def can_preview(dataset_dir: str, dataset: list) -> "gr.Button":
+ r"""Check if the dataset is a local dataset."""
+ try:
+ with open(os.path.join(dataset_dir, DATA_CONFIG), encoding="utf-8") as f:
+ dataset_info = json.load(f)
+ except Exception:
+ return gr.Button(interactive=False)
+
+ if len(dataset) == 0 or "file_name" not in dataset_info[dataset[0]]:
+ return gr.Button(interactive=False)
+
+ data_path = os.path.join(dataset_dir, dataset_info[dataset[0]]["file_name"])
+ if os.path.isfile(data_path) or (os.path.isdir(data_path) and os.listdir(data_path)):
+ return gr.Button(interactive=True)
+ else:
+ return gr.Button(interactive=False)
+
+
+def _load_data_file(file_path: str) -> list[Any]:
+ with open(file_path, encoding="utf-8") as f:
+ if file_path.endswith(".json"):
+ return json.load(f)
+ elif file_path.endswith(".jsonl"):
+ return [json.loads(line) for line in f]
+ else:
+ return list(f)
+
+
+def get_preview(dataset_dir: str, dataset: list, page_index: int) -> tuple[int, list, "gr.Column"]:
+ r"""Get the preview samples from the dataset."""
+ with open(os.path.join(dataset_dir, DATA_CONFIG), encoding="utf-8") as f:
+ dataset_info = json.load(f)
+
+ data_path = os.path.join(dataset_dir, dataset_info[dataset[0]]["file_name"])
+ if os.path.isfile(data_path):
+ data = _load_data_file(data_path)
+ else:
+ data = []
+ for file_name in os.listdir(data_path):
+ data.extend(_load_data_file(os.path.join(data_path, file_name)))
+
+ return len(data), data[PAGE_SIZE * page_index : PAGE_SIZE * (page_index + 1)], gr.Column(visible=True)
+
+
+def create_preview_box(dataset_dir: "gr.Textbox", dataset: "gr.Dropdown") -> dict[str, "Component"]:
+ data_preview_btn = gr.Button(interactive=False, scale=1)
+ with gr.Column(visible=False, elem_classes="modal-box") as preview_box:
+ with gr.Row():
+ preview_count = gr.Number(value=0, interactive=False, precision=0)
+ page_index = gr.Number(value=0, interactive=False, precision=0)
+
+ with gr.Row():
+ prev_btn = gr.Button()
+ next_btn = gr.Button()
+ close_btn = gr.Button()
+
+ with gr.Row():
+ preview_samples = gr.JSON()
+
+ dataset.change(can_preview, [dataset_dir, dataset], [data_preview_btn], queue=False).then(
+ lambda: 0, outputs=[page_index], queue=False
+ )
+ data_preview_btn.click(
+ get_preview, [dataset_dir, dataset, page_index], [preview_count, preview_samples, preview_box], queue=False
+ )
+ prev_btn.click(prev_page, [page_index], [page_index], queue=False).then(
+ get_preview, [dataset_dir, dataset, page_index], [preview_count, preview_samples, preview_box], queue=False
+ )
+ next_btn.click(next_page, [page_index, preview_count], [page_index], queue=False).then(
+ get_preview, [dataset_dir, dataset, page_index], [preview_count, preview_samples, preview_box], queue=False
+ )
+ close_btn.click(lambda: gr.Column(visible=False), outputs=[preview_box], queue=False)
+ return dict(
+ data_preview_btn=data_preview_btn,
+ preview_count=preview_count,
+ page_index=page_index,
+ prev_btn=prev_btn,
+ next_btn=next_btn,
+ close_btn=close_btn,
+ preview_samples=preview_samples,
+ )
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/components/eval.py b/post-training/LLaMA-Factory/src/llamafactory/webui/components/eval.py
new file mode 100644
index 0000000..3804a77
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/components/eval.py
@@ -0,0 +1,94 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from ...extras.packages import is_gradio_available
+from ..common import DEFAULT_DATA_DIR
+from ..control import list_datasets
+from .data import create_preview_box
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+ from ..engine import Engine
+
+
+def create_eval_tab(engine: "Engine") -> dict[str, "Component"]:
+ input_elems = engine.manager.get_base_elems()
+ elem_dict = dict()
+
+ with gr.Row():
+ dataset_dir = gr.Textbox(value=DEFAULT_DATA_DIR, scale=2)
+ dataset = gr.Dropdown(multiselect=True, allow_custom_value=True, scale=4)
+ preview_elems = create_preview_box(dataset_dir, dataset)
+
+ input_elems.update({dataset_dir, dataset})
+ elem_dict.update(dict(dataset_dir=dataset_dir, dataset=dataset, **preview_elems))
+
+ with gr.Row():
+ cutoff_len = gr.Slider(minimum=4, maximum=131072, value=1024, step=1)
+ max_samples = gr.Textbox(value="100000")
+ batch_size = gr.Slider(minimum=1, maximum=1024, value=2, step=1)
+ predict = gr.Checkbox(value=True)
+
+ input_elems.update({cutoff_len, max_samples, batch_size, predict})
+ elem_dict.update(dict(cutoff_len=cutoff_len, max_samples=max_samples, batch_size=batch_size, predict=predict))
+
+ with gr.Row():
+ max_new_tokens = gr.Slider(minimum=8, maximum=4096, value=512, step=1)
+ top_p = gr.Slider(minimum=0.01, maximum=1, value=0.7, step=0.01)
+ temperature = gr.Slider(minimum=0.01, maximum=1.5, value=0.95, step=0.01)
+ output_dir = gr.Textbox()
+
+ input_elems.update({max_new_tokens, top_p, temperature, output_dir})
+ elem_dict.update(dict(max_new_tokens=max_new_tokens, top_p=top_p, temperature=temperature, output_dir=output_dir))
+
+ with gr.Row():
+ cmd_preview_btn = gr.Button()
+ start_btn = gr.Button(variant="primary")
+ stop_btn = gr.Button(variant="stop")
+
+ with gr.Row():
+ resume_btn = gr.Checkbox(visible=False, interactive=False)
+ progress_bar = gr.Slider(visible=False, interactive=False)
+
+ with gr.Row():
+ output_box = gr.Markdown()
+
+ elem_dict.update(
+ dict(
+ cmd_preview_btn=cmd_preview_btn,
+ start_btn=start_btn,
+ stop_btn=stop_btn,
+ resume_btn=resume_btn,
+ progress_bar=progress_bar,
+ output_box=output_box,
+ )
+ )
+ output_elems = [output_box, progress_bar]
+
+ cmd_preview_btn.click(engine.runner.preview_eval, input_elems, output_elems, concurrency_limit=None)
+ start_btn.click(engine.runner.run_eval, input_elems, output_elems)
+ stop_btn.click(engine.runner.set_abort)
+ resume_btn.change(engine.runner.monitor, outputs=output_elems, concurrency_limit=None)
+
+ dataset.focus(list_datasets, [dataset_dir], [dataset], queue=False)
+
+ return elem_dict
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/components/export.py b/post-training/LLaMA-Factory/src/llamafactory/webui/components/export.py
new file mode 100644
index 0000000..bb458e6
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/components/export.py
@@ -0,0 +1,158 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from collections.abc import Generator
+from typing import TYPE_CHECKING, Union
+
+from ...extras.constants import PEFT_METHODS
+from ...extras.misc import torch_gc
+from ...extras.packages import is_gradio_available
+from ...train.tuner import export_model
+from ..common import get_save_dir, load_config
+from ..locales import ALERTS
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+ from ..engine import Engine
+
+
+GPTQ_BITS = ["8", "4", "3", "2"]
+
+
+def can_quantize(checkpoint_path: Union[str, list[str]]) -> "gr.Dropdown":
+ if isinstance(checkpoint_path, list) and len(checkpoint_path) != 0:
+ return gr.Dropdown(value="none", interactive=False)
+ else:
+ return gr.Dropdown(interactive=True)
+
+
+def save_model(
+ lang: str,
+ model_name: str,
+ model_path: str,
+ finetuning_type: str,
+ checkpoint_path: Union[str, list[str]],
+ template: str,
+ export_size: int,
+ export_quantization_bit: str,
+ export_quantization_dataset: str,
+ export_device: str,
+ export_legacy_format: bool,
+ export_dir: str,
+ export_hub_model_id: str,
+) -> Generator[str, None, None]:
+ user_config = load_config()
+ error = ""
+ if not model_name:
+ error = ALERTS["err_no_model"][lang]
+ elif not model_path:
+ error = ALERTS["err_no_path"][lang]
+ elif not export_dir:
+ error = ALERTS["err_no_export_dir"][lang]
+ elif export_quantization_bit in GPTQ_BITS and not export_quantization_dataset:
+ error = ALERTS["err_no_dataset"][lang]
+ elif export_quantization_bit not in GPTQ_BITS and not checkpoint_path:
+ error = ALERTS["err_no_adapter"][lang]
+ elif export_quantization_bit in GPTQ_BITS and checkpoint_path and isinstance(checkpoint_path, list):
+ error = ALERTS["err_gptq_lora"][lang]
+
+ if error:
+ gr.Warning(error)
+ yield error
+ return
+
+ args = dict(
+ model_name_or_path=model_path,
+ cache_dir=user_config.get("cache_dir", None),
+ finetuning_type=finetuning_type,
+ template=template,
+ export_dir=export_dir,
+ export_hub_model_id=export_hub_model_id or None,
+ export_size=export_size,
+ export_quantization_bit=int(export_quantization_bit) if export_quantization_bit in GPTQ_BITS else None,
+ export_quantization_dataset=export_quantization_dataset,
+ export_device=export_device,
+ export_legacy_format=export_legacy_format,
+ trust_remote_code=True,
+ )
+
+ if checkpoint_path:
+ if finetuning_type in PEFT_METHODS: # list
+ args["adapter_name_or_path"] = ",".join(
+ [get_save_dir(model_name, finetuning_type, adapter) for adapter in checkpoint_path]
+ )
+ else: # str
+ args["model_name_or_path"] = get_save_dir(model_name, finetuning_type, checkpoint_path)
+
+ yield ALERTS["info_exporting"][lang]
+ export_model(args)
+ torch_gc()
+ yield ALERTS["info_exported"][lang]
+
+
+def create_export_tab(engine: "Engine") -> dict[str, "Component"]:
+ with gr.Row():
+ export_size = gr.Slider(minimum=1, maximum=100, value=5, step=1)
+ export_quantization_bit = gr.Dropdown(choices=["none"] + GPTQ_BITS, value="none")
+ export_quantization_dataset = gr.Textbox(value="data/c4_demo.jsonl")
+ export_device = gr.Radio(choices=["cpu", "auto"], value="cpu")
+ export_legacy_format = gr.Checkbox()
+
+ with gr.Row():
+ export_dir = gr.Textbox()
+ export_hub_model_id = gr.Textbox()
+
+ checkpoint_path: gr.Dropdown = engine.manager.get_elem_by_id("top.checkpoint_path")
+ checkpoint_path.change(can_quantize, [checkpoint_path], [export_quantization_bit], queue=False)
+
+ export_btn = gr.Button()
+ info_box = gr.Textbox(show_label=False, interactive=False)
+
+ export_btn.click(
+ save_model,
+ [
+ engine.manager.get_elem_by_id("top.lang"),
+ engine.manager.get_elem_by_id("top.model_name"),
+ engine.manager.get_elem_by_id("top.model_path"),
+ engine.manager.get_elem_by_id("top.finetuning_type"),
+ engine.manager.get_elem_by_id("top.checkpoint_path"),
+ engine.manager.get_elem_by_id("top.template"),
+ export_size,
+ export_quantization_bit,
+ export_quantization_dataset,
+ export_device,
+ export_legacy_format,
+ export_dir,
+ export_hub_model_id,
+ ],
+ [info_box],
+ )
+
+ return dict(
+ export_size=export_size,
+ export_quantization_bit=export_quantization_bit,
+ export_quantization_dataset=export_quantization_dataset,
+ export_device=export_device,
+ export_legacy_format=export_legacy_format,
+ export_dir=export_dir,
+ export_hub_model_id=export_hub_model_id,
+ export_btn=export_btn,
+ info_box=info_box,
+ )
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/components/infer.py b/post-training/LLaMA-Factory/src/llamafactory/webui/components/infer.py
new file mode 100644
index 0000000..677036b
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/components/infer.py
@@ -0,0 +1,74 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from ...extras.packages import is_gradio_available
+from ..common import is_multimodal
+from .chatbot import create_chat_box
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+ from ..engine import Engine
+
+
+def create_infer_tab(engine: "Engine") -> dict[str, "Component"]:
+ input_elems = engine.manager.get_base_elems()
+ elem_dict = dict()
+
+ with gr.Row():
+ infer_backend = gr.Dropdown(choices=["huggingface", "vllm", "sglang"], value="huggingface")
+ infer_dtype = gr.Dropdown(choices=["auto", "float16", "bfloat16", "float32"], value="auto")
+
+ with gr.Row():
+ load_btn = gr.Button()
+ unload_btn = gr.Button()
+
+ info_box = gr.Textbox(show_label=False, interactive=False)
+
+ input_elems.update({infer_backend, infer_dtype})
+ elem_dict.update(
+ dict(
+ infer_backend=infer_backend,
+ infer_dtype=infer_dtype,
+ load_btn=load_btn,
+ unload_btn=unload_btn,
+ info_box=info_box,
+ )
+ )
+
+ chatbot, messages, chat_elems = create_chat_box(engine, visible=False)
+ elem_dict.update(chat_elems)
+
+ load_btn.click(engine.chatter.load_model, input_elems, [info_box]).then(
+ lambda: gr.Column(visible=engine.chatter.loaded), outputs=[chat_elems["chat_box"]]
+ )
+
+ unload_btn.click(engine.chatter.unload_model, input_elems, [info_box]).then(
+ lambda: ([], []), outputs=[chatbot, messages]
+ ).then(lambda: gr.Column(visible=engine.chatter.loaded), outputs=[chat_elems["chat_box"]])
+
+ engine.manager.get_elem_by_id("top.model_name").change(
+ lambda model_name: gr.Column(visible=is_multimodal(model_name)),
+ [engine.manager.get_elem_by_id("top.model_name")],
+ [chat_elems["mm_box"]],
+ )
+
+ return elem_dict
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/components/top.py b/post-training/LLaMA-Factory/src/llamafactory/webui/components/top.py
new file mode 100644
index 0000000..d6df174
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/components/top.py
@@ -0,0 +1,72 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from ...data import TEMPLATES
+from ...extras.constants import METHODS, SUPPORTED_MODELS
+from ...extras.packages import is_gradio_available
+from ..common import save_config
+from ..control import can_quantize, can_quantize_to, get_model_info, list_checkpoints
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+
+def create_top() -> dict[str, "Component"]:
+ with gr.Row():
+ lang = gr.Dropdown(choices=["en", "ru", "zh", "ko", "ja"], value=None, scale=1)
+ available_models = list(SUPPORTED_MODELS.keys()) + ["Custom"]
+ model_name = gr.Dropdown(choices=available_models, value=None, scale=3)
+ model_path = gr.Textbox(scale=3)
+
+ with gr.Row():
+ finetuning_type = gr.Dropdown(choices=METHODS, value="lora", scale=1)
+ checkpoint_path = gr.Dropdown(multiselect=True, allow_custom_value=True, scale=6)
+
+ with gr.Row():
+ quantization_bit = gr.Dropdown(choices=["none", "8", "4"], value="none", allow_custom_value=True)
+ quantization_method = gr.Dropdown(choices=["bnb", "hqq", "eetq"], value="bnb")
+ template = gr.Dropdown(choices=list(TEMPLATES.keys()), value="default")
+ rope_scaling = gr.Dropdown(choices=["none", "linear", "dynamic", "yarn", "llama3"], value="none")
+ booster = gr.Dropdown(choices=["auto", "flashattn2", "unsloth", "liger_kernel"], value="auto")
+
+ model_name.change(get_model_info, [model_name], [model_path, template], queue=False).then(
+ list_checkpoints, [model_name, finetuning_type], [checkpoint_path], queue=False
+ )
+ model_name.input(save_config, inputs=[lang, model_name], queue=False)
+ model_path.input(save_config, inputs=[lang, model_name, model_path], queue=False)
+ finetuning_type.change(can_quantize, [finetuning_type], [quantization_bit], queue=False).then(
+ list_checkpoints, [model_name, finetuning_type], [checkpoint_path], queue=False
+ )
+ checkpoint_path.focus(list_checkpoints, [model_name, finetuning_type], [checkpoint_path], queue=False)
+ quantization_method.change(can_quantize_to, [quantization_method], [quantization_bit], queue=False)
+
+ return dict(
+ lang=lang,
+ model_name=model_name,
+ model_path=model_path,
+ finetuning_type=finetuning_type,
+ checkpoint_path=checkpoint_path,
+ quantization_bit=quantization_bit,
+ quantization_method=quantization_method,
+ template=template,
+ rope_scaling=rope_scaling,
+ booster=booster,
+ )
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/components/train.py b/post-training/LLaMA-Factory/src/llamafactory/webui/components/train.py
new file mode 100644
index 0000000..7ca9964
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/components/train.py
@@ -0,0 +1,409 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from transformers.trainer_utils import SchedulerType
+
+from ...extras.constants import TRAINING_STAGES
+from ...extras.misc import get_device_count
+from ...extras.packages import is_gradio_available
+from ..common import DEFAULT_DATA_DIR
+from ..control import change_stage, list_checkpoints, list_config_paths, list_datasets, list_output_dirs
+from .data import create_preview_box
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+ from ..engine import Engine
+
+
+def create_train_tab(engine: "Engine") -> dict[str, "Component"]:
+ input_elems = engine.manager.get_base_elems()
+ elem_dict = dict()
+
+ with gr.Row():
+ stages = list(TRAINING_STAGES.keys())
+ training_stage = gr.Dropdown(choices=stages, value=stages[0], scale=1)
+ dataset_dir = gr.Textbox(value=DEFAULT_DATA_DIR, scale=1)
+ dataset = gr.Dropdown(multiselect=True, allow_custom_value=True, scale=4)
+ preview_elems = create_preview_box(dataset_dir, dataset)
+
+ input_elems.update({training_stage, dataset_dir, dataset})
+ elem_dict.update(dict(training_stage=training_stage, dataset_dir=dataset_dir, dataset=dataset, **preview_elems))
+
+ with gr.Row():
+ learning_rate = gr.Textbox(value="5e-5")
+ num_train_epochs = gr.Textbox(value="3.0")
+ max_grad_norm = gr.Textbox(value="1.0")
+ max_samples = gr.Textbox(value="100000")
+ compute_type = gr.Dropdown(choices=["bf16", "fp16", "fp32", "pure_bf16"], value="bf16")
+
+ input_elems.update({learning_rate, num_train_epochs, max_grad_norm, max_samples, compute_type})
+ elem_dict.update(
+ dict(
+ learning_rate=learning_rate,
+ num_train_epochs=num_train_epochs,
+ max_grad_norm=max_grad_norm,
+ max_samples=max_samples,
+ compute_type=compute_type,
+ )
+ )
+
+ with gr.Row():
+ cutoff_len = gr.Slider(minimum=4, maximum=131072, value=2048, step=1)
+ batch_size = gr.Slider(minimum=1, maximum=1024, value=2, step=1)
+ gradient_accumulation_steps = gr.Slider(minimum=1, maximum=1024, value=8, step=1)
+ val_size = gr.Slider(minimum=0, maximum=1, value=0, step=0.001)
+ lr_scheduler_type = gr.Dropdown(choices=[scheduler.value for scheduler in SchedulerType], value="cosine")
+
+ input_elems.update({cutoff_len, batch_size, gradient_accumulation_steps, val_size, lr_scheduler_type})
+ elem_dict.update(
+ dict(
+ cutoff_len=cutoff_len,
+ batch_size=batch_size,
+ gradient_accumulation_steps=gradient_accumulation_steps,
+ val_size=val_size,
+ lr_scheduler_type=lr_scheduler_type,
+ )
+ )
+
+ with gr.Accordion(open=False) as extra_tab:
+ with gr.Row():
+ logging_steps = gr.Slider(minimum=1, maximum=1000, value=5, step=5)
+ save_steps = gr.Slider(minimum=10, maximum=5000, value=100, step=10)
+ warmup_steps = gr.Slider(minimum=0, maximum=5000, value=0, step=1)
+ neftune_alpha = gr.Slider(minimum=0, maximum=10, value=0, step=0.1)
+ extra_args = gr.Textbox(value='{"optim": "adamw_torch"}')
+
+ with gr.Row():
+ with gr.Column():
+ packing = gr.Checkbox()
+ neat_packing = gr.Checkbox()
+
+ with gr.Column():
+ train_on_prompt = gr.Checkbox()
+ mask_history = gr.Checkbox()
+
+ with gr.Column():
+ resize_vocab = gr.Checkbox()
+ use_llama_pro = gr.Checkbox()
+
+ with gr.Column():
+ report_to = gr.Dropdown(
+ choices=["none", "all", "wandb", "mlflow", "neptune", "tensorboard"],
+ value=["none"],
+ allow_custom_value=True,
+ multiselect=True,
+ )
+
+ input_elems.update(
+ {
+ logging_steps,
+ save_steps,
+ warmup_steps,
+ neftune_alpha,
+ extra_args,
+ packing,
+ neat_packing,
+ train_on_prompt,
+ mask_history,
+ resize_vocab,
+ use_llama_pro,
+ report_to,
+ }
+ )
+ elem_dict.update(
+ dict(
+ extra_tab=extra_tab,
+ logging_steps=logging_steps,
+ save_steps=save_steps,
+ warmup_steps=warmup_steps,
+ neftune_alpha=neftune_alpha,
+ extra_args=extra_args,
+ packing=packing,
+ neat_packing=neat_packing,
+ train_on_prompt=train_on_prompt,
+ mask_history=mask_history,
+ resize_vocab=resize_vocab,
+ use_llama_pro=use_llama_pro,
+ report_to=report_to,
+ )
+ )
+
+ with gr.Accordion(open=False) as freeze_tab:
+ with gr.Row():
+ freeze_trainable_layers = gr.Slider(minimum=-128, maximum=128, value=2, step=1)
+ freeze_trainable_modules = gr.Textbox(value="all")
+ freeze_extra_modules = gr.Textbox()
+
+ input_elems.update({freeze_trainable_layers, freeze_trainable_modules, freeze_extra_modules})
+ elem_dict.update(
+ dict(
+ freeze_tab=freeze_tab,
+ freeze_trainable_layers=freeze_trainable_layers,
+ freeze_trainable_modules=freeze_trainable_modules,
+ freeze_extra_modules=freeze_extra_modules,
+ )
+ )
+
+ with gr.Accordion(open=False) as lora_tab:
+ with gr.Row():
+ lora_rank = gr.Slider(minimum=1, maximum=1024, value=8, step=1)
+ lora_alpha = gr.Slider(minimum=1, maximum=2048, value=16, step=1)
+ lora_dropout = gr.Slider(minimum=0, maximum=1, value=0, step=0.01)
+ loraplus_lr_ratio = gr.Slider(minimum=0, maximum=64, value=0, step=0.01)
+ create_new_adapter = gr.Checkbox()
+
+ with gr.Row():
+ use_rslora = gr.Checkbox()
+ use_dora = gr.Checkbox()
+ use_pissa = gr.Checkbox()
+ lora_target = gr.Textbox(scale=2)
+ additional_target = gr.Textbox(scale=2)
+
+ input_elems.update(
+ {
+ lora_rank,
+ lora_alpha,
+ lora_dropout,
+ loraplus_lr_ratio,
+ create_new_adapter,
+ use_rslora,
+ use_dora,
+ use_pissa,
+ lora_target,
+ additional_target,
+ }
+ )
+ elem_dict.update(
+ dict(
+ lora_tab=lora_tab,
+ lora_rank=lora_rank,
+ lora_alpha=lora_alpha,
+ lora_dropout=lora_dropout,
+ loraplus_lr_ratio=loraplus_lr_ratio,
+ create_new_adapter=create_new_adapter,
+ use_rslora=use_rslora,
+ use_dora=use_dora,
+ use_pissa=use_pissa,
+ lora_target=lora_target,
+ additional_target=additional_target,
+ )
+ )
+
+ with gr.Accordion(open=False) as rlhf_tab:
+ with gr.Row():
+ pref_beta = gr.Slider(minimum=0, maximum=1, value=0.1, step=0.01)
+ pref_ftx = gr.Slider(minimum=0, maximum=10, value=0, step=0.01)
+ pref_loss = gr.Dropdown(choices=["sigmoid", "hinge", "ipo", "kto_pair", "orpo", "simpo"], value="sigmoid")
+ reward_model = gr.Dropdown(multiselect=True, allow_custom_value=True)
+ with gr.Column():
+ ppo_score_norm = gr.Checkbox()
+ ppo_whiten_rewards = gr.Checkbox()
+
+ input_elems.update({pref_beta, pref_ftx, pref_loss, reward_model, ppo_score_norm, ppo_whiten_rewards})
+ elem_dict.update(
+ dict(
+ rlhf_tab=rlhf_tab,
+ pref_beta=pref_beta,
+ pref_ftx=pref_ftx,
+ pref_loss=pref_loss,
+ reward_model=reward_model,
+ ppo_score_norm=ppo_score_norm,
+ ppo_whiten_rewards=ppo_whiten_rewards,
+ )
+ )
+
+ with gr.Accordion(open=False) as galore_tab:
+ with gr.Row():
+ use_galore = gr.Checkbox()
+ galore_rank = gr.Slider(minimum=1, maximum=1024, value=16, step=1)
+ galore_update_interval = gr.Slider(minimum=1, maximum=2048, value=200, step=1)
+ galore_scale = gr.Slider(minimum=0, maximum=100, value=2.0, step=0.1)
+ galore_target = gr.Textbox(value="all")
+
+ input_elems.update({use_galore, galore_rank, galore_update_interval, galore_scale, galore_target})
+ elem_dict.update(
+ dict(
+ galore_tab=galore_tab,
+ use_galore=use_galore,
+ galore_rank=galore_rank,
+ galore_update_interval=galore_update_interval,
+ galore_scale=galore_scale,
+ galore_target=galore_target,
+ )
+ )
+
+ with gr.Accordion(open=False) as apollo_tab:
+ with gr.Row():
+ use_apollo = gr.Checkbox()
+ apollo_rank = gr.Slider(minimum=1, maximum=1024, value=16, step=1)
+ apollo_update_interval = gr.Slider(minimum=1, maximum=2048, value=200, step=1)
+ apollo_scale = gr.Slider(minimum=0, maximum=100, value=32.0, step=0.1)
+ apollo_target = gr.Textbox(value="all")
+
+ input_elems.update({use_apollo, apollo_rank, apollo_update_interval, apollo_scale, apollo_target})
+ elem_dict.update(
+ dict(
+ apollo_tab=apollo_tab,
+ use_apollo=use_apollo,
+ apollo_rank=apollo_rank,
+ apollo_update_interval=apollo_update_interval,
+ apollo_scale=apollo_scale,
+ apollo_target=apollo_target,
+ )
+ )
+
+ with gr.Accordion(open=False) as badam_tab:
+ with gr.Row():
+ use_badam = gr.Checkbox()
+ badam_mode = gr.Dropdown(choices=["layer", "ratio"], value="layer")
+ badam_switch_mode = gr.Dropdown(choices=["ascending", "descending", "random", "fixed"], value="ascending")
+ badam_switch_interval = gr.Slider(minimum=1, maximum=1024, value=50, step=1)
+ badam_update_ratio = gr.Slider(minimum=0, maximum=1, value=0.05, step=0.01)
+
+ input_elems.update({use_badam, badam_mode, badam_switch_mode, badam_switch_interval, badam_update_ratio})
+ elem_dict.update(
+ dict(
+ badam_tab=badam_tab,
+ use_badam=use_badam,
+ badam_mode=badam_mode,
+ badam_switch_mode=badam_switch_mode,
+ badam_switch_interval=badam_switch_interval,
+ badam_update_ratio=badam_update_ratio,
+ )
+ )
+
+ with gr.Accordion(open=False) as swanlab_tab:
+ with gr.Row():
+ use_swanlab = gr.Checkbox()
+ swanlab_project = gr.Textbox(value="llamafactory")
+ swanlab_run_name = gr.Textbox()
+ swanlab_workspace = gr.Textbox()
+ swanlab_api_key = gr.Textbox()
+ swanlab_mode = gr.Dropdown(choices=["cloud", "local"], value="cloud")
+ swanlab_link = gr.Markdown(visible=False)
+
+ input_elems.update(
+ {
+ use_swanlab,
+ swanlab_project,
+ swanlab_run_name,
+ swanlab_workspace,
+ swanlab_api_key,
+ swanlab_mode,
+ swanlab_link,
+ }
+ )
+ elem_dict.update(
+ dict(
+ swanlab_tab=swanlab_tab,
+ use_swanlab=use_swanlab,
+ swanlab_project=swanlab_project,
+ swanlab_run_name=swanlab_run_name,
+ swanlab_workspace=swanlab_workspace,
+ swanlab_api_key=swanlab_api_key,
+ swanlab_mode=swanlab_mode,
+ swanlab_link=swanlab_link,
+ )
+ )
+
+ with gr.Row():
+ cmd_preview_btn = gr.Button()
+ arg_save_btn = gr.Button()
+ arg_load_btn = gr.Button()
+ start_btn = gr.Button(variant="primary")
+ stop_btn = gr.Button(variant="stop")
+
+ with gr.Row():
+ with gr.Column(scale=3):
+ with gr.Row():
+ current_time = gr.Textbox(visible=False, interactive=False)
+ output_dir = gr.Dropdown(allow_custom_value=True)
+ config_path = gr.Dropdown(allow_custom_value=True)
+
+ with gr.Row():
+ device_count = gr.Textbox(value=str(get_device_count() or 1), interactive=False)
+ ds_stage = gr.Dropdown(choices=["none", "2", "3"], value="none")
+ ds_offload = gr.Checkbox()
+
+ with gr.Row():
+ resume_btn = gr.Checkbox(visible=False, interactive=False)
+ progress_bar = gr.Slider(visible=False, interactive=False)
+
+ with gr.Row():
+ output_box = gr.Markdown()
+
+ with gr.Column(scale=1):
+ loss_viewer = gr.Plot()
+
+ input_elems.update({output_dir, config_path, ds_stage, ds_offload})
+ elem_dict.update(
+ dict(
+ cmd_preview_btn=cmd_preview_btn,
+ arg_save_btn=arg_save_btn,
+ arg_load_btn=arg_load_btn,
+ start_btn=start_btn,
+ stop_btn=stop_btn,
+ current_time=current_time,
+ output_dir=output_dir,
+ config_path=config_path,
+ device_count=device_count,
+ ds_stage=ds_stage,
+ ds_offload=ds_offload,
+ resume_btn=resume_btn,
+ progress_bar=progress_bar,
+ output_box=output_box,
+ loss_viewer=loss_viewer,
+ )
+ )
+ output_elems = [output_box, progress_bar, loss_viewer, swanlab_link]
+
+ cmd_preview_btn.click(engine.runner.preview_train, input_elems, output_elems, concurrency_limit=None)
+ start_btn.click(engine.runner.run_train, input_elems, output_elems)
+ stop_btn.click(engine.runner.set_abort)
+ resume_btn.change(engine.runner.monitor, outputs=output_elems, concurrency_limit=None)
+
+ lang = engine.manager.get_elem_by_id("top.lang")
+ model_name: gr.Dropdown = engine.manager.get_elem_by_id("top.model_name")
+ finetuning_type: gr.Dropdown = engine.manager.get_elem_by_id("top.finetuning_type")
+
+ arg_save_btn.click(engine.runner.save_args, input_elems, output_elems, concurrency_limit=None)
+ arg_load_btn.click(
+ engine.runner.load_args, [lang, config_path], list(input_elems) + [output_box], concurrency_limit=None
+ )
+
+ dataset.focus(list_datasets, [dataset_dir, training_stage], [dataset], queue=False)
+ training_stage.change(change_stage, [training_stage], [dataset, packing], queue=False)
+ reward_model.focus(list_checkpoints, [model_name, finetuning_type], [reward_model], queue=False)
+ model_name.change(list_output_dirs, [model_name, finetuning_type, current_time], [output_dir], queue=False)
+ finetuning_type.change(list_output_dirs, [model_name, finetuning_type, current_time], [output_dir], queue=False)
+ output_dir.change(
+ list_output_dirs, [model_name, finetuning_type, current_time], [output_dir], concurrency_limit=None
+ )
+ output_dir.input(
+ engine.runner.check_output_dir,
+ [lang, model_name, finetuning_type, output_dir],
+ list(input_elems) + [output_box],
+ concurrency_limit=None,
+ )
+ config_path.change(list_config_paths, [current_time], [config_path], queue=False)
+
+ return elem_dict
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/control.py b/post-training/LLaMA-Factory/src/llamafactory/webui/control.py
new file mode 100644
index 0000000..08aed40
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/control.py
@@ -0,0 +1,204 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from typing import Any, Optional
+
+from transformers.trainer_utils import get_last_checkpoint
+
+from ..extras.constants import (
+ CHECKPOINT_NAMES,
+ PEFT_METHODS,
+ RUNNING_LOG,
+ STAGES_USE_PAIR_DATA,
+ SWANLAB_CONFIG,
+ TRAINER_LOG,
+ TRAINING_STAGES,
+)
+from ..extras.packages import is_gradio_available, is_matplotlib_available
+from ..extras.ploting import gen_loss_plot
+from ..model import QuantizationMethod
+from .common import DEFAULT_CONFIG_DIR, DEFAULT_DATA_DIR, get_model_path, get_save_dir, get_template, load_dataset_info
+from .locales import ALERTS
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+def can_quantize(finetuning_type: str) -> "gr.Dropdown":
+ r"""Judge if the quantization is available in this finetuning type.
+
+ Inputs: top.finetuning_type
+ Outputs: top.quantization_bit
+ """
+ if finetuning_type not in PEFT_METHODS:
+ return gr.Dropdown(value="none", interactive=False)
+ else:
+ return gr.Dropdown(interactive=True)
+
+
+def can_quantize_to(quantization_method: str) -> "gr.Dropdown":
+ r"""Get the available quantization bits.
+
+ Inputs: top.quantization_method
+ Outputs: top.quantization_bit
+ """
+ if quantization_method == QuantizationMethod.BITS_AND_BYTES.value:
+ available_bits = ["none", "8", "4"]
+ elif quantization_method == QuantizationMethod.HQQ.value:
+ available_bits = ["none", "8", "6", "5", "4", "3", "2", "1"]
+ elif quantization_method == QuantizationMethod.EETQ.value:
+ available_bits = ["none", "8"]
+
+ return gr.Dropdown(choices=available_bits)
+
+
+def change_stage(training_stage: str = list(TRAINING_STAGES.keys())[0]) -> tuple[list[str], bool]:
+ r"""Modify states after changing the training stage.
+
+ Inputs: train.training_stage
+ Outputs: train.dataset, train.packing
+ """
+ return [], TRAINING_STAGES[training_stage] == "pt"
+
+
+def get_model_info(model_name: str) -> tuple[str, str]:
+ r"""Get the necessary information of this model.
+
+ Inputs: top.model_name
+ Outputs: top.model_path, top.template
+ """
+ return get_model_path(model_name), get_template(model_name)
+
+
+def get_trainer_info(lang: str, output_path: os.PathLike, do_train: bool) -> tuple[str, "gr.Slider", dict[str, Any]]:
+ r"""Get training infomation for monitor.
+
+ If do_train is True:
+ Inputs: top.lang, train.output_path
+ Outputs: train.output_box, train.progress_bar, train.loss_viewer, train.swanlab_link
+ If do_train is False:
+ Inputs: top.lang, eval.output_path
+ Outputs: eval.output_box, eval.progress_bar, None, None
+ """
+ running_log = ""
+ running_progress = gr.Slider(visible=False)
+ running_info = {}
+
+ running_log_path = os.path.join(output_path, RUNNING_LOG)
+ if os.path.isfile(running_log_path):
+ with open(running_log_path, encoding="utf-8") as f:
+ running_log = f.read()[-20000:] # avoid lengthy log
+
+ trainer_log_path = os.path.join(output_path, TRAINER_LOG)
+ if os.path.isfile(trainer_log_path):
+ trainer_log: list[dict[str, Any]] = []
+ with open(trainer_log_path, encoding="utf-8") as f:
+ for line in f:
+ trainer_log.append(json.loads(line))
+
+ if len(trainer_log) != 0:
+ latest_log = trainer_log[-1]
+ percentage = latest_log["percentage"]
+ label = "Running {:d}/{:d}: {} < {}".format(
+ latest_log["current_steps"],
+ latest_log["total_steps"],
+ latest_log["elapsed_time"],
+ latest_log["remaining_time"],
+ )
+ running_progress = gr.Slider(label=label, value=percentage, visible=True)
+
+ if do_train and is_matplotlib_available():
+ running_info["loss_viewer"] = gr.Plot(gen_loss_plot(trainer_log))
+
+ swanlab_config_path = os.path.join(output_path, SWANLAB_CONFIG)
+ if os.path.isfile(swanlab_config_path):
+ with open(swanlab_config_path, encoding="utf-8") as f:
+ swanlab_public_config = json.load(f)
+ swanlab_link = swanlab_public_config["cloud"]["experiment_url"]
+ if swanlab_link is not None:
+ running_info["swanlab_link"] = gr.Markdown(
+ ALERTS["info_swanlab_link"][lang] + swanlab_link, visible=True
+ )
+
+ return running_log, running_progress, running_info
+
+
+def list_checkpoints(model_name: str, finetuning_type: str) -> "gr.Dropdown":
+ r"""List all available checkpoints.
+
+ Inputs: top.model_name, top.finetuning_type
+ Outputs: top.checkpoint_path
+ """
+ checkpoints = []
+ if model_name:
+ save_dir = get_save_dir(model_name, finetuning_type)
+ if save_dir and os.path.isdir(save_dir):
+ for checkpoint in os.listdir(save_dir):
+ if os.path.isdir(os.path.join(save_dir, checkpoint)) and any(
+ os.path.isfile(os.path.join(save_dir, checkpoint, name)) for name in CHECKPOINT_NAMES
+ ):
+ checkpoints.append(checkpoint)
+
+ if finetuning_type in PEFT_METHODS:
+ return gr.Dropdown(value=[], choices=checkpoints, multiselect=True)
+ else:
+ return gr.Dropdown(value=None, choices=checkpoints, multiselect=False)
+
+
+def list_config_paths(current_time: str) -> "gr.Dropdown":
+ r"""List all the saved configuration files.
+
+ Inputs: train.current_time
+ Outputs: train.config_path
+ """
+ config_files = [f"{current_time}.yaml"]
+ if os.path.isdir(DEFAULT_CONFIG_DIR):
+ for file_name in os.listdir(DEFAULT_CONFIG_DIR):
+ if file_name.endswith(".yaml") and file_name not in config_files:
+ config_files.append(file_name)
+
+ return gr.Dropdown(choices=config_files)
+
+
+def list_datasets(dataset_dir: str = None, training_stage: str = list(TRAINING_STAGES.keys())[0]) -> "gr.Dropdown":
+ r"""List all available datasets in the dataset dir for the training stage.
+
+ Inputs: *.dataset_dir, *.training_stage
+ Outputs: *.dataset
+ """
+ dataset_info = load_dataset_info(dataset_dir if dataset_dir is not None else DEFAULT_DATA_DIR)
+ ranking = TRAINING_STAGES[training_stage] in STAGES_USE_PAIR_DATA
+ datasets = [k for k, v in dataset_info.items() if v.get("ranking", False) == ranking]
+ return gr.Dropdown(choices=datasets)
+
+
+def list_output_dirs(model_name: Optional[str], finetuning_type: str, current_time: str) -> "gr.Dropdown":
+ r"""List all the directories that can resume from.
+
+ Inputs: top.model_name, top.finetuning_type, train.current_time
+ Outputs: train.output_dir
+ """
+ output_dirs = [f"train_{current_time}"]
+ if model_name:
+ save_dir = get_save_dir(model_name, finetuning_type)
+ if save_dir and os.path.isdir(save_dir):
+ for folder in os.listdir(save_dir):
+ output_dir = os.path.join(save_dir, folder)
+ if os.path.isdir(output_dir) and get_last_checkpoint(output_dir) is not None:
+ output_dirs.append(folder)
+
+ return gr.Dropdown(choices=output_dirs)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/css.py b/post-training/LLaMA-Factory/src/llamafactory/webui/css.py
new file mode 100644
index 0000000..b7e4c35
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/css.py
@@ -0,0 +1,67 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+CSS = r"""
+.duplicate-button {
+ margin: auto !important;
+ color: white !important;
+ background: black !important;
+ border-radius: 100vh !important;
+}
+
+.thinking-summary {
+ padding: 8px !important;
+}
+
+.thinking-summary span {
+ border-radius: 4px !important;
+ padding: 4px !important;
+ cursor: pointer !important;
+ font-size: 14px !important;
+ background: rgb(245, 245, 245) !important;
+}
+
+.dark .thinking-summary span {
+ background: rgb(73, 73, 73) !important;
+}
+
+.thinking-container {
+ border-left: 2px solid #a6a6a6 !important;
+ padding-left: 10px !important;
+ margin: 4px 0 !important;
+}
+
+.thinking-container p {
+ color: #a6a6a6 !important;
+}
+
+.modal-box {
+ position: fixed !important;
+ top: 50%;
+ left: 50%;
+ transform: translate(-50%, -50%); /* center horizontally */
+ max-width: 1000px;
+ max-height: 750px;
+ overflow-y: auto;
+ background-color: var(--input-background-fill);
+ flex-wrap: nowrap !important;
+ border: 2px solid black !important;
+ z-index: 1000;
+ padding: 10px;
+}
+
+.dark .modal-box {
+ border: 2px solid white !important;
+}
+"""
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/engine.py b/post-training/LLaMA-Factory/src/llamafactory/webui/engine.py
new file mode 100644
index 0000000..8844cac
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/engine.py
@@ -0,0 +1,81 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING, Any
+
+from .chatter import WebChatModel
+from .common import create_ds_config, get_time, load_config
+from .locales import LOCALES
+from .manager import Manager
+from .runner import Runner
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+
+class Engine:
+ r"""A general engine to control the behaviors of Web UI."""
+
+ def __init__(self, demo_mode: bool = False, pure_chat: bool = False) -> None:
+ self.demo_mode = demo_mode
+ self.pure_chat = pure_chat
+ self.manager = Manager()
+ self.runner = Runner(self.manager, demo_mode)
+ self.chatter = WebChatModel(self.manager, demo_mode, lazy_init=(not pure_chat))
+ if not demo_mode:
+ create_ds_config()
+
+ def _update_component(self, input_dict: dict[str, dict[str, Any]]) -> dict["Component", "Component"]:
+ r"""Update gradio components according to the (elem_id, properties) mapping."""
+ output_dict: dict[Component, Component] = {}
+ for elem_id, elem_attr in input_dict.items():
+ elem = self.manager.get_elem_by_id(elem_id)
+ output_dict[elem] = elem.__class__(**elem_attr)
+
+ return output_dict
+
+ def resume(self):
+ r"""Get the initial value of gradio components and restores training status if necessary."""
+ user_config = load_config() if not self.demo_mode else {} # do not use config in demo mode
+ lang = user_config.get("lang", None) or "en"
+ init_dict = {"top.lang": {"value": lang}, "infer.chat_box": {"visible": self.chatter.loaded}}
+
+ if not self.pure_chat:
+ current_time = get_time()
+ init_dict["train.current_time"] = {"value": current_time}
+ init_dict["train.output_dir"] = {"value": f"train_{current_time}"}
+ init_dict["train.config_path"] = {"value": f"{current_time}.yaml"}
+ init_dict["eval.output_dir"] = {"value": f"eval_{current_time}"}
+ init_dict["infer.mm_box"] = {"visible": False}
+
+ if user_config.get("last_model", None):
+ init_dict["top.model_name"] = {"value": user_config["last_model"]}
+
+ yield self._update_component(init_dict)
+
+ if self.runner.running and not self.demo_mode and not self.pure_chat:
+ yield {elem: elem.__class__(value=value) for elem, value in self.runner.running_data.items()}
+ if self.runner.do_train:
+ yield self._update_component({"train.resume_btn": {"value": True}})
+ else:
+ yield self._update_component({"eval.resume_btn": {"value": True}})
+
+ def change_lang(self, lang: str):
+ r"""Update the displayed language of gradio components."""
+ return {
+ elem: elem.__class__(**LOCALES[elem_name][lang])
+ for elem_name, elem in self.manager.get_elem_iter()
+ if elem_name in LOCALES
+ }
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/interface.py b/post-training/LLaMA-Factory/src/llamafactory/webui/interface.py
new file mode 100644
index 0000000..691a88a
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/interface.py
@@ -0,0 +1,106 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import platform
+
+from ..extras.misc import fix_proxy, is_env_enabled
+from ..extras.packages import is_gradio_available
+from .common import save_config
+from .components import (
+ create_chat_box,
+ create_eval_tab,
+ create_export_tab,
+ create_infer_tab,
+ create_top,
+ create_train_tab,
+)
+from .css import CSS
+from .engine import Engine
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+def create_ui(demo_mode: bool = False) -> "gr.Blocks":
+ engine = Engine(demo_mode=demo_mode, pure_chat=False)
+ hostname = os.getenv("HOSTNAME", os.getenv("COMPUTERNAME", platform.node())).split(".")[0]
+
+ with gr.Blocks(title=f"LLaMA Board ({hostname})", css=CSS) as demo:
+ if demo_mode:
+ gr.HTML("LLaMA Board: A One-stop Web UI for Getting Started with LLaMA Factory ")
+ gr.HTML(
+ '"
+ )
+ gr.DuplicateButton(value="Duplicate Space for private use", elem_classes="duplicate-button")
+
+ engine.manager.add_elems("top", create_top())
+ lang: gr.Dropdown = engine.manager.get_elem_by_id("top.lang")
+
+ with gr.Tab("Train"):
+ engine.manager.add_elems("train", create_train_tab(engine))
+
+ with gr.Tab("Evaluate & Predict"):
+ engine.manager.add_elems("eval", create_eval_tab(engine))
+
+ with gr.Tab("Chat"):
+ engine.manager.add_elems("infer", create_infer_tab(engine))
+
+ if not demo_mode:
+ with gr.Tab("Export"):
+ engine.manager.add_elems("export", create_export_tab(engine))
+
+ demo.load(engine.resume, outputs=engine.manager.get_elem_list(), concurrency_limit=None)
+ lang.change(engine.change_lang, [lang], engine.manager.get_elem_list(), queue=False)
+ lang.input(save_config, inputs=[lang], queue=False)
+
+ return demo
+
+
+def create_web_demo() -> "gr.Blocks":
+ engine = Engine(pure_chat=True)
+ hostname = os.getenv("HOSTNAME", os.getenv("COMPUTERNAME", platform.node())).split(".")[0]
+
+ with gr.Blocks(title=f"LLaMA Factory Web Demo ({hostname})", css=CSS) as demo:
+ lang = gr.Dropdown(choices=["en", "ru", "zh", "ko", "ja"], scale=1)
+ engine.manager.add_elems("top", dict(lang=lang))
+
+ _, _, chat_elems = create_chat_box(engine, visible=True)
+ engine.manager.add_elems("infer", chat_elems)
+
+ demo.load(engine.resume, outputs=engine.manager.get_elem_list(), concurrency_limit=None)
+ lang.change(engine.change_lang, [lang], engine.manager.get_elem_list(), queue=False)
+ lang.input(save_config, inputs=[lang], queue=False)
+
+ return demo
+
+
+def run_web_ui() -> None:
+ gradio_ipv6 = is_env_enabled("GRADIO_IPV6")
+ gradio_share = is_env_enabled("GRADIO_SHARE")
+ server_name = os.getenv("GRADIO_SERVER_NAME", "[::]" if gradio_ipv6 else "0.0.0.0")
+ print("Visit http://ip:port for Web UI, e.g., http://127.0.0.1:7860")
+ fix_proxy(ipv6_enabled=gradio_ipv6)
+ create_ui().queue().launch(share=gradio_share, server_name=server_name, inbrowser=True)
+
+
+def run_web_demo() -> None:
+ gradio_ipv6 = is_env_enabled("GRADIO_IPV6")
+ gradio_share = is_env_enabled("GRADIO_SHARE")
+ server_name = os.getenv("GRADIO_SERVER_NAME", "[::]" if gradio_ipv6 else "0.0.0.0")
+ print("Visit http://ip:port for Web UI, e.g., http://127.0.0.1:7860")
+ fix_proxy(ipv6_enabled=gradio_ipv6)
+ create_web_demo().queue().launch(share=gradio_share, server_name=server_name, inbrowser=True)
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/locales.py b/post-training/LLaMA-Factory/src/llamafactory/webui/locales.py
new file mode 100644
index 0000000..ad8ebeb
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/locales.py
@@ -0,0 +1,2880 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+LOCALES = {
+ "lang": {
+ "en": {
+ "label": "Language",
+ },
+ "ru": {
+ "label": "Язык",
+ },
+ "zh": {
+ "label": "语言",
+ },
+ "ko": {
+ "label": "언어",
+ },
+ "ja": {
+ "label": "言語",
+ },
+ },
+ "model_name": {
+ "en": {
+ "label": "Model name",
+ "info": "Input the initial name to search for the model.",
+ },
+ "ru": {
+ "label": "Название модели",
+ "info": "Введите начальное имя для поиска модели.",
+ },
+ "zh": {
+ "label": "模型名称",
+ "info": "输入首单词以检索模型。",
+ },
+ "ko": {
+ "label": "모델 이름",
+ "info": "모델을 검색할 초기 이름을 입력하세요.",
+ },
+ "ja": {
+ "label": "モデル名",
+ "info": "モデルを検索するための初期名を入力してください。",
+ },
+ },
+ "model_path": {
+ "en": {
+ "label": "Model path",
+ "info": "Path to pretrained model or model identifier from Hugging Face.",
+ },
+ "ru": {
+ "label": "Путь к модели",
+ "info": "Путь к предварительно обученной модели или идентификатор модели от Hugging Face.",
+ },
+ "zh": {
+ "label": "模型路径",
+ "info": "本地模型的文件路径或 Hugging Face 的模型标识符。",
+ },
+ "ko": {
+ "label": "모델 경로",
+ "info": "사전 훈련된 모델의 경로 또는 Hugging Face의 모델 식별자.",
+ },
+ "ja": {
+ "label": "モデルパス",
+ "info": "事前学習済みモデルへのパス、または Hugging Face のモデル識別子。",
+ },
+ },
+ "finetuning_type": {
+ "en": {
+ "label": "Finetuning method",
+ },
+ "ru": {
+ "label": "Метод дообучения",
+ },
+ "zh": {
+ "label": "微调方法",
+ },
+ "ko": {
+ "label": "파인튜닝 방법",
+ },
+ "ja": {
+ "label": "ファインチューニング方法",
+ },
+ },
+ "checkpoint_path": {
+ "en": {
+ "label": "Checkpoint path",
+ },
+ "ru": {
+ "label": "Путь контрольной точки",
+ },
+ "zh": {
+ "label": "检查点路径",
+ },
+ "ko": {
+ "label": "체크포인트 경로",
+ },
+ "ja": {
+ "label": "チェックポイントパス",
+ },
+ },
+ "quantization_bit": {
+ "en": {
+ "label": "Quantization bit",
+ "info": "Enable quantization (QLoRA).",
+ },
+ "ru": {
+ "label": "Уровень квантования",
+ "info": "Включить квантование (QLoRA).",
+ },
+ "zh": {
+ "label": "量化等级",
+ "info": "启用量化(QLoRA)。",
+ },
+ "ko": {
+ "label": "양자화 비트",
+ "info": "양자화 활성화 (QLoRA).",
+ },
+ "ja": {
+ "label": "量子化ビット",
+ "info": "量子化を有効にする (QLoRA)。",
+ },
+ },
+ "quantization_method": {
+ "en": {
+ "label": "Quantization method",
+ "info": "Quantization algorithm to use.",
+ },
+ "ru": {
+ "label": "Метод квантования",
+ "info": "Алгоритм квантования, который следует использовать.",
+ },
+ "zh": {
+ "label": "量化方法",
+ "info": "使用的量化算法。",
+ },
+ "ko": {
+ "label": "양자화 방법",
+ "info": "사용할 양자화 알고리즘.",
+ },
+ "ja": {
+ "label": "量子化方法",
+ "info": "使用する量子化アルゴリズム。",
+ },
+ },
+ "template": {
+ "en": {
+ "label": "Chat template",
+ "info": "The chat template used in constructing prompts.",
+ },
+ "ru": {
+ "label": "Шаблон чата",
+ "info": "Шаблон чата используемый для составления подсказок.",
+ },
+ "zh": {
+ "label": "对话模板",
+ "info": "构建提示词时使用的模板。",
+ },
+ "ko": {
+ "label": "채팅 템플릿",
+ "info": "프롬프트 작성에 사용되는 채팅 템플릿.",
+ },
+ "ja": {
+ "label": "チャットテンプレート",
+ "info": "プロンプトの構築に使用されるチャットテンプレート。",
+ },
+ },
+ "rope_scaling": {
+ "en": {
+ "label": "RoPE scaling",
+ "info": "RoPE scaling method to use.",
+ },
+ "ru": {
+ "label": "Масштабирование RoPE",
+ "info": "Метод масштабирования RoPE для использования.",
+ },
+ "zh": {"label": "RoPE 插值方法", "info": "RoPE 插值时使用的方法。"},
+ "ko": {
+ "label": "RoPE 스케일링",
+ "info": "사용할 RoPE 스케일링 방법.",
+ },
+ "ja": {
+ "label": "RoPE スケーリング",
+ "info": "使用する RoPE スケーリング方法。",
+ },
+ },
+ "booster": {
+ "en": {
+ "label": "Booster",
+ "info": "Approach used to boost training speed.",
+ },
+ "ru": {
+ "label": "Ускоритель",
+ "info": "Подход, используемый для ускорения обучения.",
+ },
+ "zh": {"label": "加速方式", "info": "使用的加速方法。"},
+ "ko": {
+ "label": "부스터",
+ "info": "훈련 속도를 향상시키기 위해 사용된 접근 방식.",
+ },
+ "ja": {
+ "label": "ブースター",
+ "info": "トレーニング速度を向上させるためのアプローチ。",
+ },
+ },
+ "training_stage": {
+ "en": {
+ "label": "Stage",
+ "info": "The stage to perform in training.",
+ },
+ "ru": {
+ "label": "Этап",
+ "info": "Этап выполнения обучения.",
+ },
+ "zh": {
+ "label": "训练阶段",
+ "info": "目前采用的训练方式。",
+ },
+ "ko": {
+ "label": "학습 단계",
+ "info": "수행할 학습 방법.",
+ },
+ "ja": {
+ "label": "ステージ",
+ "info": "トレーニングで実行するステージ。",
+ },
+ },
+ "dataset_dir": {
+ "en": {
+ "label": "Data dir",
+ "info": "Path to the data directory.",
+ },
+ "ru": {
+ "label": "Директория данных",
+ "info": "Путь к директории данных.",
+ },
+ "zh": {
+ "label": "数据路径",
+ "info": "数据文件夹的路径。",
+ },
+ "ko": {
+ "label": "데이터 디렉토리",
+ "info": "데이터 디렉토리의 경로.",
+ },
+ "ja": {
+ "label": "データディレクトリ",
+ "info": "データディレクトリへのパス。",
+ },
+ },
+ "dataset": {
+ "en": {
+ "label": "Dataset",
+ },
+ "ru": {
+ "label": "Набор данных",
+ },
+ "zh": {
+ "label": "数据集",
+ },
+ "ko": {
+ "label": "데이터셋",
+ },
+ "ja": {
+ "label": "データセット",
+ },
+ },
+ "data_preview_btn": {
+ "en": {
+ "value": "Preview dataset",
+ },
+ "ru": {
+ "value": "Просмотреть набор данных",
+ },
+ "zh": {
+ "value": "预览数据集",
+ },
+ "ko": {
+ "value": "데이터셋 미리보기",
+ },
+ "ja": {
+ "value": "データセットをプレビュー",
+ },
+ },
+ "preview_count": {
+ "en": {
+ "label": "Count",
+ },
+ "ru": {
+ "label": "Количество",
+ },
+ "zh": {
+ "label": "数量",
+ },
+ "ko": {
+ "label": "개수",
+ },
+ "ja": {
+ "label": "カウント",
+ },
+ },
+ "page_index": {
+ "en": {
+ "label": "Page",
+ },
+ "ru": {
+ "label": "Страница",
+ },
+ "zh": {
+ "label": "页数",
+ },
+ "ko": {
+ "label": "페이지",
+ },
+ "ja": {
+ "label": "ページ",
+ },
+ },
+ "prev_btn": {
+ "en": {
+ "value": "Prev",
+ },
+ "ru": {
+ "value": "Предыдущая",
+ },
+ "zh": {
+ "value": "上一页",
+ },
+ "ko": {
+ "value": "이전",
+ },
+ "ja": {
+ "value": "前へ",
+ },
+ },
+ "next_btn": {
+ "en": {
+ "value": "Next",
+ },
+ "ru": {
+ "value": "Следующая",
+ },
+ "zh": {
+ "value": "下一页",
+ },
+ "ko": {
+ "value": "다음",
+ },
+ "ja": {
+ "value": "次へ",
+ },
+ },
+ "close_btn": {
+ "en": {
+ "value": "Close",
+ },
+ "ru": {
+ "value": "Закрыть",
+ },
+ "zh": {
+ "value": "关闭",
+ },
+ "ko": {
+ "value": "닫기",
+ },
+ "ja": {
+ "value": "閉じる",
+ },
+ },
+ "preview_samples": {
+ "en": {
+ "label": "Samples",
+ },
+ "ru": {
+ "label": "Примеры",
+ },
+ "zh": {
+ "label": "样例",
+ },
+ "ko": {
+ "label": "샘플",
+ },
+ "ja": {
+ "label": "サンプル",
+ },
+ },
+ "learning_rate": {
+ "en": {
+ "label": "Learning rate",
+ "info": "Initial learning rate for AdamW.",
+ },
+ "ru": {
+ "label": "Скорость обучения",
+ "info": "Начальная скорость обучения для AdamW.",
+ },
+ "zh": {
+ "label": "学习率",
+ "info": "AdamW 优化器的初始学习率。",
+ },
+ "ko": {
+ "label": "학습률",
+ "info": "AdamW의 초기 학습률.",
+ },
+ "ja": {
+ "label": "学習率",
+ "info": "AdamW の初期学習率。",
+ },
+ },
+ "num_train_epochs": {
+ "en": {
+ "label": "Epochs",
+ "info": "Total number of training epochs to perform.",
+ },
+ "ru": {
+ "label": "Эпохи",
+ "info": "Общее количество эпох обучения.",
+ },
+ "zh": {
+ "label": "训练轮数",
+ "info": "需要执行的训练总轮数。",
+ },
+ "ko": {
+ "label": "에포크",
+ "info": "수행할 총 학습 에포크 수.",
+ },
+ "ja": {
+ "label": "エポック数",
+ "info": "実行するトレーニングの総エポック数。",
+ },
+ },
+ "max_grad_norm": {
+ "en": {
+ "label": "Maximum gradient norm",
+ "info": "Norm for gradient clipping.",
+ },
+ "ru": {
+ "label": "Максимальная норма градиента",
+ "info": "Норма для обрезки градиента.",
+ },
+ "zh": {
+ "label": "最大梯度范数",
+ "info": "用于梯度裁剪的范数。",
+ },
+ "ko": {
+ "label": "최대 그레디언트 노름(norm)",
+ "info": "그레디언트 클리핑을 위한 노름(norm).",
+ },
+ "ja": {
+ "label": "最大勾配ノルム",
+ "info": "勾配クリッピングのためのノルム。",
+ },
+ },
+ "max_samples": {
+ "en": {
+ "label": "Max samples",
+ "info": "Maximum samples per dataset.",
+ },
+ "ru": {
+ "label": "Максимальное количество образцов",
+ "info": "Максимальное количество образцов на набор данных.",
+ },
+ "zh": {
+ "label": "最大样本数",
+ "info": "每个数据集的最大样本数。",
+ },
+ "ko": {
+ "label": "최대 샘플 수",
+ "info": "데이터셋 당 최대 샘플 수.",
+ },
+ "ja": {
+ "label": "最大サンプル数",
+ "info": "データセットごとの最大サンプル数。",
+ },
+ },
+ "compute_type": {
+ "en": {
+ "label": "Compute type",
+ "info": "Whether to use mixed precision training.",
+ },
+ "ru": {
+ "label": "Тип вычислений",
+ "info": "Использовать ли обучение смешанной точности.",
+ },
+ "zh": {
+ "label": "计算类型",
+ "info": "是否使用混合精度训练。",
+ },
+ "ko": {
+ "label": "연산 유형",
+ "info": "혼합 정밀도 훈련을 사용할지 여부.",
+ },
+ "ja": {
+ "label": "計算タイプ",
+ "info": "混合精度トレーニングを使用するかどうか。",
+ },
+ },
+ "cutoff_len": {
+ "en": {
+ "label": "Cutoff length",
+ "info": "Max tokens in input sequence.",
+ },
+ "ru": {
+ "label": "Длина обрезки",
+ "info": "Максимальное количество токенов во входной последовательности.",
+ },
+ "zh": {
+ "label": "截断长度",
+ "info": "输入序列分词后的最大长度。",
+ },
+ "ko": {
+ "label": "컷오프 길이",
+ "info": "입력 시퀀스의 최대 토큰 수.",
+ },
+ "ja": {
+ "label": "カットオフ長",
+ "info": "入力シーケンスの最大トークン数。",
+ },
+ },
+ "batch_size": {
+ "en": {
+ "label": "Batch size",
+ "info": "Number of samples processed on each GPU.",
+ },
+ "ru": {
+ "label": "Размер пакета",
+ "info": "Количество образцов для обработки на каждом GPU.",
+ },
+ "zh": {
+ "label": "批处理大小",
+ "info": "每个 GPU 处理的样本数量。",
+ },
+ "ko": {
+ "label": "배치 크기",
+ "info": "각 GPU에서 처리되는 샘플 수.",
+ },
+ "ja": {
+ "label": "バッチサイズ",
+ "info": "各 GPU で処理されるサンプル数。",
+ },
+ },
+ "gradient_accumulation_steps": {
+ "en": {
+ "label": "Gradient accumulation",
+ "info": "Number of steps for gradient accumulation.",
+ },
+ "ru": {
+ "label": "Накопление градиента",
+ "info": "Количество шагов накопления градиента.",
+ },
+ "zh": {
+ "label": "梯度累积",
+ "info": "梯度累积的步数。",
+ },
+ "ko": {
+ "label": "그레디언트 누적",
+ "info": "그레디언트 누적 단계 수.",
+ },
+ "ja": {
+ "label": "勾配累積",
+ "info": "勾配累積のステップ数。",
+ },
+ },
+ "val_size": {
+ "en": {
+ "label": "Val size",
+ "info": "Percentage of validation set from the entire dataset.",
+ },
+ "ru": {
+ "label": "Размер валидации",
+ "info": "Пропорция данных в наборе для разработки.",
+ },
+ "zh": {
+ "label": "验证集比例",
+ "info": "验证集占全部样本的百分比。",
+ },
+ "ko": {
+ "label": "검증 데이터셋 크기",
+ "info": "개발 데이터셋에서 검증 데이터의 비율.",
+ },
+ "ja": {
+ "label": "検証セットサイズ",
+ "info": "データセット全体に対する検証セットの割合。",
+ },
+ },
+ "lr_scheduler_type": {
+ "en": {
+ "label": "LR scheduler",
+ "info": "Name of the learning rate scheduler.",
+ },
+ "ru": {
+ "label": "Планировщик скорости обучения",
+ "info": "Название планировщика скорости обучения.",
+ },
+ "zh": {
+ "label": "学习率调节器",
+ "info": "学习率调度器的名称。",
+ },
+ "ko": {
+ "label": "LR 스케줄러",
+ "info": "학습률 스케줄러의 이름.",
+ },
+ "ja": {
+ "label": "学習率スケジューラ",
+ "info": "学習率スケジューラの名前。",
+ },
+ },
+ "extra_tab": {
+ "en": {
+ "label": "Extra configurations",
+ },
+ "ru": {
+ "label": "Дополнительные конфигурации",
+ },
+ "zh": {
+ "label": "其它参数设置",
+ },
+ "ko": {
+ "label": "추가 구성(configuration)",
+ },
+ "ja": {
+ "label": "追加設定",
+ },
+ },
+ "logging_steps": {
+ "en": {
+ "label": "Logging steps",
+ "info": "Number of steps between two logs.",
+ },
+ "ru": {
+ "label": "Шаги логирования",
+ "info": "Количество шагов между двумя записями в журнале.",
+ },
+ "zh": {
+ "label": "日志间隔",
+ "info": "每两次日志输出间的更新步数。",
+ },
+ "ko": {
+ "label": "로깅 스텝",
+ "info": "이전 로깅과 다음 로깅 간 스텝 수.",
+ },
+ "ja": {
+ "label": "ロギングステップ",
+ "info": "2 つのログ間のステップ数。",
+ },
+ },
+ "save_steps": {
+ "en": {
+ "label": "Save steps",
+ "info": "Number of steps between two checkpoints.",
+ },
+ "ru": {
+ "label": "Шаги сохранения",
+ "info": "Количество шагов между двумя контрольными точками.",
+ },
+ "zh": {
+ "label": "保存间隔",
+ "info": "每两次断点保存间的更新步数。",
+ },
+ "ko": {
+ "label": "저장 스텝",
+ "info": "이전 체크포인트와 다음 체크포인트 사이의 스텝 수.",
+ },
+ "ja": {
+ "label": "保存ステップ",
+ "info": "2 つのチェックポイント間のステップ数。",
+ },
+ },
+ "warmup_steps": {
+ "en": {
+ "label": "Warmup steps",
+ "info": "Number of steps used for warmup.",
+ },
+ "ru": {
+ "label": "Шаги прогрева",
+ "info": "Количество шагов, используемых для прогрева.",
+ },
+ "zh": {
+ "label": "预热步数",
+ "info": "学习率预热采用的步数。",
+ },
+ "ko": {
+ "label": "Warmup 스텝",
+ "info": "Warmup에 사용되는 스텝 수.",
+ },
+ "ja": {
+ "label": "ウォームアップステップ",
+ "info": "ウォームアップに使用されるステップ数。",
+ },
+ },
+ "neftune_alpha": {
+ "en": {
+ "label": "NEFTune alpha",
+ "info": "Magnitude of noise adding to embedding vectors.",
+ },
+ "ru": {
+ "label": "NEFTune alpha",
+ "info": "Величина шума, добавляемого к векторам вложений.",
+ },
+ "zh": {
+ "label": "NEFTune 噪声参数",
+ "info": "嵌入向量所添加的噪声大小。",
+ },
+ "ko": {
+ "label": "NEFTune 알파",
+ "info": "임베딩 벡터에 추가되는 노이즈의 크기.",
+ },
+ "ja": {
+ "label": "NEFTune alpha",
+ "info": "埋め込みベクトルに追加されるノイズの大きさ。",
+ },
+ },
+ "extra_args": {
+ "en": {
+ "label": "Extra arguments",
+ "info": "Extra arguments passed to the trainer in JSON format.",
+ },
+ "ru": {
+ "label": "Дополнительные аргументы",
+ "info": "Дополнительные аргументы, которые передаются тренеру в формате JSON.",
+ },
+ "zh": {
+ "label": "额外参数",
+ "info": "以 JSON 格式传递给训练器的额外参数。",
+ },
+ "ko": {
+ "label": "추가 인수",
+ "info": "JSON 형식으로 트레이너에게 전달할 추가 인수입니다.",
+ },
+ "ja": {
+ "label": "追加引数",
+ "info": "JSON 形式でトレーナーに渡される追加引数。",
+ },
+ },
+ "packing": {
+ "en": {
+ "label": "Pack sequences",
+ "info": "Pack sequences into samples of fixed length.",
+ },
+ "ru": {
+ "label": "Упаковка последовательностей",
+ "info": "Упаковка последовательностей в образцы фиксированной длины.",
+ },
+ "zh": {
+ "label": "序列打包",
+ "info": "将序列打包为等长样本。",
+ },
+ "ko": {
+ "label": "시퀀스 패킹",
+ "info": "고정된 길이의 샘플로 시퀀스를 패킹합니다.",
+ },
+ "ja": {
+ "label": "シーケンスパッキング",
+ "info": "シーケンスを固定長のサンプルにパッキングします。",
+ },
+ },
+ "neat_packing": {
+ "en": {
+ "label": "Use neat packing",
+ "info": "Avoid cross-attention between packed sequences.",
+ },
+ "ru": {
+ "label": "Используйте аккуратную упаковку",
+ "info": "избегайте перекрестного внимания между упакованными последовательностями.",
+ },
+ "zh": {
+ "label": "使用无污染打包",
+ "info": "避免打包后的序列产生交叉注意力。",
+ },
+ "ko": {
+ "label": "니트 패킹 사용",
+ "info": "패킹된 시퀀스 간의 크로스 어텐션을 피합니다.",
+ },
+ "ja": {
+ "label": "無汚染パッキングを使用",
+ "info": "パッキング後のシーケンス間のクロスアテンションを避けます。",
+ },
+ },
+ "train_on_prompt": {
+ "en": {
+ "label": "Train on prompt",
+ "info": "Disable the label mask on the prompt (only for SFT).",
+ },
+ "ru": {
+ "label": "Тренировка на подсказке",
+ "info": "Отключить маску меток на подсказке (только для SFT).",
+ },
+ "zh": {
+ "label": "学习提示词",
+ "info": "不在提示词的部分添加掩码(仅适用于 SFT)。",
+ },
+ "ko": {
+ "label": "프롬프트도 학습",
+ "info": "프롬프트에서 라벨 마스킹을 비활성화합니다 (SFT에만 해당).",
+ },
+ "ja": {
+ "label": "プロンプトで学習",
+ "info": "プロンプト部分にマスクを追加しない(SFT のみ)。",
+ },
+ },
+ "mask_history": {
+ "en": {
+ "label": "Mask history",
+ "info": "Train on the last turn only (only for SFT).",
+ },
+ "ru": {
+ "label": "История масок",
+ "info": "Тренироваться только на последнем шаге (только для SFT).",
+ },
+ "zh": {
+ "label": "不学习历史对话",
+ "info": "仅学习最后一轮对话(仅适用于 SFT)。",
+ },
+ "ko": {
+ "label": "히스토리 마스킹",
+ "info": "대화 데이터의 마지막 턴만 학습합니다 (SFT에만 해당).",
+ },
+ "ja": {
+ "label": "履歴をマスク",
+ "info": "最後のターンのみを学習する(SFT のみ)。",
+ },
+ },
+ "resize_vocab": {
+ "en": {
+ "label": "Resize token embeddings",
+ "info": "Resize the tokenizer vocab and the embedding layers.",
+ },
+ "ru": {
+ "label": "Изменение размера токенных эмбеддингов",
+ "info": "Изменить размер словаря токенизатора и слоев эмбеддинга.",
+ },
+ "zh": {
+ "label": "更改词表大小",
+ "info": "更改分词器词表和嵌入层的大小。",
+ },
+ "ko": {
+ "label": "토큰 임베딩의 사이즈 조정",
+ "info": "토크나이저 어휘와 임베딩 레이어의 크기를 조정합니다.",
+ },
+ "ja": {
+ "label": "トークン埋め込みのサイズ変更",
+ "info": "トークナイザーの語彙と埋め込み層のサイズを変更します。",
+ },
+ },
+ "use_llama_pro": {
+ "en": {
+ "label": "Enable LLaMA Pro",
+ "info": "Make the parameters in the expanded blocks trainable.",
+ },
+ "ru": {
+ "label": "Включить LLaMA Pro",
+ "info": "Сделать параметры в расширенных блоках обучаемыми.",
+ },
+ "zh": {
+ "label": "使用 LLaMA Pro",
+ "info": "仅训练块扩展后的参数。",
+ },
+ "ko": {
+ "label": "LLaMA Pro 사용",
+ "info": "확장된 블록의 매개변수를 학습 가능하게 만듭니다.",
+ },
+ "ja": {
+ "label": "LLaMA Pro を有効化",
+ "info": "拡張ブロックのパラメータのみをトレーニングします。",
+ },
+ },
+ "report_to": {
+ "en": {
+ "label": "Enable external logger",
+ "info": "Use TensorBoard or wandb to log experiment.",
+ },
+ "ru": {
+ "label": "Включить внешний регистратор",
+ "info": "Использовать TensorBoard или wandb для ведения журнала экспериментов.",
+ },
+ "zh": {
+ "label": "启用外部记录面板",
+ "info": "使用 TensorBoard 或 wandb 记录实验。",
+ },
+ "ko": {
+ "label": "외부 logger 활성화",
+ "info": "TensorBoard 또는 wandb를 사용하여 실험을 기록합니다.",
+ },
+ "ja": {
+ "label": "外部ロガーを有効化",
+ "info": "TensorBoard または wandb を使用して実験を記録します。",
+ },
+ },
+ "freeze_tab": {
+ "en": {
+ "label": "Freeze tuning configurations",
+ },
+ "ru": {
+ "label": "конфигурации для настройки заморозки",
+ },
+ "zh": {
+ "label": "部分参数微调设置",
+ },
+ "ko": {
+ "label": "Freeze tuning 설정",
+ },
+ "ja": {
+ "label": "フリーズチューニング設定",
+ },
+ },
+ "freeze_trainable_layers": {
+ "en": {
+ "label": "Trainable layers",
+ "info": "Number of the last(+)/first(-) hidden layers to be set as trainable.",
+ },
+ "ru": {
+ "label": "Обучаемые слои",
+ "info": "Количество последних (+)/первых (-) скрытых слоев, которые будут установлены как обучаемые.",
+ },
+ "zh": {
+ "label": "可训练层数",
+ "info": "最末尾(+)/最前端(-)可训练隐藏层的数量。",
+ },
+ "ko": {
+ "label": "학습 가능한 레이어",
+ "info": "학습 가능하게 설정할 마지막(+)/처음(-) 히든 레이어의 수.",
+ },
+ "ja": {
+ "label": "学習可能なレイヤー",
+ "info": "最後(+)/最初(-)の学習可能な隠れ層の数。",
+ },
+ },
+ "freeze_trainable_modules": {
+ "en": {
+ "label": "Trainable modules",
+ "info": "Name(s) of trainable modules. Use commas to separate multiple modules.",
+ },
+ "ru": {
+ "label": "Обучаемые модули",
+ "info": "Название обучаемых модулей. Используйте запятые для разделения нескольких модулей.",
+ },
+ "zh": {
+ "label": "可训练模块",
+ "info": "可训练模块的名称。使用英文逗号分隔多个名称。",
+ },
+ "ko": {
+ "label": "학습 가능한 모듈",
+ "info": "학습 가능한 모듈의 이름. 여러 모듈을 구분하려면 쉼표(,)를 사용하세요.",
+ },
+ "ja": {
+ "label": "学習可能なモジュール",
+ "info": "学習可能なモジュールの名前。複数のモジュールを区切るにはカンマを使用します。",
+ },
+ },
+ "freeze_extra_modules": {
+ "en": {
+ "label": "Extra modules (optional)",
+ "info": (
+ "Name(s) of modules apart from hidden layers to be set as trainable. "
+ "Use commas to separate multiple modules."
+ ),
+ },
+ "ru": {
+ "label": "Дополнительные модули (опционально)",
+ "info": (
+ "Имена модулей, кроме скрытых слоев, которые следует установить в качестве обучаемых. "
+ "Используйте запятые для разделения нескольких модулей."
+ ),
+ },
+ "zh": {
+ "label": "额外模块(非必填)",
+ "info": "除隐藏层以外的可训练模块名称。使用英文逗号分隔多个名称。",
+ },
+ "ko": {
+ "label": "추가 모듈 (선택 사항)",
+ "info": "히든 레이어 외에 학습 가능하게 설정할 모듈의 이름. 모듈 간에는 쉼표(,)로 구분하십시오.",
+ },
+ "ja": {
+ "label": "追加モジュール(オプション)",
+ "info": "隠れ層以外の学習可能なモジュールの名前。複数のモジュールを区切るにはカンマを使用します。",
+ },
+ },
+ "lora_tab": {
+ "en": {
+ "label": "LoRA configurations",
+ },
+ "ru": {
+ "label": "Конфигурации LoRA",
+ },
+ "zh": {
+ "label": "LoRA 参数设置",
+ },
+ "ko": {
+ "label": "LoRA 구성",
+ },
+ "ja": {
+ "label": "LoRA 設定",
+ },
+ },
+ "lora_rank": {
+ "en": {
+ "label": "LoRA rank",
+ "info": "The rank of LoRA matrices.",
+ },
+ "ru": {
+ "label": "Ранг матриц LoRA",
+ "info": "Ранг матриц LoRA.",
+ },
+ "zh": {
+ "label": "LoRA 秩",
+ "info": "LoRA 矩阵的秩大小。",
+ },
+ "ko": {
+ "label": "LoRA 랭크",
+ "info": "LoRA 행렬의 랭크.",
+ },
+ "ja": {
+ "label": "LoRA ランク",
+ "info": "LoRA 行列のランク。",
+ },
+ },
+ "lora_alpha": {
+ "en": {
+ "label": "LoRA alpha",
+ "info": "Lora scaling coefficient.",
+ },
+ "ru": {
+ "label": "LoRA alpha",
+ "info": "Коэффициент масштабирования LoRA.",
+ },
+ "zh": {
+ "label": "LoRA 缩放系数",
+ "info": "LoRA 缩放系数大小。",
+ },
+ "ko": {
+ "label": "LoRA 알파",
+ "info": "LoRA 스케일링 계수.",
+ },
+ "ja": {
+ "label": "LoRA alpha",
+ "info": "LoRA スケーリング係数。",
+ },
+ },
+ "lora_dropout": {
+ "en": {
+ "label": "LoRA dropout",
+ "info": "Dropout ratio of LoRA weights.",
+ },
+ "ru": {
+ "label": "Вероятность отсева LoRA",
+ "info": "Вероятность отсева весов LoRA.",
+ },
+ "zh": {
+ "label": "LoRA 随机丢弃",
+ "info": "LoRA 权重随机丢弃的概率。",
+ },
+ "ko": {
+ "label": "LoRA 드롭아웃",
+ "info": "LoRA 가중치의 드롭아웃 비율.",
+ },
+ "ja": {
+ "label": "LoRA ドロップアウト",
+ "info": "LoRA 重みのドロップアウト確率。",
+ },
+ },
+ "loraplus_lr_ratio": {
+ "en": {
+ "label": "LoRA+ LR ratio",
+ "info": "The LR ratio of the B matrices in LoRA.",
+ },
+ "ru": {
+ "label": "LoRA+ LR коэффициент",
+ "info": "Коэффициент LR матриц B в LoRA.",
+ },
+ "zh": {
+ "label": "LoRA+ 学习率比例",
+ "info": "LoRA+ 中 B 矩阵的学习率倍数。",
+ },
+ "ko": {
+ "label": "LoRA+ LR 비율",
+ "info": "LoRA에서 B 행렬의 LR 비율.",
+ },
+ "ja": {
+ "label": "LoRA+ LR 比率",
+ "info": "LoRA+ の B 行列の学習率倍率。",
+ },
+ },
+ "create_new_adapter": {
+ "en": {
+ "label": "Create new adapter",
+ "info": "Create a new adapter with randomly initialized weight upon the existing one.",
+ },
+ "ru": {
+ "label": "Создать новый адаптер",
+ "info": "Создать новый адаптер с случайной инициализацией веса на основе существующего.",
+ },
+ "zh": {
+ "label": "新建适配器",
+ "info": "在现有的适配器上创建一个随机初始化后的新适配器。",
+ },
+ "ko": {
+ "label": "새 어댑터 생성",
+ "info": "기존 어댑터 위에 무작위로 초기화된 가중치를 가진 새 어댑터를 생성합니다.",
+ },
+ "ja": {
+ "label": "新しいアダプターを作成",
+ "info": "既存のアダプター上にランダムに初期化された新しいアダプターを作成します。",
+ },
+ },
+ "use_rslora": {
+ "en": {
+ "label": "Use rslora",
+ "info": "Use the rank stabilization scaling factor for LoRA layer.",
+ },
+ "ru": {
+ "label": "Использовать rslora",
+ "info": "Использовать коэффициент масштабирования стабилизации ранга для слоя LoRA.",
+ },
+ "zh": {
+ "label": "使用 rslora",
+ "info": "对 LoRA 层使用秩稳定缩放方法。",
+ },
+ "ko": {
+ "label": "rslora 사용",
+ "info": "LoRA 레이어에 랭크 안정화 스케일링 계수를 사용합니다.",
+ },
+ "ja": {
+ "label": "rslora を使用",
+ "info": "LoRA 層にランク安定化スケーリング方法を使用します。",
+ },
+ },
+ "use_dora": {
+ "en": {
+ "label": "Use DoRA",
+ "info": "Use weight-decomposed LoRA.",
+ },
+ "ru": {
+ "label": "Используйте DoRA",
+ "info": "Используйте LoRA с декомпозицией весов.",
+ },
+ "zh": {
+ "label": "使用 DoRA",
+ "info": "使用权重分解的 LoRA。",
+ },
+ "ko": {
+ "label": "DoRA 사용",
+ "info": "가중치-분해 LoRA를 사용합니다.",
+ },
+ "ja": {
+ "label": "DoRA を使用",
+ "info": "重み分解された LoRA を使用します。",
+ },
+ },
+ "use_pissa": {
+ "en": {
+ "label": "Use PiSSA",
+ "info": "Use PiSSA method.",
+ },
+ "ru": {
+ "label": "используйте PiSSA",
+ "info": "Используйте метод PiSSA.",
+ },
+ "zh": {
+ "label": "使用 PiSSA",
+ "info": "使用 PiSSA 方法。",
+ },
+ "ko": {
+ "label": "PiSSA 사용",
+ "info": "PiSSA 방법을 사용합니다.",
+ },
+ "ja": {
+ "label": "PiSSA を使用",
+ "info": "PiSSA メソッドを使用します。",
+ },
+ },
+ "lora_target": {
+ "en": {
+ "label": "LoRA modules (optional)",
+ "info": "Name(s) of modules to apply LoRA. Use commas to separate multiple modules.",
+ },
+ "ru": {
+ "label": "Модули LoRA (опционально)",
+ "info": "Имена модулей для применения LoRA. Используйте запятые для разделения нескольких модулей.",
+ },
+ "zh": {
+ "label": "LoRA 作用模块(非必填)",
+ "info": "应用 LoRA 的模块名称。使用英文逗号分隔多个名称。",
+ },
+ "ko": {
+ "label": "LoRA 모듈 (선택 사항)",
+ "info": "LoRA를 적용할 모듈의 이름. 모듈 간에는 쉼표(,)로 구분하십시오.",
+ },
+ "ja": {
+ "label": "LoRA モジュール(オプション)",
+ "info": "LoRA を適用するモジュールの名前。複数のモジュールを区切るにはカンマを使用します。",
+ },
+ },
+ "additional_target": {
+ "en": {
+ "label": "Additional modules (optional)",
+ "info": (
+ "Name(s) of modules apart from LoRA layers to be set as trainable. "
+ "Use commas to separate multiple modules."
+ ),
+ },
+ "ru": {
+ "label": "Дополнительные модули (опционально)",
+ "info": (
+ "Имена модулей, кроме слоев LoRA, которые следует установить в качестве обучаемых. "
+ "Используйте запятые для разделения нескольких модулей."
+ ),
+ },
+ "zh": {
+ "label": "附加模块(非必填)",
+ "info": "除 LoRA 层以外的可训练模块名称。使用英文逗号分隔多个名称。",
+ },
+ "ko": {
+ "label": "추가 모듈 (선택 사항)",
+ "info": "LoRA 레이어 외에 학습 가능하게 설정할 모듈의 이름. 모듈 간에는 쉼표(,)로 구분하십시오.",
+ },
+ "ja": {
+ "label": "追加モジュール(オプション)",
+ "info": "LoRA 層以外の学習可能なモジュールの名前。複数のモジュールを区切るにはカンマを使用します。",
+ },
+ },
+ "rlhf_tab": {
+ "en": {
+ "label": "RLHF configurations",
+ },
+ "ru": {
+ "label": "Конфигурации RLHF",
+ },
+ "zh": {
+ "label": "RLHF 参数设置",
+ },
+ "ko": {
+ "label": "RLHF 구성",
+ },
+ "ja": {
+ "label": "RLHF 設定",
+ },
+ },
+ "pref_beta": {
+ "en": {
+ "label": "Beta value",
+ "info": "Value of the beta parameter in the loss.",
+ },
+ "ru": {
+ "label": "Бета значение",
+ "info": "Значение параметра бета в функции потерь.",
+ },
+ "zh": {
+ "label": "Beta 参数",
+ "info": "损失函数中 beta 超参数大小。",
+ },
+ "ko": {
+ "label": "베타 값",
+ "info": "손실 함수에서 베타 매개 변수의 값.",
+ },
+ "ja": {
+ "label": "Beta 値",
+ "info": "損失関数における beta ハイパーパラメータの値。",
+ },
+ },
+ "pref_ftx": {
+ "en": {
+ "label": "Ftx gamma",
+ "info": "The weight of SFT loss in the final loss.",
+ },
+ "ru": {
+ "label": "Ftx гамма",
+ "info": "Вес потери SFT в итоговой потере.",
+ },
+ "zh": {
+ "label": "Ftx gamma",
+ "info": "损失函数中 SFT 损失的权重大小。",
+ },
+ "ko": {
+ "label": "Ftx 감마",
+ "info": "최종 로스 함수에서 SFT 로스의 가중치.",
+ },
+ "ja": {
+ "label": "Ftx gamma",
+ "info": "損失関数における SFT 損失の重み。",
+ },
+ },
+ "pref_loss": {
+ "en": {
+ "label": "Loss type",
+ "info": "The type of the loss function.",
+ },
+ "ru": {
+ "label": "Тип потерь",
+ "info": "Тип функции потерь.",
+ },
+ "zh": {
+ "label": "损失类型",
+ "info": "损失函数的类型。",
+ },
+ "ko": {
+ "label": "로스 유형",
+ "info": "로스 함수의 유형.",
+ },
+ "ja": {
+ "label": "損失タイプ",
+ "info": "損失関数のタイプ。",
+ },
+ },
+ "reward_model": {
+ "en": {
+ "label": "Reward model",
+ "info": "Adapter of the reward model in PPO training.",
+ },
+ "ru": {
+ "label": "Модель вознаграждения",
+ "info": "Адаптер модели вознаграждения для обучения PPO.",
+ },
+ "zh": {
+ "label": "奖励模型",
+ "info": "PPO 训练中奖励模型的适配器路径。",
+ },
+ "ko": {
+ "label": "리워드 모델",
+ "info": "PPO 학습에서 사용할 리워드 모델의 어댑터.",
+ },
+ "ja": {
+ "label": "報酬モデル",
+ "info": "PPO トレーニングにおける報酬モデルのアダプター。",
+ },
+ },
+ "ppo_score_norm": {
+ "en": {
+ "label": "Score norm",
+ "info": "Normalizing scores in PPO training.",
+ },
+ "ru": {
+ "label": "Норма оценок",
+ "info": "Нормализация оценок в тренировке PPO.",
+ },
+ "zh": {
+ "label": "归一化分数",
+ "info": "PPO 训练中归一化奖励分数。",
+ },
+ "ko": {
+ "label": "스코어 정규화",
+ "info": "PPO 학습에서 스코어를 정규화합니다.",
+ },
+ "ja": {
+ "label": "スコア正規化",
+ "info": "PPO トレーニングにおける報酬スコアの正規化。",
+ },
+ },
+ "ppo_whiten_rewards": {
+ "en": {
+ "label": "Whiten rewards",
+ "info": "Whiten the rewards in PPO training.",
+ },
+ "ru": {
+ "label": "Белые вознаграждения",
+ "info": "Осветлите вознаграждения в обучении PPO.",
+ },
+ "zh": {
+ "label": "白化奖励",
+ "info": "PPO 训练中将奖励分数做白化处理。",
+ },
+ "ko": {
+ "label": "보상 백화",
+ "info": "PPO 훈련에서 보상을 백화(Whiten)합니다.",
+ },
+ "ja": {
+ "label": "報酬のホワイトニング",
+ "info": "PPO トレーニングにおいて報酬スコアをホワイトニング処理します。",
+ },
+ },
+ "galore_tab": {
+ "en": {
+ "label": "GaLore configurations",
+ },
+ "ru": {
+ "label": "Конфигурации GaLore",
+ },
+ "zh": {
+ "label": "GaLore 参数设置",
+ },
+ "ko": {
+ "label": "GaLore 구성",
+ },
+ "ja": {
+ "label": "GaLore 設定",
+ },
+ },
+ "use_galore": {
+ "en": {
+ "label": "Use GaLore",
+ "info": "Use [GaLore](https://github.com/jiaweizzhao/GaLore) optimizer.",
+ },
+ "ru": {
+ "label": "Использовать GaLore",
+ "info": "Используйте оптимизатор [GaLore](https://github.com/jiaweizzhao/GaLore).",
+ },
+ "zh": {
+ "label": "使用 GaLore",
+ "info": "使用 [GaLore](https://github.com/jiaweizzhao/GaLore) 优化器。",
+ },
+ "ko": {
+ "label": "GaLore 사용",
+ "info": "[GaLore](https://github.com/jiaweizzhao/GaLore) 최적화를 사용하세요.",
+ },
+ "ja": {
+ "label": "GaLore を使用",
+ "info": "[GaLore](https://github.com/jiaweizzhao/GaLore) オプティマイザーを使用します。",
+ },
+ },
+ "galore_rank": {
+ "en": {
+ "label": "GaLore rank",
+ "info": "The rank of GaLore gradients.",
+ },
+ "ru": {
+ "label": "Ранг GaLore",
+ "info": "Ранг градиентов GaLore.",
+ },
+ "zh": {
+ "label": "GaLore 秩",
+ "info": "GaLore 梯度的秩大小。",
+ },
+ "ko": {
+ "label": "GaLore 랭크",
+ "info": "GaLore 그레디언트의 랭크.",
+ },
+ "ja": {
+ "label": "GaLore ランク",
+ "info": "GaLore 勾配のランク。",
+ },
+ },
+ "galore_update_interval": {
+ "en": {
+ "label": "Update interval",
+ "info": "Number of steps to update the GaLore projection.",
+ },
+ "ru": {
+ "label": "Интервал обновления",
+ "info": "Количество шагов для обновления проекции GaLore.",
+ },
+ "zh": {
+ "label": "更新间隔",
+ "info": "相邻两次投影更新的步数。",
+ },
+ "ko": {
+ "label": "업데이트 간격",
+ "info": "GaLore 프로젝션을 업데이트할 간격의 스텝 수.",
+ },
+ "ja": {
+ "label": "更新間隔",
+ "info": "隣接する 2 回の投影更新間のステップ数。",
+ },
+ },
+ "galore_scale": {
+ "en": {
+ "label": "GaLore scale",
+ "info": "GaLore scaling coefficient.",
+ },
+ "ru": {
+ "label": "LoRA Alpha",
+ "info": "Коэффициент масштабирования GaLore.",
+ },
+ "zh": {
+ "label": "GaLore 缩放系数",
+ "info": "GaLore 缩放系数大小。",
+ },
+ "ko": {
+ "label": "GaLore 스케일",
+ "info": "GaLore 스케일링 계수.",
+ },
+ "ja": {
+ "label": "GaLore スケール",
+ "info": "GaLore スケーリング係数。",
+ },
+ },
+ "galore_target": {
+ "en": {
+ "label": "GaLore modules",
+ "info": "Name(s) of modules to apply GaLore. Use commas to separate multiple modules.",
+ },
+ "ru": {
+ "label": "Модули GaLore",
+ "info": "Имена модулей для применения GaLore. Используйте запятые для разделения нескольких модулей.",
+ },
+ "zh": {
+ "label": "GaLore 作用模块",
+ "info": "应用 GaLore 的模块名称。使用英文逗号分隔多个名称。",
+ },
+ "ko": {
+ "label": "GaLore 모듈",
+ "info": "GaLore를 적용할 모듈의 이름. 모듈 간에는 쉼표(,)로 구분하십시오.",
+ },
+ "ja": {
+ "label": "GaLore モジュール",
+ "info": "GaLore を適用するモジュールの名前。複数のモジュールを区切るにはカンマを使用します。",
+ },
+ },
+ "apollo_tab": {
+ "en": {
+ "label": "APOLLO configurations",
+ },
+ "ru": {
+ "label": "Конфигурации APOLLO",
+ },
+ "zh": {
+ "label": "APOLLO 参数设置",
+ },
+ "ko": {
+ "label": "APOLLO 구성",
+ },
+ "ja": {
+ "label": "APOLLO 設定",
+ },
+ },
+ "use_apollo": {
+ "en": {
+ "label": "Use APOLLO",
+ "info": "Use [APOLLO](https://github.com/zhuhanqing/APOLLO) optimizer.",
+ },
+ "ru": {
+ "label": "Использовать APOLLO",
+ "info": "Используйте оптимизатор [APOLLO](https://github.com/zhuhanqing/APOLLO).",
+ },
+ "zh": {
+ "label": "使用 APOLLO",
+ "info": "使用 [APOLLO](https://github.com/zhuhanqing/APOLLO) 优化器。",
+ },
+ "ko": {
+ "label": "APOLLO 사용",
+ "info": "[APOLLO](https://github.com/zhuhanqing/APOLLO) 최적화를 사용하세요.",
+ },
+ "ja": {
+ "label": "APOLLO を使用",
+ "info": "[APOLLO](https://github.com/zhuhanqing/APOLLO) オプティマイザーを使用します。",
+ },
+ },
+ "apollo_rank": {
+ "en": {
+ "label": "APOLLO rank",
+ "info": "The rank of APOLLO gradients.",
+ },
+ "ru": {
+ "label": "Ранг APOLLO",
+ "info": "Ранг градиентов APOLLO.",
+ },
+ "zh": {
+ "label": "APOLLO 秩",
+ "info": "APOLLO 梯度的秩大小。",
+ },
+ "ko": {
+ "label": "APOLLO 랭크",
+ "info": "APOLLO 그레디언트의 랭크.",
+ },
+ "ja": {
+ "label": "APOLLO ランク",
+ "info": "APOLLO 勾配のランク。",
+ },
+ },
+ "apollo_update_interval": {
+ "en": {
+ "label": "Update interval",
+ "info": "Number of steps to update the APOLLO projection.",
+ },
+ "ru": {
+ "label": "Интервал обновления",
+ "info": "Количество шагов для обновления проекции APOLLO.",
+ },
+ "zh": {
+ "label": "更新间隔",
+ "info": "相邻两次投影更新的步数。",
+ },
+ "ko": {
+ "label": "업데이트 간격",
+ "info": "APOLLO 프로젝션을 업데이트할 간격의 스텝 수.",
+ },
+ "ja": {
+ "label": "更新間隔",
+ "info": "隣接する 2 回の投影更新間のステップ数。",
+ },
+ },
+ "apollo_scale": {
+ "en": {
+ "label": "APOLLO scale",
+ "info": "APOLLO scaling coefficient.",
+ },
+ "ru": {
+ "label": "LoRA Alpha",
+ "info": "Коэффициент масштабирования APOLLO.",
+ },
+ "zh": {
+ "label": "APOLLO 缩放系数",
+ "info": "APOLLO 缩放系数大小。",
+ },
+ "ko": {
+ "label": "APOLLO 스케일",
+ "info": "APOLLO 스케일링 계수.",
+ },
+ "ja": {
+ "label": "APOLLO スケール",
+ "info": "APOLLO スケーリング係数。",
+ },
+ },
+ "apollo_target": {
+ "en": {
+ "label": "APOLLO modules",
+ "info": "Name(s) of modules to apply APOLLO. Use commas to separate multiple modules.",
+ },
+ "ru": {
+ "label": "Модули APOLLO",
+ "info": "Имена модулей для применения APOLLO. Используйте запятые для разделения нескольких модулей.",
+ },
+ "zh": {
+ "label": "APOLLO 作用模块",
+ "info": "应用 APOLLO 的模块名称。使用英文逗号分隔多个名称。",
+ },
+ "ko": {
+ "label": "APOLLO 모듈",
+ "info": "APOLLO를 적용할 모듈의 이름. 모듈 간에는 쉼표(,)로 구분하십시오.",
+ },
+ "ja": {
+ "label": "APOLLO モジュール",
+ "info": "APOLLO を適用するモジュールの名前。複数のモジュールを区切るにはカンマを使用します。",
+ },
+ },
+ "badam_tab": {
+ "en": {
+ "label": "BAdam configurations",
+ },
+ "ru": {
+ "label": "Конфигурации BAdam",
+ },
+ "zh": {
+ "label": "BAdam 参数设置",
+ },
+ "ko": {
+ "label": "BAdam 설정",
+ },
+ "ja": {
+ "label": "BAdam 設定",
+ },
+ },
+ "use_badam": {
+ "en": {
+ "label": "Use BAdam",
+ "info": "Enable the [BAdam](https://github.com/Ledzy/BAdam) optimizer.",
+ },
+ "ru": {
+ "label": "Использовать BAdam",
+ "info": "Включите оптимизатор [BAdam](https://github.com/Ledzy/BAdam).",
+ },
+ "zh": {
+ "label": "使用 BAdam",
+ "info": "使用 [BAdam](https://github.com/Ledzy/BAdam) 优化器。",
+ },
+ "ko": {
+ "label": "BAdam 사용",
+ "info": "[BAdam](https://github.com/Ledzy/BAdam) 옵티마이저를 사용합니다.",
+ },
+ "ja": {
+ "label": "BAdam を使用",
+ "info": "[BAdam](https://github.com/Ledzy/BAdam) オプティマイザーを使用します。",
+ },
+ },
+ "badam_mode": {
+ "en": {
+ "label": "BAdam mode",
+ "info": "Whether to use layer-wise or ratio-wise BAdam optimizer.",
+ },
+ "ru": {
+ "label": "Режим BAdam",
+ "info": "Использовать ли оптимизатор BAdam с послоевой или пропорциональной настройкой.",
+ },
+ "zh": {
+ "label": "BAdam 模式",
+ "info": "使用 layer-wise 或 ratio-wise BAdam 优化器。",
+ },
+ "ko": {
+ "label": "BAdam 모드",
+ "info": "레이어-BAdam 옵티마이저인지 비율-BAdam 옵티마이저인지.",
+ },
+ "ja": {
+ "label": "BAdam モード",
+ "info": "layer-wise または ratio-wise BAdam オプティマイザーを使用します。",
+ },
+ },
+ "badam_switch_mode": {
+ "en": {
+ "label": "Switch mode",
+ "info": "The strategy of picking block to update for layer-wise BAdam.",
+ },
+ "ru": {
+ "label": "Режим переключения",
+ "info": "Стратегия выбора блока для обновления для послойного BAdam.",
+ },
+ "zh": {
+ "label": "切换策略",
+ "info": "Layer-wise BAdam 优化器的块切换策略。",
+ },
+ "ko": {
+ "label": "스위치 모드",
+ "info": "레이어-BAdam을 위한 블록 선택 전략.",
+ },
+ "ja": {
+ "label": "切り替え戦略",
+ "info": "Layer-wise BAdam オプティマイザーのブロック切り替え戦略。",
+ },
+ },
+ "badam_switch_interval": {
+ "en": {
+ "label": "Switch interval",
+ "info": "Number of steps to update the block for layer-wise BAdam.",
+ },
+ "ru": {
+ "label": "Интервал переключения",
+ "info": "количество шагов для обновления блока для пошагового BAdam.",
+ },
+ "zh": {
+ "label": "切换频率",
+ "info": "Layer-wise BAdam 优化器的块切换频率。",
+ },
+ "ko": {
+ "label": "전환 간격",
+ "info": "레이어-BAdam을 위한 블록 업데이트 간 스텝 수.",
+ },
+ "ja": {
+ "label": "切り替え頻度",
+ "info": "Layer-wise BAdam オプティマイザーのブロック切り替え頻度。",
+ },
+ },
+ "badam_update_ratio": {
+ "en": {
+ "label": "Update ratio",
+ "info": "The ratio of the update for ratio-wise BAdam.",
+ },
+ "ru": {
+ "label": "Коэффициент обновления",
+ "info": "Коэффициент обновления для BAdam с учётом соотношений.",
+ },
+ "zh": {
+ "label": "Block 更新比例",
+ "info": "Ratio-wise BAdam 优化器的更新比例。",
+ },
+ "ko": {
+ "label": "업데이트 비율",
+ "info": "비율-BAdam의 업데이트 비율.",
+ },
+ "ja": {
+ "label": "ブロック更新比率",
+ "info": "Ratio-wise BAdam オプティマイザーの更新比率。",
+ },
+ },
+ "swanlab_tab": {
+ "en": {
+ "label": "SwanLab configurations",
+ },
+ "ru": {
+ "label": "Конфигурации SwanLab",
+ },
+ "zh": {
+ "label": "SwanLab 参数设置",
+ },
+ "ko": {
+ "label": "SwanLab 설정",
+ },
+ "ja": {
+ "label": "SwanLab 設定",
+ },
+ },
+ "use_swanlab": {
+ "en": {
+ "label": "Use SwanLab",
+ "info": "Enable [SwanLab](https://swanlab.cn/) for experiment tracking and visualization.",
+ },
+ "ru": {
+ "label": "Использовать SwanLab",
+ "info": "Включить [SwanLab](https://swanlab.cn/) для отслеживания и визуализации экспериментов.",
+ },
+ "zh": {
+ "label": "使用 SwanLab",
+ "info": "启用 [SwanLab](https://swanlab.cn/) 进行实验跟踪和可视化。",
+ },
+ "ko": {
+ "label": "SwanLab 사용",
+ "info": "[SwanLab](https://swanlab.cn/) 를 사용하여 실험을 추적하고 시각화합니다.",
+ },
+ "ja": {
+ "label": "SwanLab を使用",
+ "info": "[SwanLab](https://swanlab.cn/) を有効にして実験の追跡と可視化を行います。",
+ },
+ },
+ "swanlab_project": {
+ "en": {
+ "label": "SwanLab project",
+ },
+ "ru": {
+ "label": "SwanLab Проект",
+ },
+ "zh": {
+ "label": "SwanLab 项目名",
+ },
+ "ko": {
+ "label": "SwanLab 프로젝트",
+ },
+ "ja": {
+ "label": "SwanLab プロジェクト",
+ },
+ },
+ "swanlab_run_name": {
+ "en": {
+ "label": "SwanLab experiment name (optional)",
+ },
+ "ru": {
+ "label": "SwanLab Имя эксперимента (опционально)",
+ },
+ "zh": {
+ "label": "SwanLab 实验名(非必填)",
+ },
+ "ko": {
+ "label": "SwanLab 실험 이름 (선택 사항)",
+ },
+ "ja": {
+ "label": "SwanLab 実験名(オプション)",
+ },
+ },
+ "swanlab_workspace": {
+ "en": {
+ "label": "SwanLab workspace (optional)",
+ "info": "Workspace for SwanLab. Defaults to the personal workspace.",
+ },
+ "ru": {
+ "label": "SwanLab Рабочая область (опционально)",
+ "info": "Рабочая область SwanLab, если не заполнено, то по умолчанию в личной рабочей области.",
+ },
+ "zh": {
+ "label": "SwanLab 工作区(非必填)",
+ "info": "SwanLab 的工作区,默认在个人工作区下。",
+ },
+ "ko": {
+ "label": "SwanLab 작업 영역 (선택 사항)",
+ "info": "SwanLab 조직의 작업 영역, 비어 있으면 기본적으로 개인 작업 영역에 있습니다.",
+ },
+ "ja": {
+ "label": "SwanLab ワークスペース(オプション)",
+ "info": "SwanLab のワークスペース。デフォルトでは個人ワークスペースです。",
+ },
+ },
+ "swanlab_api_key": {
+ "en": {
+ "label": "SwanLab API key (optional)",
+ "info": "API key for SwanLab.",
+ },
+ "ru": {
+ "label": "SwanLab API ключ (опционально)",
+ "info": "API ключ для SwanLab.",
+ },
+ "zh": {
+ "label": "SwanLab API 密钥(非必填)",
+ "info": "用于在编程环境登录 SwanLab,已登录则无需填写。",
+ },
+ "ko": {
+ "label": "SwanLab API 키 (선택 사항)",
+ "info": "SwanLab의 API 키.",
+ },
+ "ja": {
+ "label": "SwanLab API キー(オプション)",
+ "info": "SwanLab の API キー。",
+ },
+ },
+ "swanlab_mode": {
+ "en": {
+ "label": "SwanLab mode",
+ "info": "Cloud or offline version.",
+ },
+ "ru": {
+ "label": "SwanLab Режим",
+ "info": "Версия в облаке или локальная версия.",
+ },
+ "zh": {
+ "label": "SwanLab 模式",
+ "info": "使用云端版或离线版 SwanLab。",
+ },
+ "ko": {
+ "label": "SwanLab 모드",
+ "info": "클라우드 버전 또는 오프라인 버전.",
+ },
+ "ja": {
+ "label": "SwanLab モード",
+ "info": "クラウド版またはオフライン版 SwanLab を使用します。",
+ },
+ },
+ "swanlab_logdir": {
+ "en": {
+ "label": "SwanLab log directory",
+ "info": "The log directory for SwanLab.",
+ },
+ "ru": {
+ "label": "SwanLab 로그 디렉토리",
+ "info": "SwanLab의 로그 디렉토리.",
+ },
+ "zh": {
+ "label": "SwanLab 日志目录",
+ "info": "SwanLab 的日志目录。",
+ },
+ "ko": {
+ "label": "SwanLab 로그 디렉토리",
+ "info": "SwanLab의 로그 디렉토리.",
+ },
+ "ja": {
+ "label": "SwanLab ログ ディレクトリ",
+ "info": "SwanLab のログ ディレクトリ。",
+ },
+ },
+ "cmd_preview_btn": {
+ "en": {
+ "value": "Preview command",
+ },
+ "ru": {
+ "value": "Просмотр команды",
+ },
+ "zh": {
+ "value": "预览命令",
+ },
+ "ko": {
+ "value": "명령어 미리보기",
+ },
+ "ja": {
+ "value": "コマンドをプレビュー",
+ },
+ },
+ "arg_save_btn": {
+ "en": {
+ "value": "Save arguments",
+ },
+ "ru": {
+ "value": "Сохранить аргументы",
+ },
+ "zh": {
+ "value": "保存训练参数",
+ },
+ "ko": {
+ "value": "Argument 저장",
+ },
+ "ja": {
+ "value": "引数を保存",
+ },
+ },
+ "arg_load_btn": {
+ "en": {
+ "value": "Load arguments",
+ },
+ "ru": {
+ "value": "Загрузить аргументы",
+ },
+ "zh": {
+ "value": "载入训练参数",
+ },
+ "ko": {
+ "value": "Argument 불러오기",
+ },
+ "ja": {
+ "value": "引数を読み込む",
+ },
+ },
+ "start_btn": {
+ "en": {
+ "value": "Start",
+ },
+ "ru": {
+ "value": "Начать",
+ },
+ "zh": {
+ "value": "开始",
+ },
+ "ko": {
+ "value": "시작",
+ },
+ "ja": {
+ "value": "開始",
+ },
+ },
+ "stop_btn": {
+ "en": {
+ "value": "Abort",
+ },
+ "ru": {
+ "value": "Прервать",
+ },
+ "zh": {
+ "value": "中断",
+ },
+ "ko": {
+ "value": "중단",
+ },
+ "ja": {
+ "value": "中断",
+ },
+ },
+ "output_dir": {
+ "en": {
+ "label": "Output dir",
+ "info": "Directory for saving results.",
+ },
+ "ru": {
+ "label": "Выходной каталог",
+ "info": "Каталог для сохранения результатов.",
+ },
+ "zh": {
+ "label": "输出目录",
+ "info": "保存结果的路径。",
+ },
+ "ko": {
+ "label": "출력 디렉토리",
+ "info": "결과를 저장할 디렉토리.",
+ },
+ "ja": {
+ "label": "出力ディレクトリ",
+ "info": "結果を保存するパス。",
+ },
+ },
+ "config_path": {
+ "en": {
+ "label": "Config path",
+ "info": "Path to config saving arguments.",
+ },
+ "ru": {
+ "label": "Путь к конфигурации",
+ "info": "Путь для сохранения аргументов конфигурации.",
+ },
+ "zh": {
+ "label": "配置路径",
+ "info": "保存训练参数的配置文件路径。",
+ },
+ "ko": {
+ "label": "설정 경로",
+ "info": "Arguments 저장 파일 경로.",
+ },
+ "ja": {
+ "label": "設定パス",
+ "info": "トレーニングパラメータを保存する設定ファイルのパス。",
+ },
+ },
+ "device_count": {
+ "en": {
+ "label": "Device count",
+ "info": "Number of devices available.",
+ },
+ "ru": {
+ "label": "Количество устройств",
+ "info": "Количество доступных устройств.",
+ },
+ "zh": {
+ "label": "设备数量",
+ "info": "当前可用的运算设备数。",
+ },
+ "ko": {
+ "label": "디바이스 수",
+ "info": "사용 가능한 디바이스 수.",
+ },
+ "ja": {
+ "label": "デバイス数",
+ "info": "現在利用可能な演算デバイス数。",
+ },
+ },
+ "ds_stage": {
+ "en": {
+ "label": "DeepSpeed stage",
+ "info": "DeepSpeed stage for distributed training.",
+ },
+ "ru": {
+ "label": "Этап DeepSpeed",
+ "info": "Этап DeepSpeed для распределенного обучения.",
+ },
+ "zh": {
+ "label": "DeepSpeed stage",
+ "info": "多卡训练的 DeepSpeed stage。",
+ },
+ "ko": {
+ "label": "DeepSpeed 단계",
+ "info": "분산 학습을 위한 DeepSpeed 단계.",
+ },
+ "ja": {
+ "label": "DeepSpeed stage",
+ "info": "マルチ GPU トレーニングの DeepSpeed stage。",
+ },
+ },
+ "ds_offload": {
+ "en": {
+ "label": "Enable offload",
+ "info": "Enable DeepSpeed offload (slow down training).",
+ },
+ "ru": {
+ "label": "Включить выгрузку",
+ "info": "включить выгрузку DeepSpeed (замедлит обучение).",
+ },
+ "zh": {
+ "label": "使用 offload",
+ "info": "使用 DeepSpeed offload(会减慢速度)。",
+ },
+ "ko": {
+ "label": "오프로딩 활성화",
+ "info": "DeepSpeed 오프로딩 활성화 (훈련 속도 느려짐).",
+ },
+ "ja": {
+ "label": "オフロードを使用",
+ "info": "DeepSpeed オフロードを使用します(速度が遅くなります)。",
+ },
+ },
+ "output_box": {
+ "en": {
+ "value": "Ready.",
+ },
+ "ru": {
+ "value": "Готово.",
+ },
+ "zh": {
+ "value": "准备就绪。",
+ },
+ "ko": {
+ "value": "준비 완료.",
+ },
+ "ja": {
+ "value": "準備完了。",
+ },
+ },
+ "loss_viewer": {
+ "en": {
+ "label": "Loss",
+ },
+ "ru": {
+ "label": "Потери",
+ },
+ "zh": {
+ "label": "损失",
+ },
+ "ko": {
+ "label": "손실",
+ },
+ "ja": {
+ "label": "損失",
+ },
+ },
+ "predict": {
+ "en": {
+ "label": "Save predictions",
+ },
+ "ru": {
+ "label": "Сохранить предсказания",
+ },
+ "zh": {
+ "label": "保存预测结果",
+ },
+ "ko": {
+ "label": "예측 결과 저장",
+ },
+ "ja": {
+ "label": "予測結果を保存",
+ },
+ },
+ "infer_backend": {
+ "en": {
+ "label": "Inference engine",
+ },
+ "ru": {
+ "label": "Инференс движок",
+ },
+ "zh": {
+ "label": "推理引擎",
+ },
+ "ko": {
+ "label": "추론 엔진",
+ },
+ "ja": {
+ "label": "推論エンジン",
+ },
+ },
+ "infer_dtype": {
+ "en": {
+ "label": "Inference data type",
+ },
+ "ru": {
+ "label": "Тип данных для вывода",
+ },
+ "zh": {
+ "label": "推理数据类型",
+ },
+ "ko": {
+ "label": "추론 데이터 유형",
+ },
+ "ja": {
+ "label": "推論データタイプ",
+ },
+ },
+ "load_btn": {
+ "en": {
+ "value": "Load model",
+ },
+ "ru": {
+ "value": "Загрузить модель",
+ },
+ "zh": {
+ "value": "加载模型",
+ },
+ "ko": {
+ "value": "모델 불러오기",
+ },
+ "ja": {
+ "value": "モデルを読み込む",
+ },
+ },
+ "unload_btn": {
+ "en": {
+ "value": "Unload model",
+ },
+ "ru": {
+ "value": "Выгрузить модель",
+ },
+ "zh": {
+ "value": "卸载模型",
+ },
+ "ko": {
+ "value": "모델 언로드",
+ },
+ "ja": {
+ "value": "モデルをアンロード",
+ },
+ },
+ "info_box": {
+ "en": {
+ "value": "Model unloaded, please load a model first.",
+ },
+ "ru": {
+ "value": "Модель не загружена, загрузите модель сначала.",
+ },
+ "zh": {
+ "value": "模型未加载,请先加载模型。",
+ },
+ "ko": {
+ "value": "모델이 언로드되었습니다. 모델을 먼저 불러오십시오.",
+ },
+ "ja": {
+ "value": "モデルがロードされていません。最初にモデルをロードしてください。",
+ },
+ },
+ "role": {
+ "en": {
+ "label": "Role",
+ },
+ "ru": {
+ "label": "Роль",
+ },
+ "zh": {
+ "label": "角色",
+ },
+ "ko": {
+ "label": "역할",
+ },
+ "ja": {
+ "label": "役割",
+ },
+ },
+ "system": {
+ "en": {
+ "placeholder": "System prompt (optional)",
+ },
+ "ru": {
+ "placeholder": "Системный запрос (по желанию)",
+ },
+ "zh": {
+ "placeholder": "系统提示词(非必填)",
+ },
+ "ko": {
+ "placeholder": "시스템 프롬프트 (선택 사항)",
+ },
+ "ja": {
+ "placeholder": "システムプロンプト(オプション)",
+ },
+ },
+ "tools": {
+ "en": {
+ "placeholder": "Tools (optional)",
+ },
+ "ru": {
+ "placeholder": "Инструменты (по желанию)",
+ },
+ "zh": {
+ "placeholder": "工具列表(非必填)",
+ },
+ "ko": {
+ "placeholder": "툴 (선택 사항)",
+ },
+ "ja": {
+ "placeholder": "ツールリスト(オプション)",
+ },
+ },
+ "image": {
+ "en": {
+ "label": "Image (optional)",
+ },
+ "ru": {
+ "label": "Изображение (по желанию)",
+ },
+ "zh": {
+ "label": "图像(非必填)",
+ },
+ "ko": {
+ "label": "이미지 (선택 사항)",
+ },
+ "ja": {
+ "label": "画像(オプション)",
+ },
+ },
+ "video": {
+ "en": {
+ "label": "Video (optional)",
+ },
+ "ru": {
+ "label": "Видео (по желанию)",
+ },
+ "zh": {
+ "label": "视频(非必填)",
+ },
+ "ko": {
+ "label": "비디오 (선택 사항)",
+ },
+ "ja": {
+ "label": "動画(オプション)",
+ },
+ },
+ "query": {
+ "en": {
+ "placeholder": "Input...",
+ },
+ "ru": {
+ "placeholder": "Ввод...",
+ },
+ "zh": {
+ "placeholder": "输入...",
+ },
+ "ko": {
+ "placeholder": "입력...",
+ },
+ "ja": {
+ "placeholder": "入力...",
+ },
+ },
+ "submit_btn": {
+ "en": {
+ "value": "Submit",
+ },
+ "ru": {
+ "value": "Отправить",
+ },
+ "zh": {
+ "value": "提交",
+ },
+ "ko": {
+ "value": "제출",
+ },
+ "ja": {
+ "value": "送信",
+ },
+ },
+ "max_length": {
+ "en": {
+ "label": "Maximum length",
+ },
+ "ru": {
+ "label": "Максимальная длина",
+ },
+ "zh": {
+ "label": "最大长度",
+ },
+ "ko": {
+ "label": "최대 길이",
+ },
+ "ja": {
+ "label": "最大長",
+ },
+ },
+ "max_new_tokens": {
+ "en": {
+ "label": "Maximum new tokens",
+ },
+ "ru": {
+ "label": "Максимальное количество новых токенов",
+ },
+ "zh": {
+ "label": "最大生成长度",
+ },
+ "ko": {
+ "label": "응답의 최대 길이",
+ },
+ "ja": {
+ "label": "最大生成長",
+ },
+ },
+ "top_p": {
+ "en": {
+ "label": "Top-p",
+ },
+ "ru": {
+ "label": "Лучшие-p",
+ },
+ "zh": {
+ "label": "Top-p 采样值",
+ },
+ "ko": {
+ "label": "Top-p",
+ },
+ "ja": {
+ "label": "Top-p",
+ },
+ },
+ "temperature": {
+ "en": {
+ "label": "Temperature",
+ },
+ "ru": {
+ "label": "Температура",
+ },
+ "zh": {
+ "label": "温度系数",
+ },
+ "ko": {
+ "label": "온도",
+ },
+ "ja": {
+ "label": "温度",
+ },
+ },
+ "skip_special_tokens": {
+ "en": {
+ "label": "Skip special tokens",
+ },
+ "ru": {
+ "label": "Пропустить специальные токены",
+ },
+ "zh": {
+ "label": "跳过特殊 token",
+ },
+ "ko": {
+ "label": "스페셜 토큰을 건너뛰기",
+ },
+ "ja": {
+ "label": "スペシャルトークンをスキップ",
+ },
+ },
+ "escape_html": {
+ "en": {
+ "label": "Escape HTML tags",
+ },
+ "ru": {
+ "label": "Исключить HTML теги",
+ },
+ "zh": {
+ "label": "转义 HTML 标签",
+ },
+ "ko": {
+ "label": "HTML 태그 이스케이프",
+ },
+ "ja": {
+ "label": "HTML タグをエスケープ",
+ },
+ },
+ "clear_btn": {
+ "en": {
+ "value": "Clear history",
+ },
+ "ru": {
+ "value": "Очистить историю",
+ },
+ "zh": {
+ "value": "清空历史",
+ },
+ "ko": {
+ "value": "기록 지우기",
+ },
+ "ja": {
+ "value": "履歴をクリア",
+ },
+ },
+ "export_size": {
+ "en": {
+ "label": "Max shard size (GB)",
+ "info": "The maximum size for a model file.",
+ },
+ "ru": {
+ "label": "Максимальный размер фрагмента (ГБ)",
+ "info": "Максимальный размер файла модели.",
+ },
+ "zh": {
+ "label": "最大分块大小(GB)",
+ "info": "单个模型文件的最大大小。",
+ },
+ "ko": {
+ "label": "최대 샤드 크기 (GB)",
+ "info": "모델 파일의 최대 크기.",
+ },
+ "ja": {
+ "label": "最大シャードサイズ(GB)",
+ "info": "単一のモデルファイルの最大サイズ。",
+ },
+ },
+ "export_quantization_bit": {
+ "en": {
+ "label": "Export quantization bit.",
+ "info": "Quantizing the exported model.",
+ },
+ "ru": {
+ "label": "Экспорт бита квантования",
+ "info": "Квантование экспортируемой модели.",
+ },
+ "zh": {
+ "label": "导出量化等级",
+ "info": "量化导出模型。",
+ },
+ "ko": {
+ "label": "양자화 비트 내보내기",
+ "info": "내보낸 모델의 양자화.",
+ },
+ "ja": {
+ "label": "量子化ビットをエクスポート",
+ "info": "エクスポートするモデルを量子化します。",
+ },
+ },
+ "export_quantization_dataset": {
+ "en": {
+ "label": "Export quantization dataset",
+ "info": "The calibration dataset used for quantization.",
+ },
+ "ru": {
+ "label": "Экспорт набора данных для квантования",
+ "info": "Набор данных калибровки, используемый для квантования.",
+ },
+ "zh": {
+ "label": "导出量化数据集",
+ "info": "量化过程中使用的校准数据集。",
+ },
+ "ko": {
+ "label": "양자화 데이터셋 내보내기",
+ "info": "양자화에 사용되는 교정 데이터셋.",
+ },
+ "ja": {
+ "label": "量子化データセットをエクスポート",
+ "info": "量子化プロセスで使用されるキャリブレーションデータセット。",
+ },
+ },
+ "export_device": {
+ "en": {
+ "label": "Export device",
+ "info": "Which device should be used to export model.",
+ },
+ "ru": {
+ "label": "Экспорт устройство",
+ "info": "Какое устройство следует использовать для экспорта модели.",
+ },
+ "zh": {
+ "label": "导出设备",
+ "info": "导出模型使用的设备类型。",
+ },
+ "ko": {
+ "label": "내보낼 장치",
+ "info": "모델을 내보내는 데 사용할 장치.",
+ },
+ "ja": {
+ "label": "エクスポートデバイス",
+ "info": "モデルをエクスポートするために使用するデバイスタイプ。",
+ },
+ },
+ "export_legacy_format": {
+ "en": {
+ "label": "Export legacy format",
+ "info": "Do not use safetensors to save the model.",
+ },
+ "ru": {
+ "label": "Экспорт в устаревший формат",
+ "info": "Не использовать safetensors для сохранения модели.",
+ },
+ "zh": {
+ "label": "导出旧格式",
+ "info": "不使用 safetensors 格式保存模型。",
+ },
+ "ko": {
+ "label": "레거시 형식 내보내기",
+ "info": "모델을 저장하는 데 safetensors를 사용하지 않습니다.",
+ },
+ "ja": {
+ "label": "レガシーフォーマットをエクスポート",
+ "info": "safetensors フォーマットを使用せずにモデルを保存します。",
+ },
+ },
+ "export_dir": {
+ "en": {
+ "label": "Export dir",
+ "info": "Directory to save exported model.",
+ },
+ "ru": {
+ "label": "Каталог экспорта",
+ "info": "Каталог для сохранения экспортированной модели.",
+ },
+ "zh": {
+ "label": "导出目录",
+ "info": "保存导出模型的文件夹路径。",
+ },
+ "ko": {
+ "label": "내보내기 디렉토리",
+ "info": "내보낸 모델을 저장할 디렉토리.",
+ },
+ "ja": {
+ "label": "エクスポートディレクトリ",
+ "info": "エクスポートしたモデルを保存するフォルダのパス。",
+ },
+ },
+ "export_hub_model_id": {
+ "en": {
+ "label": "HF Hub ID (optional)",
+ "info": "Repo ID for uploading model to Hugging Face hub.",
+ },
+ "ru": {
+ "label": "HF Hub ID (опционально)",
+ "info": "Идентификатор репозитория для загрузки модели на Hugging Face hub.",
+ },
+ "zh": {
+ "label": "HF Hub ID(非必填)",
+ "info": "用于将模型上传至 Hugging Face Hub 的仓库 ID。",
+ },
+ "ko": {
+ "label": "HF 허브 ID (선택 사항)",
+ "info": "모델을 Hugging Face 허브에 업로드하기 위한 레포 ID.",
+ },
+ "ja": {
+ "label": "HF Hub ID(オプション)",
+ "info": "Hugging Face Hub にモデルをアップロードするためのリポジトリ ID。",
+ },
+ },
+ "export_btn": {
+ "en": {
+ "value": "Export",
+ },
+ "ru": {
+ "value": "Экспорт",
+ },
+ "zh": {
+ "value": "开始导出",
+ },
+ "ko": {
+ "value": "내보내기",
+ },
+ "ja": {
+ "value": "エクスポート",
+ },
+ },
+}
+
+
+ALERTS = {
+ "err_conflict": {
+ "en": "A process is in running, please abort it first.",
+ "ru": "Процесс уже запущен, пожалуйста, сначала прервите его.",
+ "zh": "任务已存在,请先中断训练。",
+ "ko": "프로세스가 실행 중입니다. 먼저 중단하십시오.",
+ "ja": "プロセスが実行中です。最初に中断してください。",
+ },
+ "err_exists": {
+ "en": "You have loaded a model, please unload it first.",
+ "ru": "Вы загрузили модель, сначала разгрузите ее.",
+ "zh": "模型已存在,请先卸载模型。",
+ "ko": "모델이 로드되었습니다. 먼저 언로드하십시오.",
+ "ja": "モデルがロードされています。最初にアンロードしてください。",
+ },
+ "err_no_model": {
+ "en": "Please select a model.",
+ "ru": "Пожалуйста, выберите модель.",
+ "zh": "请选择模型。",
+ "ko": "모델을 선택하십시오.",
+ "ja": "モデルを選択してください。",
+ },
+ "err_no_path": {
+ "en": "Model not found.",
+ "ru": "Модель не найдена.",
+ "zh": "模型未找到。",
+ "ko": "모델을 찾을 수 없습니다.",
+ "ja": "モデルが見つかりません。",
+ },
+ "err_no_dataset": {
+ "en": "Please choose a dataset.",
+ "ru": "Пожалуйста, выберите набор данных.",
+ "zh": "请选择数据集。",
+ "ko": "데이터 세트를 선택하십시오.",
+ "ja": "データセットを選択してください。",
+ },
+ "err_no_adapter": {
+ "en": "Please select an adapter.",
+ "ru": "Пожалуйста, выберите адаптер.",
+ "zh": "请选择适配器。",
+ "ko": "어댑터를 선택하십시오.",
+ "ja": "アダプターを選択してください。",
+ },
+ "err_no_output_dir": {
+ "en": "Please provide output dir.",
+ "ru": "Пожалуйста, укажите выходную директорию.",
+ "zh": "请填写输出目录。",
+ "ko": "출력 디렉토리를 제공하십시오.",
+ "ja": "出力ディレクトリを入力してください。",
+ },
+ "err_no_reward_model": {
+ "en": "Please select a reward model.",
+ "ru": "Пожалуйста, выберите модель вознаграждения.",
+ "zh": "请选择奖励模型。",
+ "ko": "리워드 모델을 선택하십시오.",
+ "ja": "報酬モデルを選択してください。",
+ },
+ "err_no_export_dir": {
+ "en": "Please provide export dir.",
+ "ru": "Пожалуйста, укажите каталог для экспорта.",
+ "zh": "请填写导出目录。",
+ "ko": "Export 디렉토리를 제공하십시오.",
+ "ja": "エクスポートディレクトリを入力してください。",
+ },
+ "err_gptq_lora": {
+ "en": "Please merge adapters before quantizing the model.",
+ "ru": "Пожалуйста, объедините адаптеры перед квантованием модели.",
+ "zh": "量化模型前请先合并适配器。",
+ "ko": "모델을 양자화하기 전에 어댑터를 병합하십시오.",
+ "ja": "モデルを量子化する前にアダプターをマージしてください。",
+ },
+ "err_failed": {
+ "en": "Failed.",
+ "ru": "Ошибка.",
+ "zh": "训练出错。",
+ "ko": "실패했습니다.",
+ "ja": "失敗しました。",
+ },
+ "err_demo": {
+ "en": "Training is unavailable in demo mode, duplicate the space to a private one first.",
+ "ru": "Обучение недоступно в демонстрационном режиме, сначала скопируйте пространство в частное.",
+ "zh": "展示模式不支持训练,请先复制到私人空间。",
+ "ko": "데모 모드에서는 훈련을 사용할 수 없습니다. 먼저 프라이빗 레포지토리로 작업 공간을 복제하십시오.",
+ "ja": "デモモードではトレーニングは利用できません。最初にプライベートスペースに複製してください。",
+ },
+ "err_tool_name": {
+ "en": "Tool name not found.",
+ "ru": "Имя инструмента не найдено.",
+ "zh": "工具名称未找到。",
+ "ko": "툴 이름을 찾을 수 없습니다.",
+ "ja": "ツール名が見つかりません。",
+ },
+ "err_json_schema": {
+ "en": "Invalid JSON schema.",
+ "ru": "Неверная схема JSON.",
+ "zh": "Json 格式错误。",
+ "ko": "잘못된 JSON 스키마입니다.",
+ "ja": "JSON スキーマが無効です。",
+ },
+ "err_config_not_found": {
+ "en": "Config file is not found.",
+ "ru": "Файл конфигурации не найден.",
+ "zh": "未找到配置文件。",
+ "ko": "Config 파일을 찾을 수 없습니다.",
+ "ja": "設定ファイルが見つかりません。",
+ },
+ "warn_no_cuda": {
+ "en": "CUDA environment was not detected.",
+ "ru": "Среда CUDA не обнаружена.",
+ "zh": "未检测到 CUDA 环境。",
+ "ko": "CUDA 환경이 감지되지 않았습니다.",
+ "ja": "CUDA 環境が検出されませんでした。",
+ },
+ "warn_output_dir_exists": {
+ "en": "Output dir already exists, will resume training from here.",
+ "ru": "Выходной каталог уже существует, обучение будет продолжено отсюда.",
+ "zh": "输出目录已存在,将从该断点恢复训练。",
+ "ko": "출력 디렉토리가 이미 존재합니다. 위 출력 디렉토리에 저장된 학습을 재개합니다.",
+ "ja": "出力ディレクトリが既に存在します。このチェックポイントからトレーニングを再開します。",
+ },
+ "info_aborting": {
+ "en": "Aborted, wait for terminating...",
+ "ru": "Прервано, ожидание завершения...",
+ "zh": "训练中断,正在等待进程结束……",
+ "ko": "중단되었습니다. 종료를 기다리십시오...",
+ "ja": "トレーニングが中断されました。プロセスの終了を待っています...",
+ },
+ "info_aborted": {
+ "en": "Ready.",
+ "ru": "Готово.",
+ "zh": "准备就绪。",
+ "ko": "준비되었습니다.",
+ "ja": "準備完了。",
+ },
+ "info_finished": {
+ "en": "Finished.",
+ "ru": "Завершено.",
+ "zh": "训练完毕。",
+ "ko": "완료되었습니다.",
+ "ja": "トレーニングが完了しました。",
+ },
+ "info_config_saved": {
+ "en": "Arguments have been saved at: ",
+ "ru": "Аргументы были сохранены по адресу: ",
+ "zh": "训练参数已保存至:",
+ "ko": "매개변수가 저장되었습니다: ",
+ "ja": "トレーニングパラメータが保存されました: ",
+ },
+ "info_config_loaded": {
+ "en": "Arguments have been restored.",
+ "ru": "Аргументы были восстановлены.",
+ "zh": "训练参数已载入。",
+ "ko": "매개변수가 복원되었습니다.",
+ "ja": "トレーニングパラメータが読み込まれました。",
+ },
+ "info_loading": {
+ "en": "Loading model...",
+ "ru": "Загрузка модели...",
+ "zh": "加载中……",
+ "ko": "모델 로딩 중...",
+ "ja": "モデルをロード中...",
+ },
+ "info_unloading": {
+ "en": "Unloading model...",
+ "ru": "Выгрузка модели...",
+ "zh": "卸载中……",
+ "ko": "모델 언로딩 중...",
+ "ja": "モデルをアンロード中...",
+ },
+ "info_loaded": {
+ "en": "Model loaded, now you can chat with your model!",
+ "ru": "Модель загружена, теперь вы можете общаться с вашей моделью!",
+ "zh": "模型已加载,可以开始聊天了!",
+ "ko": "모델이 로드되었습니다. 이제 모델과 채팅할 수 있습니다!",
+ "ja": "モデルがロードされました。チャットを開始できます!",
+ },
+ "info_unloaded": {
+ "en": "Model unloaded.",
+ "ru": "Модель выгружена.",
+ "zh": "模型已卸载。",
+ "ko": "모델이 언로드되었습니다.",
+ "ja": "モデルがアンロードされました。",
+ },
+ "info_thinking": {
+ "en": "🌀 Thinking...",
+ "ru": "🌀 Думаю...",
+ "zh": "🌀 思考中...",
+ "ko": "🌀 생각 중...",
+ "ja": "🌀 考えています...",
+ },
+ "info_thought": {
+ "en": "✅ Thought",
+ "ru": "✅ Думать закончено",
+ "zh": "✅ 思考完成",
+ "ko": "✅ 생각이 완료되었습니다",
+ "ja": "✅ 思考完了",
+ },
+ "info_exporting": {
+ "en": "Exporting model...",
+ "ru": "Экспорт модели...",
+ "zh": "正在导出模型……",
+ "ko": "모델 내보내기 중...",
+ "ja": "モデルをエクスポート中...",
+ },
+ "info_exported": {
+ "en": "Model exported.",
+ "ru": "Модель экспортирована.",
+ "zh": "模型导出完成。",
+ "ko": "모델이 내보내졌습니다.",
+ "ja": "モデルのエクスポートが完了しました。",
+ },
+ "info_swanlab_link": {
+ "en": "### SwanLab Link\n",
+ "ru": "### SwanLab ссылка\n",
+ "zh": "### SwanLab 链接\n",
+ "ko": "### SwanLab 링크\n",
+ "ja": "### SwanLab リンク\n",
+ },
+}
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/manager.py b/post-training/LLaMA-Factory/src/llamafactory/webui/manager.py
new file mode 100644
index 0000000..e762fa6
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/manager.py
@@ -0,0 +1,70 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from collections.abc import Generator
+from typing import TYPE_CHECKING
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+
+class Manager:
+ r"""A class to manage all the gradio components in Web UI."""
+
+ def __init__(self) -> None:
+ self._id_to_elem: dict[str, Component] = {}
+ self._elem_to_id: dict[Component, str] = {}
+
+ def add_elems(self, tab_name: str, elem_dict: dict[str, "Component"]) -> None:
+ r"""Add elements to manager."""
+ for elem_name, elem in elem_dict.items():
+ elem_id = f"{tab_name}.{elem_name}"
+ self._id_to_elem[elem_id] = elem
+ self._elem_to_id[elem] = elem_id
+
+ def get_elem_list(self) -> list["Component"]:
+ r"""Return the list of all elements."""
+ return list(self._id_to_elem.values())
+
+ def get_elem_iter(self) -> Generator[tuple[str, "Component"], None, None]:
+ r"""Return an iterator over all elements with their names."""
+ for elem_id, elem in self._id_to_elem.items():
+ yield elem_id.split(".")[-1], elem
+
+ def get_elem_by_id(self, elem_id: str) -> "Component":
+ r"""Get element by id.
+
+ Example: top.lang, train.dataset
+ """
+ return self._id_to_elem[elem_id]
+
+ def get_id_by_elem(self, elem: "Component") -> str:
+ r"""Get id by element."""
+ return self._elem_to_id[elem]
+
+ def get_base_elems(self) -> set["Component"]:
+ r"""Get the base elements that are commonly used."""
+ return {
+ self._id_to_elem["top.lang"],
+ self._id_to_elem["top.model_name"],
+ self._id_to_elem["top.model_path"],
+ self._id_to_elem["top.finetuning_type"],
+ self._id_to_elem["top.checkpoint_path"],
+ self._id_to_elem["top.quantization_bit"],
+ self._id_to_elem["top.quantization_method"],
+ self._id_to_elem["top.template"],
+ self._id_to_elem["top.rope_scaling"],
+ self._id_to_elem["top.booster"],
+ }
diff --git a/post-training/LLaMA-Factory/src/llamafactory/webui/runner.py b/post-training/LLaMA-Factory/src/llamafactory/webui/runner.py
new file mode 100644
index 0000000..4fbebde
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/llamafactory/webui/runner.py
@@ -0,0 +1,499 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from collections.abc import Generator
+from copy import deepcopy
+from subprocess import Popen, TimeoutExpired
+from typing import TYPE_CHECKING, Any, Optional
+
+from transformers.trainer import TRAINING_ARGS_NAME
+from transformers.utils import is_torch_npu_available
+
+from ..extras.constants import LLAMABOARD_CONFIG, PEFT_METHODS, TRAINING_STAGES
+from ..extras.misc import is_accelerator_available, torch_gc, use_ray
+from ..extras.packages import is_gradio_available
+from .common import (
+ DEFAULT_CACHE_DIR,
+ DEFAULT_CONFIG_DIR,
+ abort_process,
+ gen_cmd,
+ get_save_dir,
+ load_args,
+ load_config,
+ load_eval_results,
+ save_args,
+ save_cmd,
+)
+from .control import get_trainer_info
+from .locales import ALERTS, LOCALES
+
+
+if is_gradio_available():
+ import gradio as gr
+
+
+if TYPE_CHECKING:
+ from gradio.components import Component
+
+ from .manager import Manager
+
+
+class Runner:
+ r"""A class to manage the running status of the trainers."""
+
+ def __init__(self, manager: "Manager", demo_mode: bool = False) -> None:
+ r"""Init a runner."""
+ self.manager = manager
+ self.demo_mode = demo_mode
+ """ Resume """
+ self.trainer: Optional[Popen] = None
+ self.do_train = True
+ self.running_data: dict[Component, Any] = None
+ """ State """
+ self.aborted = False
+ self.running = False
+
+ def set_abort(self) -> None:
+ self.aborted = True
+ if self.trainer is not None:
+ abort_process(self.trainer.pid)
+
+ def _initialize(self, data: dict["Component", Any], do_train: bool, from_preview: bool) -> str:
+ r"""Validate the configuration."""
+ get = lambda elem_id: data[self.manager.get_elem_by_id(elem_id)]
+ lang, model_name, model_path = get("top.lang"), get("top.model_name"), get("top.model_path")
+ dataset = get("train.dataset") if do_train else get("eval.dataset")
+
+ if self.running:
+ return ALERTS["err_conflict"][lang]
+
+ if not model_name:
+ return ALERTS["err_no_model"][lang]
+
+ if not model_path:
+ return ALERTS["err_no_path"][lang]
+
+ if not dataset:
+ return ALERTS["err_no_dataset"][lang]
+
+ if not from_preview and self.demo_mode:
+ return ALERTS["err_demo"][lang]
+
+ if do_train:
+ if not get("train.output_dir"):
+ return ALERTS["err_no_output_dir"][lang]
+
+ try:
+ json.loads(get("train.extra_args"))
+ except json.JSONDecodeError:
+ return ALERTS["err_json_schema"][lang]
+
+ stage = TRAINING_STAGES[get("train.training_stage")]
+ if stage == "ppo" and not get("train.reward_model"):
+ return ALERTS["err_no_reward_model"][lang]
+ else:
+ if not get("eval.output_dir"):
+ return ALERTS["err_no_output_dir"][lang]
+
+ if not from_preview and not is_accelerator_available():
+ gr.Warning(ALERTS["warn_no_cuda"][lang])
+
+ return ""
+
+ def _finalize(self, lang: str, finish_info: str) -> str:
+ r"""Clean the cached memory and resets the runner."""
+ finish_info = ALERTS["info_aborted"][lang] if self.aborted else finish_info
+ gr.Info(finish_info)
+ self.trainer = None
+ self.aborted = False
+ self.running = False
+ self.running_data = None
+ torch_gc()
+ return finish_info
+
+ def _parse_train_args(self, data: dict["Component", Any]) -> dict[str, Any]:
+ r"""Build and validate the training arguments."""
+ get = lambda elem_id: data[self.manager.get_elem_by_id(elem_id)]
+ model_name, finetuning_type = get("top.model_name"), get("top.finetuning_type")
+ user_config = load_config()
+
+ args = dict(
+ stage=TRAINING_STAGES[get("train.training_stage")],
+ do_train=True,
+ model_name_or_path=get("top.model_path"),
+ cache_dir=user_config.get("cache_dir", None),
+ preprocessing_num_workers=16,
+ finetuning_type=finetuning_type,
+ template=get("top.template"),
+ rope_scaling=get("top.rope_scaling") if get("top.rope_scaling") != "none" else None,
+ flash_attn="fa2" if get("top.booster") == "flashattn2" else "auto",
+ use_unsloth=(get("top.booster") == "unsloth"),
+ enable_liger_kernel=(get("top.booster") == "liger_kernel"),
+ dataset_dir=get("train.dataset_dir"),
+ dataset=",".join(get("train.dataset")),
+ cutoff_len=get("train.cutoff_len"),
+ learning_rate=float(get("train.learning_rate")),
+ num_train_epochs=float(get("train.num_train_epochs")),
+ max_samples=int(get("train.max_samples")),
+ per_device_train_batch_size=get("train.batch_size"),
+ gradient_accumulation_steps=get("train.gradient_accumulation_steps"),
+ lr_scheduler_type=get("train.lr_scheduler_type"),
+ max_grad_norm=float(get("train.max_grad_norm")),
+ logging_steps=get("train.logging_steps"),
+ save_steps=get("train.save_steps"),
+ warmup_steps=get("train.warmup_steps"),
+ neftune_noise_alpha=get("train.neftune_alpha") or None,
+ packing=get("train.packing") or get("train.neat_packing"),
+ neat_packing=get("train.neat_packing"),
+ train_on_prompt=get("train.train_on_prompt"),
+ mask_history=get("train.mask_history"),
+ resize_vocab=get("train.resize_vocab"),
+ use_llama_pro=get("train.use_llama_pro"),
+ report_to=get("train.report_to"),
+ use_galore=get("train.use_galore"),
+ use_apollo=get("train.use_apollo"),
+ use_badam=get("train.use_badam"),
+ use_swanlab=get("train.use_swanlab"),
+ output_dir=get_save_dir(model_name, finetuning_type, get("train.output_dir")),
+ fp16=(get("train.compute_type") == "fp16"),
+ bf16=(get("train.compute_type") == "bf16"),
+ pure_bf16=(get("train.compute_type") == "pure_bf16"),
+ plot_loss=True,
+ trust_remote_code=True,
+ ddp_timeout=180000000,
+ include_num_input_tokens_seen=True,
+ )
+ args.update(json.loads(get("train.extra_args")))
+
+ # checkpoints
+ if get("top.checkpoint_path"):
+ if finetuning_type in PEFT_METHODS: # list
+ args["adapter_name_or_path"] = ",".join(
+ [get_save_dir(model_name, finetuning_type, adapter) for adapter in get("top.checkpoint_path")]
+ )
+ else: # str
+ args["model_name_or_path"] = get_save_dir(model_name, finetuning_type, get("top.checkpoint_path"))
+
+ # quantization
+ if get("top.quantization_bit") != "none":
+ args["quantization_bit"] = int(get("top.quantization_bit"))
+ args["quantization_method"] = get("top.quantization_method")
+ args["double_quantization"] = not is_torch_npu_available()
+
+ # freeze config
+ if args["finetuning_type"] == "freeze":
+ args["freeze_trainable_layers"] = get("train.freeze_trainable_layers")
+ args["freeze_trainable_modules"] = get("train.freeze_trainable_modules")
+ args["freeze_extra_modules"] = get("train.freeze_extra_modules") or None
+
+ # lora config
+ if args["finetuning_type"] == "lora":
+ args["lora_rank"] = get("train.lora_rank")
+ args["lora_alpha"] = get("train.lora_alpha")
+ args["lora_dropout"] = get("train.lora_dropout")
+ args["loraplus_lr_ratio"] = get("train.loraplus_lr_ratio") or None
+ args["create_new_adapter"] = get("train.create_new_adapter")
+ args["use_rslora"] = get("train.use_rslora")
+ args["use_dora"] = get("train.use_dora")
+ args["pissa_init"] = get("train.use_pissa")
+ args["pissa_convert"] = get("train.use_pissa")
+ args["lora_target"] = get("train.lora_target") or "all"
+ args["additional_target"] = get("train.additional_target") or None
+
+ if args["use_llama_pro"]:
+ args["freeze_trainable_layers"] = get("train.freeze_trainable_layers")
+
+ # rlhf config
+ if args["stage"] == "ppo":
+ if finetuning_type in PEFT_METHODS:
+ args["reward_model"] = ",".join(
+ [get_save_dir(model_name, finetuning_type, adapter) for adapter in get("train.reward_model")]
+ )
+ else:
+ args["reward_model"] = get_save_dir(model_name, finetuning_type, get("train.reward_model"))
+
+ args["reward_model_type"] = "lora" if finetuning_type == "lora" else "full"
+ args["ppo_score_norm"] = get("train.ppo_score_norm")
+ args["ppo_whiten_rewards"] = get("train.ppo_whiten_rewards")
+ args["top_k"] = 0
+ args["top_p"] = 0.9
+ elif args["stage"] in ["dpo", "kto"]:
+ args["pref_beta"] = get("train.pref_beta")
+ args["pref_ftx"] = get("train.pref_ftx")
+ args["pref_loss"] = get("train.pref_loss")
+
+ # galore config
+ if args["use_galore"]:
+ args["galore_rank"] = get("train.galore_rank")
+ args["galore_update_interval"] = get("train.galore_update_interval")
+ args["galore_scale"] = get("train.galore_scale")
+ args["galore_target"] = get("train.galore_target")
+
+ # apollo config
+ if args["use_apollo"]:
+ args["apollo_rank"] = get("train.apollo_rank")
+ args["apollo_update_interval"] = get("train.apollo_update_interval")
+ args["apollo_scale"] = get("train.apollo_scale")
+ args["apollo_target"] = get("train.apollo_target")
+
+ # badam config
+ if args["use_badam"]:
+ args["badam_mode"] = get("train.badam_mode")
+ args["badam_switch_mode"] = get("train.badam_switch_mode")
+ args["badam_switch_interval"] = get("train.badam_switch_interval")
+ args["badam_update_ratio"] = get("train.badam_update_ratio")
+
+ # report_to
+ if "none" in args["report_to"]:
+ args["report_to"] = "none"
+ elif "all" in args["report_to"]:
+ args["report_to"] = "all"
+
+ # swanlab config
+ if get("train.use_swanlab"):
+ args["swanlab_project"] = get("train.swanlab_project")
+ args["swanlab_run_name"] = get("train.swanlab_run_name")
+ args["swanlab_workspace"] = get("train.swanlab_workspace")
+ args["swanlab_api_key"] = get("train.swanlab_api_key")
+ args["swanlab_mode"] = get("train.swanlab_mode")
+
+ # eval config
+ if get("train.val_size") > 1e-6 and args["stage"] != "ppo":
+ args["val_size"] = get("train.val_size")
+ args["eval_strategy"] = "steps"
+ args["eval_steps"] = args["save_steps"]
+ args["per_device_eval_batch_size"] = args["per_device_train_batch_size"]
+
+ # ds config
+ if get("train.ds_stage") != "none":
+ ds_stage = get("train.ds_stage")
+ ds_offload = "offload_" if get("train.ds_offload") else ""
+ args["deepspeed"] = os.path.join(DEFAULT_CACHE_DIR, f"ds_z{ds_stage}_{ds_offload}config.json")
+
+ return args
+
+ def _parse_eval_args(self, data: dict["Component", Any]) -> dict[str, Any]:
+ r"""Build and validate the evaluation arguments."""
+ get = lambda elem_id: data[self.manager.get_elem_by_id(elem_id)]
+ model_name, finetuning_type = get("top.model_name"), get("top.finetuning_type")
+ user_config = load_config()
+
+ args = dict(
+ stage="sft",
+ model_name_or_path=get("top.model_path"),
+ cache_dir=user_config.get("cache_dir", None),
+ preprocessing_num_workers=16,
+ finetuning_type=finetuning_type,
+ quantization_method=get("top.quantization_method"),
+ template=get("top.template"),
+ rope_scaling=get("top.rope_scaling") if get("top.rope_scaling") != "none" else None,
+ flash_attn="fa2" if get("top.booster") == "flashattn2" else "auto",
+ use_unsloth=(get("top.booster") == "unsloth"),
+ dataset_dir=get("eval.dataset_dir"),
+ eval_dataset=",".join(get("eval.dataset")),
+ cutoff_len=get("eval.cutoff_len"),
+ max_samples=int(get("eval.max_samples")),
+ per_device_eval_batch_size=get("eval.batch_size"),
+ predict_with_generate=True,
+ max_new_tokens=get("eval.max_new_tokens"),
+ top_p=get("eval.top_p"),
+ temperature=get("eval.temperature"),
+ output_dir=get_save_dir(model_name, finetuning_type, get("eval.output_dir")),
+ trust_remote_code=True,
+ )
+
+ if get("eval.predict"):
+ args["do_predict"] = True
+ else:
+ args["do_eval"] = True
+
+ # checkpoints
+ if get("top.checkpoint_path"):
+ if finetuning_type in PEFT_METHODS: # list
+ args["adapter_name_or_path"] = ",".join(
+ [get_save_dir(model_name, finetuning_type, adapter) for adapter in get("top.checkpoint_path")]
+ )
+ else: # str
+ args["model_name_or_path"] = get_save_dir(model_name, finetuning_type, get("top.checkpoint_path"))
+
+ # quantization
+ if get("top.quantization_bit") != "none":
+ args["quantization_bit"] = int(get("top.quantization_bit"))
+ args["quantization_method"] = get("top.quantization_method")
+ args["double_quantization"] = not is_torch_npu_available()
+
+ return args
+
+ def _preview(self, data: dict["Component", Any], do_train: bool) -> Generator[dict["Component", str], None, None]:
+ r"""Preview the training commands."""
+ output_box = self.manager.get_elem_by_id("{}.output_box".format("train" if do_train else "eval"))
+ error = self._initialize(data, do_train, from_preview=True)
+ if error:
+ gr.Warning(error)
+ yield {output_box: error}
+ else:
+ args = self._parse_train_args(data) if do_train else self._parse_eval_args(data)
+ yield {output_box: gen_cmd(args)}
+
+ def _launch(self, data: dict["Component", Any], do_train: bool) -> Generator[dict["Component", Any], None, None]:
+ r"""Start the training process."""
+ output_box = self.manager.get_elem_by_id("{}.output_box".format("train" if do_train else "eval"))
+ error = self._initialize(data, do_train, from_preview=False)
+ if error:
+ gr.Warning(error)
+ yield {output_box: error}
+ else:
+ self.do_train, self.running_data = do_train, data
+ args = self._parse_train_args(data) if do_train else self._parse_eval_args(data)
+
+ os.makedirs(args["output_dir"], exist_ok=True)
+ save_args(os.path.join(args["output_dir"], LLAMABOARD_CONFIG), self._build_config_dict(data))
+
+ env = deepcopy(os.environ)
+ env["LLAMABOARD_ENABLED"] = "1"
+ env["LLAMABOARD_WORKDIR"] = args["output_dir"]
+ if args.get("deepspeed", None) is not None:
+ env["FORCE_TORCHRUN"] = "1"
+
+ # NOTE: DO NOT USE shell=True to avoid security risk
+ self.trainer = Popen(["llamafactory-cli", "train", save_cmd(args)], env=env)
+ yield from self.monitor()
+
+ def _build_config_dict(self, data: dict["Component", Any]) -> dict[str, Any]:
+ r"""Build a dictionary containing the current training configuration."""
+ config_dict = {}
+ skip_ids = ["top.lang", "top.model_path", "train.output_dir", "train.config_path"]
+ for elem, value in data.items():
+ elem_id = self.manager.get_id_by_elem(elem)
+ if elem_id not in skip_ids:
+ config_dict[elem_id] = value
+
+ return config_dict
+
+ def preview_train(self, data):
+ yield from self._preview(data, do_train=True)
+
+ def preview_eval(self, data):
+ yield from self._preview(data, do_train=False)
+
+ def run_train(self, data):
+ yield from self._launch(data, do_train=True)
+
+ def run_eval(self, data):
+ yield from self._launch(data, do_train=False)
+
+ def monitor(self):
+ r"""Monitorgit the training progress and logs."""
+ self.aborted = False
+ self.running = True
+
+ get = lambda elem_id: self.running_data[self.manager.get_elem_by_id(elem_id)]
+ lang, model_name, finetuning_type = get("top.lang"), get("top.model_name"), get("top.finetuning_type")
+ output_dir = get("{}.output_dir".format("train" if self.do_train else "eval"))
+ output_path = get_save_dir(model_name, finetuning_type, output_dir)
+
+ output_box = self.manager.get_elem_by_id("{}.output_box".format("train" if self.do_train else "eval"))
+ progress_bar = self.manager.get_elem_by_id("{}.progress_bar".format("train" if self.do_train else "eval"))
+ loss_viewer = self.manager.get_elem_by_id("train.loss_viewer") if self.do_train else None
+ swanlab_link = self.manager.get_elem_by_id("train.swanlab_link") if self.do_train else None
+
+ running_log = ""
+ while self.trainer is not None:
+ if self.aborted:
+ yield {
+ output_box: ALERTS["info_aborting"][lang],
+ progress_bar: gr.Slider(visible=False),
+ }
+ else:
+ running_log, running_progress, running_info = get_trainer_info(lang, output_path, self.do_train)
+ return_dict = {
+ output_box: running_log,
+ progress_bar: running_progress,
+ }
+ if "loss_viewer" in running_info:
+ return_dict[loss_viewer] = running_info["loss_viewer"]
+
+ if "swanlab_link" in running_info:
+ return_dict[swanlab_link] = running_info["swanlab_link"]
+
+ yield return_dict
+ try:
+ self.trainer.wait(2)
+ self.trainer = None
+ except TimeoutExpired:
+ continue
+
+ if self.do_train:
+ if os.path.exists(os.path.join(output_path, TRAINING_ARGS_NAME)) or use_ray():
+ finish_info = ALERTS["info_finished"][lang]
+ else:
+ finish_info = ALERTS["err_failed"][lang]
+ else:
+ if os.path.exists(os.path.join(output_path, "all_results.json")) or use_ray():
+ finish_info = load_eval_results(os.path.join(output_path, "all_results.json"))
+ else:
+ finish_info = ALERTS["err_failed"][lang]
+
+ return_dict = {
+ output_box: self._finalize(lang, finish_info) + "\n\n" + running_log,
+ progress_bar: gr.Slider(visible=False),
+ }
+ yield return_dict
+
+ def save_args(self, data):
+ r"""Save the training configuration to config path."""
+ output_box = self.manager.get_elem_by_id("train.output_box")
+ error = self._initialize(data, do_train=True, from_preview=True)
+ if error:
+ gr.Warning(error)
+ return {output_box: error}
+
+ lang = data[self.manager.get_elem_by_id("top.lang")]
+ config_path = data[self.manager.get_elem_by_id("train.config_path")]
+ os.makedirs(DEFAULT_CONFIG_DIR, exist_ok=True)
+ save_path = os.path.join(DEFAULT_CONFIG_DIR, config_path)
+
+ save_args(save_path, self._build_config_dict(data))
+ return {output_box: ALERTS["info_config_saved"][lang] + save_path}
+
+ def load_args(self, lang: str, config_path: str):
+ r"""Load the training configuration from config path."""
+ output_box = self.manager.get_elem_by_id("train.output_box")
+ config_dict = load_args(os.path.join(DEFAULT_CONFIG_DIR, config_path))
+ if config_dict is None:
+ gr.Warning(ALERTS["err_config_not_found"][lang])
+ return {output_box: ALERTS["err_config_not_found"][lang]}
+
+ output_dict: dict[Component, Any] = {output_box: ALERTS["info_config_loaded"][lang]}
+ for elem_id, value in config_dict.items():
+ output_dict[self.manager.get_elem_by_id(elem_id)] = value
+
+ return output_dict
+
+ def check_output_dir(self, lang: str, model_name: str, finetuning_type: str, output_dir: str):
+ r"""Restore the training status if output_dir exists."""
+ output_box = self.manager.get_elem_by_id("train.output_box")
+ output_dict: dict[Component, Any] = {output_box: LOCALES["output_box"][lang]["value"]}
+ if model_name and output_dir and os.path.isdir(get_save_dir(model_name, finetuning_type, output_dir)):
+ gr.Warning(ALERTS["warn_output_dir_exists"][lang])
+ output_dict[output_box] = ALERTS["warn_output_dir_exists"][lang]
+
+ output_dir = get_save_dir(model_name, finetuning_type, output_dir)
+ config_dict = load_args(os.path.join(output_dir, LLAMABOARD_CONFIG)) # load llamaboard config
+ for elem_id, value in config_dict.items():
+ output_dict[self.manager.get_elem_by_id(elem_id)] = value
+
+ return output_dict
diff --git a/post-training/LLaMA-Factory/src/train.py b/post-training/LLaMA-Factory/src/train.py
new file mode 100644
index 0000000..eba846a
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/train.py
@@ -0,0 +1,28 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from llamafactory.train.tuner import run_exp
+
+
+def main():
+ run_exp()
+
+
+def _mp_fn(index):
+ # For xla_spawn (TPUs)
+ run_exp()
+
+
+if __name__ == "__main__":
+ main()
diff --git a/post-training/LLaMA-Factory/src/webui.py b/post-training/LLaMA-Factory/src/webui.py
new file mode 100644
index 0000000..f13d2f2
--- /dev/null
+++ b/post-training/LLaMA-Factory/src/webui.py
@@ -0,0 +1,31 @@
+# Copyright 2025 the LlamaFactory team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+from llamafactory.extras.misc import fix_proxy, is_env_enabled
+from llamafactory.webui.interface import create_ui
+
+
+def main():
+ gradio_ipv6 = is_env_enabled("GRADIO_IPV6")
+ gradio_share = is_env_enabled("GRADIO_SHARE")
+ server_name = os.getenv("GRADIO_SERVER_NAME", "[::]" if gradio_ipv6 else "0.0.0.0")
+ print("Visit http://ip:port for Web UI, e.g., http://127.0.0.1:7860")
+ fix_proxy(ipv6_enabled=gradio_ipv6)
+ create_ui().queue().launch(share=gradio_share, server_name=server_name, inbrowser=True)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/post-training/LLaMA-Factory/test.json b/post-training/LLaMA-Factory/test.json
new file mode 100644
index 0000000..6219b30
--- /dev/null
+++ b/post-training/LLaMA-Factory/test.json
@@ -0,0 +1,500 @@
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Real Estate Agency \n \n \n\n\n \n \n
\n
Our Services \n
Whether you're looking for a new home, a rental property, or a commercial property, we have the expertise and resources to help you.
\n
\n
\n \n
\n\n\n```", "label": "\n \n\n\n\n\nWelcome to our Real Estate Agency \n\n At our Real Estate Agency, we are dedicated to providing our clients with the best possible service. We understand that buying or selling a property can be a big decision, which is why we offer a wide range of services to help you find the perfect property. Whether you're looking for a new home, a rental property, or a commercial property, we have the expertise and resources to help you.\n
\n \n\n\n \n\n\n
\n
\n
Real Estate Agency \n\n\n
\n
Follow Us \n\n\n
\n
\n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original web page you provided. I've included placeholders for images and links where necessary.\n\n```html\n\n\n\n \n \n Healthcare Provider \n \n \n\n\n \n \n
\n
About Us \n
We are committed to providing the best care possible. Our team of experienced healthcare professionals is dedicated to providing high-quality, compassionate care to our patients. We believe in treating everyone with respect and empathy, regardless of their background or condition.
\n
\n
\n
Services \n
We offer a wide range of services to meet the needs of our patients. These include primary care, specialty care, and more. Our team of healthcare professionals is here to help you every step of the way.
\n
\n
\n
Doctors \n
Our team of healthcare professionals is some of the best in the industry. They are experts in their fields and are dedicated to providing the best care possible.
\n
\n
\n\n\n```\n\nThis code uses Tailwind CSS for styling and includes placeholders for images and links where necessary. The layout and styling are designed to match the original web page you provided.", "label": "\n \n\n\n\nHealthcare Provider \n
\n\n \n\nAbout Us \n\n At Healthcare Provider, we are committed to providing the best care possible. Our team of experienced healthcare professionals is dedicated to providing high-quality, compassionate care to our patients. We believe in treating everyone with respect and empathy, regardless of their background or condition.\n
\n \n\nServices \n\n We offer a wide range of services to meet the needs of our patients. These include primary care, specialty care, and more. Our team of healthcare professionals is here to help you every step of the way.\n
\n \n\nDoctors \n\n Our team of healthcare professionals is some of the best in the industry. They are experts in their fields and are dedicated to providing the best care possible.\n
\n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. I've included placeholders for images and links where necessary.\n\n```html\n\n\n\n \n \n Tech Company \n \n \n\n\n \n Welcome to Tech Company \n At Tech Company, we are dedicated to providing the best technology solutions for our clients. Our team of experts is committed to delivering high-quality, reliable, and innovative products and services. We are passionate about technology and are always looking for new ways to improve and innovate. Whether you're a small business or a large corporation, we're here to help you achieve your technology goals.
\n \n \n Home \n About \n Contact \n \n \n At Tech Company, we are dedicated to providing the best technology solutions for our clients. Our team of experts is committed to delivering high-quality, reliable, and innovative products and services. We are passionate about technology and are always looking for new ways to improve and innovate. Whether you're a small business or a large corporation, we're here to help you achieve your technology goals.
\n \n \n\n\n```\n\nThis code uses Tailwind CSS for styling and includes placeholders for images and links where necessary. The layout and design should closely resemble the original screenshot.", "label": "\n \n\n\n\nTech Company \n
\n\n \n\n
Welcome to Tech Company \n
\n At Tech Company, we are dedicated to providing the best technology solutions for our clients. Our team of experts is committed to delivering high-quality, reliable, and innovative products and services. We are passionate about technology and are always looking for new ways to improve and innovate. Whether you're a small business or a large corporation, we're here to help you achieve your technology goals.\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Our Educational Institution \n \n \n\n\n \n \n
\n
At our educational institution, we are committed to providing a high-quality education that prepares students for success in their chosen fields. Our experienced faculty, state-of-the-art facilities, and comprehensive curriculum ensure that our students are well-equipped to excel in their studies and beyond. \n
Our mission is to provide a supportive and enriching learning environment where students can thrive academically and personally. We believe in the power of education to shape the future and are dedicated to helping our students achieve their goals.
\n
\n \n
\n\n\n```", "label": "\n \n\n\n
\n
\n
Welcome to Our Educational Institution \n
\n At our educational institution, we are committed to providing a high-quality education that prepares students for success in their chosen fields. Our experienced faculty, state-of-the-art facilities, and comprehensive curriculum ensure that our students are well-equipped to excel in their studies and beyond.\n
\n
\n Learn More\n \n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. I've included placeholder images and links where necessary.\n\n```html\n\n\n\n \n \n Travel Agency \n \n \n\n\n \n \n
\n
Explore the World \n
We offer a wide range of travel packages to destinations all over the world. Whether you're looking for a relaxing beach vacation, an adventurous trek, or a cultural experience, we've got you covered.
\n
\n
\n
\n\n\n```\n\nThis code will create a web page with a header, content area, and a call-to-action (CTA) button. The layout and styling are designed to match the original screenshot.", "label": "\n \n\n\n
\n
\n
Travel Agency \n
\n Welcome to our travel agency, where we specialize in creating unforgettable travel experiences. We offer a wide range of travel packages to destinations all over the world. Whether you're looking for a relaxing beach vacation, an adventurous trek, or a cultural experience, we've got you covered.\n
\n
\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Welcome to Our Real Estate Agency \n \n \n\n\n \n \n
\n
\n
At our real estate agency, we are committed to providing our clients with the highest level of service and expertise. We understand that buying or selling a home is a big decision, and we are here to help you every step of the way. Whether you're looking for a home, selling your current home, or investing in real estate, we have the resources and expertise to help you.
\n
\n
\n
Our Team of Experienced Real Estate Professionals \n
Our team of experienced real estate professionals is dedicated to providing you with comprehensive market analysis, property research, and expert advice. We understand that buying or selling a home can be a complex process, which is why we offer a wide range of services to meet your unique needs.
\n
We are committed to providing you with the best possible service and we are always here to help you. Whether you're looking for a home, selling your home, or investing in real estate, we are here to support you.
\n
\n
\n \n
\n\n\n```", "label": "\n \n\n\n
\n
\n
Welcome to Our Real Estate Agency \n
At our real estate agency, we are committed to providing our clients with the highest level of service and expertise. We understand that buying or selling a home is a big decision, and we are here to help you every step of the way. Whether you're looking for your dream home, selling your current home, or investing in real estate, we have the resources and expertise to help you.
\n
Our team of experienced real estate professionals is dedicated to providing you with comprehensive market analysis, property research, and expert advice. We understand that buying or selling a home can be a complex process, which is why we offer a wide range of services to meet your unique needs.
\n
We are committed to providing you with the best possible service and we are always here to help you. Whether you're looking for a home, selling your home, or investing in real estate, we are here to support you.
\n
\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Fashion Brand \n \n \n\n\n \n \n
\n
Our Collections \n
Our collections are designed to cater to a wide range of tastes and preferences, from classic to modern, from casual to formal, and everything in between. We believe that fashion is just as much about what you wear as it is about how you live.
\n
\n
\n
\n
Product 1 \n
Product description 1
\n
\n
\n
\n
Product 2 \n
Product description 2
\n
\n
\n
\n
Product 3 \n
Product description 3
\n
\n
\n
\n \n
\n\n\n```", "label": "\n \n\n\n
\n
Welcome to our fashion brand, where we believe in the power of style and self-expression. We are a team of passionate individuals who are dedicated to creating unique and timeless pieces that are not only fashionable but also comfortable and wearable. Our collections are designed to cater to a wide range of tastes and preferences, from classic to modern, from casual to formal, and everything in between. We believe that fashion is not just about what you wear, but also about how you live. That's why we offer a wide range of products that are not only stylish but also functional. Whether you're looking for a wardrobe upgrade or simply want to express your individuality, we have something for everyone.
\n
\n
\n
At our brand, we are committed to sustainability and ethical production. We use only the highest quality materials and we work closely with our suppliers to ensure that our products are not only stylish but also ethically sourced. We believe that fashion should not only be about what you wear, but also about how you live. That's why we offer a wide range of products that are not only stylish but also functional. Whether you're looking for a wardrobe upgrade or simply want to express your individuality, we have something for everyone.
\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Online Education Platform \n \n \n\n\n \n \n
\n
About Us \n
Our online education platform is designed to provide flexible, interactive, and engaging learning experiences. We offer a wide range of courses in various subjects, from science and art to technology and business, and everything in between. Our courses are taught by experienced professionals in their respective fields, and they are designed to be accessible to all, regardless of your previous knowledge or experience.
\n
\n
\n
Our Courses \n
We offer a variety of courses to suit your learning needs. Whether you're just starting out or looking to advance your skills, we have something for everyone. Our courses are designed to be interactive and engaging, with a mix of video lectures, quizzes, and practical exercises.
\n
\n
\n
Why Choose Us? \n
Our online education platform is committed to providing high-quality education to everyone, anywhere, at any time. We believe in the power of education to change lives, and we're committed to making that a reality.
\n
\n
\n\n\n```", "label": "\n \n\n\n\nAbout Us \nOur online education platform is designed to provide flexible, interactive, and engaging learning experiences. We offer a wide range of courses in various subjects, from science to art, from technology to business, and everything in between. Our courses are taught by experienced professionals in their respective fields, and they are designed to be accessible to all, regardless of your previous knowledge or experience.
\n \n\nOur Courses \nWe offer a variety of courses to suit your learning needs. Whether you're just starting out or looking to advance your skills, we have something for everyone. Our courses are designed to be interactive and engaging, with a mix of video lectures, quizzes, and practical exercises.
\n \n\nWhy Choose Us? \nOur online education platform is committed to providing high-quality education to everyone, anywhere, at any time. We believe in the power of education to change lives, and we're committed to making that a reality.
\n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. I've included placeholders for images and links where necessary.\n\n```html\n\n\n\n \n \n Tech Company \n \n \n\n\n \n \n \n
Welcome to Tech Company \n
At Tech Company, we are dedicated to providing the best technology solutions for our clients. Our team of experts is committed to delivering high-quality, reliable, and innovative products and services. We are passionate about technology and are always looking for new ways to improve and innovate.
\n
\n \n \n\n\n```\n\nThis code uses Tailwind CSS for styling and includes placeholders for images and links. You can replace the `placeholder.jpg` with your actual image URL and adjust the links as needed.", "label": "\n \n\n\n\nTech Company \n
\n\n \n\nWelcome to Tech Company \n\n At Tech Company, we are dedicated to providing the best technology solutions for our clients. Our team of experts is committed to delivering high-quality, reliable, and innovative products and services. We are passionate about technology and are always looking for new ways to improve and innovate.\n
\n\n \n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Art Gallery \n \n \n\n\n \n \n
\n
\n
Card 1 \n
Explore the world of art with our collection of stunning pieces. From paintings to sculptures, we have something for everyone.
\n
\n
\n
\n
Card 2 \n
Discover the beauty of nature through our collection of nature-inspired artwork. Each piece is a work of art in itself.
\n
\n
\n
\n
Card 3 \n
Explore the power of abstract art with our collection of abstract pieces. Each piece is a unique expression of the artist's soul.
\n
\n
\n
\n \n
\n\n\n```", "label": "\n \n\n\n
\n
Art Gallery \n
\n Welcome to our Art Gallery, where we showcase a wide variety of artwork pieces from various artists. Our mission is to provide a platform for artists to showcase their work and for art enthusiasts to discover new and exciting pieces. We believe in the power of art to inspire, educate, and bring people together.\n
\n
\n\n Explore Artwork\n \n
\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "To create an HTML and Tailwind CSS version of the web page that matches the original, you'll need to ensure that the layout, text, and images are styled according to the provided screenshot. Below is a basic implementation using Tailwind CSS. Note that Tailwind CSS is a utility-first CSS framework, so you'll need to include it in your project to use its utility classes.\n\n### HTML and Tailwind CSS Implementation\n\n```html\n\n\n\n \n \n Our Tech Company \n \n\n\n \n
\n
\n
\n
\n
\n
Welcome to Our Tech Company \n
\n At our tech company, we are dedicated to providing innovative and reliable technology solutions to meet your business needs. Our team of experts is committed to delivering high-quality products and services that exceed your expectations.\n
\n
\n We specialize in a wide range of services, including software development, IT consulting, and cybersecurity. Whether you're a small business or a large corporation, we have the expertise to help you achieve your goals.\n
\n
\n
\n
\n
\n
About \n
\n At our tech company, we are dedicated to providing innovative and reliable technology solutions to meet your business needs. Our team of experts is committed to delivering high-quality products and services that exceed your expectations.\n
\n
\n We specialize in a wide range of services, including software development, IT consulting, and cybersecurity. Whether you're a small business or a large corporation, we have the expertise to help you achieve your goals.\n
\n
\n
\n
Services \n
\n At our tech company, we are dedicated to providing innovative and reliable technology solutions to meet your business needs. Our team of experts is committed to delivering high-quality products and services that exceed your expectations.\n
\n
\n We specialize in a wide range of services, including software development, IT consulting, and cybersecurity. Whether you're a small business or a large corporation, we have the expertise to help you achieve your goals.\n
\n
\n
\n
\n
\n
Contact \n
\n At our tech company, we are dedicated to providing innovative and reliable technology solutions to meet your business needs. Our team of experts is committed to delivering high-quality products and services that exceed your expectations.\n
\n
\n We specialize in a wide range of services, including software development, IT consulting, and cybersecurity. Whether you're a small business or a large corporation, we have the expertise to help you achieve your goals.\n
\n
\n
\n
Phone \n
\n At our tech company, we are dedicated to providing innovative and reliable technology solutions to meet your business needs. Our team of experts is committed to delivering high-quality products and services that exceed your expectations.\n
\n
\n We specialize in a wide range of services, including software development, IT consulting, and cybersecurity. Whether you're a small business or a large corporation, we have the expertise to help you achieve your goals.\n
\n
\n
\n
\n
\n
Email \n
\n At our tech company, we are dedicated to providing innovative and reliable technology solutions to meet your business needs. Our team of experts is committed to delivering high-quality products and services that exceed your expectations.\n
\n
\n We specialize in a wide range of services, including software development, IT consulting, and cybersecurity. Whether you're a small business or a large corporation, we have the expertise to help you achieve your goals.\n
\n
\n
\n
Address \n
\n At our tech company, we are dedicated to providing innovative and reliable technology solutions to meet your business needs. Our team of experts is committed to delivering high-quality products and services that exceed your expectations.\n
\n
\n We specialize in a wide range of services, including software development, IT consulting, and cybersecurity. Whether you're a small business or a large corporation, we have the expertise to help you achieve your goals.\n
\n
\n
\n
\n\n\n```\n\n### Explanation:\n\n1. **Container and Padding**: The `container` class from Tailwind CSS is used to create a responsive container with padding.\n2. **Images**: The company logo is placed in the left column using an `img` tag.\n3. **Text Sections**: Each section (About, Services, Contact, Phone, Email, Address) is styled using Tailwind CSS classes to ensure consistency and responsiveness.\n4. **Flexbox Layout**: The layout uses Flexbox to arrange the content in a grid format, making it responsive and easy to read on different screen sizes.\n\nThis implementation should closely match the appearance of the original web page as shown in the screenshot.", "label": "\n \n\n\n \n \n\n\n \n\n\nWelcome to Our Tech Company \n\n At our tech company, we are dedicated to providing innovative and reliable technology solutions to meet your business needs. Our team of experts is committed to delivering high-quality products and services that exceed your expectations.\n
\n\n We specialize in a wide range of services, including software development, IT consulting, and cybersecurity. Whether you're a small business or a large corporation, we have the expertise to help you achieve your goals.\n
\n \n\nContact Us \n\n Phone: 123-456-7890\n
\n\n Email: info@techcompany.com\n
\n\n Address: 123 Tech Street, City, State, Zip\n
\n \n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. Note that I've used placeholder images and links for the elements that are not specified in the original screenshot.\n\n```html\n\n\n\n \n \n Travel Agency \n \n \n\n\n \n \n
\n
Welcome to our travel agency \n
We are a professional travel agency that offers a wide range of services to make your travel experience unforgettable.
\n
Discover the world with us and create memories that last a lifetime.
\n
\n
\n
\n\n\n```\n\nThis code will create a web page with a header, content area, and a call-to-action (CTA) button. The elements are styled using Tailwind CSS, and the overall layout is designed to match the original screenshot.", "label": "\n \n\n\n
\n
\n
\n
\n
Travel Agency \n
\n
\nBook Now \n
\n
\n
\n
\n
\n
Welcome to our travel agency \n
We are a professional travel agency that offers a wide range of services to make your travel experience unforgettable.
\n
Learn More \n
\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. I've included placeholders for images and links where necessary.\n\n```html\n\n\n\n \n \n Non-Profit Organization \n \n \n\n\n \n \n
\n
\n
Success Stories \n
We have been successful in raising funds for numerous charities and causes. We are proud to have helped so many people in need.
\n
\n
\n
Volunteer Opportunities \n
We are always looking for volunteers to help us with our mission. If you are interested in volunteering, please contact us.
\n
\n
\n
Donation \n
If you would like to donate, please use the form below.
\n
\n
\n
Contact Information \n
Phone: 123-456-7890 Email: info@nonprofit.org
\n
\n
\n
Volunteer Sign-up \n
If you are interested in volunteering, please sign up below.
\n
\n
\n \n
\n\n\n```\n\nThis code uses Tailwind CSS for styling and includes placeholders for images and links. You can replace the placeholder text with your actual content.", "label": "\n \n\n\n
\n
\n
\n
Non-Profit Organization \n
Mission \n
\n Our mission is to provide a platform for individuals to donate to causes they care about. We believe in the power of community and the importance of giving back.\n
\n
Success Stories \n
\n We have been successful in raising funds for numerous charities and causes. We are proud to have helped so many people in need.\n
\n
Volunteer Opportunities \n
\n We are always looking for volunteers to help us with our mission. If you are interested in volunteering, please contact us.\n
\n
\n
\n
\n
Donation \n
\n If you would like to donate, please use the form below.\n
\n\n
\n
\n
Contact Information \n
\n Phone: 123-456-7890\n Email: info@nonprofit.org\n
\n
\n
\n
Volunteer Sign-up \n
\n If you are interested in volunteering, please sign up below.\n
\n\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Healthcare Provider \n \n \n\n\n \n \n
\n
Our Services \n
We offer a wide range of services including physical therapy, occupational therapy, speech therapy, and more.
\n
\n
\n
About Us \n
We are a team of experienced healthcare professionals dedicated to providing the best care possible.
\n
\n \n
\n\n\n```", "label": "\n \n\n\n\nOur Services \nWe offer a wide range of services including physical therapy, occupational therapy, speech therapy, and more.
\n \n\nAbout Us \nWe are a team of experienced healthcare professionals dedicated to providing the best care possible.
\n \n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "To create an HTML and Tailwind CSS version of the web page, we need to ensure that the layout, size, text, and color of all elements match the original screenshot. Here's a step-by-step guide to achieve this:\n\n1. **HTML Structure**: We'll start by creating the basic HTML structure.\n2. **Tailwind CSS Classes**: We'll use Tailwind CSS classes to style the elements according to the screenshot.\n\nHere's the complete code:\n\n```html\n\n\n\n \n \n Web Page \n \n\n\n \n
\n
\n
Company Name \n
Categories
\n
Category 1
\n
Category 2
\n
Category 3
\n
\n
Product Image
\n
\n
\n
\n
Product Image \n
Categories
\n
Category 1
\n
Category 2
\n
Category 3
\n
\n
Product Image
\n
\n
\n
\n
\n
Customer Reviews \n
This is a great product. Highly recommend it!
\n
John Doe
\n
\n
\n\n\n```\n\n### Explanation:\n\n1. **HTML Structure**:\n - The `container` class from Tailwind CSS is used to center the content and add some padding.\n - The `flex` and `flex-col` classes are used to create a flexible layout with two columns.\n - The `md:w-1/2` class is used to make the columns responsive, so they take up half the width on medium and larger screens.\n\n2. **Tailwind CSS Classes**:\n - `text-2xl` and `font-bold` are used to style the headings.\n - `mb-4` is used to add margin at the bottom of each paragraph.\n - `w-full` and `h-48` are used to set the width and height of the images.\n - `object-cover` is used to cover the image with the background.\n\nThis code should create a web page that looks exactly like the original screenshot, with the same layout, size, text, and color of all elements.", "label": "\n \n\n\n\n
\n
Company Name \n
\n\n \n
\n \n\n\n
Categories \n
\nCategory 1 \nCategory 2 \nCategory 3 \n \n
\n\n
Product Image \n
\n
\n\n
Customer Reviews \n
\n
\"This is a great product. I highly recommend it.\"
\n
- John Doe
\n
\n
\n \n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot.\n\n```html\n\n\n\n \n \n Food Delivery Service \n \n \n\n\n \n Welcome to Our Food Delivery Service \n At our food delivery service, we are committed to providing you with the best service and quality food. We use only the freshest ingredients and our experienced chefs to prepare your meals. Our delivery service is fast, reliable, and convenient. We are proud to offer a wide variety of dishes from classic comfort food to unique and exotic dishes. We also offer a wide range of promotions and discounts to make your meal experience even more enjoyable.
\n \n \n Featured Dishes \n Customer Reviews \n \n \n \n Featured Dishes \n Discover our delicious and diverse menu of dishes. From classic comfort food to unique and exotic dishes, we have something for everyone.
\n \n \n \n Customer Reviews \n Read what our satisfied customers have to say about our food delivery service.
\n \n \n \n \n\n\n```\n\nThis code uses Tailwind CSS for styling and ensures that the layout and elements match the original screenshot. The `placeholder.jpg` images are placeholders and should be replaced with actual images.", "label": "\n \n\n\n
Welcome to Our Food Delivery Service \n
\n At our food delivery service, we are committed to providing you with the best service and quality food. We use only the freshest ingredients and our experienced chefs to prepare your meals. Our delivery service is fast, reliable, and convenient. We are proud to offer a wide variety of dishes, from classic comfort food to unique and exotic dishes. We also offer a wide range of promotions and discounts to make your meal experience even more enjoyable.\n
\n
\n
\n
Featured Dishes \n\n\n
\n
Customer Reviews \n\n\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "To create an HTML and Tailwind CSS version of the web page that matches the original screenshot, you can use the following code. This code includes the necessary HTML structure, Tailwind CSS classes, and placeholder images and links.\n\n```html\n\n\n\n \n \n Company Website \n \n\n\n \n
\n
\n
Welcome to Our Company \n
We offer a wide range of high-quality products and services to our customers.
\n
\n
\n
\n
Our Services \n
We provide top-notch services to our customers. Learn more about our company and our mission.
\n
\n
\n\n\n```\n\n### Explanation:\n1. **HTML Structure**: The HTML structure is basic with a `div` container for the main content.\n2. **Tailwind CSS**: The `tailwind.min.css` link is included to apply Tailwind CSS classes.\n3. **Background Color**: The `bg-gray-100` class is used to set the background color to light gray.\n4. **Flexbox Layout**: The `flex` and `items-center` classes are used to center the content both horizontally and vertically.\n5. **Images and Text**: Placeholder images and text are used to match the original screenshot. You can replace `placeholder.jpg` with actual image URLs if available.\n6. **Text Alignment**: The `text-center` class is used to center the text within the container.\n\nThis code should create a web page that looks similar to the original screenshot, with the specified layout, text, and images.", "label": "\n \n\n\n \n \n\n\nHome \nProducts \nServices \nAbout Us \n \n \n\n\n
\n
Products \n
We offer a wide range of high-quality products.
\n
\n
\n
Services \n
We provide top-notch services to our customers.
\n
\n
\n
About Us \n
Learn more about our company and our mission.
\n
\n
\n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Pharmaceutical Company \n \n \n\n\n \n Pharmaceutical Company \n \n \n
\n About Us \n We are a leading pharmaceutical company dedicated to providing high-quality, safe, and effective medicines to our patients. Our mission is to improve the health and well-being of our communities through our innovative medicines.
\n \n
\n Our Services \n We offer a wide range of services including prescription drugs, over-the-counter medicines, and health and wellness products. We also provide expert advice and consultation on a range of health issues.
\n \n
\n Request a Consultation \n If you're not sure what you need, or if you're not sure how to take your medicines, we're here to help. Contact us today to schedule a consultation.
\n \n
\n \n\n\n```", "label": "\n \n\n\nPharmaceutical Company \n \n\n\nAbout Us \n\n We are a leading pharmaceutical company dedicated to providing high-quality, safe, and effective medicines to our patients. Our mission is to improve the health and well-being of our communities through our innovative medicines.\n
\n \n\nOur Services \n\n We offer a wide range of services including prescription drugs, over-the-counter medicines, and health and wellness products. We also provide expert advice and consultation on a range of health issues.\n
\n \n\nRequest a Consultation \n\n If you're not sure what you need, or if you're not sure how to take your medicines, we're here to help. Contact us today to schedule a consultation.\n
\nRequest a Consultation \n \n \n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "To create an HTML and Tailwind CSS version of the web page, we need to ensure that the layout, text, and colors match the original screenshot. Here's a step-by-step guide to achieve this:\n\n1. **HTML Structure**: We'll start with the basic HTML structure.\n2. **Tailwind CSS**: We'll use Tailwind CSS classes to style the elements.\n\nHere's the complete code:\n\n```html\n\n\n\n \n \n Home Decor Store \n \n\n\n \n\n \n \n
\n
Welcome to Home Decor Store \n
\n At Home Decor Store, we offer a wide range of high-quality home decor items to make your living space more beautiful and functional. From stylish lamps and cozy throws to elegant furniture and unique accessories, we have something for everyone.\n
\n
\n Shop Now\n \n
\n
\n
Featured Products \n
\n
\n
\n
\n
Lampshade \n
Elegant and stylish, our lampshades are perfect for adding a touch of sophistication to any room.
\n
Price: $50
\n
\n Buy Now\n \n
\n
\n
\n
\n
\n
Throw Blanket \n
Soft and cozy, our throw blankets are perfect for adding warmth and comfort to any space.
\n
Price: $30
\n
\n Buy Now\n \n
\n
\n
\n
\n
\n \n\n \n\n\n```\n\n### Explanation:\n1. **Header**: The header section includes navigation links for \"Home,\" \"Inspiration,\" \"Products,\" and \"About Us.\" The logo is placed on the left, and the navigation links are on the right.\n2. **Main Content**: The main content section includes a welcome message, a featured product section, and a \"Shop Now\" button.\n3. **Footer**: The footer section is similar to the header, with navigation links for \"Home,\" \"Inspiration,\" \"Products,\" and \"About Us.\"\n\nThis code uses Tailwind CSS classes to style the elements, ensuring that the layout and design match the original screenshot.", "label": "\n \n\n\n \n\n\n \n \n\n\n\n\n\n \n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "To create an HTML and Tailwind CSS version of the web page that matches the original screenshot, you can use the following code. This code includes the necessary HTML structure, Tailwind CSS classes, and placeholder images and links.\n\n```html\n\n\n\n \n \n Welcome to our Tech Company \n \n \n\n\n \n \n
\n
Our Services \n
Our team of experts is dedicated to delivering the best possible service, ensuring your business runs smoothly and efficiently.
\n
\n \n
\n\n\n```\n\nThis code uses Tailwind CSS for styling and includes placeholders for images and links. The `header` and `content` sections are styled to match the original screenshot, and the `footer` section is also styled to match the original layout.", "label": "\n \n\n\n\nWelcome to our Tech Company \n\n At our tech company, we are committed to providing innovative and efficient solutions for your business needs. Our team of experts is dedicated to delivering the best possible service, ensuring your business runs smoothly and efficiently.\n
\nLearn More \n \n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n My Fashion Brand \n \n \n\n\n \n \n
\n
\n
\n
Product 1 \n
Product 1 description goes here.
\n
\n
\n
\n
Product 2 \n
Product 2 description goes here.
\n
\n
\n
\n
Product 3 \n
Product 3 description goes here.
\n
\n
\n \n
\n\n\n```", "label": "\n \n\n\n\n\n \n\n\n Welcome to our fashion brand, where we believe in the power of style to elevate your everyday. Our mission is to create timeless pieces that are not just fashionable, but also comfortable, durable, and made with the finest materials. We are committed to providing our customers with high-quality, stylish clothing that is not only fashionable but also functional. We believe that fashion should not only be about what you wear, but also about how you live. That's why we offer a wide range of products that cater to all tastes and styles. From casual wear to formal attire, our collection is designed to make you feel confident and beautiful. We are constantly updating our collection with the latest trends and styles, ensuring that you always have something new to wear. So, come and explore our fashion brand and discover the perfect pieces that will elevate your style and your everyday.\n
\n \n\n\n
\n
Contact Us
\n
123 Street, City, Country
\n
Phone: 123-456-7890
\n
Email: info@fashionbrand.com
\n
\n
\n
\n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Travel Agency \n \n \n\n\n \n Welcome to Educational Institution \n At Educational Institution, we are committed to providing the best education possible. Our experienced and dedicated staff are here to help you succeed in your studies. We offer a wide range of programs, from online courses to in-person classes. Our flexible schedules and affordable tuition make us a great choice for students of all ages and backgrounds.
\n \n \n Home \n About \n Contact \n \n \n We believe in the power of education to change lives. That's why we are dedicated to providing a supportive and inclusive learning environment. Our programs are designed to help you reach your full potential, no matter where you are in your education journey.
\n \n \n\n\n```", "label": "\n \n\n\n\nEducational Institution \n
\n\n \n\n
Welcome to Educational Institution \n
\n At Educational Institution, we are committed to providing the best education possible. Our experienced and dedicated staff are here to help you succeed in your studies. We offer a wide range of programs, from online courses to in-person classes. Our flexible schedules and affordable tuition make us a great choice for students of all ages and backgrounds.\n
\n
\n We believe in the power of education to change lives. That's why we are dedicated to providing a supportive and inclusive learning environment. Our programs are designed to help you reach your full potential, no matter where you are in your education journey.\n
\n
\n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Gaming Company \n \n \n\n\n \n \n
\n
Our Games \n
At Gaming Company, we are constantly working on new and exciting games. From fast-paced action to relaxing puzzle games, we have something for everyone. Check out our latest releases and stay tuned for future updates.
\n
\n
\n
Our Games \n
At Gaming Company, we are constantly working on new and exciting games. From fast-paced action to relaxing puzzle games, we have something for everyone. Check out our latest releases and stay tuned for future updates.
\n
\n \n \n
\n\n\n```", "label": "\n \n\n\n\n\nWelcome to Gaming Company \n\n We are a leading provider of innovative and exciting gaming experiences. Our mission is to bring the joy of gaming to everyone, no matter where they are. We believe in the power of games to inspire, engage, and connect people from all walks of life.\n
\n \n\nOur Games \n\n At Gaming Company, we are constantly working on new and exciting games. From fast-paced action to relaxing puzzle games, we have something for everyone. Check out our latest releases and stay tuned for future updates.\n
\n \n\nJoin Our Community \n\n We believe in the power of community. That's why we have a vibrant and active community of gamers who are always ready to share their experiences, ask questions, and help each other out. Join us today and become a part of our community.\n
\n \n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "```html\n\n\n\n \n \n Financial Advisory Firm \n \n \n\n\n \n \n
\n
\n
Personal Financial Planning \n
Personal Financial Planning is a comprehensive approach to managing your finances. We help you create a financial plan that aligns with your goals and objectives.
\n
\n
\n
Investment Planning \n
Investment Planning involves selecting and managing investments to achieve your financial goals. We help you make informed investment decisions.
\n
\n
\n
Retirement Planning \n
Retirement Planning is the process of planning for your retirement. We help you create a retirement plan that ensures you have a comfortable retirement.
\n
\n
\n
Tax Planning \n
Tax Planning involves managing your taxes to minimize your tax liability. We help you understand your tax obligations and make informed tax decisions.
\n
\n
\n \n
\n\n\n```", "label": "\n \n\n\n
\n
\n
\n
\n
\n
Welcome to Our Financial Advisory Firm \n
\n At our firm, we are committed to providing our clients with the best possible financial advice. We understand that financial planning is a complex process that requires a deep understanding of your individual needs and circumstances. That's why we offer a wide range of services to help you achieve your financial goals.\n
\n
\n
\n
\n\n
\n
\n
Our Services \n
\nPersonal Financial Planning \nInvestment Planning \nRetirement Planning \nTax Planning \n \n
\n
\n
\n
\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "\n \n\n \n
\n
\n Travel Destinations\n
\n
\n Welcome to our Travel Agency\n
\n
\n We are a travel agency that offers a wide range of services to make your travel experience unforgettable.\n
\n
\n
\n
\n
\n
\n
\n
Popular Destinations \n
\n Discover the best destinations in the world. From bustling cities to serene beaches, our agency has you covered.\n
\n
\n
\n
\n
\n
\n
\n
\n
Beach Destinations \n
\n Escape to the sun and sand with our beach destinations. From tropical islands to coastal towns, we have you covered.\n
\n
\n
\n
\n
\n
\n\n", "label": "\n \n\n\n
\n
\n
Welcome to our Travel Agency \n
We are a travel agency that offers a wide range of services to make your travel experience unforgettable.
\n
\n
\n
\n\n
Popular Destinations \n
\n
\n
\n
\n
Beach Destinations \n
Discover the best beaches in the world.
\n
\n
\n\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot.\n\n```html\n\n\n\n \n \n Travel Agency \n \n \n\n\n \n Welcome to our Travel Agency \n We are a dynamic travel agency that offers a wide range of travel destinations. Whether you're looking for a relaxing beach vacation, an adventurous mountain trek, or a cultural city break, we've got you covered.
\n \n \n Home \n Destinations \n About Us \n Contact Us \n \n \n Explore Destinations \n Discover the world with us. From exotic beaches to breathtaking mountains, we have something for everyone.
\n \n \n\n\n```\n\nThis code uses Tailwind CSS for styling and ensures that the layout and design closely match the original screenshot. The `header`, `nav`, `main`, and `footer` sections are styled to match the original layout and content.", "label": "\n \n\n\n\n\nWelcome to our Travel Agency \nWe are a dynamic travel agency that offers a wide range of travel destinations. Whether you're looking for a relaxing beach vacation, an adventurous mountain trek, or a cultural city break, we've got you covered.
\nExplore Destinations \n \n \n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. I've included placeholders for images and links where necessary.\n\n```html\n\n\n\n \n \n Marketing Agency \n \n \n\n\n \n
\n
\n
\n
\n
Our Services \n
We are a marketing firm that specializes in providing top-notch services to businesses of all sizes.
\n
\n
\n
\n\n\n```\n\nThis code uses Tailwind CSS for styling and includes placeholders for images and links. You can replace the placeholder images and links with your actual content.", "label": "\n \n\n\n\n\nWe are a marketing firm that specializes in providing top-notch services to businesses of all sizes. \n\n Get Started\n \n \n \n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. I'll use placeholder images and links where necessary.\n\n```html\n\n\n\n \n \n Real Estate Company \n \n \n\n\n \n\n \n \n Featured Listings \n \n \n\n \n Search Properties \n \n \n\n \n Featured Listings \n \n \n\n \n\n \n \n\n \n\n\n```\n\nThis code uses Tailwind CSS for styling and includes placeholders for images and links. You can replace the placeholder text and images with your actual content.", "label": "\n \n\n\n\n\n\n\n\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. Note that I've used placeholder images and links for the images and links, as you requested.\n\n```html\n\n\n\n \n \n Education Platform \n \n \n\n\n \n \n
\n
\n
Featured Courses \n
We offer a wide range of courses to help you reach your educational goals.
\n
View Courses \n
\n
\n
Course Categories \n
Explore our diverse course categories to find the one that suits you best.
\n
View Categories \n
\n
\n
Pricing Information \n
Discover our flexible pricing options to fit your budget.
\n
View Pricing \n
\n
\n
\n\n\n```\n\nThis code will create a web page with a header, a content section with three sections (Featured Courses, Course Categories, and Pricing Information), and links to view each section. The background colors, padding, and other styles are tailored to match the original screenshot.", "label": "\n \n\n\n\n
\n
Welcome to Our Education Platform \n
We offer a wide range of courses to help you reach your educational goals.
\n
Enroll Now \n
\n
\n \n\n
\n
\n
Featured Courses \n\n\n
\n
Course Categories \n\nPricing Information \n\n\n
\n
\n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. I've included placeholders for images and links as per your instructions.\n\n```html\n\n\n\n \n \n Non-profit Organization \n \n \n\n\n \n \n
Your tagline here
\n
\n
Story Title 1
\n
Your story content goes here
\n
\n
\n
Our Impact
\n
\n
Story Title 2
\n
Your story content goes here
\n
\n
\n
Story Title 3
\n
Your story content goes here
\n
\n
\n
\n\n\n```\n\nThis code will create a web page with the same layout and styling as the original screenshot. The placeholders for images and links are included for demonstration purposes.", "label": "\n \n\n\nNon-profit Organization \nYour tagline goes here
\nDonate or Get Involved \n \n\nOur Impact \n\n
Story Title 1 \n
Your story content goes here
\n
\n\n
Story Title 2 \n
Your story content goes here
\n
\n\n
Story Title 3 \n
Your story content goes here
\n
\n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Sure! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot.\n\n```html\n\n\n\n \n \n Tech Startup \n \n \n\n\n \n \n Home \n About \n Contact \n \n Welcome to Tech Startup \n At Tech Startup, we are passionate about creating innovative and playful designs. Our team of experts is dedicated to delivering the best possible user experience. We believe in the power of design to bring joy and excitement to our users.
\n \n\n \n Tech Startup \n We are committed to using our skills and knowledge to create a better future for all.
\n Learn More \n \n\n \n\n\n```\n\nThis code uses Tailwind CSS for styling and includes placeholders for images and links. You can replace the placeholder images and links with your actual content.", "label": "\n \n\n\n\nTech Startup \n
\n\n\n \n\nWelcome to Tech Startup \nAt Tech Startup, we are passionate about creating innovative and playful designs. Our team of experts is dedicated to delivering the best possible user experience. We believe in the power of design to bring joy and excitement to our users.
\nOur mission is to make the world a better place, one design at a time. We are committed to using our skills and knowledge to create a better future for all.
\n \n\n\n
\n
Tech Startup
\n
123 Main St, Anytown, USA
\n
(123) 456-7890
\n
info@techstartup.com
\n
\n
\n
Follow us on:
\n
Facebook
\n
Twitter
\n
Instagram
\n
\n
\n \n\n\n"}
+{"prompt": "system\nYou are a helpful assistant.\nuser\nHere is a screenshot of a web page. Please write a HTML and Tailwind CSS to make it look exactly like the original web page. Pay attention to things like size, text, position, and color of all the elements, as well as the overall layout. If it involves any images, use 'placeholder.jpg' as the placeholder. If it involves any links, use '#' as the placeholder.\nassistant\n", "predict": "Certainly! Below is the HTML and Tailwind CSS code to create a web page that looks exactly like the original screenshot. Note that I've used placeholder images and links for the elements that are not directly provided in the screenshot.\n\n```html\n\n\n\n \n \n Our Educational Institution \n \n \n\n\n \n
\n
Welcome to Our Educational Institution \n
We provide top-notch education with a modern, clean design.
\n
\n