artpli commited on
Commit
ab91129
·
verified ·
1 Parent(s): 5bda05b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -39
README.md CHANGED
@@ -12,16 +12,35 @@ tags:
12
  - text-to-motion
13
  ---
14
 
15
- # FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
16
-
17
  <div align="center">
18
- <img src="./assets/hi_logo.png" alt="FRoM-W1" width="7.5%">
19
 
20
- The **H**umanoid **I**ntelligence Team from FudanNLP and OpenMOSS
21
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- <div align="center">
24
- <a href="https://github.com/OpenMOSS/FRoM-W1">💻Github</a>&emsp;<a href="https://arxiv.org/abs/2601.12799">📄Paper</a>&emsp;<a href="https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets">🤗Datasets</a>&emsp;<a href="https://huggingface.co/OpenMOSS-Team/FRoM-W1">🤗Models</a>
25
  </div>
26
 
27
  ## Introduction
@@ -29,67 +48,68 @@ tags:
29
  <img src="./assets/FRoM-W1-Teaser.png" alt="FRoM-W1" width="50%">
30
  </div>
31
 
 
32
 
33
- Humanoid robots are capable of performing various actions such as greeting, dancing, and even backflipping. However, these motions are often hard-coded or specifically trained, which limits their versatility.
34
- In this work, we present **FRoM-W1[^1]**, an open-source framework designed to achieve general humanoid whole-body motion control using natural language.
35
- To universally understand natural language and generate corresponding motions, as well as enable various humanoid robots to stably execute these motions in the physical world under gravity, **FRoM-W1** operates in two stages:
36
- (a) **H-GPT**: Utilizing massive human data, a large-scale language-driven human whole-body motion generation model is trained to generate diverse natural behaviors.
37
- We further leverage the Chain-of-Thought technique to improve the model’s generalization in instruction understanding.
38
- (b) **H-ACT**: After retargeting generated human whole-body motions into robot-specific actions, a motion controller that is pretrained and further fine-tuned through reinforcement learning in physical simulation enables humanoid robots to accurately and stably perform corresponding actions.
39
- It is then deployed on real robots via a modular simulation-to-reality module.
40
- We extensively evaluate our framework on the Unitree H1 and G1 robots, demonstrating successful language-to-motion generation and stable execution in both simulation and real-world settings.
41
- We fully open-source the entire **FRoM-W1** framework and hope it will advance the development of humanoid intelligence.
42
 
43
  [^1]: **F**oundational Humanoid **Ro**bot **M**odel - **W**hole-Body Control, Version **1**
44
 
45
  ## Release Timeline
46
- We will gradually release the paper, data, codebase, model checkpoints, and the real-robot deployment framework for **FRoM-W1** in the next week or two.
47
 
48
  Here is the current release progress:
49
- - [**2026/01/21**] 🎉🎉 We have released the [Technical Report](https://arxiv.org/abs/2601.12799) of FRoM-W1!
50
- - [**2025/12/18**] We have released the CoT data of Motion-X on **[HuggingFace Datasets](https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets)**.
51
- - [**2025/12/17**] We have released the **perturbed text data** (i.e., **δ-Humanml3d-X**) on **[HuggingFace Datasets](https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets)**.
52
  - [**2025/12/17**] We have released the code to train and evaluate other baselines: [text-to-motion](https://github.com/EricGuo5513/text-to-motion), [MLD](https://github.com/ChenFengYe/motion-latent-diffusion), and [MotionDiffuse](https://github.com/mingyuan-zhang/MotionDiffuse) on HumanML3D-X at [`baselines`](./baselines).
53
  - [**2025/12/16**] We have released the code to train and evaluate the baseline [T2M-GPT](https://github.com/Mael-zys/T2M-GPT) on HumanML3D-X at [`baselines/T2M-GPT`](./baselines/T2M-GPT).
54
- - [**2025/12/14**] We have released the **CoT data** of HumanML3D-X on **[HuggingFace Datasets](https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets)**.
55
- - [**2025/12/13**] We have uploaded the checkpoints for HGPT, Baselines (SMPL-X version of T2M, MotionDiffuse, MLD, T2M-GPT), and the SMPL-X Motion Generation eval model on **[HuggingFace Models](https://huggingface.co/OpenMOSS-Team/FRoM-W1)**.
56
- - [**2025/12/10**] We have uploaded the initial version of the code for two core modules, **[H-GPT](./H-GPT/README.md)** and **[H-ACT](./H-ACT/README.md)** !
57
- - [**2025/12/10**] We have released our lightweight, modular humanoid-robot deployment framework [**RoboJuDo**](https://github.com/HansZ8/RoboJuDo)!
58
  - [**2025/12/10**] We are thrilled to initiate the release of **FRoM-W1**!
59
 
60
 
61
  ## Usage
 
62
 
63
  <div align="center">
64
  <img src="./assets/FRoM-W1-Overview.png" alt="overview" width="80%">
65
  </div>
66
 
67
- The complete **FRoM-W1** workflow is illustrated above:
68
-
69
- - **H-GPT**
70
- Deploy **H-GPT** via command-line tools or a web interface to convert natural-language commands into human motion representations.
71
- This module provides full training, inference, and evaluation code, and pretrained models are available on HuggingFace.
72
 
73
  <div align="center">
74
  <img src="./assets/FRoM-W1-HGPT.png" alt="fromw1-hgpt" width="80%">
75
  </div>
76
 
77
- - **H-ACT**
78
- **H-ACT** converts the motion representations from H-GPT into SMPL-X motion sequences and further retargets them to various humanoid robots.
79
- The resulting motions can be used both for training control policies and executing actions on real robots using our deployment pipeline.
80
 
81
  <div align="center">
82
  <img src="./assets/FRoM-W1-HACT.png" alt="fromw1-hact" width="80%">
83
  </div>
84
 
 
 
85
  ## Citation
86
- If you find our work useful, please cite it for now in the following way:
87
  ```bibtex
88
- @misc{FRoM-W1,
89
- author = {Peng Li, Zihan Zhuang, Yangfan Gao, Yi Dong, Sixian Li, Changhao Jiang, Shihan Dou, Zhiheng Xi, Enyu Zhou, Jixuan Huang, Hui Li, Jingjing Gong, Xingjun Ma, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Xipeng Qiu},
90
- title = {FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions},
91
- url = {https://github.com/OpenMOSS/FRoM-W1},
92
- year = {2025}
 
 
 
93
  }
94
  ```
95
- Welcome to star⭐ our GitHub Repo, raise issues, and submit PRs!
 
12
  - text-to-motion
13
  ---
14
 
 
 
15
  <div align="center">
 
16
 
17
+ # FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
18
+
19
+ <img src="./assets/hi_logo.jpg" alt="FRoM-W1" width="7.5%">
20
+
21
+ The Humanoid Intelligence Team from FudanNLP and OpenMOSS
22
+
23
+ <p align="center">
24
+ <a href="https://openmoss.github.io/FRoM-W1/">
25
+ <img src="https://img.shields.io/badge/Project-Webpage-blue.svg" alt="Project Webpage"/>
26
+ </a>
27
+ <a href="https://arxiv.org/abs/2601.12799">
28
+ <img src="https://img.shields.io/badge/arXiv-2601.12799-b31b1b.svg" alt="Paper on arXiv"/>
29
+ </a>
30
+ <a href="https://github.com/OpenMOSS/FRoM-W1">
31
+ <img src="https://img.shields.io/badge/GitHub-Code-black.svg?logo=github" alt="GitHub Code"/>
32
+ </a>
33
+ <a href="https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets">
34
+ <img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Data-yellow.svg" alt="Hugging Face Data"/>
35
+ </a>
36
+ <a href="https://huggingface.co/OpenMOSS-Team/FRoM-W1">
37
+ <img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Model-yellow.svg" alt="Hugging Face Model"/>
38
+ </a>
39
+ <a href="LICENSE">
40
+ <img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="License"/>
41
+ </a>
42
+ </p>
43
 
 
 
44
  </div>
45
 
46
  ## Introduction
 
48
  <img src="./assets/FRoM-W1-Teaser.png" alt="FRoM-W1" width="50%">
49
  </div>
50
 
51
+ Humanoid robots are capable of performing various actions such as greeting, dancing and even backflipping. However, these motions are often hard-coded or specifically trained, which limits their versatility. In this work, we present **FRoM-W1**[^1], an open-source framework designed to achieve general humanoid whole-body motion control using natural language.
52
 
53
+ To universally understand natural language and generate corresponding motions, as well as enable various humanoid robots to stably execute these motions in the physical world under gravity, **FRoM-W1** operates in two stages:
54
+
55
+ **(a) H-GPT**
56
+ Utilizing massive human data, a large-scale language-driven human whole-body motion generation model is trained to generate diverse natural behaviors. We further leverage the Chain-of-Thought technique to improve the model's generalization in instruction understanding.
57
+
58
+ **(b) H-ACT**
59
+ After retargeting generated human whole-body motions into robot-specific actions, a motion controller that is pretrained and further fine-tuned through reinforcement learning in physical simulation enables humanoid robots to accurately and stably perform corresponding actions. It is then deployed on real robots via a modular simulation-to-reality module.
60
+
61
+ We extensively evaluate **FRoM-W1** on Unitree H1 and G1 robots. Results demonstrate superior performance on the HumanML3D-X benchmark for human whole-body motion generation, and our introduced reinforcement learning fine-tuning consistently improves both motion tracking accuracy and task success rates of these humanoid robots. We open-source the entire **FRoM-W1** framework and hope it will advance the development of humanoid intelligence.
62
 
63
  [^1]: **F**oundational Humanoid **Ro**bot **M**odel - **W**hole-Body Control, Version **1**
64
 
65
  ## Release Timeline
66
+ We will gradually release the paper, data, codebase, model checkpoints, and the real-robot deployment framework for **FRoM-W1**.
67
 
68
  Here is the current release progress:
69
+ - [**2026/01/21**] 🎉🎉🎉 We have released the **[Technical Report](https://arxiv.org/abs/2601.12799)** of FRoM-W1!
70
+ - [**2025/12/18**] We have released the CoT data of Motion-X on [HuggingFace Datasets](https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets).
71
+ - [**2025/12/17**] We have released the perturbed text data, i.e., δHumanML3D-X, on [HuggingFace Datasets](https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets).
72
  - [**2025/12/17**] We have released the code to train and evaluate other baselines: [text-to-motion](https://github.com/EricGuo5513/text-to-motion), [MLD](https://github.com/ChenFengYe/motion-latent-diffusion), and [MotionDiffuse](https://github.com/mingyuan-zhang/MotionDiffuse) on HumanML3D-X at [`baselines`](./baselines).
73
  - [**2025/12/16**] We have released the code to train and evaluate the baseline [T2M-GPT](https://github.com/Mael-zys/T2M-GPT) on HumanML3D-X at [`baselines/T2M-GPT`](./baselines/T2M-GPT).
74
+ - [**2025/12/14**] We have released the CoT data of HumanML3D-X on [HuggingFace Datasets](https://huggingface.co/datasets/OpenMOSS-Team/FRoM-W1-Datasets).
75
+ - [**2025/12/13**] We have uploaded the checkpoints for HGPT, Baselines (SMPL-X version of T2M, MotionDiffuse, MLD, T2M-GPT), and the SMPL-X Motion Generation eval model on [HuggingFace Models](https://huggingface.co/OpenMOSS-Team/FRoM-W1).
76
+ - [**2025/12/10**] We have uploaded the initial version of the code for two core modules, [H-GPT](./H-GPT/README.md) and [H-ACT](./H-ACT/README.md)!
77
+ - [**2025/12/10**] We have released our lightweight, modular humanoid-robot deployment framework **[RoboJuDo](https://github.com/HansZ8/RoboJuDo)**!
78
  - [**2025/12/10**] We are thrilled to initiate the release of **FRoM-W1**!
79
 
80
 
81
  ## Usage
82
+ The complete **FRoM-W1** workflow is illustrated as below:
83
 
84
  <div align="center">
85
  <img src="./assets/FRoM-W1-Overview.png" alt="overview" width="80%">
86
  </div>
87
 
88
+ - **[H-GPT](./H-GPT/README.md)**: Deploy **H-GPT** via command-line tools or a web interface to convert natural-language commands into human motion representations. We provide the complete code for training, inference, and evaluation in this module, with pretrained models available on HuggingFace.
 
 
 
 
89
 
90
  <div align="center">
91
  <img src="./assets/FRoM-W1-HGPT.png" alt="fromw1-hgpt" width="80%">
92
  </div>
93
 
94
+ - **[H-ACT](./H-ACT/README.md)**: **H-ACT** converts the motion representations from H-GPT into SMPL-X motion sequences and further retargets them to various humanoid robots. The resulting motions can be used both for training control policies and executing actions on real robots using our deployment pipeline.
 
 
95
 
96
  <div align="center">
97
  <img src="./assets/FRoM-W1-HACT.png" alt="fromw1-hact" width="80%">
98
  </div>
99
 
100
+ Please refer to the preview code in the corresponding folder for now, and we will provide a quick-start example and more detailed README documents later.
101
+
102
  ## Citation
103
+ If you find our work useful, please cite it in the following way:
104
  ```bibtex
105
+ @misc{li2026fromw1generalhumanoidwholebody,
106
+ title={FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions},
107
+ author={Peng Li and Zihan Zhuang and Yangfan Gao and Yi Dong and Sixian Li and Changhao Jiang and Shihan Dou and Zhiheng Xi and Enyu Zhou and Jixuan Huang and Hui Li and Jingjing Gong and Xingjun Ma and Tao Gui and Zuxuan Wu and Qi Zhang and Xuanjing Huang and Yu-Gang Jiang and Xipeng Qiu},
108
+ year={2026},
109
+ eprint={2601.12799},
110
+ archivePrefix={arXiv},
111
+ primaryClass={cs.RO},
112
+ url={https://arxiv.org/abs/2601.12799},
113
  }
114
  ```
115
+ Welcome to star ⭐ our GitHub Repo, raise issues, and submit PRs!