Papers
arxiv:2502.06635

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

Published on Feb 10
· Submitted by aaabiao on Feb 11
Authors:
,
,

Abstract

Steel-LLM is a Chinese-centric language model developed from scratch with the goal of creating a high-quality, open-source model despite limited computational resources. Launched in March 2024, the project aimed to train a 1-billion-parameter model on a large-scale dataset, prioritizing transparency and the sharing of practical insights to assist others in the community. The training process primarily focused on Chinese data, with a small proportion of English data included, addressing gaps in existing open-source LLMs by providing a more detailed and practical account of the model-building journey. Steel-LLM has demonstrated competitive performance on benchmarks such as CEVAL and CMMLU, outperforming early models from larger institutions. This paper provides a comprehensive summary of the project's key contributions, including data collection, model design, training methodologies, and the challenges encountered along the way, offering a valuable resource for researchers and practitioners looking to develop their own LLMs. The model checkpoints and training script are available at https://github.com/zhanshijinwat/Steel-LLM.

Community

Paper author Paper submitter

Introducing Steel-LLM: A Fully Open-Source, Resource-Efficient Chinese-Centric Language Model

Discover Steel-LLM, a groundbreaking 1-billion-parameter language model developed with limited computational resources (just 8 GPUs) and a commitment to full transparency. Launched in March 2024, Steel-LLM is designed to bridge the gap in open-source LLMs by focusing on Chinese language data while incorporating a small portion of English.

Whether you're a small research team or an individual practitioner, Steel-LLM provides practical guidance and detailed insights into model development, making it an invaluable resource for the LLM community.

Join us in advancing open-source AI. Explore Steel-LLM today:
https://github.com/zhanshijinwat/Steel-LLM

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.06635 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.06635 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.06635 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.