Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Abstract
Steel-LLM is a Chinese-centric language model developed from scratch with the goal of creating a high-quality, open-source model despite limited computational resources. Launched in March 2024, the project aimed to train a 1-billion-parameter model on a large-scale dataset, prioritizing transparency and the sharing of practical insights to assist others in the community. The training process primarily focused on Chinese data, with a small proportion of English data included, addressing gaps in existing open-source LLMs by providing a more detailed and practical account of the model-building journey. Steel-LLM has demonstrated competitive performance on benchmarks such as CEVAL and CMMLU, outperforming early models from larger institutions. This paper provides a comprehensive summary of the project's key contributions, including data collection, model design, training methodologies, and the challenges encountered along the way, offering a valuable resource for researchers and practitioners looking to develop their own LLMs. The model checkpoints and training script are available at https://github.com/zhanshijinwat/Steel-LLM.
Community
Introducing Steel-LLM: A Fully Open-Source, Resource-Efficient Chinese-Centric Language Model
Discover Steel-LLM, a groundbreaking 1-billion-parameter language model developed with limited computational resources (just 8 GPUs) and a commitment to full transparency. Launched in March 2024, Steel-LLM is designed to bridge the gap in open-source LLMs by focusing on Chinese language data while incorporating a small portion of English.
Whether you're a small research team or an individual practitioner, Steel-LLM provides practical guidance and detailed insights into model development, making it an invaluable resource for the LLM community.
Join us in advancing open-source AI. Explore Steel-LLM today:
https://github.com/zhanshijinwat/Steel-LLM
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper