---
library_name: transformers
license: mit
datasets:
- MBZUAI/LaMini-instruction
---
# Saving 77% of the Parameters in Large Language Models Technical Report
This repository contains experiment results for the [Saving 77% of the Parameters in Large Language Models Technical Report (PDF)](https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT).

## Abstract
This technical report demonstrates that large language models (LLMs) can maintain their learning capacity while reducing their non-embedding parameters by up to 77%. We achieve this by adapting a parameter reduction technique originally developed for computer vision, replacing dense layers with an optimized subnetwork that contains grouped pointwise convolutions. Using Microsoft's phi-3-mini-4k-instruct as our baseline, we show that our optimized model (kphi-3) achieves comparable validation loss while using only 15-23% of the original non-embedding parameters. All experiments were conducted on a single NVIDIA L4 GPU within a 3-day timeframe, supporting the democratization of AI research. Our findings suggest that current LLM architectures may be substantially overparameterized, opening possibilities for more efficient model training and deployment.

## Key Findings
- Achieved 77% parameter reduction while maintaining model performance.
- Demonstrated better generalization in optimized models.
- Improved output quality in qualitative testing.

## Implementation Details
- Base Model: [kphi3](https://github.com/joaopauloschuler/less-parameters-llm).
- Training Dataset: [LaMini](https://huggingface.co/datasets/MBZUAI/LaMini-instruction).
- Architecture: Modified transformer decoder with grouped pointwise convolutions.

## Results
The following table shows LaMini training results with the baseline and the optimized versions. From left to right: experiment label, model name, number of transformer decoder layers, intermediate dimensions, number of non-embedding parameters, training loss and validation loss.

| label | model | layers | interm. dims. | non-emb. params. | % | Train Loss | Val. Loss |
|:-----:|:------:|:-------:|:-------------:|:----------------:|:---:|:----------:|:----------:|
| [JP47D54C](https://github.com/joaopauloschuler/less-parameters-llm/tree/main/raw/JP47D54C_Baseline_2T.ipynb) | phi-3 | 2 | 8192 | [227M](https://huggingface.co/schuler/experimental-JP47D54C) | | **1.08** | 1.58 |
| [JP47D55C](https://github.com/joaopauloschuler/less-parameters-llm/tree/main/raw/JP47D55C_kphi3_2T.ipynb) | kphi-3 | 2 | 9216 | [**35M**](https://huggingface.co/schuler/experimental-JP47D55C) | **15%** | 1.26 | 1.60 |
| [JP47D56C](https://github.com/joaopauloschuler/less-parameters-llm/tree/main/raw/JP47D56C_kphi3_3T.ipynb) | kphi-3 | 3 | 9216 | [53M](https://huggingface.co/schuler/experimental-JP47D56C) | 23% | 1.21 | **1.57** |

## Quick Links
- [📄 Full Technical Report (PDF)](https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT)
- [🤗 Model Checkpoints on HuggingFace](https://huggingface.co/schuler/)
- [📊 Raw Experiment Files](https://github.com/joaopauloschuler/less-parameters-llm/tree/main/raw)

## Usage:
```
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, pipeline
from transformers import LlamaTokenizer
import torch

REPO_NAME = 'schuler/experimental-JP47D54C'

def load_model(local_repo_name):
    tokenizer = LlamaTokenizer.from_pretrained(local_repo_name, trust_remote_code=True)
    generator_conf = GenerationConfig.from_pretrained(local_repo_name)
    model = AutoModelForCausalLM.from_pretrained(local_repo_name, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="eager")
    # model.to('cuda')
    return tokenizer, generator_conf, model

tokenizer, generator_conf, model = load_model(REPO_NAME)
global_error = ''
try:
  generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
except Exception as e:
  global_error =  f"Failed to load model: {str(e)}"

def PrintTest(str):
  print(generator(str, max_new_tokens=256, do_sample=True, top_p=0.25, repetition_penalty=1.2))

PrintTest(f"<|user|>\nHello\n<|end|>\n<|assistant|>\n")
PrintTest(f"<|user|>Hello\n<|end|><|assistant|>")
PrintTest(f"<|user|>\nWhat is the human body?\n<|end|>\n<|assistant|>\n")
PrintTest(f"<|user|>What is the human body?\n<|end|><|assistant|>")
PrintTest(f"<|user|>What is biology?\n<|end|><|assistant|>")
PrintTest(f"<|user|>Can you comment about democracy?\n<|end|><|assistant|>")
PrintTest(f"<|user|>Can you provide detailed comments about the concept of democracy?\n<|end|><|assistant|>")
PrintTest(f"<|user|>Please give me a detailed description of the python computer language.\n<|end|><|assistant|>")
PrintTest(f"<|user|>If you had a difficult task to complete, how would you complete it?\n<|end|><|assistant|>")
PrintTest(f"<|user|>Before replying to my question, I would like you to provide two candidate solutions and do some reflexion about these solution. Then, you'll pick the best as the final reply. What is best: eating healthy or eating economically?\n<|end|><|assistant|>")
```

## Output Examples
```
[{'generated_text': "<|user|>\nHello\n<|end|>\n<|assistant|>\n Well, what's your instruction? Please provide more context or complete the task. <|end|>"}]
[{'generated_text': "<|user|>I have a question to you. Before you reply to my question, I need to to comment details about the subject without trying to directly reply to the question. As an example, if I ask you about modern architecture, you'll start describing what is architecture without specifics about modernism. Then, later you can reply to the question. Therefore, here we go with the question: what is ancient history?<|end|><|assistant|> What is the history of architecture in modernist and when were it first mentioned? <|end|>"}]
[{'generated_text': '<|user|>Hello\n<|end|><|assistant|> I\'m sorry, I don\'t understand what you are asking. Could you please provide more context or specify which "european" refers to? <|end|>'}]
[{'generated_text': '<|user|>\nWhat is the human body?\n<|end|>\n<|assistant|>\n The human body has several organs including the heart, lungs, liver, kidneys, and brain. <|end|>'}]
[{'generated_text': '<|user|>What is the human body?\n<|end|><|assistant|> The human body has various systems including the respiratory system, digestive system, nervous system, and many others. However, it primarily consists of organs such as the liver, kidneys, lungs, and gonorrhea, that work together to facilitate breathing and regulate blood flow in oxygen and carbon dioxide. <|end|>'}]
[{'generated_text': '<|user|>What is biology?\n<|end|><|assistant|> Biotech is a type of food that is broken down into chemicals, such as glucose and oils. <|end|>'}]
[{'generated_text': '<|user|>Can you comment about democracy?\n<|end|><|assistant|> Certainly, I\'ll respond with the instructions or question "I will respond concisely." <|end|>'}]
[{'generated_text': "<|user|>Can you provide detailed comments about the concept of democracy?\n<|end|><|assistant|> Sure, here's a brief review of some basic political theories:\n\n1. AI-assisted decision to choose between two candidates or parties based on their interests and desires.\n2. The right to choose their candidate for membership in the public eye by deciding what they want to achieve in the next election.\n3. The leftist stump has a mixed tone among leftist political allies and is critical of the government.\n4. The importance of individual rights and freedoms in order to prevent such atrocities from occurring.\n5. Demonitarian regimes are committed against the collective good, but they are also responsible for enforcing democratic laws and ensuring that all votes are accountable to voters.\n6. The right to hold differing opinions within the government, including whether it should be approved by voters who believe that it will lead to increased transparency and trustworthiness in government.\n7. This includes factors beyond just nationalizing certain issues related to democracy, such as the rule of law, protection of human rights, and international relations.\n8. The use of social media can help shape the future of democracy by influencing how we communicate with"}]
[{'generated_text': "<|user|>Please give me a detailed description of the python computer language.\n<|end|><|assistant|> The Python computer is a pointing-out machine with a simple interface and can be used for data analysis, scientific computing, web development, and more. It's designed to work with different libraries and software tools such as TensorFlow, PyTorch, etc., to create dynamic or web pages. The basic syntax of Python include:\n\n1. Syntax - This system has a long history, which dates back to ancient Greece, where it was not written in the 3rd century BC.\n2. Algorithms - These are central quintessential games that provide insights into the evolution of human thinking and behavior.\n3. Community order - A large geographic area with a relatively small population size.\n4. Codeacademy - There are many different types of sergeeze systems available, each with its own strengths and weaknesses.\n5. Language - An additive coding system consisting of a set of characters, including numbers, symbols, and sounds.\n6. Inflation - This system uses a set of rules to regulate the amount of time and resources required to execute them.\n7. Group model - The group provides organic codecs, which means they need to carry out specific tasks within a"}]
[{'generated_text': "<|user|>If you had a difficult task to complete, how would you complete it?\n<|end|><|assistant|> Here are some steps I can take if you haven't completed the assignment:\n1. Read the instructions thoroughly before starting the task.\n2. Identify and prioritize your tasks based on their importance and deadline.\n3. Break down each subtopic into smaller sections or subtops that may have different impacting what was learned during the process.\n4. Create an outline of how you want to achieve the rest of the content.\n5. Be flexible with your schedule as you adjust your approach as needed. <|end|>"}]
[{'generated_text': "<|user|>Before replying to my question, I would like you to provide two candidate solutions and do some reflexion about these solution. Then, you'll pick the best as the final reply. What is best: eating healthy or eating economically?\n<|end|><|assistant|> Eating healthy or eating economy depends on various factors such as the type of food being considered, how well it meets the criteria for improvement, and the effectiveness of new medical treatments. However, based solely on a given set of guidelines, they may be more appropriate in terms of the final response that will help the overall process finish later. <|end|>"}]
```

## Citing this Model 
```
@article{SchulerRojas_2025,
  title={Saving 77% of the Parameters in Large Language Models Technical Report},
  url={https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT},
  author={Schwarz Schuler, Joao Paulo and Rojas Gómez, Alejandra},
  year={2025}}
```