david-thrower commited on
Commit
87ee3f5
·
verified ·
1 Parent(s): 64e94ef

Update model card

Browse files
Files changed (1) hide show
  1. README.md +112 -71
README.md CHANGED
@@ -7,136 +7,172 @@ tags: []
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
-
11
 
12
  ## Model Details
13
 
14
- ### Model Description
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
  - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
  ### Model Sources [optional]
29
 
30
  <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
  - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
  ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
 
46
  ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
51
 
52
  ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
- [More Information Needed]
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
61
 
62
- [More Information Needed]
63
 
64
  ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
67
 
68
  Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
  ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
 
76
  ## Training Details
77
 
 
 
78
  ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
83
 
84
  ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
87
 
88
  #### Preprocessing [optional]
89
 
90
- [More Information Needed]
91
 
92
 
93
  #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
96
 
97
  #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
  ### Testing Data, Factors & Metrics
108
 
109
  #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
  #### Factors
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
 
134
 
135
  ## Model Examination [optional]
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
 
141
  ## Environmental Impact
142
 
@@ -144,56 +180,61 @@ Use the code below to get started with the model.
144
 
145
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
  ## Technical Specifications [optional]
154
 
 
 
155
  ### Model Architecture and Objective
156
 
157
- [More Information Needed]
158
 
159
  ### Compute Infrastructure
160
 
161
- [More Information Needed]
162
 
163
  #### Hardware
164
 
165
- [More Information Needed]
166
 
167
  #### Software
168
 
169
- [More Information Needed]
170
 
171
  ## Citation [optional]
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
 
175
  **BibTeX:**
176
 
177
- [More Information Needed]
178
 
179
  **APA:**
180
 
181
- [More Information Needed]
182
 
183
  ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
 
187
- [More Information Needed]
188
 
189
  ## More Information [optional]
190
 
191
- [More Information Needed]
192
 
193
  ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
 
 
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
+ This is a vanilla example of the model resulting from DoRA fine tuning. The Jupyter Notebook here: https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it provides a template for DoRA /LoRA fine tuning of Gemma 2-2B instruct. This template can be easily modified to create a custom LLM to solve a real world problem.
11
 
12
  ## Model Details
13
 
14
+ Gemma-2 Fine-Tuning with LoRA and DoRA: A Practical plug and play Template
15
+
16
+ ### Overview:
17
+
18
+ The notebook where this model was developed provides a practical, simple case template for fine-tuning Gemma-2 models (2B, 9B, 27B) using the Weight-Decomposed Low-Rank Adaptation (DoRA) version of Low-Rank Adaptation (LoRA), on a free-tier Google Colab GPU. This approach allows for efficient customization of Gemma-2 for specific tasks without the computational overhead of full fine-tuning. The basic concepts are discussed, but this notebook is meant to be a practical template for any developer at any level to be able to "just plug and play" without needing a PhD in math to do it.
19
+
20
+
21
+ ## Model Description
22
+
23
+ ### The Problem: "Off The Shelf" LLMs are great, but they are jacks of all trades and masters at none:
24
+
25
+ Gemma-2 offers impressive performance, especially for its size, excelling at code generation, complex question answering, and following nuanced instructions. The quality of its writing, quality of the explainations it generates, and "human - like" writing style is also rather impressive. However, like other pre-trained LLMs, its performance an a niche task needs to be enhanced a bit, and that is where fine-tuning on task-specific data comes in. Traditional fine-tuning is computationally expensive, involves thousands of dollars in compute resources, and leaves a gaping carbon footprint, making it impractical for many users.
26
 
27
+ ### LoRA: A second Generation approach to Parameter-Efficient Fine-Tuning Solutions:
28
+
29
+ - LoRA addresses this challenge by freezing the pre-trained model weights, in other words, basically leaving the existing model as is, and training a small set of new weights that are added in parallel to some of the model's layers.
30
+ - The benefit: This drastically reduces the number of trainable parameters, enabling efficient fine-tuning on consumer-grade hardware. This provides almost as good accuracy as full fine tuning, and requires as little as 1% as many compute resources to accomplish.
31
+ - The drawback we really want to avoid here: The models that classic case LoRA produces is usually lower than that of full fine tuning.
32
+ - For advanced users: these adapters are low rank matrices (adapter weights) injected along side specific layers, usually the query, key, and value feed forward layers.
33
+
34
+ ### DoRA: A 3rd Generation approach to Parameter-Efficient Fine-Tuning Solutions that we will use here:
35
+
36
+ - Weight-Decomposed Low-Rank Adaptation (DoRA) builds upon LoRA by adding a matrix factorization that improves the accuracy without much additional computational expense. You don't really need to understand what is happening under the hood to use it. This template will is fairly robust and should work reasonably well on a lot of data sets.
37
+ - The benefits: Like conventional LoRA, we are leaving the model's original weights as - is and only training adapters that were added that account for less than 1% of the model's weights. Unlike conventional LoRA, DoRA will often create models that are equally as accurate as those done by expensive full fine tuning, and if not equally accurate, very close to it in most cases if done correctly and carefully optimized and on the right training data.
38
+ - **For advanced users:"" What is happening is that DoRA incorporates orthogonal constraints on the adapter weights. This technique decomposes the updates of the weights into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the magnitude is handled by a separate learnable parameter. You can read more about it on the resources below, but to stay true to the scope of this notebook, to serve as a practical template and guide to arrive at a proof of concept or MVP custom LLM that can be refined later by advanced users if need be, we refer you to the paper and other academic materials rather than go deep into the details.
39
+ - https://arxiv.org/abs/2402.09353
40
+ - https://www.youtube.com/watch?v=J2WzLS9TggQ
41
+
42
+ ### Why This Template (On the Github Link Above)?
43
+
44
+ - Practical, Plug and Play: If you don't understand the theory discussed here, no problem. If you understand the basics of Python and follow the instructions, this template can be easily used to fine tune your own custom LLM without any cost to you to do so. If you are a developer, you can use other tutorials to integrate the model you create into a chatbot UI like one of these to make a practical app.
45
+ - https://www.gradio.app/docs/gradio/chatinterface
46
+ - https://reflex.dev/docs/getting-started/chatapp-tutorial/
47
+ - Free-Tier Colab Ready: Designed to run efficiently on Google Colab's free T4 GPUs, making powerful LLM customization accessible to everyone.
48
+ - Scalable: Easily adaptable for larger Gemma-2 models (9B, 27B) by simply changing the model_name and running in a suitable environment with more resources.
49
+ - Simple and Customizable: Provides a clear and concise code structure that can be easily modified for various tasks and datasets.
50
 
51
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
52
 
53
+ - **Developed by:** David Thrower and Cerebros AutoML
54
+ - **Funded by [optional]:** David Thrower
55
+ - **Shared by [optional]:** David Thrower
56
+ - **Model type:** LLM, Fork of Gemma 2-2B-IT
57
  - **Language(s) (NLP):** [More Information Needed]
58
+ - **License:** Gemma, Cerebros modified Apache 2.0
59
+ - **Finetuned from model [optional]:** Gemma 2-2B-IT
60
 
61
  ### Model Sources [optional]
62
 
63
  <!-- Provide the basic links for the model. -->
64
 
65
+ - **Repository:** https://huggingface.co/google/gemma-2-2b-it
66
  - **Paper [optional]:** [More Information Needed]
67
+ - **Demo [optional]:** https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it
68
 
69
  ## Uses
70
 
71
+ The model and associated Jupyter notebook demonstrate how you can easily make a custom LLM to do whatever you want, using a simple template to fine tune Gemma family models using DoRA / LoRA fine tuning on free tier Google Colab notebooks.
72
 
73
  ### Direct Use
74
 
75
+ This model is a simple vanilla demo. The Jupyter notebook associated with it will enable you to make a fine tuned LLM to do pretty much whatever you want.
 
 
76
 
77
  ### Downstream Use [optional]
78
 
79
+ Follow the fine tuning template and fine tune it to do whatever you want, as long as it is legal and ethical.
80
 
 
81
 
82
  ### Out-of-Scope Use
83
 
84
+ - Anything that Cerebros modified Apache 2.0 excludes:
85
+ - Anything Apache 2.0 excludes
86
+ - Military use, except explicitly authorized by the author
87
+ - Law enforcement use intended to aide in making decisions that lead to a anyone being incarcerated or in any way managing an incarceration operation, or criminal prosecution operation, jail, prison, or participating in decisions that flag citizens for investigation or exclusion from public locations, whether physical or virtual.
88
+ - Use in committing property or violent crimes
89
+ - Use in any application supporting the adult films industry
90
+ - Use in any application supporting or in any way promoting the alcoholic beverages, firearms, and / or tobacco industries
91
+ - Any use supporting the trade, marketing of, or administration of prescription drugs which are commonly abused
92
+ - Use in a manner intended to identify or discriminate against anyone on any ethnic, ideological, religious, racial, demographic,familial status,family of origin, sex or gender, gender identity, sexual orientation, status of being a victim of any crime, having a history or present status of being a good-faith litigant, national origin(including citizenship or lawful resident status), disability, age, pregnancy, parental status, mental health, income, or socioeconomic / credit status (which includes lawful credit, tenant, and HR screening other than screening for criminal history).
93
+ - Promoting controversial services such as abortion, via any and all types of marketing, market targeting, operational,administrative, or financial support for providers of such services.
94
+ - Any use supporting any operation which attempts to sway public opinion, political alignment, or purchasing habits via means such as:
95
+ - Misleading the public to believe that the opinions promoted by said operation are those of a different group of people than those which the campaign portrays them as being. For example, a political group attempting to cast an image that a given political alignment is that of low income rural citizens, when such is not consistent with known statistics on such population (commonly referred to as astroturfing).
96
+ - Leading the public to believe premises that contradict duly accepted scientific findings, implausible doctrines, or premises that are generally regarded as heretical or occult.
97
+ - Promoting or managing any operation profiting from dishonest or unorthodox marketing practices or marketing unorthodox products generally regarded as a junk deal to consumers or employees: (e.g. multi-level marketing operations, 'businesses' that rely on 1099 contractors not ensured a regular wage for all hours worked, companies having any full time employee paid less than $40,000 per year at the time of this writing weighted to BLS inflation, short term consumer lenders and retailers / car dealers offering credit to consumers who could not be approved for the same loan by an FDIC insured bank, operations that make sales through telemarketing or automated phone calls, non-opt-in email distribution marketing, vacation timeshare operations, etc.)
98
+ - Any use that supports copyright, trademark, patent, or trade secret infringement.
99
+ - Any use that may reasonably be deemed as negligent.
100
+ - Any use intended to prevent Cerebros from operating their own commercial distribution of Cerebros or any attempt to gain a de-facto monopoly on commercial or managed platform use of this or a derivative work.
101
+ - Any use in an AI system that is inherently designed to avoid contact from customers, employees, applicants, citizens, or otherwise makes decisions that significantly affect a person's life or finances without human review of ALL decisions made by said system having an unfavorable impact on a person. Example of acceptable uses under this term:
102
+ - An IVR or email routing system that predicts which department a customer's inquiry should be routed to.
103
+ - Examples of unacceptable uses under this term:
104
+ - An IVR system that is designed to make it cumbersome for a customer to reach a human representative at a company (e.g. the system has no option to reach a human representative or the option is in a nested layer of a multi - layer menu of options).
105
+ - Email screening applications that only allow selected categories of email from known customers, employees, constituents, etc to appear in a business or government representative's email inbox, blindly discarding or obfuscating all other inquiries.
106
+ - Anything that violates Google's terms of use for Gemma and derivative works:
107
+ - https://ai.google.dev/gemma/terms
108
 
 
109
 
110
  ## Bias, Risks, and Limitations
111
 
112
+ - It is always your responsibility to screen models you create for bias and proper operating charastics as required in the professional ethics of your use case.
113
+ - Being a fork of Gemma reasonable efforts have been made at the foundation model level to ensure it is fair.
114
 
 
115
 
116
  ### Recommendations
117
 
118
+ - Follow the DIY fine Dora fine tuning https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it (download the .ipynb file and run it in Google Colab)
119
+ - Adapt the training data set to suit your own use case.
120
+ - Make contributions to the notebook and extend it.
121
 
122
  Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
123
 
124
  ## How to Get Started with the Model
125
 
126
+ Use the code below to get started with the model. https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it (The .ipynb file found here)
 
 
127
 
128
  ## Training Details
129
 
130
+ This is a basic vanilla example and a template to train your own fine - tuned Gemma 2 model.
131
+
132
  ### Training Data
133
 
134
+ A simple - case vanilla data set meant to emulate a start - up's proof of concept / MVP. This is meant to be replaced with your own data.
135
 
 
136
 
137
  ### Training Procedure
138
 
139
+ See the Jupyter notebook in the repo link. Run it in Google Colab and modify it to your use case.
140
+
141
 
142
  #### Preprocessing [optional]
143
 
144
+ N/A
145
 
146
 
147
  #### Training Hyperparameters
148
 
149
+
150
+ See the Jupyter notebook in the repo link.
151
 
152
  #### Speeds, Sizes, Times [optional]
153
 
154
+ N/A
155
 
 
156
 
157
  ## Evaluation
158
 
159
+ Contributions welcome!
160
 
161
  ### Testing Data, Factors & Metrics
162
 
163
  #### Testing Data
164
 
165
+ Contributions welcome!
166
 
 
167
 
168
  #### Factors
169
 
170
+ Contributions welcome!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
 
173
  ## Model Examination [optional]
174
 
175
+ Contributions welcome!
 
 
176
 
177
  ## Environmental Impact
178
 
 
180
 
181
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
182
 
183
+ - **Hardware Type:** Google Colab T4
184
+ - **Hours used:** 0.15
185
+ - **Cloud Provider:** GCP
186
+ - **Compute Region:** N/A
187
+ - **Carbon Emitted:** 2g
188
 
189
  ## Technical Specifications [optional]
190
 
191
+ N/A
192
+
193
  ### Model Architecture and Objective
194
 
195
+ Gemma CausaLLM not otherwise specified.
196
 
197
  ### Compute Infrastructure
198
 
199
+ N/A
200
 
201
  #### Hardware
202
 
203
+ T4 GPU
204
 
205
  #### Software
206
 
207
+ N/A
208
 
209
  ## Citation [optional]
210
 
211
+ N/A
212
 
213
  **BibTeX:**
214
 
215
+ N/A
216
 
217
  **APA:**
218
 
219
+ N/A
220
 
221
  ## Glossary [optional]
222
 
223
+ N/A
224
 
 
225
 
226
  ## More Information [optional]
227
 
228
+ N/A
229
 
230
  ## Model Card Authors [optional]
231
 
232
+ N/A
233
 
234
  ## Model Card Contact
235
 
236
+ David Thrower
237
+ david@cerebros.one
238
+ (239) 645-3585
239
+ https://www.linkedin.com/in/david-thrower-%F0%9F%8C%BB-2972482a
240
+