david-thrower
/

DoRA-fork-of-gemma-2-2b-it

@@ -7,136 +7,172 @@ tags: []
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
 - **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
 - **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
 ### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
 ### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 #### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
 #### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
 ## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
@@ -144,56 +180,61 @@ Use the code below to get started with the model.
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
 ### Model Architecture and Objective
-[More Information Needed]
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
 ## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**
-[More Information Needed]
 **APA:**
-[More Information Needed]
 ## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
 ## More Information [optional]
-[More Information Needed]
 ## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]

 <!-- Provide a quick summary of what the model is/does. -->
+This is a vanilla example of the model resulting from DoRA fine tuning. The Jupyter Notebook here: https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it provides a template for DoRA /LoRA fine tuning of Gemma 2-2B instruct. This template can be easily modified to create a custom LLM to solve a real world problem.
 ## Model Details
+Gemma-2 Fine-Tuning with LoRA and DoRA: A Practical plug and play Template
+### Overview:
+The notebook where this model was developed provides a practical, simple case template for fine-tuning Gemma-2 models (2B, 9B, 27B) using the Weight-Decomposed Low-Rank Adaptation (DoRA) version of Low-Rank Adaptation (LoRA), on a free-tier Google Colab GPU. This approach allows for efficient customization of Gemma-2 for specific tasks without the computational overhead of full fine-tuning. The basic concepts are discussed, but this notebook is meant to be a practical template for any developer at any level to be able to "just plug and play" without needing a PhD in math to do it.
+## Model Description
+### The Problem: "Off The Shelf" LLMs are great, but they are jacks of all trades and masters at none:
+Gemma-2 offers impressive performance, especially for its size, excelling at code generation, complex question answering, and following nuanced instructions. The quality of its writing, quality of the explainations it generates, and "human - like" writing style is also rather impressive. However, like other pre-trained LLMs, its performance an a niche task needs to be enhanced a bit, and that is where fine-tuning on task-specific data comes in. Traditional fine-tuning is computationally expensive, involves thousands of dollars in compute resources, and leaves a gaping carbon footprint, making it impractical for many users.
+### LoRA: A second Generation approach to Parameter-Efficient Fine-Tuning Solutions:
+- LoRA addresses this challenge by freezing the pre-trained model weights, in other words, basically leaving the existing model as is, and training a small set of new weights that are added in parallel to some of the model's layers.
+    - The benefit: This drastically reduces the number of trainable parameters, enabling efficient fine-tuning on consumer-grade hardware. This provides almost as good accuracy as full fine tuning, and requires as little as 1% as many compute resources to accomplish.
+    - The drawback we really want to avoid here: The models that classic case LoRA produces is usually lower than that of full fine tuning.
+- For advanced users: these adapters are low rank matrices (adapter weights) injected along side specific layers, usually the query, key, and value feed forward layers.
+### DoRA: A 3rd Generation approach to Parameter-Efficient Fine-Tuning Solutions that we will use here:
+- Weight-Decomposed Low-Rank Adaptation (DoRA) builds upon LoRA by adding a matrix factorization that improves the accuracy without much additional computational expense. You don't really need to understand what is happening under the hood to use it. This template will is fairly robust and should work reasonably well on a lot of data sets.
+  - The benefits: Like conventional LoRA, we are leaving the model's original weights as - is and only training adapters that were added that account for less than 1% of the model's weights. Unlike conventional LoRA, DoRA will often create models that are equally as accurate as those done by expensive full fine tuning, and if not equally accurate, very close to it in most cases if done correctly and carefully optimized and on the right training data.
+- **For advanced users:"" What is happening is that DoRA incorporates orthogonal constraints on the adapter weights. This technique decomposes the updates of the weights into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the magnitude is handled by a separate learnable parameter. You can read more about it on the resources below, but to stay true to the scope of this notebook, to serve as a practical template and guide to arrive at a proof of concept or MVP custom LLM that can be refined later by advanced users if need be, we refer you to the paper and other academic materials rather than go deep into the details.
+    - https://arxiv.org/abs/2402.09353
+    - https://www.youtube.com/watch?v=J2WzLS9TggQ
+### Why This Template (On the Github Link Above)?
+- Practical, Plug and Play: If you don't understand the theory discussed here, no problem. If you understand the basics of Python and follow the instructions, this template can be easily used to fine tune your own custom LLM without any cost to you to do so. If you are a developer, you can use other tutorials to integrate the model you create into a chatbot UI like one of these to make a practical app.
+    - https://www.gradio.app/docs/gradio/chatinterface
+    - https://reflex.dev/docs/getting-started/chatapp-tutorial/
+- Free-Tier Colab Ready: Designed to run efficiently on Google Colab's free T4 GPUs, making powerful LLM customization accessible to everyone.
+- Scalable: Easily adaptable for larger Gemma-2 models (9B, 27B) by simply changing the model_name and running in a suitable environment with more resources.
+- Simple and Customizable: Provides a clear and concise code structure that can be easily modified for various tasks and datasets.
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** David Thrower and Cerebros AutoML
+- **Funded by [optional]:** David Thrower
+- **Shared by [optional]:** David Thrower
+- **Model type:** LLM, Fork of Gemma 2-2B-IT
 - **Language(s) (NLP):** [More Information Needed]
+- **License:** Gemma, Cerebros modified Apache 2.0
+- **Finetuned from model [optional]:** Gemma 2-2B-IT
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
+- **Repository:** https://huggingface.co/google/gemma-2-2b-it
 - **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it
 ## Uses
+The model and associated Jupyter notebook demonstrate how you can easily make a custom LLM to do whatever you want, using a simple template to fine tune Gemma family models using DoRA / LoRA fine tuning on free tier Google Colab notebooks.
 ### Direct Use
+This model is a simple vanilla demo. The Jupyter notebook associated with it will enable you to make a fine tuned LLM to do pretty much whatever you want.
 ### Downstream Use [optional]
+Follow the fine tuning template and fine tune it to do whatever you want, as long as it is legal and ethical.
 ### Out-of-Scope Use
+- Anything that Cerebros modified Apache 2.0 excludes:
+    - Anything Apache 2.0 excludes
+    - Military use, except explicitly authorized by the author
+    - Law enforcement use intended to aide in making decisions that lead to a anyone being incarcerated or in any way managing an incarceration operation, or criminal prosecution operation, jail, prison, or participating in decisions that flag citizens for investigation or exclusion from public locations, whether physical or virtual.
+    - Use in committing property or violent crimes
+    - Use in any application supporting the adult films industry
+    - Use in any application supporting or in any way promoting the alcoholic beverages, firearms, and / or tobacco industries
+    - Any use supporting the trade, marketing of, or administration of prescription drugs which are commonly abused
+    - Use in a manner intended to identify or discriminate against anyone on any ethnic, ideological, religious, racial, demographic,familial status,family of origin, sex or gender, gender identity, sexual orientation, status of being a victim of any crime, having a history or present status of being a good-faith litigant, national origin(including citizenship or lawful resident status), disability, age, pregnancy, parental status, mental health, income, or socioeconomic / credit status (which includes lawful credit, tenant, and HR screening other than screening for criminal history).
+    - Promoting controversial services such as abortion, via any and all types of marketing, market targeting, operational,administrative, or financial support for providers of such services.
+    - Any use supporting any operation which attempts to sway public opinion, political alignment, or purchasing habits via means such as:
+    - Misleading the public to believe that the opinions promoted by said operation are those of a different group of people than those which the campaign portrays them as being. For example, a political group attempting to cast an image that a given political alignment is that of low income rural citizens, when such is not consistent with known statistics on such population (commonly referred to as astroturfing).
+    - Leading the public to believe premises that contradict duly accepted scientific findings, implausible doctrines, or premises that are generally regarded as heretical or occult.
+    - Promoting or managing any operation profiting from dishonest or unorthodox marketing practices or marketing unorthodox products generally regarded as a junk deal to consumers or employees: (e.g. multi-level marketing operations, 'businesses' that rely on 1099 contractors not ensured a regular wage for all hours worked, companies having any full time employee paid less than $40,000 per year at the time of this writing weighted to BLS inflation, short term consumer lenders and retailers / car dealers offering credit to consumers who could not be approved for the same loan by an FDIC insured bank, operations that make sales through telemarketing or automated phone calls, non-opt-in email distribution marketing, vacation timeshare operations, etc.)
+    - Any use that supports copyright, trademark, patent, or trade secret infringement.
+    - Any use that may reasonably be deemed as negligent.
+    - Any use intended to prevent Cerebros from operating their own commercial distribution of Cerebros or any attempt to gain a de-facto monopoly on commercial or managed platform use of this or a derivative work.
+    - Any use in an AI system that is inherently designed to avoid contact from customers, employees, applicants, citizens, or otherwise makes decisions that significantly affect a person's life or finances without human review of ALL decisions made by said system having an unfavorable impact on a person. Example of acceptable uses under this term:
+      - An IVR or email routing system that predicts which department a customer's inquiry should be routed to.
+      - Examples of unacceptable uses under this term:
+      - An IVR system that is designed to make it cumbersome for a customer to reach a human representative at a company (e.g. the system has no option to reach a human representative or the option is in a nested layer of a multi - layer menu of options).
+      - Email screening applications that only allow selected categories of email from known customers, employees, constituents, etc to appear in a business or government representative's email inbox, blindly discarding or obfuscating all other inquiries.
+- Anything that violates Google's terms of use for Gemma and derivative works:
+    - https://ai.google.dev/gemma/terms
 ## Bias, Risks, and Limitations
+- It is always your responsibility to screen models you create for bias and proper operating charastics as required in the professional ethics of your use case.
+- Being a fork of Gemma reasonable efforts have been made at the foundation model level to ensure it is fair.
 ### Recommendations
+- Follow the DIY fine Dora fine tuning https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it (download the .ipynb file and run it in Google Colab)
+- Adapt the training data set to suit your own use case.
+- Make contributions to the notebook and extend it.
 Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
+Use the code below to get started with the model. https://github.com/david-thrower/DoRA-fine-tuning-gemma-2-2b-it (The .ipynb file found here)
 ## Training Details
+This is a basic vanilla example and a template to train your own fine - tuned Gemma 2 model.
 ### Training Data
+A simple - case vanilla data set meant to emulate a start - up's proof of concept / MVP. This is meant to be replaced with your own data.
 ### Training Procedure
+See the Jupyter notebook in the repo link. Run it in Google Colab and modify it to your use case.
 #### Preprocessing [optional]
+N/A
 #### Training Hyperparameters
+See the Jupyter notebook in the repo link.
 #### Speeds, Sizes, Times [optional]
+N/A
 ## Evaluation
+Contributions welcome!
 ### Testing Data, Factors & Metrics
 #### Testing Data
+Contributions welcome!
 #### Factors
+Contributions welcome!
 ## Model Examination [optional]
+Contributions welcome!
 ## Environmental Impact
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** Google Colab T4
+- **Hours used:** 0.15
+- **Cloud Provider:** GCP
+- **Compute Region:** N/A
+- **Carbon Emitted:** 2g
 ## Technical Specifications [optional]
+N/A
 ### Model Architecture and Objective
+Gemma CausaLLM not otherwise specified.
 ### Compute Infrastructure
+N/A
 #### Hardware
+T4 GPU
 #### Software
+N/A
 ## Citation [optional]
+N/A
 **BibTeX:**
+N/A
 **APA:**
+N/A
 ## Glossary [optional]
+N/A
 ## More Information [optional]
+N/A
 ## Model Card Authors [optional]
+N/A
 ## Model Card Contact
+David Thrower
+david@cerebros.one
+(239) 645-3585
+https://www.linkedin.com/in/david-thrower-%F0%9F%8C%BB-2972482a