Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,29 @@ base_model:
|
|
6 |
pipeline_tag: text-generation
|
7 |
license: mit
|
8 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
10 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/671ad995ca9561981190dbb4/ndneRnA3jP563cKMEtMth.png)
|
11 |
|
|
|
6 |
pipeline_tag: text-generation
|
7 |
license: mit
|
8 |
---
|
9 |
+
# Purpose of this finetuning
|
10 |
+
|
11 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
12 |
+
|
13 |
+
Finetune base model [GPT2-IMDB](https://huggingface.co/lvwerra/gpt2-imdb) using a using [this BERT sentiment classifier](https://huggingface.co/lvwerra/distilbert-imdb) as a reward function.
|
14 |
+
|
15 |
+
- The goal is to train the GPT2 model to extrapolate on a movie review and generate negative sentiment.
|
16 |
+
- There is a separate training done to generate positive movie reviews. The eventual goal would be to interpolate the weight spaces of the 'positively fintuned' and 'negatively finetuned' models as per the [rewarded-soups paper](https://arxiv.org/abs/2306.04488) and test if it results in (qualitatively) neutral reviews.
|
17 |
+
|
18 |
+
## Model Params
|
19 |
+
|
20 |
+
Here are the traning parameters
|
21 |
+
- base_model ='lvwerra/gpt2-imdb'
|
22 |
+
- dataset = stanfordnlp/imdb
|
23 |
+
- batch_size = 16
|
24 |
+
- learning_rate = 1.41e-5
|
25 |
+
- output_max_length = 16
|
26 |
+
- output_min_length = 4
|
27 |
+
|
28 |
+
Not sure how long it took, but less than a couple hours on a single A6000 GPU
|
29 |
+
|
30 |
+
|
31 |
+
### Results
|
32 |
|
33 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/671ad995ca9561981190dbb4/ndneRnA3jP563cKMEtMth.png)
|
34 |
|