|
--- |
|
datasets: |
|
- stanfordnlp/imdb |
|
base_model: |
|
- lvwerra/gpt2-imdb |
|
pipeline_tag: text-generation |
|
license: mit |
|
--- |
|
# Purpose of this finetuning |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
Finetune base model [GPT2-IMDB](https://huggingface.co/lvwerra/gpt2-imdb) using a using [this BERT sentiment classifier](https://huggingface.co/lvwerra/distilbert-imdb) as a reward function. |
|
|
|
- The goal is to train the GPT2 model to extrapolate on a movie review and generate negative sentiment. |
|
- There is a separate training done to generate positive movie reviews. The eventual goal would be to interpolate the weight spaces of the 'positively fintuned' and 'negatively finetuned' models as per the [rewarded-soups paper](https://arxiv.org/abs/2306.04488) and test if it results in (qualitatively) neutral reviews. |
|
|
|
## Model Params |
|
|
|
Here are the traning parameters |
|
- base_model ='lvwerra/gpt2-imdb' |
|
- dataset = stanfordnlp/imdb |
|
- batch_size = 16 |
|
- learning_rate = 1.41e-5 |
|
- output_max_length = 16 |
|
- output_min_length = 4 |
|
|
|
Not sure how long it took, but less than a couple hours on a single A6000 GPU |
|
|
|
|
|
### Results |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/671ad995ca9561981190dbb4/ndneRnA3jP563cKMEtMth.png) |
|
|
|
|