Update README.md
Browse files
README.md
CHANGED
@@ -12,4 +12,10 @@ tags:
|
|
12 |
- sft
|
13 |
---
|
14 |
|
15 |
-
**athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1** further pretrained on 1 epoch of the dirty stories from nothingiisreal/Reddit-Dirty-And-WritingPrompts, with all scores below 2 dropped.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
- sft
|
13 |
---
|
14 |
|
15 |
+
**athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1** further pretrained on 1 epoch of the dirty stories from nothingiisreal/Reddit-Dirty-And-WritingPrompts, with all scores below 2 dropped.
|
16 |
+
|
17 |
+
-----
|
18 |
+
|
19 |
+
Why do this? I have a niche use case where I cannot increase compute over 8b, and L3/3.1 are the only models in this size category that meet my needs for logic. However, both versions of L3/3.1 have the damn repetition/token overconfidence problem, and this is meant to disrupt that certainty without disrupting the model's ability to function.
|
20 |
+
|
21 |
+
By the way, I *think* it's the lm_head that is causing the looping, but it might be the embeddings being too separated. I'm not going to pay two more times to test them separately, however :p
|