YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Experimental model focused on RP and storytelling. This method attempts to bring some of the intrigue and style of the base model back into the instruct model.

This is a model trained in four stages (Use with Llama-8B-Instruct or Llama-8B-Instruct abliterations)

Base Model -- 1 Gig of semi-structured pretraining data (Uniform distribution centered around 4096 ctx length, b/w 512-8192) image/png

  • Base pretraining phase 1 (Constant LR, text completion -- 20,000 steps 2/3 epoch)
  • Base pretraining phase 2 (Cosine LR, text completion -- 10,000 steps 1/3 epoch)

Merge LORA into instruct model -- 100 MB of structured story-instruct data (All samples attempt to be near 8192 ctx fullsize instructions) image/png

  • Story-instruct tune phase 1 (Constant LR, ~1250 steps, 1 epoch)
  • Story-instruct tune phase 2 (Cosine LR, ~1250 steps, 1 epoch)

Trained using https://github.com/unslothai/unsloth Rough script:

model = FastLanguageModel.get_peft_model(
    model,
    r = 64,
    target_modules = ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 32,
    lora_dropout = 0.05, # 0 for base pretraining
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = True,
    loftq_config = None,
)

trainer = SFTTrainer(
    model = model,
    train_dataset = train_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        warmup_steps = 45,
        num_train_epochs=2, #1 for base-pretraining
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 15,
        logging_dir="logs",
        report_to="tensorboard",
        output_dir = "outputs",
        save_strategy=IntervalStrategy.STEPS,
        save_steps=100,
        save_total_limit=30,
        optim = "adamw_torch_fused",
        lr_scheduler_type="cosine", # <- Changed over time
        learning_rate=5e-5,
        weight_decay=0.10, # .15 for base pretraining
        adam_beta1=0.88, # .9 for base pretraining
        adam_beta2=0.99,  # .999 for base pretraining
    ),
)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Blackroot/Llama-3-8B-Abomination-LORA

Merges
24 models