AbstractPhil/sdxl-interpolated

Requires a custom training notebook that will be provided soon.
Distilling SDXL using T5 attention masking for the sake of teaching SDXL; CLIP_L and CLIP_G to expect the T5 attention mask.
Additional finetuning required, additional interpolation required, addistional distillation required for full cohesion.
Ongoing training effort interpolating the T5 into SDXL using teacher/student process.
-config = {
"epochs": 10,
"batch_size": 64,
"learning_rate": 1e-6, # Lower learning rate for stability
"save_interval_steps": 10, # Save checkpoint every 10 training steps
"test_save_interval_steps": 10, # Save test images every 10 training steps
"checkpoint_dir": "./checkpoints", # Full diffusers checkpoint folder
"compact_model_dir": "./compact_model", # For final compact model (not used for caching)
"baseline_test_dir": "./baseline_test", # For baseline images & captions
"cache_dir": "./cache", # Folder for caching T5 outputs and teacher features
"num_generated_captions": 128, # Number of captions to generate for training
"model_id": "stabilityai/stable-diffusion-xl-base-1.0",
"model_name": "my_interpolative_distillation", # Folder name for checkpoints
"seed": 420,
"device": torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu"),
"inference_steps": 50,
"height": 1024,
"width": 1024,
"guidance_scale": 7.5,
"inference_interval": 10,
"max_caption_length": 512,
Batch size for teacher feature caching (set very low to reduce VRAM usage)
"cache_teacher_batch_size": 64,

AbstractPhil
/

sdxl-interpolated