This model is Llama 3.2 3B, but converted to mLlama's architecture. This means the cross attention and vision are untrained.