Getting shape mismatch while loading saved Pixtral model
Hi, thank you for creating this transformers compatible version of Pixtral. I am saving the model to my local drive and then I want to load it again. However, I get size mismatch for the QKV matrices of "language_model" as shown below. I would appreciate some help. Thanks!
>>> from transformers import LlavaForConditionalGeneration
>>> model_id = "mistral-community/pixtral-12b"
>>> model = LlavaForConditionalGeneration.from_pretrained(model_id)
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [00:04<00:00, 1.27it/s]
>>> model.save_pretrained("pixtral-12b", from_pt = True)
>>> model2 = LlavaForConditionalGeneration.from_pretrained("pixtral-12b")
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 11/11 [00:02<00:00, 4.68it/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/sandbox/anaconda/envs/pixtral/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4224, in from_pretrained
) = cls._load_pretrained_model(
File "/data/sandbox/anaconda/envs/pixtral/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4852, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlavaForConditionalGeneration:
size mismatch for language_model.model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
size mismatch for language_model.model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for language_model.model.layers.1.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
......
Just updating that replacing the config.json works. Basically when I do save_pretrained
the config.json
that is saved is different from what is in this repo. Replacing it with the config.json
in this repo works. I am wondering why save_pretrained
doesnt download the correct config? Thanks.
Hey, thanks for reporting. This is related to the default values we have in Mistral config within transformers. Saving a config is not storing head_dim
and thus causing errors when loading it back. I will make an easy fix by updating the config for now
UPDATE: sorry, realized this cannot be fixed by just updating config and needs fix on transformers level. Will submit a PR soon
@RaushanTurganbay I've also been tracking this issue. The issue is because while the config specifies text_config.model_type to be Mistral, the default config loads with model_type Llama
config = LlavaConfig.from_pretrained("path")
type(config.text_config)
# <class 'transformers.models.mistral.configuration_mistral.MistralConfig'>
default_config = LlavaConfig()
type(default_config.text_config)
# <class 'transformers.models.llama.configuration_llama.LlamaConfig'>
This mismatch between the default config and the loaded config type causes issues when attempting to save_pretrained
->to_json_string
with use_diff=True
.
It just so happens that the default head_dim for LlamaConfig().head_dim
is 128
, which is the true value meant to be saved. When the diff between the default config and the saving config is being calculated, the saving config value 128
is compared to the default config value 128
, they're equal, and therefore the value is not written to the config.
I believe the best fix is to set is_composition=True
, which matches how other composed configs are created
https://github.com/huggingface/transformers/pull/36077