llama.cpp convert problem report(about `tokenizer.json`)

#2
by DataSoul - opened

I attempted to convert this model to gguf using the convert_hf_to_gguf.py script from llama.cpp, but encountered an error:

[
FileNotFoundError: File not found: F:\OpensourceAI-models\SuperNova-Medius\tokenizer.model
Exception: data did not match any variant of untagged enum ModelWrapper at line 757443 column 3
]

After downloading tokenizer.json from qwen2.5-14B, replacing the file with the same name in this model's directory with it, I was able to successfully convert the model to gguf.

I made a rough comparison of the two "tokenizer.json" files and found that they are mostly similar except for some formatting differences. This model's tokenizer.json has an additional line "ignore_merges": false, while other parts seem unchanged.

I am unsure of the reason behind this issue, nor do I know if others might encounter a similar problem. Therefore, I report it here for reference.

Sign up or log in to comment