limour/limour-blog.github.io @ 9b518ed098a20d686e6fb31714747a383bce6837

title: 【探索】将BlueLM-7B-Chat转换为标准的GGUF模型 urlname: Convert-BlueLM-7B-Chat-to-the-standard-GGUF-model index_img: https://api.limour.top/randomImg?d=2024-02-03 22:38:07 date: 2024-02-04 06:38:07

tags: [llama, 探索]

准备模型

运行环境

# conda create -n llamaConvert python=3.10 git -c conda-forge
# conda activate llamaConvert
# cd D:\llama
# git clone --depth=1 https://github.com/ggerganov/llama.cpp.git
# cd llama.cpp
# python -m pip install -r requirements.txt
# pip install tiktoken
$env:HF_ENDPOINT="https://hf-mirror.com"; python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='vivo-ai/BlueLM-7B-Chat-32K', local_dir=r'D:\models\BlueLM-7B')"
# 还是用 vivo-ai/BlueLM-7B-Chat 吧, 32k的 ntkmixed 长度外推方案不知道怎么改

初始的模型结构

BlueLMForCausalLM(
(model): BlueLMModel(
(embed_tokens): Embedding(100096, 4096, padding_idx=3)
(embed_layer_norm): LayerNorm((4096,), eps=1e-06, elementwise_affine=True)
(layers): ModuleList(
  (0-31): 32 x BlueLMDecoderLayer(
    (self_attn): BlueLMAttention(
      (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
      (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
      (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
      (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
      (rotary_emb): BlueLMRotaryEmbedding()
    )
    (mlp): BlueLMMLP(
      (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
      (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
      (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
      (act_fn): SiLU()
      (dropout): Dropout(p=0, inplace=False)
    )
    (input_layernorm): BlueLMRMSNorm()
    (post_attention_layernorm): BlueLMRMSNorm()
  )
)
(norm): BlueLMRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=100096, bias=False)
)

归一化 embed

from transformers import AutoModelForCausalLM
import torch

# 提前将 modeling_bluelm.py 中用到 flash_attn 的部分改成 None，反正不真运行，只需要模型结构
tmp = AutoModelForCausalLM.from_pretrained(r'D:\models\BlueLM-7B',
                                 torch_dtype=torch.bfloat16,
                                 trust_remote_code=True)

test_i = torch.arange(0, 10, dtype=torch.long)

embedding = tmp.model.embed_tokens
layer_norm = tmp.model.embed_layer_norm

test_o_o = embedding(test_i)
test_o_o = layer_norm(test_o_o)

for param in embedding.parameters():
if len(param.shape) > 1:
    param.data = layer_norm(param.data)

test_o_c = embedding(test_i)

print(torch.allclose(test_o_o, test_o_c, atol=1e-4))

del tmp.model.embed_layer_norm
tmp.save_pretrained(r'D:\models\BlueLM')
# 记得将缺失的一些文件手动复制一下
# 顺便删掉config.json里的rope scaling type

删除 embed_layer_norm 后的结构

BlueLMForCausalLM(
(model): BlueLMModel(
(embed_tokens): Embedding(100096, 4096, padding_idx=3)
(layers): ModuleList(
  (0-31): 32 x BlueLMDecoderLayer(
    (self_attn): BlueLMAttention(
      (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
      (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
      (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
      (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
      (rotary_emb): BlueLMRotaryEmbedding()
    )
    (mlp): BlueLMMLP(
      (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
      (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
      (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
      (act_fn): SiLU()
      (dropout): Dropout(p=0, inplace=False)
    )
    (input_layernorm): BlueLMRMSNorm()
    (post_attention_layernorm): BlueLMRMSNorm()
  )
)
(norm): BlueLMRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=100096, bias=False)
)

测试运行

conda activate llamaConvert
cd D:\llama\llama.cpp
python convert.py D:\models\BlueLM --padvocab
Wrote D:\models\BlueLM\ggml-model-f16.gguf
conda activate llamaCpp
cd D:\llama-cublas
.\quantize.exe D:\models\BlueLM\ggml-model-f16.gguf D:\models\BlueLM\ggml-model-Q5_K_M.gguf Q5_K_M
.\main.exe -m D:\models\BlueLM\ggml-model-Q5_K_M.gguf -ngl 25 -c 1024 --interactive-first

2024-02-04-【探索】将BlueLM-7B-Chat转换为标准的GGUF模型.md 4.6 KB Historia Czysty

tags: [llama, 探索]

准备模型

归一化 embed

测试运行

2024-02-04-【探索】将BlueLM-7B-Chat转换为标准的GGUF模型.md 4.6 KB

Historia Czysty