limour-dev 7 місяців тому
батько
коміт
3cae2ea3a4

+ 38 - 1
source/_posts/2024-01-01-【记录】win10平台6G显存运行Qwen-1-8B.md

@@ -96,4 +96,41 @@ cd D:\llama ; .\quantize.exe --help
 {% note info %}
 感觉 6G 显存下,比较好用的是 Yi-6B-Chat-Q4_K_M
 tigerbot-13b 在 R5 5600H 上推理速度 4.6 tokens/s,CPU 使用率 60%,频率 3.5GHz,应该是内存带宽瓶颈
-{% endnote %}
+{% endnote %}
+
+## 附加 在 Colab 上量化
++ [llm2gguf.ipynb](https://colab.research.google.com/drive/173nKy0pYTtzS-xw09qBdO5lBEHlwEhDQ?usp=sharing)
++ [量化后的结果](https://huggingface.co/datasets/Limour/CausalLM-7B-GGUF)
+### 安装 llama.cpp
+```ipython
+!git clone --depth=1 https://github.com/ggerganov/llama.cpp.git
+%cd llama.cpp
+!python -m pip install -r requirements.txt
+!pip install tiktoken
+```
+### 转换模型
+```ipython
+from huggingface_hub import snapshot_download
+from huggingface_hub import login, CommitScheduler
+from google.colab import userdata
+# login(token=os.environ.get("HF_TOKEN"), write_permission=True)
+login(token=userdata.get('HF_TOKEN'), write_permission=True)
+!mkdir -p ~/CausalLM
+snapshot_download(repo_id='CausalLM/7B', local_dir=r'/content/CausalLM', ignore_patterns=['*.h5', '*.ot', '*.msgpack', '*.safetensors'])
+!python convert.py --vocab-type bpe --pad-vocab --outtype f16 /content/CausalLM 
+```
+### 量化模型
+```ipython
+!mkdir build
+%cd build
+!apt install cmake
+!cmake ..
+!cmake --build . --config Release
+!/content/llama.cpp/build/bin/quantize --help
+!mkdir -p /content/CausalLM-7B-GGUF
+!/content/llama.cpp/build/bin/quantize /content/CausalLM/ggml-model-f16.gguf /content/CausalLM-7B-GGUF/causallm_7b.IQ4_NL.gguf IQ4_NL
+```
+### 上传模型
+```ipython
+CommitScheduler(repo_id='Limour/CausalLM-7B-GGUF', repo_type='dataset', folder_path='/content/CausalLM-7B-GGUF')
+```