limour-dev vor 7 Monaten
Ursprung
Commit
faac132500

+ 32 - 17
source/_posts/2024-01-01-【记录】win10平台6G显存运行Qwen-1-8B.md

@@ -99,38 +99,53 @@ tigerbot-13b 在 R5 5600H 上推理速度 4.6 tokens/s,CPU 使用率 60%,频
 {% endnote %}
 
 ## 附加 在 Colab 上量化
-+ [llm2gguf.ipynb](https://colab.research.google.com/drive/173nKy0pYTtzS-xw09qBdO5lBEHlwEhDQ?usp=sharing)
-+ [量化后的结果](https://huggingface.co/datasets/Limour/CausalLM-7B-GGUF)
++ [llm2gguf.ipynb](https://colab.research.google.com/drive/1JT3XFjD7CTRB97pu3QpeGuzWA1yYEAM7?usp=sharing)
++ [量化后的结果](https://huggingface.co/Limour/CausalLM-14B-GGUF)
 ### 安装 llama.cpp
 ```ipython
 !git clone --depth=1 https://github.com/ggerganov/llama.cpp.git
-%cd llama.cpp
-!python -m pip install -r requirements.txt
-!pip install tiktoken
+%cd /content/llama.cpp
+!LLAMA_CUDA=1 make -j
 ```
-### 转换模型
+### 计算 imatrix
+```ipython
+%cd /content
+!wget -O transient.txt.gz https://huggingface.co/datasets/Limour/b-corpus/resolve/main/00-preview/00-transient.txt.gz?download=true
+!gunzip transient.txt.gz
+!mkdir -p /content/CausalLM-14B-GGUF
+!wget -O /content/CausalLM-14B-GGUF/causallm_14b.Q8_0.gguf  https://huggingface.co/TheBloke/CausalLM-14B-GGUF/resolve/main/causallm_14b.Q8_0.gguf?download=true
+!/content/llama.cpp/imatrix -m /content/CausalLM-14B-GGUF/causallm_14b.Q8_0.gguf -f /content/transient.txt -ngl 36
+```
+### 登录拥抱脸
 ```ipython
-from huggingface_hub import snapshot_download
-from huggingface_hub import login, CommitScheduler
 from google.colab import userdata
+from huggingface_hub import login
 # login(token=os.environ.get("HF_TOKEN"), write_permission=True)
 login(token=userdata.get('HF_TOKEN'), write_permission=True)
+# from huggingface_hub import notebook_login
+# notebook_login()
+```
+### (跳过) 转换模型
+```ipython
+%cd llama.cpp
+!python -m pip install -r requirements.txt
+!pip install tiktoken
+from huggingface_hub import snapshot_download
 !mkdir -p ~/CausalLM
 snapshot_download(repo_id='CausalLM/7B', local_dir=r'/content/CausalLM', ignore_patterns=['*.h5', '*.ot', '*.msgpack', '*.safetensors'])
 !python convert.py --vocab-type bpe --pad-vocab --outtype f16 /content/CausalLM 
 ```
 ### 量化模型
 ```ipython
-!mkdir build
-%cd build
-!apt install cmake
-!cmake ..
-!cmake --build . --config Release
-!/content/llama.cpp/build/bin/quantize --help
-!mkdir -p /content/CausalLM-7B-GGUF
-!/content/llama.cpp/build/bin/quantize /content/CausalLM/ggml-model-f16.gguf /content/CausalLM-7B-GGUF/causallm_7b.IQ4_NL.gguf IQ4_NL
+!/content/llama.cpp/quantize --allow-requantize --imatrix /content/imatrix.dat /content/CausalLM-14B-GGUF/causallm_14b.Q8_0.gguf /content/CausalLM-14B-GGUF/causallm_14b.IQ3_XS.gguf IQ3_XS
 ```
 ### 上传模型
 ```ipython
-CommitScheduler(repo_id='Limour/CausalLM-7B-GGUF', repo_type='dataset', folder_path='/content/CausalLM-7B-GGUF')
+from huggingface_hub import HfApi
+api = HfApi()
+api.upload_file(
+    path_or_fileobj="/content/CausalLM-14B-GGUF/causallm_14b.IQ3_XS.gguf",
+    path_in_repo="causallm_14b.IQ3_XS.gguf",
+    repo_id="Limour/CausalLM-14B-GGUF"
+)
 ```