Transformers
falcon
text-generation-inference
TheBloke commited on
Commit
ceec419
1 Parent(s): d80582f

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -34
README.md CHANGED
@@ -1,19 +1,21 @@
1
  ---
 
 
 
 
2
  language:
3
  - en
4
  - de
5
  - es
6
  - fr
7
  license: unknown
8
- datasets:
9
- - tiiuae/falcon-refinedweb
10
- model_name: Falcon 180B
11
- inference: false
12
  model_creator: Technology Innovation Institute
13
- model_link: https://huggingface.co/tiiuae/falcon-180B
14
  model_type: falcon
 
 
 
15
  quantized_by: TheBloke
16
- base_model: tiiuae/falcon-180B
17
  ---
18
 
19
  <!-- header start -->
@@ -37,23 +39,25 @@ base_model: tiiuae/falcon-180B
37
  - Model creator: [Technology Innovation Institute](https://huggingface.co/tiiuae)
38
  - Original model: [Falcon 180B](https://huggingface.co/tiiuae/falcon-180B)
39
 
 
40
  ## Description
41
 
42
  This repo contains GGUF format model files for [Technology Innovation Institute's Falcon 180B](https://huggingface.co/tiiuae/falcon-180B).
43
 
 
44
  <!-- README_GGUF.md-about-gguf start -->
45
  ### About GGUF
46
 
47
  GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
48
 
49
- The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
50
 
51
- Here are a list of clients and libraries that are known to support GGUF:
52
- * [llama.cpp](https://github.com/ggerganov/llama.cpp).
53
- * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions.
54
- * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with full GPU accel across multiple platforms and GPU architectures. Especially good for story telling.
55
- * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
56
  * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with many interesting and unique features, including a full model library for easy model selection.
 
57
  * [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
58
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
59
  * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
@@ -76,12 +80,14 @@ Here are a list of clients and libraries that are known to support GGUF:
76
  ```
77
 
78
  <!-- prompt-template end -->
 
 
79
  <!-- compatibility_gguf start -->
80
  ## Compatibility
81
 
82
- These quantised GGUF files are compatible with llama.cpp from August 21st 2023 onwards, as of commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9)
83
 
84
- They are now also compatible with many third party UIs and libraries - please see the list at the top of the README.
85
 
86
  ## Explanation of quantisation methods
87
  <details>
@@ -154,21 +160,75 @@ del falcon-180b.Q8_0.gguf-split-a falcon-180b.Q8_0.gguf-split-b
154
  </details>
155
  <!-- README_GGUF.md-provided-files end -->
156
 
157
- <!-- README_GGUF.md-how-to-run start -->
158
- ## Example `llama.cpp` command
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159
 
160
- Make sure you are using `llama.cpp` from commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9) or later.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
 
162
- For compatibility with older versions of llama.cpp, or for any third-party libraries or clients that haven't yet updated for GGUF, please use GGML files instead.
 
 
 
 
163
 
 
 
164
  ```
165
- ./main -t 10 -ngl 32 -m falcon-180b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
 
 
 
 
 
 
 
 
 
 
 
166
  ```
167
- Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. If offloading all layers to GPU, set `-t 1`.
168
 
169
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
170
 
171
- Change `-c 4096` to the desired sequence length for this model. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
172
 
173
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
174
 
@@ -182,35 +242,37 @@ Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://git
182
 
183
  You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
184
 
185
- ### How to load this model from Python using ctransformers
186
 
187
  #### First install the package
188
 
189
- ```bash
 
 
190
  # Base ctransformers with no GPU acceleration
191
- pip install ctransformers>=0.2.24
192
  # Or with CUDA GPU acceleration
193
- pip install ctransformers[cuda]>=0.2.24
194
- # Or with ROCm GPU acceleration
195
- CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
196
- # Or with Metal GPU acceleration for macOS systems
197
- CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
198
  ```
199
 
200
- #### Simple example code to load one of these GGUF models
201
 
202
  ```python
203
  from ctransformers import AutoModelForCausalLM
204
 
205
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
206
- llm = AutoModelForCausalLM.from_pretrained("TheBloke/Falcon-180B-GGUF", model_file="falcon-180b.q4_K_M.gguf", model_type="falcon", gpu_layers=50)
207
 
208
  print(llm("AI is going to"))
209
  ```
210
 
211
  ## How to use with LangChain
212
 
213
- Here's guides on using llama-cpp-python or ctransformers with LangChain:
214
 
215
  * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
216
  * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
@@ -225,10 +287,12 @@ For further support, and discussions on these models and AI in general, join us
225
 
226
  [TheBloke AI's Discord server](https://discord.gg/theblokeai)
227
 
228
- ## Thanks, and how to contribute.
229
 
230
  Thanks to the [chirper.ai](https://chirper.ai) team!
231
 
 
 
232
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
233
 
234
  If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
@@ -240,7 +304,7 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
240
 
241
  **Special thanks to**: Aemon Algiz.
242
 
243
- **Patreon special mentions**: Russ Johnson, J, alfie_i, Alex, NimbleBox.ai, Chadd, Mandus, Nikolai Manek, Ken Nordquist, ya boyyy, Illia Dulskyi, Viktor Bowallius, vamX, Iucharbius, zynix, Magnesian, Clay Pascal, Pierre Kircher, Enrico Ros, Tony Hughes, Elle, Andrey, knownsqashed, Deep Realms, Jerry Meng, Lone Striker, Derek Yates, Pyrater, Mesiah Bishop, James Bentley, Femi Adebogun, Brandon Frisco, SuperWojo, Alps Aficionado, Michael Dempsey, Vitor Caleffi, Will Dee, Edmond Seymore, usrbinkat, LangChain4j, Kacper Wikieł, Luke Pendergrass, John Detwiler, theTransient, Nathan LeClaire, Tiffany J. Kim, biorpg, Eugene Pentland, Stanislav Ovsiannikov, Fred von Graf, terasurfer, Kalila, Dan Guido, Nitin Borwankar, 阿明, Ai Maven, John Villwock, Gabriel Puliatti, Stephen Murray, Asp the Wyvern, danny, Chris Smitley, ReadyPlayerEmma, S_X, Daniel P. Andersen, Olakabola, Jeffrey Morgan, Imad Khwaja, Caitlyn Gatomon, webtim, Alicia Loh, Trenton Dambrowitz, Swaroop Kallakuri, Erik Bjäreholt, Leonard Tan, Spiking Neurons AB, Luke @flexchar, Ajan Kanaga, Thomas Belote, Deo Leter, RoA, Willem Michiel, transmissions 11, subjectnull, Matthew Berman, Joseph William Delisle, David Ziegler, Michael Davis, Johann-Peter Hartmann, Talal Aujan, senxiiz, Artur Olbinski, Rainer Wilmers, Spencer Kim, Fen Risland, Cap'n Zoog, Rishabh Srivastava, Michael Levine, Geoffrey Montalvo, Sean Connelly, Alexandros Triantafyllidis, Pieter, Gabriel Tamborski, Sam, Subspace Studios, Junyu Yang, Pedro Madruga, Vadim, Cory Kujawski, K, Raven Klaugh, Randy H, Mano Prime, Sebastain Graf, Space Cruiser
244
 
245
 
246
  Thank you to all my generous patrons and donaters!
 
1
  ---
2
+ base_model: tiiuae/falcon-180B
3
+ datasets:
4
+ - tiiuae/falcon-refinedweb
5
+ inference: false
6
  language:
7
  - en
8
  - de
9
  - es
10
  - fr
11
  license: unknown
 
 
 
 
12
  model_creator: Technology Innovation Institute
13
+ model_name: Falcon 180B
14
  model_type: falcon
15
+ prompt_template: '{prompt}
16
+
17
+ '
18
  quantized_by: TheBloke
 
19
  ---
20
 
21
  <!-- header start -->
 
39
  - Model creator: [Technology Innovation Institute](https://huggingface.co/tiiuae)
40
  - Original model: [Falcon 180B](https://huggingface.co/tiiuae/falcon-180B)
41
 
42
+ <!-- description start -->
43
  ## Description
44
 
45
  This repo contains GGUF format model files for [Technology Innovation Institute's Falcon 180B](https://huggingface.co/tiiuae/falcon-180B).
46
 
47
+ <!-- description end -->
48
  <!-- README_GGUF.md-about-gguf start -->
49
  ### About GGUF
50
 
51
  GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
52
 
53
+ Here is an incomplate list of clients and libraries that are known to support GGUF:
54
 
55
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
56
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
57
+ * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
58
+ * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
 
59
  * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with many interesting and unique features, including a full model library for easy model selection.
60
+ * [Faraday.dev](https://faraday.dev/), an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration.
61
  * [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
62
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
63
  * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
 
80
  ```
81
 
82
  <!-- prompt-template end -->
83
+
84
+
85
  <!-- compatibility_gguf start -->
86
  ## Compatibility
87
 
88
+ These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
89
 
90
+ They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
91
 
92
  ## Explanation of quantisation methods
93
  <details>
 
160
  </details>
161
  <!-- README_GGUF.md-provided-files end -->
162
 
163
+ <!-- README_GGUF.md-how-to-download start -->
164
+ ## How to download GGUF files
165
+
166
+ **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
167
+
168
+ The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
169
+ - LM Studio
170
+ - LoLLMS Web UI
171
+ - Faraday.dev
172
+
173
+ ### In `text-generation-webui`
174
+
175
+ Under Download Model, you can enter the model repo: TheBloke/Falcon-180B-GGUF and below it, a specific filename to download, such as: falcon-180b.Q4_K_M.gguf.
176
+
177
+ Then click Download.
178
+
179
+ ### On the command line, including multiple files at once
180
+
181
+ I recommend using the `huggingface-hub` Python library:
182
 
183
+ ```shell
184
+ pip3 install huggingface-hub
185
+ ```
186
+
187
+ Then you can download any individual model file to the current directory, at high speed, with a command like this:
188
+
189
+ ```shell
190
+ huggingface-cli download TheBloke/Falcon-180B-GGUF falcon-180b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
191
+ ```
192
+
193
+ <details>
194
+ <summary>More advanced huggingface-cli download usage</summary>
195
+
196
+ You can also download multiple files at once with a pattern:
197
+
198
+ ```shell
199
+ huggingface-cli download TheBloke/Falcon-180B-GGUF --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
200
+ ```
201
+
202
+ For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
203
+
204
+ To accelerate downloads on fast connections (1Gbit/s or higher), install `hf_transfer`:
205
 
206
+ ```shell
207
+ pip3 install hf_transfer
208
+ ```
209
+
210
+ And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
211
 
212
+ ```shell
213
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/Falcon-180B-GGUF falcon-180b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
214
  ```
215
+
216
+ Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
217
+ </details>
218
+ <!-- README_GGUF.md-how-to-download end -->
219
+
220
+ <!-- README_GGUF.md-how-to-run start -->
221
+ ## Example `llama.cpp` command
222
+
223
+ Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
224
+
225
+ ```shell
226
+ ./main -ngl 32 -m falcon-180b.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
227
  ```
 
228
 
229
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
230
 
231
+ Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
232
 
233
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
234
 
 
242
 
243
  You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
244
 
245
+ ### How to load this model in Python code, using ctransformers
246
 
247
  #### First install the package
248
 
249
+ Run one of the following commands, according to your system:
250
+
251
+ ```shell
252
  # Base ctransformers with no GPU acceleration
253
+ pip install ctransformers
254
  # Or with CUDA GPU acceleration
255
+ pip install ctransformers[cuda]
256
+ # Or with AMD ROCm GPU acceleration (Linux only)
257
+ CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
258
+ # Or with Metal GPU acceleration for macOS systems only
259
+ CT_METAL=1 pip install ctransformers --no-binary ctransformers
260
  ```
261
 
262
+ #### Simple ctransformers example code
263
 
264
  ```python
265
  from ctransformers import AutoModelForCausalLM
266
 
267
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
268
+ llm = AutoModelForCausalLM.from_pretrained("TheBloke/Falcon-180B-GGUF", model_file="falcon-180b.Q4_K_M.gguf", model_type="falcon", gpu_layers=50)
269
 
270
  print(llm("AI is going to"))
271
  ```
272
 
273
  ## How to use with LangChain
274
 
275
+ Here are guides on using llama-cpp-python and ctransformers with LangChain:
276
 
277
  * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
278
  * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
 
287
 
288
  [TheBloke AI's Discord server](https://discord.gg/theblokeai)
289
 
290
+ ## Thanks, and how to contribute
291
 
292
  Thanks to the [chirper.ai](https://chirper.ai) team!
293
 
294
+ Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
295
+
296
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
297
 
298
  If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
 
304
 
305
  **Special thanks to**: Aemon Algiz.
306
 
307
+ **Patreon special mentions**: Pierre Kircher, Stanislav Ovsiannikov, Michael Levine, Eugene Pentland, Andrey, 준교 김, Randy H, Fred von Graf, Artur Olbinski, Caitlyn Gatomon, terasurfer, Jeff Scroggin, James Bentley, Vadim, Gabriel Puliatti, Harry Royden McLaughlin, Sean Connelly, Dan Guido, Edmond Seymore, Alicia Loh, subjectnull, AzureBlack, Manuel Alberto Morcote, Thomas Belote, Lone Striker, Chris Smitley, Vitor Caleffi, Johann-Peter Hartmann, Clay Pascal, biorpg, Brandon Frisco, sidney chen, transmissions 11, Pedro Madruga, jinyuan sun, Ajan Kanaga, Emad Mostaque, Trenton Dambrowitz, Jonathan Leane, Iucharbius, usrbinkat, vamX, George Stoitzev, Luke Pendergrass, theTransient, Olakabola, Swaroop Kallakuri, Cap'n Zoog, Brandon Phillips, Michael Dempsey, Nikolai Manek, danny, Matthew Berman, Gabriel Tamborski, alfie_i, Raymond Fosdick, Tom X Nguyen, Raven Klaugh, LangChain4j, Magnesian, Illia Dulskyi, David Ziegler, Mano Prime, Luis Javier Navarrete Lozano, Erik Bjäreholt, 阿明, Nathan Dryer, Alex, Rainer Wilmers, zynix, TL, Joseph William Delisle, John Villwock, Nathan LeClaire, Willem Michiel, Joguhyik, GodLy, OG, Alps Aficionado, Jeffrey Morgan, ReadyPlayerEmma, Tiffany J. Kim, Sebastain Graf, Spencer Kim, Michael Davis, webtim, Talal Aujan, knownsqashed, John Detwiler, Imad Khwaja, Deo Leter, Jerry Meng, Elijah Stavena, Rooh Singh, Pieter, SuperWojo, Alexandros Triantafyllidis, Stephen Murray, Ai Maven, ya boyyy, Enrico Ros, Ken Nordquist, Deep Realms, Nicholas, Spiking Neurons AB, Elle, Will Dee, Jack West, RoA, Luke @flexchar, Viktor Bowallius, Derek Yates, Subspace Studios, jjj, Toran Billups, Asp the Wyvern, Fen Risland, Ilya, NimbleBox.ai, Chadd, Nitin Borwankar, Emre, Mandus, Leonard Tan, Kalila, K, Trailburnt, S_X, Cory Kujawski
308
 
309
 
310
  Thank you to all my generous patrons and donaters!