Sébastien De Greef commited on
Commit
c9e7f26
1 Parent(s): 255d69d

feat: Add HuggingFace Embeddings Leaderboard link to embeddings.qmd

Browse files
src/llms/embeddings.qmd CHANGED
@@ -1,6 +1,11 @@
1
- # Embeddings
 
 
 
2
  Embeddings in Large Language Models (LLMs) are a foundational component in the field of natural language processing (NLP). These embeddings transform words, phrases, or even longer texts into a vector space, capturing the semantic meaning that enables LLMs to perform a variety of language-based tasks with remarkable proficiency. This article focuses on the role of embeddings in LLMs, how they are generated, and their impact on the performance of these models.
3
 
 
 
4
  ## What are Embeddings in LLMs?
5
  In the context of LLMs, embeddings are dense vector representations of text. Each vector aims to encapsulate aspects of linguistic meaning such as syntax, semantics, and context. Unlike simpler models that might use one-hot encoding, LLM embeddings map words or tokens to vectors in a way that reflects their semantic and contextual relationships.
6
 
 
1
+ ---
2
+ title: Embeddings
3
+ ---
4
+
5
  Embeddings in Large Language Models (LLMs) are a foundational component in the field of natural language processing (NLP). These embeddings transform words, phrases, or even longer texts into a vector space, capturing the semantic meaning that enables LLMs to perform a variety of language-based tasks with remarkable proficiency. This article focuses on the role of embeddings in LLMs, how they are generated, and their impact on the performance of these models.
6
 
7
+ [HuggingFace Embeddings Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
8
+
9
  ## What are Embeddings in LLMs?
10
  In the context of LLMs, embeddings are dense vector representations of text. Each vector aims to encapsulate aspects of linguistic meaning such as syntax, semantics, and context. Unlike simpler models that might use one-hot encoding, LLM embeddings map words or tokens to vectors in a way that reflects their semantic and contextual relationships.
11
 
src/llms/tokenizers.qmd CHANGED
@@ -4,8 +4,8 @@ title: Tokenizers
4
 
5
  Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down text into smaller components, such as words, phrases, or symbols. These smaller components are called tokens. Tokenizers, the tools that perform tokenization, play a crucial role in preparing text for various NLP tasks like machine translation, sentiment analysis, and text summarization. This article provides an exhaustive overview of tokenizers, exploring their types, how they function, their importance, and the challenges they present.
6
 
7
- [Excellent video of Andrej Karpathy about Tokenizers](https://www.youtube.com/watch?v=zduSFxRajkE)
8
- [Online Tokenizer Playground](https://gpt-tokenizer.dev/)
9
 
10
  ## What is Tokenization?
11
 
 
4
 
5
  Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down text into smaller components, such as words, phrases, or symbols. These smaller components are called tokens. Tokenizers, the tools that perform tokenization, play a crucial role in preparing text for various NLP tasks like machine translation, sentiment analysis, and text summarization. This article provides an exhaustive overview of tokenizers, exploring their types, how they function, their importance, and the challenges they present.
6
 
7
+ * [Excellent video of Andrej Karpathy about Tokenizers](https://www.youtube.com/watch?v=zduSFxRajkE)
8
+ * [Online Tokenizer Playground](https://gpt-tokenizer.dev/)
9
 
10
  ## What is Tokenization?
11