XsoraS commited on
Commit
839d517
1 Parent(s): ad93a84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -5
README.md CHANGED
@@ -16,7 +16,7 @@ pipeline_tag: text-generation
16
 
17
  # Fine-Tuning NeyabAI on Custom Dataset:
18
 
19
- This repository demonstrates how to fine-tune the GPT-2 language model on a custom dataset using PyTorch and Hugging Face's Transformers library. The code provides an end-to-end example, from loading the dataset to training the model and evaluating its performance.
20
 
21
  ## Requirements
22
 
@@ -26,15 +26,12 @@ This repository demonstrates how to fine-tune the GPT-2 language model on a cust
26
  - NumPy
27
 
28
  You can install the required packages using pip:
29
-
30
  ```bash
31
  pip install torch transformers numpy
32
  ```
33
 
34
  ## Fine-Tuning Script
35
-
36
  The following script outlines the steps for fine-tuning GPT-2 on a custom dataset:
37
-
38
  ```python
39
  from transformers import GPT2LMHeadModel, GPT2TokenizerFast, AdamW
40
  import torch
@@ -52,7 +49,7 @@ dataset = ["Your custom dataset goes here."] # Replace with your actual dataset
52
 
53
  # Tokenization function
54
  def tokenize_function(examples):
55
- return tokenizer(examples, padding='max_length', truncation=True, max_length=400)
56
 
57
  # Tokenize the dataset
58
  tokenized_inputs = [tokenize_function(text) for text in dataset]
 
16
 
17
  # Fine-Tuning NeyabAI on Custom Dataset:
18
 
19
+ This repository demonstrates how to fine-tune the NeyabAI(GPT-2) language model on a custom dataset using PyTorch and Hugging Face's Transformers library. The code provides an end-to-end example, from loading the dataset to training the model and evaluating its performance.
20
 
21
  ## Requirements
22
 
 
26
  - NumPy
27
 
28
  You can install the required packages using pip:
 
29
  ```bash
30
  pip install torch transformers numpy
31
  ```
32
 
33
  ## Fine-Tuning Script
 
34
  The following script outlines the steps for fine-tuning GPT-2 on a custom dataset:
 
35
  ```python
36
  from transformers import GPT2LMHeadModel, GPT2TokenizerFast, AdamW
37
  import torch
 
49
 
50
  # Tokenization function
51
  def tokenize_function(examples):
52
+ return tokenizer(examples, padding='max_length', truncation=True, max_length=512)
53
 
54
  # Tokenize the dataset
55
  tokenized_inputs = [tokenize_function(text) for text in dataset]