Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ pipeline_tag: text-generation
|
|
16 |
|
17 |
# Fine-Tuning NeyabAI on Custom Dataset:
|
18 |
|
19 |
-
This repository demonstrates how to fine-tune the GPT-2 language model on a custom dataset using PyTorch and Hugging Face's Transformers library. The code provides an end-to-end example, from loading the dataset to training the model and evaluating its performance.
|
20 |
|
21 |
## Requirements
|
22 |
|
@@ -26,15 +26,12 @@ This repository demonstrates how to fine-tune the GPT-2 language model on a cust
|
|
26 |
- NumPy
|
27 |
|
28 |
You can install the required packages using pip:
|
29 |
-
|
30 |
```bash
|
31 |
pip install torch transformers numpy
|
32 |
```
|
33 |
|
34 |
## Fine-Tuning Script
|
35 |
-
|
36 |
The following script outlines the steps for fine-tuning GPT-2 on a custom dataset:
|
37 |
-
|
38 |
```python
|
39 |
from transformers import GPT2LMHeadModel, GPT2TokenizerFast, AdamW
|
40 |
import torch
|
@@ -52,7 +49,7 @@ dataset = ["Your custom dataset goes here."] # Replace with your actual dataset
|
|
52 |
|
53 |
# Tokenization function
|
54 |
def tokenize_function(examples):
|
55 |
-
return tokenizer(examples, padding='max_length', truncation=True, max_length=
|
56 |
|
57 |
# Tokenize the dataset
|
58 |
tokenized_inputs = [tokenize_function(text) for text in dataset]
|
|
|
16 |
|
17 |
# Fine-Tuning NeyabAI on Custom Dataset:
|
18 |
|
19 |
+
This repository demonstrates how to fine-tune the NeyabAI(GPT-2) language model on a custom dataset using PyTorch and Hugging Face's Transformers library. The code provides an end-to-end example, from loading the dataset to training the model and evaluating its performance.
|
20 |
|
21 |
## Requirements
|
22 |
|
|
|
26 |
- NumPy
|
27 |
|
28 |
You can install the required packages using pip:
|
|
|
29 |
```bash
|
30 |
pip install torch transformers numpy
|
31 |
```
|
32 |
|
33 |
## Fine-Tuning Script
|
|
|
34 |
The following script outlines the steps for fine-tuning GPT-2 on a custom dataset:
|
|
|
35 |
```python
|
36 |
from transformers import GPT2LMHeadModel, GPT2TokenizerFast, AdamW
|
37 |
import torch
|
|
|
49 |
|
50 |
# Tokenization function
|
51 |
def tokenize_function(examples):
|
52 |
+
return tokenizer(examples, padding='max_length', truncation=True, max_length=512)
|
53 |
|
54 |
# Tokenize the dataset
|
55 |
tokenized_inputs = [tokenize_function(text) for text in dataset]
|