XsoraS
/

NeyabAI-GPT2_version

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

XsoraS commited on Jun 22

Commit

839d517

•

1 Parent(s): ad93a84

Update README.md

Files changed (1) hide show

README.md +2 -5

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ pipeline_tag: text-generation
 # Fine-Tuning NeyabAI on Custom Dataset:
-This repository demonstrates how to fine-tune the GPT-2 language model on a custom dataset using PyTorch and Hugging Face's Transformers library. The code provides an end-to-end example, from loading the dataset to training the model and evaluating its performance.
 ## Requirements
@@ -26,15 +26,12 @@ This repository demonstrates how to fine-tune the GPT-2 language model on a cust
 - NumPy
 You can install the required packages using pip:
 ```bash
 pip install torch transformers numpy
 ```
 ## Fine-Tuning Script
 The following script outlines the steps for fine-tuning GPT-2 on a custom dataset:
 ```python
 from transformers import GPT2LMHeadModel, GPT2TokenizerFast, AdamW
 import torch
@@ -52,7 +49,7 @@ dataset = ["Your custom dataset goes here."]  # Replace with your actual dataset
 # Tokenization function
 def tokenize_function(examples):
-    return tokenizer(examples, padding='max_length', truncation=True, max_length=400)
 # Tokenize the dataset
 tokenized_inputs = [tokenize_function(text) for text in dataset]

 # Fine-Tuning NeyabAI on Custom Dataset:
+This repository demonstrates how to fine-tune the NeyabAI(GPT-2) language model on a custom dataset using PyTorch and Hugging Face's Transformers library. The code provides an end-to-end example, from loading the dataset to training the model and evaluating its performance.
 ## Requirements
 - NumPy
 You can install the required packages using pip:
 ```bash
 pip install torch transformers numpy
 ```
 ## Fine-Tuning Script
 The following script outlines the steps for fine-tuning GPT-2 on a custom dataset:
 ```python
 from transformers import GPT2LMHeadModel, GPT2TokenizerFast, AdamW
 import torch
 # Tokenization function
 def tokenize_function(examples):
+    return tokenizer(examples, padding='max_length', truncation=True, max_length=512)
 # Tokenize the dataset
 tokenized_inputs = [tokenize_function(text) for text in dataset]