File size: 1,788 Bytes
fcda435
 
 
cd7229c
 
ad23529
 
 
0451f0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad23529
2b48317
 
ad23529
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
datasets:
- samhog/psychology-10k
---

# Psychology Alpaca 🍩
This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.

### Background 💡
This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from *human feedback* and *AI feedback*.

### Paper 📜
"Comparison Between RLHF and RLAIF in Fine-Tuning a Large Language Model"

The paper can be found [here](https://www.diva-portal.org/smash/record.jsf?dswid=3835&pid=diva2%3A1782683&c=2&searchType=SIMPLE&language=en&query=rlhf&af=%5B%5D&aq=%5B%5B%5D%5D&aq2=%5B%5B%5D%5D&aqe=%5B%5D&noOfRows=50&sortOrder=author_sort_asc&sortOrder2=title_sort_asc&onlyFullText=false&sf=undergraduate)!

### Usage 🏂
```
from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig

tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")

# Load model weights
model = LLaMAForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    device_map="auto",
)

# Add Peft layer to initial weights in order to get the Psychology Alpaca weights
model = PeftModel.from_pretrained(model, "kth/psychology-alpaca")
```

**Links**: [RLHF model](https://huggingface.co/samhog/psychology-llama-rlhf); [RLAIF model](https://huggingface.co/samhog/psychology-llama-rlaif)


**Authors:**
Samuel Höglund, samhog@kth.se;
Josef Khedri, jkhedri@kth.se