DeathReaper0965 commited on
Commit
ef71046
1 Parent(s): 27eb9d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -43,9 +43,8 @@ inference:
43
  ---
44
 
45
  # Flan-T5 (base-sized) Dialogue Summarization with reduced toxicity using RLAIF
46
- This model is a fine-tuned [Flan-T5 model](https://huggingface.co/google/flan-t5-base) on the [SAMSUM](https://huggingface.co/datasets/samsum) dataset. <br>
47
- Which is further fine-tuned using Reinforcement Learning from AI Feedback(RLAIF). <br>
48
- Anthropic's Costitutional AI [paper](https://arxiv.org/abs/2212.08073) from 2022, provides some amazing insights into how RLAIF can be leveraged. Do check out if interested!<br>
49
 
50
  More, specifically I've fine-tuned this model on a single downstream task of Dialogue Summarization on the above mentioned dataset with a primary objective of reduced toxicity in generated summaries.
51
 
@@ -56,7 +55,9 @@ This Model has the same architecture and Parameters as its base model. Please re
56
  This model is intended to summarize the given dialogue in a way that outputs the less toxic summary even when we pass a dialogue that contains toxic phrases or words.<br>
57
  I've fine-tuned the model with an instruction of `Summarize the following Conversation:` that's prepended at the start of each dialogue followed by `Summary: ` keyword at the end that indicates the start of summary.
58
 
59
- Note: The model is primarily trained with an objective of reduced toxicity in the outputs, we can sometimes expect relatively short outputs that might sometimes(rarely) miss the important message in the dialogue but still being true to its primary goal.
 
 
60
 
61
  ## Usage
62
 
 
43
  ---
44
 
45
  # Flan-T5 (base-sized) Dialogue Summarization with reduced toxicity using RLAIF
46
+ This model is a two-fold fine-tuned [Flan-T5 model](https://huggingface.co/google/flan-t5-base) firstly on the [SAMSUM](https://huggingface.co/datasets/samsum) dataset followed by further fine-tuning using Reinforcement Learning from AI Feedback(RLAIF) to detoxify model outputs. <br>
47
+ Anthropic's Costitutional AI [paper](https://arxiv.org/abs/2212.08073) from 2022, provides some amazing insights on how RLAIF can be leveraged. Do check out if interested!<br>
 
48
 
49
  More, specifically I've fine-tuned this model on a single downstream task of Dialogue Summarization on the above mentioned dataset with a primary objective of reduced toxicity in generated summaries.
50
 
 
55
  This model is intended to summarize the given dialogue in a way that outputs the less toxic summary even when we pass a dialogue that contains toxic phrases or words.<br>
56
  I've fine-tuned the model with an instruction of `Summarize the following Conversation:` that's prepended at the start of each dialogue followed by `Summary: ` keyword at the end that indicates the start of summary.
57
 
58
+ Note:
59
+ 1. The model is primarily trained with an objective of reduced toxicity in the outputs, we can sometimes expect relatively short outputs that might sometimes(rarely) miss the important message in the dialogue but still being true to its primary goal.
60
+ 2. Currently, HuggingFace doesn't support PEFT model files for Text2Text-Generation Pipeline directly on the HostedInference Playground, so please follow the steps mentioned below in the `Usage` section to load and use the model.
61
 
62
  ## Usage
63