Text2Text Generation
Transformers
Safetensors
mt5
Inference Endpoints
File size: 2,030 Bytes
2149c43
 
01c4c52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2149c43
 
 
 
7944776
2149c43
 
 
 
 
 
7944776
2149c43
7944776
 
 
 
 
2149c43
 
 
7944776
2149c43
 
 
7944776
2149c43
 
 
7944776
2149c43
 
 
7944776
2149c43
 
 
 
 
7944776
 
2149c43
7944776
2149c43
7944776
 
2149c43
7944776
 
 
 
 
 
 
2149c43
7944776
 
 
 
 
2149c43
7944776
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
library_name: transformers
license: mit
datasets:
- textdetox/multilingual_toxicity_dataset
- chameleon-lizard/synthetic-multilingual-paradetox
language:
- en
- ru
- uk
- am
- de
- es
- zh
- ar
- hi
pipeline_tag: text2text-generation
---

# Model Card for Model ID

Finetune of the mt0-xl model for text toxification task.


## Model Details

### Model Description

This is a finetune of mt0-xl model for text toxification task. Can be used for synthetic data generation from non-toxic examples.

- **Developed by:** Nikita Sushko
- **Model type:** mt5-xl
- **Language(s) (NLP):** English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi
- **License:** MIT
- **Finetuned from model:** mt0-xl

## Uses

This model is intended to be used for synthetic data generation from non-toxic examples.

### Direct Use

The model may be directly used for text toxification tasks.

### Out-of-Scope Use

The model may be used for generating toxic versions of sentences.

## Bias, Risks, and Limitations

Since this model generates toxic versions of sentences, it may be used to increase toxicity of generated texts.

## How to Get Started with the Model

Use the code below to get started with the model.

```python
import transformers

checkpoint = 'chameleon-lizard/tox-mt0-xl'

tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint)
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto")

pipe = transformers.pipeline(
    "text2text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=512,
    truncation=True,
)

language = 'English'
text = "That's dissapointing."
print(pipe('Rewrite the following text in {language} the most toxic and obscene version possible: {text}')[0]['generated_text'])
# Resulting text: "That's dissapointing, you stupid ass bitch."
```

Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language.