--- library_name: transformers license: mit datasets: - textdetox/multilingual_toxicity_dataset - chameleon-lizard/synthetic-multilingual-paradetox language: - en - ru - uk - am - de - es - zh - ar - hi pipeline_tag: text2text-generation --- # Model Card for Model ID Finetune of the mt0-xl model for text toxification task. ## Model Details ### Model Description This is a finetune of mt0-xl model for text toxification task. Can be used for synthetic data generation from non-toxic examples. - **Developed by:** Nikita Sushko - **Model type:** mt5-xl - **Language(s) (NLP):** English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi - **License:** MIT - **Finetuned from model:** mt0-xl ## Uses This model is intended to be used for synthetic data generation from non-toxic examples. ### Direct Use The model may be directly used for text toxification tasks. ### Out-of-Scope Use The model may be used for generating toxic versions of sentences. ## Bias, Risks, and Limitations Since this model generates toxic versions of sentences, it may be used to increase toxicity of generated texts. ## How to Get Started with the Model Use the code below to get started with the model. ```python import transformers checkpoint = 'chameleon-lizard/tox-mt0-xl' tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint) model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto") pipe = transformers.pipeline( "text2text-generation", model=model, tokenizer=tokenizer, max_length=512, truncation=True, ) language = 'English' text = "That's dissapointing." print(pipe('Rewrite the following text in {language} the most toxic and obscene version possible: {text}')[0]['generated_text']) # Resulting text: "That's dissapointing, you stupid ass bitch." ``` Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language.