File size: 1,412 Bytes
96cd98a
 
 
 
 
 
 
 
 
ef19e3f
 
 
 
a9d5518
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
datasets:
- laion/dalle-3-dataset
language:
- en
tags:
- art
- image-to-text
- image-captioning
---

# DALL·E 3 Image prompt reverse-engineering

Pre-trained image-captioning model BLIP fine-tuned on a mixture of `laion/dalle-3-dataset` and semi-automatically gathered `(image, prompt)` data from DALLE·E 3. 

It takes a generated image as an input and outputs a potential prompt to generate such an image, which can then be used as a base to generate similar images.

### Usage:

Loading the model and preprocessor:
```python
from transformers import BlipForConditionalGeneration, AutoProcessor

model = BlipForConditionalGeneration.from_pretrained("blip-dalle3-img2prompt").to(device)
processor = AutoProcessor.from_pretrained("blip-dalle3-img2prompt")
```

Inference example on an image from `laion/dalle-3-dataset`:
```python
from datasets import load_dataset

dataset = load_dataset("laion/dalle-3-dataset", split=f'train[0%:1%]') # for fast download time in the toy example
example = dataset[img_index][0]
image = example["image"]
caption = example["caption"]

inputs = processor(images=image, return_tensors="pt").to(device)
pixel_values = inputs.pixel_values

generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(f"Generated caption: {generated_caption}\nReal caption: {caption}")
```