File size: 2,524 Bytes
29a1c2d
 
 
 
 
 
 
 
 
 
 
 
f2710b4
 
 
 
 
 
 
70757ce
 
f2710b4
 
 
 
 
 
 
83e3bd6
f2710b4
 
 
 
 
 
 
 
 
 
 
 
 
 
ad4ff47
f2710b4
 
 
 
 
 
 
 
 
 
 
 
 
 
70757ce
 
 
f2710b4
 
70757ce
f2710b4
70757ce
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
title: Image-to-Audio Story Generator
emoji: 🐒
colorFrom: red
colorTo: yellow
sdk: streamlit
sdk_version: 1.29.0
app_file: app.py
pinned: false
license: unknown
---


# πŸ–ΌοΈ Image to 🎧 Audio Story Generator

This project showcases an end-to-end pipeline that transforms an image into an audio story using various AI models and tools.

## 🌟 Overview

The goal of this project is to leverage AI capabilities to convert an uploaded image into an audio story. 
It uses a combination of image captioning, text generation, and text-to-speech models.

## πŸš€ Features

### πŸ“· Image Captioning
- Utilizes Salesforce's `blip-image-captioning-base` model to generate textual descriptions of uploaded images.

### ✍️ Text Generation (Story Creation)
- Employs Meta's `llama-2-70b-chat` model to create a short story influenced by the provided image caption within a positive conclusion of 100 words or less.

### πŸ”Š Text-to-Speech Conversion
- Utilizes Hugging Face's `espnet/kan-bayashi_ljspeech_vits` model to convert the generated story into an audio file.

### 🌐 Streamlit Web App
- Built using Streamlit, allowing users to upload images and visualize the generated image caption, story, and audio.

## πŸ“ Usage

To use this application:

1. Clone this repository.
2. Install the required dependencies using `pip install -r requirements.txt`.
3. Set up the necessary environment variables:
   - `TOGETHER_API_KEY`: TOGETHER AI API key.
   - `HUGGINGFACEHUB_API_TOKEN`: Hugging Face API token.
4. Run the Streamlit app with `streamlit run app.py`.
5. Upload an image file (supported formats: jpg, jpeg, png).
6. Wait for the AI processing to generate the story and audio.
7. Access the image caption, story, and audio outputs.

## πŸ“ Code Structure

- `app.py`: Contains the Streamlit web application code, integrating all functionalities.
- `README.md`: Documentation explaining the project, usage instructions, and dependencies.
- `requirements.txt`: Lists all necessary libraries.

## πŸ™Œ Credits

This project was created with love by @Aditya-Neural-Net-Ninja. 
It makes use of cutting-edge AI models for image analysis, natural language processing, and text-to-speech conversion. 
Special thanks to Streamlit and Hugging Face for their incredible platforms.


**Note:** Please ensure you have the required API keys and tokens for TOGETHER AI and Hugging Face to run this application successfully.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference