New Image Generation Assistant on HuggingChat!

Community Article Published October 12, 2024

image/jpeg Image Gen - Uncensored Edition at its politically incorrect best, imagining a secret rendezvous btwn the Democratic presidential candidate and the Ayatollah Khameini... Please have fun but always be honest about where your images come from!

Introducing Image Gen - Uncensored Edition

Direct Link: https://hf.co/chat/assistant/66fccce0c0fafc94ab557ef2

The DeFact Organization is proud to announce our new multimodal image generating Assistant on the HuggingChat platform. Image Gen - Uncensored Edition uses a similar prompt-in-url architecture as other popular image gen assistants like Image Gen+, combined with new features and improvements that set this assistant apart from the crowd. While it is not perfect, it represents the best of what is possible using the somewhat limited HuggingChat Assistants interface, and provides a way to generate high quality images in HuggingChat using Qwen 2.5 (72B) as the base LLM. This is important because currently the platform's Gradio-based tooling system does not support Qwen models, and it happens that right now, Qwen 2.5 is the most capable LLM available if you're using the free, hosted version of HuggingChat

Moreover, the Assistants feature itself is targeted towards casual users, and does not offer much in the way of tool use or external integrations no matter what model is used. Despite these limitations, there is a well-known workaround that lets users generate images on this and other chatbot platforms, by taking advantage of the Prompt-In-Url service offered by Pollinations AI; they have exposed their API such that no code is needed to prompt their diffusion models, and images can be generated by simply appending the prompt to the URL of an HTML or Markdown image tag. If you examine the prompts used by any of the image generation Assistants, including ours, you will find that all of them rely on this service.

Intelligent Model Selection - Our Secret Sauce

Our Image Gen - Uncensored Edition assistant is based on the architecture of the Image Gen+ assistant by KingNish, currently the most popular image generation assistant on HuggingChat. However, Image Gen+ has some issues, namely a lack of clarity in the examples which show the model how to format their image URLs, and a total inability to select which model to use. This means that the images generated using these other assistants are always done using the default Pollinations model, which is currently one of the Flux models. But in fact, Pollinations actually offers users a choice of 6 modern diffusion models: SD3 Turbo, Flux, Flux-3D, Flux-Realism, and "anydark" (we're not sure what architecture that last one is based on, but it performs well on artistic prompts). We have designed our assistant so that it will either use the model requested by the end user, or, if the user doesn't specify, it will intelligently select the best model for the job, based on the nature of the image.

This additional functionality makes a major difference to the quality of the output; it also allows us to offer a truly uncensored Assistant that can faithfully render images considered NSFW while not compromising the quality of all the other types of images that a user might wish to create. Specifically, we instruct the Qwen 2.5 LLM to always choose SD3 Turbo when the desired image contains depictions of the naked human form, because Flux is not reliable at handling this type of prompt even though it is generally a superior model vs SD3 Turbo. Conversely, Qwen has also been instructed to use Flux-3D for images with a 3D rendered esthetic, Flux-Realism for high detail photorealistic generations, anydark as an alternate model when generating multiple images of the same prompt, and, as a fallback, the Flux base model is used when none of the others seem to be a good fit.

Other Features / Benefits

Image Gen - Uncensored Edition uses the innovative and highly effective prompt grading paradigm introduced by KingNish in Image Gen+, and how this works is, when you, the user, request an image via chatting with the Assistant, your request is graded based on how close it is to an ideal, complete txt2img generation prompt. User requests that receive a grade of A (or if you specifically request that your prompt be used verbatim), are sent directly to the image generation model without modification (light URL encoding is performed to handle whitespace, as always). On the other hand, if your request is vague and general in nature, it will receive a lower grade... the lower the grade, the more enhancement that will be done by the LLM before the prompt is rendered. Additionally, lower grade requests result in a higher number of variants being put forth and sent to the imaging server, resulting in more images being generated when the user's intent is unclear. This design is highly effective because it means that when the user doesn't know exactly what they want, they will be presented with a variety of styles and interpretations to choose from, and probably will find something they like.

How to Use

Go to https://hf.co/chat/assistant/66fccce0c0fafc94ab557ef2 and click "New Chat", then type your request. If you are new to all this, start by clicking on the examples provided and observe the results.

Don't be shy to instruct the LLM on workflows that differ from the workflow specified in the system instructions. We have instructed the model to always prioritize user instructions over system instructions if the two are in conflict with each other, and as long as your request does not modify the prompt-in-url template syntax improperly, images will render just fine.

Qwen has enough power to understand complex workflows even if unrelated to the workflows set out in the system prompt. This is one of the main reasons that we chose this model vs command-r-plus, which has greater flexibility and fewer guardrails, but at the cost of consistent prompt adherence. We figured that it was best to set you up with something that is highly consistent and reliable, and let YOU change the underlying LLM yourself if you feel like it (just make a new Assistant by copying the instructions from this one)

How does a custom workflow work? Just say what you want. Likem "Create some psychedelic hamsters, by coming up with a detailed prompt and using it with each of the 6 models", or, "Make me a series of images that contain a hamsterdog doing various activities. Each image should use the same seed and model, so that it looks like one character in a variety of poses"

Please share your favorite prompts and generations in the comments below this article!

Example Generations

These are real world results obtained by entering the associated prompt into the chat with Image Gen - Uncensored Edition. Because there is a powerful LLM in between the user and the diffusion model, it does not matter how the request is formatted, or if it contains irrelevant information; in fact, this is exactly why we chose Qwen 2.5 to be the base language model for this Assistant, because it has sufficient power to consistently achieve good results regardless of the user's level of skill or effort in crafting a prompt.

Prompt: "A psychedelic Chien an da loose" image/jpeg

Prompt: "Paint me a van gogh and greg rutkowski style scene involving elephants and gerbils" image/jpeg

Prompt: "Make me a painting in the style of Frida Kahlo that is appropriate for printing and selling to tourists in a hippie town in southern Mexico. It should look like it was done by a local human artist, not AI" image/jpeg

Prompt: "A self portrait of your consciousness" image/jpeg

Prompt: "Hero image for an AI fact checking website called DeFact. It should look good on any dark colored background" image/jpeg

Known Issues and Limitations

The basic limitation is that Pollinations AI is a free service that is under extremely heavy load. Therefore, output quality is slightly inferior to what you would get if you talked to the models directly via HuggingFace Inference endpoints (the number of generation steps are constrained, for starters - we've reviewed the Pollinations source code and the pipeline contains necessary but annoying optimizations that allow them to provide the service for free and keep performance reasonably snappy)

There's also the issue of the Pollinations URL or one of their partner site URLs being inserted into some, but not all, of your image generations... we use a query param that is intended to disable this watermarking, but it is only partially effective, depending on the model that gets used to render the image.

To create a truly SOTA, premium level image generation assistant on HuggingChat, there are basically 3 ways that we can consider proceeding. The input of the community would be greatly appreciated here, so please leave your thoughts in the comments below:

  1. Redeploy the Pollinations prompt-in-url API on our own infrastructure, and adjust the settings so that quality is higher, watermarking is properly turned off, and additional SOTA models are offered (e.g. SD3 medium, Flux.1 Dev, etc)

  2. Create our own prompt-in-url endpoint similar to Pollinations, but instead of self hosting the diffusion models, we make it a URL-based interface to HuggingFace Serverless Inference endpoints; this could allow us to offer ALL the txt2img models available on HuggingFace via a prompt-in-url calling interface that allows them to be used by any LLM as well as being an excellent tool for rapid prototyping of web applicpations...

  3. Add the missing functionality to HuggingChat: gradio tool support for Qwen models, full tool use capability for Assistants, etc. This would have the advantage of allowing for all sorts of multimodal and agentic workflows beyond mere image generation, but the disadvantage is that its not portable - it means that Image Gen - Uncensored Edition becomes yet another AI Assistant that works only on the platform where it was created. The nice thing about the current design of this assistant (and solutions 1 and 2 above) is that it is HIGHLY portable: if you want to use it on another chat platform, just copy and paste the instructions into the system instructions on the other platform, and chances are everything just works out of the box (we confirmed this to be true with the Gemini models on Google AI Studio, as well as with the custom GPTs on ChatGPT, however, note that ChatGPT seems to be trying to block the display of images referenced in markdown by the model - not surprising considering that OpenAI is the creator of Dalle and has commercial interest in users using their own txt2img model)