- AI evolution. Advanced deep learning models now generate images rivaling human-created visuals, revolutionizing the creative process for designers, marketers and artists.
- DALL-E impact. OpenAI's DALL-E has been significant in the development of AI-generated art technologies, with popular AI art generators like Bing Image Creator using the technology.
- Brands benefit. Businesses harness generative AI for image creation, concept development and editing, saving time and money while maintaining quality in the final product.
In recent years, advanced deep learning models have emerged, capable of generating images from text descriptions with remarkable quality. These AI-generated images now rival human-created visuals, enhancing and improving the creative realm for designers, marketers and artists alike. A 2022 Shutterstock report indicated that 29% of those polled currently use generative AI and that 14% use it for work. This article will explore the way that AI image generation works, how brands are using it and will provide insights on how to most effectively use this technology.
DALL-E: How It Works
In January 2021, OpenAI, the organization that created the popular ChatGPT generative AI model, announced DALL-E, an AI image generation model. Generating images using DALL-E is a fairly simple process. Users enter descriptive text to describe the desired image in a text box, and after a short wait, an image is generated that is based on the description the user provided. The latest version, DALL-E 2, generates more realistic and higher-resolution images and is also able to modify existing images, create different versions of images while maintaining their basic features, and combine the features of two images.
Although not all AI image generation models are derived from DALL-E, it has played a significant role in the development of AI-generated art technologies. Many of the most popular AI art generators, such as Bing Image Creator, are indeed powered by OpenAI's DALL-E technology. As such, it makes sense to discuss how DALL-E, and most generative AI image creation tools, are able to generate images from text descriptions.
After being trained on billions of image-text pairs, an AI model understands what most things are. It comprehends the text's context and meaning and is able to create visually accurate images. The AI model employs a series of iterative refinements, starting with a rough image and progressively enhancing it to achieve the desired result. It achieves this by processing both text and images as a single stream of tokens. The attention mechanism in the model allows each image token to attend to all text tokens.
The "attention mechanism" refers to a technique that is used in deep learning, particularly in sequence-to-sequence (Seq2Seq) models, such as those used in machine translation and natural language processing tasks. Put simply, the attention mechanism aims to improve the model's ability to focus on the most relevant parts of the input sequence while generating output, thus enhancing its performance.
From this point, it gets a bit more complex to describe. The attention mechanism computes soft weights for each token in the input sequence. Each input token is assigned a value vector computed from its word embedding. The output of the attention mechanism is the weighted average of these value vectors, where the weights are determined by the attention mechanism's calculation of how relevant each input token is to the current output token. This is how the AI model knows the importance of each word in the descriptive text that the user provides it with. For example, if the user submitted “Try to create a cartoon of a dog wearing a business suit, walking down the sidewalk.” the weight of the words “Try, to, create, a, of, a, down, the” would not be as important as the rest when trying to generate an image.
In a Seq2Seq model, an encoder processes the input, and a decoder generates the output. The attention mechanism enables the model to weigh the importance of different parts of the input sequence when generating each output token. The decoder is the component that actually generates the sequence which forms the response.
Although not all AI image generators are based on DALL-E, most of them have several similarities, including:
- They will reject all attempts to generate sexual, violent and other content that violates the image generator’s content policy.
- They use deep learning models to generate realistic images from text or other inputs.
- They can create novel and diverse images that match the user’s imagination and preferences.
- They can handle complex and abstract prompts that involve multiple concepts, attributes and styles.
- The more specific and detailed the prompt that is provided, the better the results.
Now that we have pried under the hood to explain, as least basically, how AI generates images from text descriptions, we can move on to the specifics of the most popular AI image generation models.
Bing Image Creator
Bing Image Creator is a recently released image generation tool from Microsoft. It is powered by the aforementioned DALL-E, is easy to use, and although it’s not perfect, it creates amazing images. It can be used through its website, as well as by using Bing chat. By providing a description of the image you want it to create, including additional contexts such as location or activity, and selecting an art style, Bing Image Creator will generate the image.
Users are given 100 image generation “boosts” a day for image generation, and when they are used up, the process significantly slows down. Here’s what it created with the following prompt: “A robotic bullfrog smoking a cigarette at the beach.”
Users are presented with four images to select from, and Bing provides tips to fine-tune the image(s), such as requesting a specific style of art.
Stable Diffusion (Dream Studio)
Stable Diffusion was initially released as a tool that was used on Discord, and was created by its founder Emad Mostaque, along with help from the legal, ethics and technology teams at HuggingFace. After extensive testing, the open-source code and AI model were released on Aug. 22, 2022. The announcement stated that “This release is the culmination of many hours of collective effort to create a single file that compresses the visual information of humanity into a few gigabytes.”
Stable Diffusion was built upon the foundations that came from previous generative models like DALL-E 2. Researchers at MIT are exploring the potential for continual learning in diffusion models, with the goal of creating an AI system that can "learn" without forgetting previously acquired knowledge, enhancing the AI-generated content's creativity and complexity, and driving a new era of AI-driven art and design. The current version, Dream Studio, is actually the official API for Stable Diffusion.
Here is an image that was created by Dream Studio using the prompt “A robotic bullfrog smoking a cigarette at the beach.”
Dream Studio is more advanced than Stable Diffusion and has many more options. For comparison, you can try out the Hugging Face demo of the original Stable Diffusion.
Related Article: Artificial Inspiration: Shutterstock's AI Image Platform Takes Flight
Much like when Stable Diffusion was initially released, Midjourney is an AI-based generator that is currently limited to being used on Discord. The website states that “Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.”
While most of the other image generators require users to refine their text descriptions in order to create high-quality images, Midjourney is able to create high-quality images with less-refined prompts. Here are some recent examples of images that have been created with Midjourney:
It’s important to note that while Midjourney is free to use, nonsubscribed users do not own the images they create, and Midjourney can use them without notice. Midjourney has several subscription plans, starting at $10 per month for the Basic plan, $30 per month for the Standard plan, and $60 per month for the Pro plan. All subscribers are licensed to use the images they create for any use, commercial or otherwise.
Related Article: Is Your AI-Generated Content Protected by US Copyright?
Honorable Mention: Imagen
Imagen is Google’s text-to-image diffusion model that features a high degree of photorealism and a deep level of language understanding. Google achieved Imagen by discovering that generic large language models are very effective at using text for image generation, and that increasing the size of the language model increases both sample fidelity and image-text alignment more than increasing the size of the image diffusion model.
Imagen was used in the creation of Google’s DrawBench, which is a tool that can be used to generate images. Currently, DrawBench is only available to a small number of members of Google AI Test Kitchen. Interested parties can sign up to become a Test Kitchen tester, and can download the Test Kitchen app for iOS and Android.
How Are Brands Using Image Generation Models?
Nick Gausling, managing director of Romy Group LLC, and author of the book Bots in Suits: Using Generative AI to Revolutionize Your Business, spoke with CMSWire about how and why brands are using generative AI to create images for commercial use.
Many businesses have gone through the process of hiring an artist to create images that are to be used in the business, only to discover that the images were not exactly what they expected, and the process was overly long and tedious. “Years ago I contracted a professional logo designer and went through a multi-week process of answering concept questions in order to receive a final product,” said Gausling. “More recently I used DALL-E and got a pretty good result in about 10 minutes at less than 1% of the cost. When pretty good is good enough, consider a generative AI tool.”
Human artists take in artwork and images throughout their lives, either through study or exposure on TV, the internet, or day-to-day life. Gausling explained that generative AI is supposed to create new content based on its knowledge base, which is not much different than human artists with a similar context coming up with substantially-similar art.
Gausling brought up the topic of the future of regulation and legislation of generative AI, and suggested that if you're going to use this technology commercially, “it’s a good idea to at least document timelines and the independent co-creative process in case someone later tries to claim infringement against you.”
Other brands are using AI-generated images as the basis for concept creation and design. “The hybrid use of this technology alongside human artists is an exciting prospect that is barely being discussed,” said Gausling. “For example, someone might use generative AI for initial concept creation and getting the design 75% of the way there, then bringing in a human professional to perfect it,” saving money and time while preserving quality in the final product.
Michael Dreas, director of 1WorldSync Studios, an end-to-end product content platform provider, told CMSWire that AI has become very helpful for image editing. “If a customer sends an item to our photography studio and it arrives damaged, AI can help increase the accuracy and speed of image editing which allows us to deliver a better final image product,” said Dreas. “Our CGI services fill in the gaps where customers don’t have a physical package but have the art, allowing us to generate photo-realistic representations of products.” Dreas said that as with other forms of image editing, CGI and 3D model creation benefit from AI. “AI may be able to get you 50-80% of the way to a finished project, but a human is still needed to get the job done.”
What to Be Careful Of?
As mentioned above, it is advisable to carefully read the terms of service for any image generation tools that are used to create images that will be used commercially. Some are free for any use, while others require users to subscribe if they are going to commercially use the images. Businesses that plan to use the images they create with AI image generators should read the generator’s terms of service carefully to ensure that they have the right to use the images commercially.
Angelo Sorbello, founder of Linkdelta, a generative AI platform, is in a great position to understand the nuances and challenges of artificial intelligence, such as inherent biases. “While deep learning image generation models such as Stable Diffusion, DALL-E, Bing Create, Midjourney, and Imagen have shown impressive results in generating high-quality images, it's important to recognize that they are imperfect,” said Sorbello.
“One major concern is the potential for the models to perpetuate existing biases and discrimination.” This isn’t limited to generative AI, but is a concern for all applications of AI, because the AI model is only as good as the data it is trained on, and that data comes from humans with their own biases and prejudices. “If the models are trained on a biased dataset, they may generate images that perpetuate stereotypes or reinforce existing societal inequalities,” explained Sorbello.
Another concern that has been expressed by many industry leaders is that AI has the potential to create what is recently called “deep fakes.” These images, videos, or audio clips seem like they are genuine, but they are completely generated by AI. “These models have the potential to be used maliciously to create fake or misleading images for nefarious purposes,” said Sorbello. “For example, they could be used to create realistic but fake images of individuals that can be used to spread false information or impersonate someone else.”
Final Thoughts on Image Generation Models
Generative AI, capable of producing not only text but also high-quality images, has become a valuable asset for brands in designing logos, illustrations, website visuals and conceptual product depictions. To optimize generative AI's potential, brands should be mindful of the tools' terms of service, refine image generation prompts and master these AI-driven technologies like they would any other software tool. This approach can enhance the creative process, save time on artistic iterations and ultimately reduce costs.