AI image generation holds great promise. A newly released open-source image composition model, called Stable Diffusion, lets anyone with a PC and a decent GPU imagine almost any visual reality they can imagine. It can mimic almost any visual style, and if you give it a descriptive phrase, the results will appear on your screen like magic.
some artists very happy Depending on the outlook, others unhappy, while society as a whole still seems largely unaware of the rapid technological revolution that is taking place through communities on Twitter, Discord, and Github. Arguably, image synthesis has had as much impact as the invention of the camera—or perhaps the creation of visual art itself. Even our sense of history can be threatened, depending on how things play out. Either way, Stable Diffusion is leading a new wave of deep learning creative tools that promise to revolutionize the creation of visual media.
The Rise of Deep Learning Image Synthesis
Stable Diffusion is the brainchild of former London hedge fund manager Emad Mostaque, whose goal is to bring novel applications of deep learning to the masses through his company Stability AI. But modern image synthesis has its roots back in 2014, and Stable Diffusion isn’t the first Image Synthesis Model (ISM) to make waves this year.
In April 2022, OpenAI released DALL-E 2, which is capable of transforming scenes written in words (called “cues”) into a myriad of visual styles that can be fantastic, lifelike, and even mundane. Shocked by social media. Those with privileged access to closed tools spawn astronauts on horseback, teddy bears buying bread in ancient Egypt, novel sculptures in the style of famous artists, and more.
Shortly after DALL-E 2, Google and Meta announced their own text-to-image AI models. MidJourney, available as a Discord server from March 2022 and available to the public a few months later, charges for access and achieves a similar effect, but with a more graphic and illustrative quality by default.
Then there is stable diffusion. On August 22, Stability AI released its open-source image generation model, which is arguably comparable in quality to DALL-E 2. It has also launched its own commercial website, called DreamStudio, which sells computing time to generate images using stable diffusion. Unlike DALL-E 2, it can be used by anyone, and because the Stable Diffusion code is open source, projects can build on it with few restrictions.
In the past week alone, dozens of projects have sprung up that take Stable Diffusion in new directions. Unexpected results have been achieved using a technique called “img2img”, which “upgrades” the MS-DOS game art, Converted Minecraft graphics Come to life, transforming Aladdin scenes into 3D, translating child-like doodles into rich illustrations, and more. Image composition can bring rich idea visualization capabilities to a broad audience, lowering barriers to entry, while also accelerating the ability of artists to embrace the technology, just as Adobe Photoshop did in the 1990s.
You can run Stable Diffusion locally yourself if you follow a series of somewhat obscure steps. For the past two weeks, we’ve been running it on a Windows PC with an NVIDIA RTX 3060 12GB GPU. It can generate a 512×512 image in about 10 seconds. On the 3090 Ti, the time per image is reduced to 4 seconds. Interfaces are also evolving rapidly, from crude command-line interfaces and Google Colab notebooks to more refined (but still complex) front-end GUIs, with more refinements coming soon. So if you’re not technically inclined, hold on: simpler solutions are coming. If all else fails, you can try an online demo.