In January 2021, the OpenAI company founded by Elon Musk and financially supported by Microsoft unveiled its most ambitious project to date, the DALL-E machine learning system.
This ingenious multimodal artificial intelligence has been able to generate (albeit rather cartoon) images based on user-described attributes. Think of “a cat made of sushi” or “an x-ray of a Capybara sitting in a forest.” Recently, the company unveiled the next iteration of DALL-E, which has a higher resolution and lower latency than the original.
The first DALL-E could generate images, as well as combine several images into one collage, providing different perspective angles. It can even deduce elements of an image – such as shadow effects – from the description.
“Unlike a 3D rendering engine, whose inputs must be specified unambiguously and in full detail, DALL · E is often able to” fill in the blanks “when the caption implies that the image must contain some detail that is not explicitly mentioned, ”said the OpenAI team in 2021.
OpenAI DALL-E cannot be used by everyone
DALL-E was never intended to be a commercial product and was therefore somewhat limited in its capabilities, given the OpenAI team’s focus on it as a research tool. It has also been intentionally restricted that the system should not be used to generate misinformation.
DALL-E 2, which uses OpenAI’s CLIP image recognition system, builds on those imaging capabilities. Users can now select and edit certain areas of existing images, add or remove items, combine two images into a single collage, and generate variations of an existing image. The OpenAI CLIP is designed to look at a particular image and summarize its content in a way that people can understand. The company reversed this process, building an image from its summary in the process with the new system.
“DALL-E 1 just took our GPT-3 approach to language and applied it to produce an image: we compressed images into a series of words and just learned to predict what’s to come,” the researcher told Verge. OpenAI, Prafulla Dhariwal.
Unlike the first one, which can be played by anyone on the OpenAI site, this new version is currently only available for testing by verified partners, who themselves are limited in what they can upload or generate with the program.