Meta unveils an AI that generates video based on text prompts
AI models that are capable of creating images out of text have been joined by AIs capable of generating videos. Meta created Make-A-Video to describe a new AI system that can create short videos from brief text descriptions, images, or other videos.
The company explained that the new AI was based on the recent progress made by other image-generating models such as DALLE, Midjourney, and stable Diffusion, or Meta’s Make-A-Scene. These models create images using a prompt, or a brief description.
Mark Zuckerberg explained that it is more difficult to create videos than photos because the system must not only generate each pixel correctly but also predict how they will change over the course of time.
Make-A-Video addresses this problem by adding an unsupervised layer of learning that allows one to understand motion and apply it to traditional HTML0-to-image generation.
It is quite surprising, and the company has published several samples to show what its AI can do, even though it doesn’t yet permit access.
The Make-A-Video website offers a variety of results, including “A teddy bear painting a portrait”, “Robot dancing at Times Square”, and “A cat watching TV with a remote in his hand”, as well as descriptions like “A young couple walking through heavy rain” or “Hyper-realistic landing on Mars”. These are all grouped under categories like Surrealistic, Realistic, and Stylized.
Examples of Make-A-Video animations created include a single image, a few images, or variations on existing videos.
Technical documentation indicates that the tool has been trained using two large video databases, WebVid-10M or HD-VILA 100M.
Although Make-A-Video isn’t yet available for users, Zuckerberg explained that a demo would be available soon. It is currently possible to request it via the Make-A-Video Website
Meta’s researchers made a major leap in AI art generation by creating Make-A-Video. This new technique, which is creatively called Make-A-Video allows you to create a video from a text prompt. These results are amazing and varied. Some of them, however, are slightly creepy.
Text-to-video models are a natural extension to text-to-image models such as DALL-E which produce stills from prompts. Although the leap from still image to moving image is relatively small, it’s not difficult to implement in a machine-learning model.
Zuckerberg later stated that the AI research team had written descriptions such as A teddy bear paints a self-portrait”, a baby sloth trying to figure out how to use a computer a spaceship landing on Mars, and a robot surfing the waves of the ocean.
Zuckerberg called it a fantastic achievement, but it’s much more difficult to create videos than photos. The system must not only generate each pixel correctly but also predict how it will change over time.