Introduction
A few weeks ago, OpenAI introduced its new model, sora, that is capable of generating minute long videos based on user prompt without losing quality and without any distortion. This is a huge turning point in the field of AI, it can have lots of impact on the current market and act as a pivot for companies that work on images/videos and provide similar services. It is also a lifesaver for video editors.
As good as it may sound, the architecture of the model is very complex and the mechanism being used behind it is state of the art.
When generating, videos are compressed into a low dimension latent space and are then decomposed into patches, which act as an input for the transformer. This is the same mechanism that generates text for every LLM out there. These transformers are optimized for scaling effectively, which means that the quality of generated videos increases.
Sora is also trained heavily for language understanding. Thus, it can accurately generate videos based on the user prompt.
Use
Sora can edit videos and images. Suppose there is a video of children playing football. You can ask sora to make the weather appear cloudy or sunny; you may also ask sora to make the field appear green and lush.
It is also capable of generating videos from an image prompt. The model requires an image and an appropriate prompt for achieving this result.
This is possible because of the SDEdit model. It has been integrated with sora to achieve these results.
Apart from this, sora can also create loops of videos. E.g. A cyclist cycling in a circular valley. It can also exhibit 3D simulation capabilities, long range coherence and interactions with the environment. For example, while generating a video of a painter working on a canvas, it will show minute details such as brush strokes and the same colour on the brush, canvas and the paint shell.
This model is useful for people who need to get little editing and modification done to their video. It can also prove to be useful for content creators.
- Data Visualization
- Social media
- Artificial data generation
Limitations
Like every other AI model and application, sora has its limitations too. It cannot closely depict the physics of certain actions. For example, shattering of a glass, or a car crash. Although OpenAI is working on this and there will be lots of improvements on this in the upcoming future.
- Can generate harmful content
- Stereotypes and bias
- Can generate misinformation