Phenaki is a video synthesis software that enables users to generate realistic videos from text sequences. It uses a causal model to compress videos into a discrete token format, creating videos of variable lengths. The software claims to outperform existing per-frame baselines in terms of spatio-temporal quality and number of tokens per video. According to the website, joint training on a large corpus of image-text pairs and a smaller number of video-text examples can help generalize beyond existing video datasets. Some users may find this feature useful for quickly generating videos for different use cases, such as news reporting, video game development or marketing campaigns.
Phenaki differentiates itself from similar software competitors by using a bidirectional masked transformer conditioned on pre-computed text tokens for video tokenization.