Microsoft Unveils Groundbreaking AI Model VASA-1 for Creating Talking and Singing Portrait Videos

5 months ago 1560

Microsoft has recently released a study regarding its innovative AI model VASA-1. This unique model allows users to connect portrait photos to audio files, resulting in videos where the photos can appear to 'talk and sing in a realistic manner'.

The primary focus of the AI model is for designing virtual characters. According to Microsoft, "VASA-1 has the capability to create lip movements that perfectly sync with the audio. Additionally, it can capture a wide range of subtle facial expressions and natural head movements, enhancing the sense of authenticity and vitality."

Microsoft has released various videos showcasing the capabilities of VASA-1, including a video featuring a rapping Mona Lisa. Users have the ability to customize features such as head movements and gaze direction. In offline mode, VASA-1 generates high-quality 512x512 pixel videos at 45fps, while the online version can support up to 40fps. Microsoft has stated that they do not intend to commercialize VASA-1 out of concern for potential misuse of the AI model for creating deepfakes.