Israeli AI firm D-ID, which provided technology for projects like Deep Nostalgia, is launching a new platform where users can upload a single image and text to generate video. With this new site called Creative Reality Studio, the company is targeting sectors such as corporate training and education, internal and external corporate communications, product marketing and sales.
The platform is quite simple to use: users can upload a picture of a presenter or select one from the pre-created presenters to start the video creation process. Paid users can access premium presenters that are more “expressive” as they have better facial expressions and hand gestures than standard. After that, users can either write the text from a script or simply upload an audio clip of a person’s speech. Users can then select a language (the platform supports 119 languages), voice and styles such as cheerful, sad, excited and friendly.
The company’s AI-based algorithms will generate a video based on these parameters. Users can then distribute the video anywhere. The company claims that the algorithm only takes half the video duration time to generate a clip, but in our tests it took a few minutes to generate a one-minute video. This may change depending on the type of presenter and language you have chosen.
“The Covid-19 pandemic has accelerated the need for digital content across the globe. A major problem for organizations is the creation of educational content. Reading documents and going through presentations can be dry and boring. In addition, they have to spend thousands of dollars on hiring actors and make educational videos. So we’re using our AI to create presenters and tutors to recreate people and make the content more engaging and effective,” Gil Perry, D-ID’s CEO, told TechCrunch in an interview.
Perry pointed out many use cases for this technology – from a multilingual message from a CEO to employees to personalized wishes for an organization’s users.
D-ID launched the studio for testing in mid-August to iron out bugs before the public launch. And while its main focus is to cater to businesses of all sizes, the company sees great interest from creators on the platform.
D-ID raised $25 million in its Series B funding led by Macquarie Capital back in March – with a total of $47 million raised to date. Until now, the company had relied on others to use its API to create content – Deep Nostalgia is a prime example – with clients like Modeless, Warner Bros. and India-based short video app Josh. Now, the company is expanding its monetization products by launching a PowerPoint plugin alongside this self-service platform. The plug-in adds an interactive presenter to the deck so users don’t just have to read slides aloud. They can choose between different avatars, voices and languages - just like the self-service platform. But there is no provision to have a custom presenter at the moment.
At launch, users will be able to sign up for a free 14-day trial account and create up to five minutes of AI-generated 720p video. After that, they can pay $49 a month to access 15 minutes of full HD AI-generated video, a PowerPoint plugin, and email support.
Users can also upload their own audio clips for voice cloning. In addition, the company is working on a tool that allows users to upload their own footage to train the AI to be more expressive so it can better mimic the person in the video. All these features will be limited to the enterprise level of the company.
While the company faces competition from the likes of Rephrase.ai and Soul Machines in the AI-generated video space, it claims that there are hardly any companies that claim to generate high-quality videos from a single image.
Perry said D-ID doesn’t aim to limit itself to corporate training, communications and marketing videos. It also has ambitions to facilitate real-time video call translation and clone presenters – making an avatar appear on video instead of you while you dictate the audio.
The company is also considering becoming a key player in web3/metaverse development. “Given that we have expertise in generating videos from a single image. We’re thinking about ways to create digital avatars for the metaverse,” Perry said.