Google Releases Lyra V2: A Better, Faster, And More Versatile Speech Codec

On Oct 12, 2022

Google Releases Lyra V2: A Better, Faster, And More Versatile Speech Codec. The foundation of Lyra V2 is an end-to-end neural audio codec known as SoundStream. The architecture includes a residual vector quantizer (RVQ), which quantizes the encoded data into a bitstream and reconstructs it on the decoder side before and after the transmission channel. By adjusting the number of quantizers used, Lyra V2’s bitrate can be changed at any moment, thanks to RVQ’s incorporation into the design. Higher-quality audio is produced when more quantizers are utilized (at the cost of a higher bitrate). They offer three distinct bitrates in Lyra V2: 3.2 kbps, six kbps, and 9.2 kbps. This gives developers the option to select the bitrate that best suits their network requirements and quality standards.

The model for Lyra V2 is exported as TensorFlow Lite, a lightweight cross-platform solution for mobile and embedded devices compatible with a wide range of operating systems and hardware accelerations. The code has been tested on Linux and Android smartphones with experimental Mac and Windows support. Operations on ios are not currently supported.

It is expected that it will be doable. Additionally, this paradigm makes Lyra compatible with any upcoming platform that supports TensorFlow Lite.

Performance

The delay is cut from the prior architecture’s 100 ms to the new architecture’s 20 ms. Lyra V2 is equivalent to Opus, the most popular audio codec for WebRTC, in this sense, which often has delays of 26.5 ms, 46.5 ms, and 66.5 ms. Additionally, Lyra V2 decodes and encodes five times more quickly than Lyra V1. A 20 ms audio frame is encoded and decoded by Lyra V2 on a Pixel 6 Pro phone in 0.57 ms, which is 35 times quicker than real-time. Because Lyra V2 is less complicated than V1, it can run on more phones simultaneously and uses less battery power overall.

What was offered in Lyra V1 (the build tools, testing frameworks, C++ encoding, decoding API, signal processing toolchain, and example Android app) is still included in Lyra V2; the Lyra V2 API seems similar to developers who have used the Lyra V1 API. However, there have been a few changes. For instance, bitrates can now be changed while encoding (more information is available in the release notes). Additionally,.tflite files containing the model definitions and weights are included. Similar to V1, this release is a beta version, and it is anticipated that the API and bitstream will change. The Apache license is used to open-source the code used to run Lyra.

Reference: https://opensource.googleblog.com/2022/09/lyra-v2-a-better-faster-and-more-versatile-speech-codec.html


Please Don't Forget To Join Our ML Subreddit

Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good.

Asif’s latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning.

Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the ‘Influential Journalists in AI’ (https://onalytica.com/wp-content/uploads/2021/09/Whos-Who-In-AI.pdf). His interview was also featured by Onalytica (https://onalytica.com/blog/posts/interview-with-asif-razzaq/).

Credit: Source link