Amr Nour-Eldin, is the Vice President of Technology at LXT. Amr is a Ph.D. research scientist with over 16 years of professional experience in the fields of speech/audio processing and machine learning in the context of Automatic Speech Recognition (ASR), with a particular focus and hands-on experience in recent years on deep learning techniques for streaming end-to-end speech recognition.
LXT is an emerging leader in AI training data to power intelligent technology for global organizations. In partnership with an international network of contributors, LXT collects and annotates data across multiple modalities with the speed, scale and agility required by the enterprise. Their global expertise spans more than 115 countries and over 780 language locales.
You pursued a PhD in Signal Processing from McGill University, what initially interested you in this field?
I always wanted to study engineering, and really liked natural sciences in general, but was drawn more specifically to math and physics. I found myself always trying to figure out how nature works and how to apply that understanding to create technology. After high school, I had the opportunity to go into medicine and other professions, but specifically chose engineering as it represented the perfect combination in my view of both theory and application in the two fields closest to my heart: math and physics. And then once I had chosen it, there were many potential paths – mechanical, civil, and so on. But I specifically chose electrical engineering because it’s the closest, and the toughest in my view, to the type of math and physics problems which I always found challenging and hence, enjoyed more, as well as being the foundation of modern technology which has always driven me.
Within electrical engineering, there are various specializations to choose from, which generally fall under two umbrellas: telecommunications and signal processing, and that of power and electrical engineering. When the time came to choose between those two, I chose telecom and signal processing because it’s closer to how we describe nature through physics and equations. You’re talking about signals, whether it’s audio, images or video; understanding how we communicate and what our senses perceive, and how to mathematically represent that information in a way that allows us to leverage that knowledge to create and improve technology.
Could you discuss your research at McGill University on the information-theoretic aspect of artificial Bandwidth extension (BWE)?
After I finished my bachelor’s degree, I wanted to keep pursuing the Signal Processing field academically. After one year of studying Photonics as part of a Master’s degree in Physics, I decided to switch back to Engineering to pursue my master’s in Audio and Speech signal processing, focusing on speech recognition. When it came time to do my PhD, I wanted to broaden my field a little bit into general audio and speech processing as well as the closely-related fields of Machine Learning and Information Theory, rather than just focusing on the speech recognition application.
The vehicle for my PhD was the bandwidth extension of narrowband speech. Narrowband speech refers to conventional telephony speech. The frequency content of speech extends to around 20 kilohertz, but the majority of the information content is concentrated up to just 4 kilohertz. Bandwidth extension refers to artificially extending speech content from 3.4 kilohertz, which is the upper frequency bound in conventional telephony, to above that, up to eight kilohertz or more. To better reconstruct that missing higher frequency content given only the available narrow band content, one has to first quantify the mutual information between speech content in the two frequency bands, then use that information to train a model that learns that shared information; a model that, once trained, can then be used to generate highband content given only narrowband speech and what the model learned about the relationship between that available narrowband speech and the missing highband content. Quantifying and representing that shared “mutual information” is where information theory comes in. Information theory is the study of quantifying and representing information in any signal. So my research was about incorporating information theory to improve the artificial bandwidth extension of speech. As such, my PhD was more of an interdisciplinary research activity where I combined signal processing with information theory and machine learning.
You were a Principal Speech Scientist at Nuance Communications, now a part of Microsoft, for over 16 years, what were some of your key takeaways from this experience?
From my perspective, the most important benefit was that I was always working on state-of-the-art, cutting-edge techniques in signal processing and machine learning and applying that technology to real-world applications. I got the chance to apply those techniques to Conversational AI products across multiple domains. These domains ranged from enterprise, to healthcare, automotive, and mobility, among others. Some of the specific applications included virtual assistants, interactive voice response, voicemail to text, and others where proper representation and transcription is critical, such as in healthcare with doctor/patient interactions. Throughout those 16 years, I was fortunate to witness firsthand and be part of the evolution of conversational AI, from the days of statistical modeling using Hidden Markov Models, through the gradual takeover of Deep Learning, to now where deep learning proliferates and dominates almost all aspects of AI, including Generative AI as well as traditional predictive or discriminative AI. Another key takeaway from that experience is the crucial role that data plays, through quantity and quality, as a key driver of AI model capabilities and performance.
You’ve published a dozen papers including in such acclaimed publications as IEEE. In your opinion, what is the most groundbreaking paper that you published and why was it important?
The most impactful one, by number of citations according to Google Scholar, would be a 2008 paper titled “Mel-Frequency Cepstral Coefficient-Based Bandwidth Extension of Narrowband Speech”. At a high level, the focus of this paper is about how to reconstruct speech content using a feature representation that is widely used in the field of automatic speech recognition (ASR), mel-frequency cepstral coefficients.
However, the more innovative paper in my view, is a paper with the second-most citations, a 2011 paper titled “Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech“. In that work, I proposed a new statistical modeling technique that incorporates temporal information in speech. The advantage of that technique is that it allows modeling long-term information in speech with minimal additional complexity and in a fashion that still also allows the generation of wideband speech in a streaming or real-time fashion.
In June 2023 you were recruited as Vice President of Technology at LXT, what attracted you to this position?
Throughout my academic and professional experience prior to LXT, I have always worked directly with data. In fact, as I noted earlier, one key takeaway for me from my work with speech science and machine learning was the crucial role data played in the AI model life cycle. Having enough quality data in the right format was, and continues to be, vital to the success of state-of-the-art deep-learning-based AI. As such, when I happened to be at a stage of my career where I was seeking a startup-like environment where I could learn, broaden my skills, as well as leverage my speech and AI experience to have the most impact, I was fortunate to have the opportunity to join LXT. It was the perfect fit. Not only is LXT an AI data provider that is growing at an impressive and consistent pace, but I also saw it as at the perfect stage in terms of growth in AI know-how as well as in client size and diversity, and hence in AI and AI data types. I relished the opportunity to join and help in its growth journey; to have a big impact by bringing the perspective of a data end user after having been an AI data scientist user for all those years.
What does your average day at LXT look like?
My average day starts with looking into the latest research on one topic or another, which has lately centered around generative AI, and how we can apply that to our customers’ needs. Luckily, I have an excellent team that is very adept at creating and tailoring solutions to our clients’ often-specialized AI data needs. So, I work closely with them to set that agenda.
There is also, of course, strategic annual and quarterly planning, and breaking down strategic objectives into individual team goals and keeping up to speed with developments along those plans. As for the feature development we’re doing, we generally have two technology tracks. One is to make sure we have the right pieces in place to deliver the best outcomes on our current and new incoming projects. The other track is improving and expanding our technology capabilities, with a focus on incorporating machine learning into them.
Could you discuss the types of machine learning algorithms that you work on at LXT?
Artificial intelligence solutions are transforming businesses across all industries, and we at LXT are honored to provide the high-quality data to train the machine learning algorithms that power them. Our customers are working on a wide range of applications, including augmented and virtual reality, computer vision, conversational AI, generative AI, search relevance and speech and natural language processing (NLP), among others. We are dedicated to powering the machine learning algorithms and technologies of the future through data generation and enhancement across every language, culture and modality.
Internally, we’re also incorporating machine learning to improve and optimize our internal processes, ranging from automating our data quality validation, to enabling a human-in-the-loop labeling model across all data modalities we work on.
Speech and audio processing is rapidly approaching near perfection when it comes to English and specifically white men. How long do you anticipate it will be until it’s an even playing field across all languages, genders, and ethnicities?
This is a complicated question, and depends on a number of factors, including the economic, political, social and technological, among others. But what is clear is that the prevalence of the English language is what drove AI to where we are now. So to get to a place where it’s a level playing field really depends on the speed at which the representation of data from different ethnicities and populations grows online, and the pace at which it grows is what will determine when we get there.
However, LXT and similar companies can have a big hand in driving us toward a more level playing field. As long as the data for less well-represented languages, genders and ethnicities is hard to access or simply not available, that change will come more slowly. But we are trying to do our part. With coverage for over 1,000 language locales and experience in 145 countries, LXT helps to make access to more language data possible.
What is your vision for how LXT can accelerate AI efforts for different clients?
Our goal at LXT is to provide the data solutions that enable efficient, accurate, and faster AI development. Through our 12 years of experience in the AI data space, not only have we accumulated extensive know-how about clients’ needs in terms of all aspects relating to data, but we have also continuously fine-tuned our processes in order to deliver the highest quality data at the quickest pace and best price points. Consequently, as a result of our steadfast commitment to providing our clients the optimal combination of AI data quality, efficiency, and pricing, we have become a trusted AI data partner as evident by our repeat clients who keep coming back to LXT for their ever-growing and evolving AI data needs. My vision is to cement, improve and expand that LXT “MO” to all the modalities of data we work on as well as to all types of AI development we now serve, including generative AI. Achieving this goal revolves around strategically expanding our own machine learning and data science capabilities, both in terms of technology as well as resources.
Thank you for the great interview, readers who wish to learn more should visit LXT.
Credit: Source link
Comments are closed.