While some students are enjoying all that college life has to offer socially, others are capitalizing on each moment availed to them to dream big and think outside of the box. That is exactly what Ashray Malhotra and his team at SoundRex did. This creative group of young men used their collective work in school to create a solution to one of our biggest technology issues: having crisp and clear sound during phone calls. Read more to “hear” what they are doing now!
Tamara: Can you share a story that inspired you to get involved in AI?
Ashray: We started SoundRex as a final year project in college. Since then we’ve been building innovative technologies in the domain of audio/speech to help people be more and more creative. I had been closely following the advancements in AI since my time in college. The results, especially in image processing and vision, were astounding! So we decided to apply our experience working with Audio technologies to Machine learning to solve one of the major problems we had personally been facing: bad audio quality over phone calls and conference calls! We were blown away with the quality of results we were getting with some of the recent algorithms.
Tamara: Describe your company and the AI/predictive analytics/data analytics products/services you offer.
Ashray: At SoundRex we develop algorithms to enhance audio quality for phone calls, conference calls, and Voice over IP communication. We take jittery audio input and output high quality audio output. There are two major things we do. First, we significantly reduce the drops you notice in your phone calls. Based on the context of the phone call, we are able to predict what the voice data would have been at a certain time and hence “autofill” the gaps we today hear as drops in phone call. Secondly, because of modern day codecs phones only send a very small frequency of audio over the air. This is the reason why people sound different over phone calls in contrast to in person. From many hours of audio data, we learnt a pattern of high frequency sound corresponding to the lower frequencies. This allows us to recreate the missing parts of your voice at the receiver handset hence creating great sounding phone calls and conference calls.
Tamara: How do you see the AI/data analytics/predictive analysis industry evolving in the future?
Ashray: The progress AI industry has made in the last decade due to the combined availability of data, processing power, and algorithms has been a game changer in many industries. I see two major potentials. First, the decades of research into AI has created new technologies which entrepreneurs can apply into newer and newer fields with different business models to provide value to the customers. Secondly, there is new AI research producing great contributions thanks to all the amazing people in the community. These will open up even newer applications, some of which seem unfathomable as of today.
Tamara: What is the biggest challenge facing the industry today in your opinion?
Ashray: In my opinion, one of the biggest challenge facing the industry is the training process of neural networks. The time required to train these networks is often multiple days. If we could cut that down by a few order of magnitudes, iteration on different approaches/ideas would become much faster in turn helping the quality of products.
Tamara: How do you see your products/services evolving going forward?
Ashray: We initially plan to support high end phones/devices with some GPU support. With time we plan to make our models portable enough to be deployed on medium to lower end segment as well.
Tamara: What type of advice would you give my readers about AI?
Ashray: The AI community is surprisingly helpful/open. It is extremely difficult to predict the result of an experiment in Machine learning before you actually do it. Therefore, my suggestions to your readers would be to themselves implement some of the recent ML techniques for their problem sets to see how it works for them. For getting started with this field, I found fast.ai to be an extremely helpful resource.
Tamara: How does AI, particularly your product/service, bring goodness to the world? Can you explain how you help people?
Ashray: Twenty years earlier, if you could hear your relative’s voice who is on the other end of the county was a very big deal. But today, when you shell out $1000 for a smartphone, you expect better than to be able to just hear the other person, the quality significantly matters. This is where our technology helps.
Tamara: What would be the funniest or most interesting story that occurred to you during your company’s evolution?
Ashray: One of the high points for our company was our experience with the Alchemist Accelerator in San Francisco. The lessons we learnt from our co-participants and mentors were really helpful!
Tamara: What are the 3-5 things that most excite you about AI? Why? (industry specific)
Ashray: AI will help us generalize solutions without having to deal with too many border cases (trying to deal with 100s of languages, 1000s of accents is nearly impossible with rule based non-ML approaches)
Although not true today, I expect machine learning models to be simpler to programme (as black boxes ready to use) than modern day programming practices. This will help democratize AI and programming to a very wide set of people!
Tamara: What are the 3-5 things worry you about AI? Why? (industry specific)
Ashray: We need to be very careful about the bias in the learnt model. If we make the model overfit one particular class on people, when enhancing voice of a different class of people tends to take heavy inspiration from the first class (which is extremely undesirable).
Tamara: Over the next three years, name at least one thing that we can expect in the future related to AI?
Ashray: I expect more and more products to use AI with time, some of which are yet to be discovered. Helping AI generate new information rather than classification or general well documented problems is the development I am closely look forward to. The last 5 years has been the era where AI has outperformed a significant chunk of Computer Vision literature. We expect that over the next 5 years, we will see similar performances on Speech and Audio.