Facebook IconInstagram IconLinkedIn IconForward Leaing Logo in whiteTwitter IconYoutube Icon Artificial IntelligenceData ScienceDigital MarketingGrowth MarketingInterviewRoboticsStart-up
Ahead of the Big Data & AI Leaders Summit Singapore 2019, we interviewed Richard Cartwright, Director, Speech Analytics at Dolby. With over 20 years’ experience in digital audio signal processing and speech recognition, Richard has designed and coded algorithms that power the audio experience across multiple audio categories from stadium concert sound reinforcement, through bass guitar effects pedals to consumer electronics devices and business teleconferencing. At Dolby, Richard leads a team of researchers and engineers that melds classical signal processing techniques with machine learning.
How important is Natural Language Processing (NLP) to the future of AI?

Natural Language Processing is unquestionably one of the technologies that are most widely recognised as being important right now. So much so that many times I see people reaching for an NLP-based solution in fields like speech recognition at the expense of asking whether there are things they can do at a more fundamental audio and acoustics level that will improve performance. In my talk I’ll cover some of the more fundamental opportunities that I think exist. I think we might see more attention land back onto other parts of the chain in future years as the ability of NLP to improve performance on top of imperfect information from current acoustic modelling techniques plateaus.
What are you most excited about in AI right now?

I’m really excited about the opportunity for greater cross-domain collaboration. What I mean by that is that there’s a heap of hype around AI systems at the moment and I’m seeing a tendency for technologists to latch onto AI techniques hastily without stopping to understand whether some of the more classical techniques could help them. Conversely, some experts in classical fields are a little sceptical about applying AI techniques. I think that as our view of AI technology matures, we’ll need to learn to know when to apply human intelligence and knowledge and when to apply artificial intelligence to build systems that really work well. We’ll only get there as more experts across a range of fields understand AI technology and as the hype around AI technology subsides in favour of the view that machine learning is only one of the many arrows in a technologist’s quiver.

What’s the biggest boundary to integrating AI for consumer electronics?

I would say that a lot of the AI tech that I see people getting excited about hasn’t really gone through a rationalisation process yet. For example, if you’ve got a large neural net model with 25 million parameters in it, do you really deeply understand how much value each of those parameters is adding to doing the task your consumer is going to care about? Have you done the hard work to find out whether there’s an alternate model formulation or an alternate training technique or a way to get similar results with a blend of a learnt model and a hand-crafted algorithm? How can you possibly test how your product will respond under all the conditions that your customers will subject it to when you’ve built it from a limited set of training examples? Do you understand the range and minimum required precision of all your values so that you can implement your system efficiently in cheap low-power fixed point operations? These are the sorts of questions that are key to ask to achieve successful roleout AI in consumer electronics products at scale.

What do you think the biggest myth around AI and machine learning being propagated around your industry is?

The myth I hear most often propagated is that domain knowledge is no longer required. AI changes the way we need to apply our domain knowledge, but we certainly still need to apply it. We now need to apply it in constructing the right training sets, choosing the right loss functions and so forth instead of constructing the maths of our algorithms by hand. One of the things I’ll talk about is some examples of things that can go wrong in audio AI systems if you don’t control for basic confounds that any experienced audio signal processing person should know to look out for.

How has the industry’s attitude towards machine learning and other AI changed from when you first entered the field compared to today?

When I first started in speech recognition, technologies like Deep Learning were seen as quite difficult to apply. You needed to go and find specialist GPU hardware and use quite difficult-to-set-up software to drive them. Today it’s become quite easy for someone to pick up tools like Pytorch or Tensorflow, hire a GPU for the day on Amazon Web Services and go and train a network from a bunch of examples without really having to understand the details of what’s going on under the hood. This leads to the attitude that an AI solution is the obvious first thing to try rather than the attitude of a few years back that it’s a difficult technology that needs a lot of expertise to get right.

Where do you see AI and machine learning leading your company and industry in the future?

I think as an audio signal processing community we haven’t really asked what new types of experiences AI could enable that we couldn’t really do in the past. Today we’re using AI to solve classic tasks better - or sometimes with shorter time-to-market - than we did in the past. I’m thinking of tasks like speech recognition, speaker recognition, noise suppression, and voice activity detection. However, I think AI might lead us to algorithms that work with a much deeper understanding of the human context behind the audio that our algorithms here. Imagine a home/office or car in which you can always hear what you want without having to explicitly ask for it and in which your technology can always hear you whenever you want it to, without invading your privacy or getting in the way when you don’t want it to. That kind of complete understanding of what we humans want is missing from today’s AI but is a direction we want to head to.

What are the main challenges to your role this year?

I lead a research group looking at ways to really effectively meld traditional audio signal processing with AI techniques and multi-device orchestration. This is a giant research area as we get deeper and deeper into this work, it would easy for the team to get side-tracked onto solving some of the more traditionally-framed problems. For example, building a better speech recognizer or building a better beamformer. As the team grows a constant challenge will be continuing to hold that high-level mission firmly in view without getting lost down a rabbit hole somewhere.

What will you be discussing in your presentation?

I’ll be talking about how speech recognition works at a high level. I’ve got lots of examples of actual sound recordings demonstrating why it’s a difficult problem and the mechanisms by which today’s systems do so much better than the systems of a few years ago. I’ll also be talking about where I see some opportunities to improve and where I think some of the tripping hazards lie. Lastly, I’ll be sharing some of my perspectives on how to blend more traditional expert thinking with machine learning techniques.
To find out more about Richard's presentation at the Big Data & AI Leaders Summit, please visit here.

Are you a summit attendee or a speaker? Please enter your email address for the activation.

Become a Leading Member Today

It’s easy to get onboard and start benefitting instantly. Either enrol at a Summit or sign up for an annual membership.