We are excited to announce the launch of the Mozilla Voice Challenge,” a crowdsourcing competition sponsored by Mozilla and posted on the HeroX platform. The goal of the competition is to better define the voice technology space by creating a “stack” of open source technologies to support the development of new voice-enabled products.

(This piece was posted to the Mozilla Open Innovation - Medium blog on July 25, 2019)

The Power of the Voice

Voice-enabled products are in rapid ascent in both consumer and enterprise markets. The expectations are that in the near future voice interaction will become a key interface for people’s internet-connected lives.

Unfortunately, the current voice product market is heavily dominated by a few giant tech companies. This is unhealthy as it stifles the competition and prevents entry of smaller companies with new and innovative products. Mozilla wants to change that. We want to help opening up the ecosystem. So far there have been two major components in Mozilla’s open source voice tech efforts outside the Firefox browser:

(1) To solve for the lack of available training data for machine-learning algorithms that can power new voice-enabled applications, we launched the Common Voice project. The current release already represents the largest public domain transcribed voice dataset, with more than 2,400 hours of voice data and 28 languages represented.

(2) In addition to the data collection, Mozilla’s Machine Learning Group has applied sophisticated machine learning techniques and a variety of innovations to build an open-source speech-to-text engine that approaches human accuracy, as well as a text-to-speech engine. Together with the growing Common Voice dataset Mozilla believes this technology can and will enable a wave of innovative products and services, and that it should be available to everyone.

And this is exactly where this new Mozilla Voice Challengefits in: Its objective is to better define the voice technology space by creating a “stack” of open source technologies to support the development of new voice-enabled products.


Stacking the Odds

For the purpose of this competition, we define voice-enabled technologies as technologies that use voice as an interface, allowing people to interact with various connected devices through verbal means — both when speaking and listening.

We envision that some elements of this stack would be the following technologies:

  • Speech-to-text (STT)
  • Text-to-speech (TSS)
  • Natural Language Processing (NLP)
  • Voice-signal processing
  • Keyword spotting
  • Keyword alignment
  • Intent parsing
  • Language parsing: stemming, entity recognition, dialog management, and summation.

We want to improve this list by adding more relevant technologies and also identify any “gaps” in the stack where quality open source projects are not available (see the Challenge description for more details). We’ll then place the updated list in a public repository for open access — and to achieve this, all proposed technologies in the stack need to be open source licensed.


How to Participate

The competition was posted to the HeroX platform. The competition will run until August 20, 2019 and the submitted proposals will be evaluated by the members of Mozilla’s Voice team. Up to $6,000 in prizes will be awarded to the best proposals.

The challenge is open to everyone (except for Mozilla employees and their families), and we especially encourage members of Mozilla’s Common Voice community to take part in it.