
In hybrid approaches to STT, the recognition system consists of several components, usually an acoustic machine learning model, a pronunciation ML model, and a language ML model. To date, the most successful approaches can be divided into hybrid and end-to-end solutions. There are many approaches to solving this problem, and new breakthrough techniques are constantly emerging. The task of speech recognition (speech-to-text, STT) is seemingly simple - to convert a speech (voice) signal into text data. Analysts like Gartner expect the use of speech to text (STT) to only increase in the next decade.

Smart voice AI assistants, call center agent enhancement and conversational voice AI are just a few of the most common uses. At the end of the article, you will find benchmarks of Transformer-based speech recognition models.ĭevelopers use speech recognition to create user experiences for a variety of products. Along the way, there will be many links that will allow you to parse the details of the described techniques in more detail. Every particularly exciting idea is highlighted in bold. This article provides an overview of the main tricks that were used when using Transformer-based architectures in speech recognition. Let’s talk about something that cannot but excite your imagination - let’s talk about key breakthroughs that have occurred in speech recognition thanks to transformers. The following screenshots of LaTeX documents come from my Master thesis “Homomorphic Random Forest”.Dear reader. The notebook of interest is, and you can execute it using Google Colab at Note: because of Medium’s constraints which make it hard to write LaTeX and code, a notebook containing the same content as this article is available on my Github repository containing all articles with code at:

One can see it as putting a message into a glass bottle, as the message is packed in a container that makes it “easier” to handle, while the message is still clearly visible and anyone can recover it. Therefore CKKS first starts by encoding a vector into a plaintext polynomial. Why do we need it to use polynomials instead of vectors you might ask ? Well it’s because polynomials have a “nice” structure, which allows to do some computation very efficiently, while still maintaining a strong level of security.

As explained before, to perform computation on an encrypted vector, we must first encode it into a polynomial, before encrypting it into a pair of polynomials. This figure from the last article showed the workflow of CKKS. Overview of CKKS (Source: Pauline Troncy)
