Principle and application of voiceprint recognition technology

- Jun 10, 2019-

Voiceprint recognition, a type of biometric technology, also known as speaker recognition, is a technique for discriminating the identity of a speaker by sound. There are two types of voiceprint recognition technology, namely speaker recognition and speaker confirmation. Different voiceprint recognition techniques are used for different tasks and applications. For example, when narrowing the scope of criminal investigation, it may be necessary to identify the technology, and the bank transaction needs to confirm the technology. The People's Bank of China officially released the financial industry standard for "Mobile Finance Based Security Specification for Voiceprint Recognition", which means that voiceprint recognition technology has been recognized by financial regulatory authorities. This also solves the standard problem for voiceprint recognition technology to enter the mobile finance field.

The so-called voiceprint is a sound wave spectrum that carries speech information displayed by an electroacoustic instrument.
The generation of human language is a complex physiological and physical process between the human language center and the vocal organs. The vocal organs used by people in speech--tongue, teeth, throat, lungs, and nasal cavity vary greatly in size and shape. , so the soundprints of any two people are different. Each person's speech acoustic characteristics are both relatively stable and variability, not absolute and immutable. This variation can come from physiology, pathology, psychology, simulation, camouflage, and also related to environmental disturbances. However, since each person's vocal organs are different, in general, people can still distinguish the voices of different people or judge whether they are the same person's voice.
According to different application scenarios, voiceprint recognition can be divided into Speaker Identification (SI) and Speaker Verification (SV). SI refers to the fact that we have a speech to be tested. We need to compare this speech with a dry speaker in a set we know. Choosing the best match is a one-to-many problem. SV means that we have an unknown voice, and then we can determine whether the voice is from this target user. It is a one-to-one binary classification problem.


In application, the application prospect of voiceprint recognition is in the security field, such as criminal investigation, access control, banking transactions, and so on. In addition, in the field of smart homes, for the sake of security, and for a better intelligent experience, such as accurately identifying which sentence is the order issued by the owner in the context of vocalization, the voiceprint recognition technology has gradually received attention.


The main tasks of voiceprint recognition include: speech signal processing, voiceprint feature extraction, voiceprint modeling, voiceprint comparison, and decision making.


1. The voice acquisition with voiceprint features is convenient and natural;
2, the cost of obtaining voice is low, simple to use, like a microphone, communication equipment, etc.;
3. Suitable for remote identity verification;
4. The algorithm for voiceprint recognition and confirmation is low in complexity;
5, with some other measures, such as content identification through speech recognition, can improve the accuracy.

