Szukana fraza: [Słowa kluczowe = "automatic speech recognition"]

Wyniki wyszukiwania

Wyników: 2

Wyników na stronie: 25 50 75

Sortuj wg:

z 1

System dedicated to Polish Automatic Speech Recognition - overview of solutions

K. Pondel-Sycz P. Bilski

Bulletin of the Polish Academy of Sciences Technical Sciences | Early Access | e149818 | DOI: 10.24425/bpasts.2024.149818

Słowa kluczowe automatic speech recognition deep neural networks transformer conformer

Pobierz PDF Pobierz RIS Pobierz Bibtex

Abstrakt

The paper presents the analysis of modern Artificial Intelligence algorithms for the automated system supporting human beings during their conversation in Polish language. Their task is to perform Automatic Speech Recognition (ASR) and process it further, for instance fill the computer-based form or perform the Natural Language Processing (NLP) to assign the conversation to one of predefined categories. The State-of-the-Art review is required to select the optimal set of tools to process speech in the difficult conditions, which degrade accuracy of ASR. The paper presents the top-level architecture of the system applicable for the task. Characteristics of Polish language are discussed. Next, existing ASR solutions and architectures with the End-To-End (E2E) deep neural network (DNN) based ASR models are presented in detail. Differences between Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN) and Transformers in the context of ASR technology are also discussed.

Przejdź do artykułu

Autorzy i Afiliacje

K. Pondel-Sycz

P. Bilski

An Effective Speaker Clustering Method using UBMand Ultra-Short Training Utterances

Robert Hossa Ryszard Makowski

Archives of Acoustics | 2016 | vol. 41 | No 1 | 107-118 | DOI: 10.1515/aoa-2016-0011

Słowa kluczowe automatic speech recognition interindividual difference compensation speaker clustering universal background model GMM weighting factor adaptation

Pobierz PDF Pobierz RIS Pobierz Bibtex

Abstrakt

The same speech sounds (phones) produced by different speakers can sometimes exhibit significant differences. Therefore, it is essential to use algorithms compensating these differences in ASR systems. Speaker clustering is an attractive solution to the compensation problem, as it does not require long utterances or high computational effort at the recognition stage. The report proposes a clustering method based solely on adaptation of UBM model weights. This solution has turned out to be effective even when using a very short utterance. The obtained improvement of frame recognition quality measured by means of frame error rate is over 5%. It is noteworthy that this improvement concerns all vowels, even though the clustering discussed in this report was based only on the phoneme a. This indicates a strong correlation between the articulation of different vowels, which is probably related to the size of the vocal tract.

Przejdź do artykułu

Autorzy i Afiliacje

Robert Hossa

Ryszard Makowski

Wyniki wyszukiwania

Filtruj wyniki

Wyniki wyszukiwania

System dedicated to Polish Automatic Speech Recognition - overview of solutions

Abstrakt

Autorzy i Afiliacje

An Effective Speaker Clustering Method using UBMand Ultra-Short Training Utterances

Abstrakt

Autorzy i Afiliacje