p.2
Challenges in Continuous Speech Recognition
Why are HMMs considered statistically inefficient?
They are not effective for modeling non-linear or near non-linear functions.
p.9
Neural Networks and Their Architectures
What is a key advantage of RNNs in modeling data?
They allow parameter sharing through different layers of the network.
p.1
Neural Networks and Their Architectures
What is the role of deep neural networks in machine learning?
To extract specific features and information from inputs.
p.1
Deep Learning in Speech Recognition
What significant development in machine learning occurred around 2006?
Deep learning arose as a new area of machine learning.
p.4
Types of Speech Recognition Systems
What are the two parts of automatic speaker recognition?
Speaker identification and speaker verification.
p.8
Machine Learning Techniques
What is reinforcement learning?
Learning by interacting with the problem environment, where an agent learns from its own actions.
p.7
Machine Learning Techniques
What are the two main categories of supervised learning?
Regression algorithms and classification algorithms.
p.6
Machine Learning Techniques
What is the purpose of supervised learning?
To produce a classifier function for discrete outputs or a regression function for continuous outputs.
p.10
Systematic Literature Review Methodology
What is the first step in the systematic review process described by Nassif et al.?
Applying inclusion/exclusion criteria to ensure only relevant papers are included.
p.3
Neural Networks and Their Architectures
How do speech spectrogram features compare to MFCC when using deep neural networks?
Speech spectrogram features are more advanced than MFCC with deep neural networks compared to traditional GMMs-HMMs.
p.8
Machine Learning Techniques
Why is semi-supervised learning appealing?
It requires less human intervention and utilizes cheaper, easier-to-access unlabeled datasets.
p.4
Types of Speech Recognition Systems
What does speaker identification determine?
To which registered speaker a given utterance corresponds.
p.2
Types of Speech Recognition Systems
What models do conventional speech recognition systems typically use?
Gaussian Mixture Models (GMMs) based on Hidden Markov Models (HMMs).
p.6
Machine Learning Techniques
What is supervised learning?
A type of machine learning that uses labeled data to train the algorithm.
p.1
Feature Extraction in Speech Processing
What type of learning does deep learning utilize for feature extraction?
Greedy layerwise unsupervised pre-training.
p.2
Neural Networks and Their Architectures
How do neural networks improve speech recognition?
They allow for discriminative training more efficiently than HMMs.
p.9
Challenges in Continuous Speech Recognition
What challenge do RNNs face in training?
They are considered hard to train to capture long-term dependencies.
p.4
Applications of Speech Recognition
What is one application of speech recognition mentioned in the text?
Dictating computers instead of typing.
p.6
Machine Learning Techniques
How does the learning process in machine learning occur?
Iteratively from analyzed data and new input data.
p.6
Machine Learning Techniques
What are the different types of data used in machine learning?
Observations, examples, instructions, and direct experience.
p.1
Neural Networks and Their Architectures
What is one advantage of deep learning models over shallower architectures?
They require fewer parameters to represent non-linear functions.
p.8
Machine Learning Techniques
How does reinforcement learning differ from supervised learning?
Reinforcement learning uses direct interactions with the environment to gain knowledge, while supervised learning learns from examples provided by an external supervisor.
p.6
Machine Learning Techniques
What are the five main techniques of machine learning?
Supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and deep learning.
p.4
Challenges in Continuous Speech Recognition
What is emotion cue-based speaker recognition?
A field for human-machine interaction that recognizes user emotions from speech.
p.5
Neural Networks and Their Architectures
What class of models consists of a stack of restricted Boltzmann machines?
Deep belief networks (DBN).
p.7
Machine Learning Techniques
What is the primary goal of regression algorithms?
To uncover the best function that fits points in the training dataset.
p.6
Machine Learning Techniques
What does unsupervised learning aim to achieve?
To find common points between inputs in the dataset, often through clustering.
p.10
Systematic Literature Review Methodology
What is the purpose of removing review papers from the list?
To conduct a comparison with the current review.
p.4
Challenges in Continuous Speech Recognition
What is the challenge in language recognition systems?
Differentiating between closely correlated languages.
p.6
Machine Learning Techniques
What is reinforcement learning?
A type of learning that uses trial and error to maximize a cumulative reward metric.
p.10
Systematic Literature Review Methodology
What are the exclusion criteria for the review?
Papers that use deep neural networks in areas other than speech, papers related to speech but not using deep neural networks, and papers with no clear publication information.
p.7
Machine Learning Techniques
What is the main goal of unsupervised learning?
To learn more about the data by identifying the fundamental structure or distribution patterns within it.
p.1
Deep Learning in Speech Recognition
What has been the focus of research in speech processing applications over the past few years?
Utilizing deep learning for speech-related applications.
p.1
Systematic Literature Review Methodology
How many papers were analyzed in the systematic review conducted in the study?
174 papers published between 2006 and 2018.
p.4
Types of Speech Recognition Systems
What is the purpose of speaker verification?
To admit or discard the claimed speaker identity.
p.9
Neural Networks and Their Architectures
What are Recurrent Neural Networks (RNNs) primarily used for?
Predicting future data sequences using previous data samples.
p.3
Applications of Speech Recognition
What are some applications of deep learning in speech recognition mentioned in the text?
Feature extraction, language modeling, acoustic models, understanding speech, and dialogue estimation.
p.5
Deep Learning in Speech Recognition
What are the three classes of deep learning?
Unsupervised (generative) learning, supervised learning, and hybrid deep networks.
p.7
Machine Learning Techniques
Name three types of regression algorithms.
Linear regression, multiple linear regression, and polynomial regression.
p.6
Machine Learning Techniques
What is semi-supervised learning?
A combination of supervised and unsupervised learning using both labeled and unlabeled data.
p.10
Systematic Literature Review Methodology
What criteria are used to include papers in the review?
Papers that use deep neural networks or deep learning in the area of speech.
p.7
Machine Learning Techniques
How does unsupervised learning differ from supervised learning?
Unsupervised learning uses an input dataset without any labeled outputs, while supervised learning uses labeled outputs.
p.3
Systematic Literature Review Methodology
What information was extracted from the 174 papers reviewed in the systematic literature review?
Types of speech identified, databases used, languages, environment types, features extracted, publication types, and distribution of papers over the years.
p.9
Systematic Literature Review Methodology
What types of search terms were used in the review?
Terms related to deep neural networks and speech.
p.5
Applications of Speech Recognition
How can CNNs be adapted for speech recognition?
By incorporating speech properties into the architecture.
p.9
Systematic Literature Review Methodology
What digital libraries were used to search for research papers?
Google Scholar, IEEE Explorer, Science Direct, ResearchGate, and Springer.
p.10
Systematic Literature Review Methodology
What is the purpose of the data extraction strategy?
To extract needed information to answer the set of research questions.
p.8
Machine Learning Techniques
What are the three main categories of unsupervised learning algorithms?
Clustering, dimensionality reduction, and anomaly detection.
p.6
Machine Learning Techniques
What is machine learning?
A field of study that provides computers with the ability to learn from input data without being explicitly programmed.
p.8
Machine Learning Techniques
What is semi-supervised learning?
A method that falls between supervised and unsupervised learning, using a large amount of unlabeled data and a small amount of labeled data.
p.5
Neural Networks and Their Architectures
What is the main challenge in training deep neural networks with many hidden layers?
The persistent occurrence of local optima in the non-convex objective function.
p.3
Deep Learning in Speech Recognition
What is the focus of the paper by A. B. Nassif et al.?
The use of deep neural networks in speech recognition.
p.3
Deep Learning in Speech Recognition
What advancements in speech recognition were highlighted in the work done by Microsoft since 2009?
Recent advances in deep learning capabilities and limitations in speech recognition.
p.8
Deep Learning in Speech Recognition
What is deep learning?
A sub-field of machine learning based on algorithms that learn from multiple levels to represent complex relations among data.
p.4
Challenges in Continuous Speech Recognition
What are the two branches of emotion recognition?
Emotion identification and emotion verification.
p.1
Feature Extraction in Speech Processing
What does feature learning in deep learning aim to achieve?
Learning the transformation of previously learned features at each new layer.
p.2
Challenges in Continuous Speech Recognition
What is a limitation of neural networks in speech recognition?
They struggle with continuous speech signals due to inability to model temporal dependencies.
p.3
Applications of Speech Recognition
What does the paper by Li et al. discuss regarding spoken language recognition?
Basics of state-of-the-art solutions from computational and phonological perspectives.
p.5
Neural Networks and Their Architectures
What are the three important concepts utilized by the convolution operator in CNNs?
Sparse interactions, parameter sharing, and equivariant representation.
p.4
Types of Speech Recognition Systems
What is the process of age recognition by voice?
Estimating the speaker’s age using their speech signals.
p.8
Deep Learning in Speech Recognition
What has contributed to the popularity of deep learning?
Increased processing abilities of computer chips, incorporation of large training datasets, and advances in machine learning.
p.4
Applications of Speech Recognition
What is automatic health recognition?
Using the patient's voice to provide information on their health status.
p.5
Neural Networks and Their Architectures
What is the purpose of convolutional neural networks (CNN)?
To perform discriminative deep architecture tasks, particularly in computer vision and image recognition.
p.7
Machine Learning Techniques
What is the main aim of classification algorithms?
To uncover the best fit class for the input data by assigning each input to its correct class.
p.9
Systematic Literature Review Methodology
What methodology is used in the systematic literature review presented in the paper?
Kitchenham and Charters methodology.
p.9
Systematic Literature Review Methodology
What is the first stage of the systematic literature review process?
Identifying the research questions.
p.5
Neural Networks and Their Architectures
What is the role of pooling layers in CNNs?
To sub-sample the output from the convolutional layer and decrease the data rate.
p.10
Systematic Literature Review Methodology
What is QAR 1 in the quality assessment rules?
Is the paper well organized?
p.2
Systematic Literature Review Methodology
What did Morgan's review focus on in speech recognition?
Discriminatively trained feed-forward networks and their effectiveness prior to HMM decoding.
p.10
Systematic Literature Review Methodology
What is the scoring system for QARs?
Scores range from 1 for fully answered to 0 for completely not answered.
p.8
Deep Learning in Speech Recognition
What distinguishes deep learning architectures from shallow architectures?
Deep learning architectures have multiple layers of non-linear feature transformation, while shallow architectures typically have one or two layers.
p.9
Neural Networks and Their Architectures
What recent advancement has helped improve RNN training?
Hessian free optimization.
p.4
Types of Speech Recognition Systems
What is accent recognition?
The recognition of a speaker’s regional accent within a predetermined language.
p.5
Neural Networks and Their Architectures
Why did researchers start exploring deep neural networks seriously in recent years?
Because high computational power became more accessible.
p.2
Applications of Speech Recognition
What are some applications of deep neural networks in speech-related fields?
Automatic speech recognition, emotional speech recognition, speaker identification, and speech enhancement.
p.4
Machine Learning Techniques
What is the main challenge in extracting knowledge from data?
The real challenge is in the extraction process itself.
p.7
Applications of Speech Recognition
What is an example of an application of unsupervised learning?
Social information filtering algorithms, like those used by Amazon.com for recommendations.
p.2
Applications of Speech Recognition
What significant improvement did Microsoft's MAVIS achieve?
Reduced word error rate (WER) by 30% compared to GMM-based models.
p.3
Challenges in Continuous Speech Recognition
What are the five criteria used to evaluate noise-robust techniques in automatic speech recognition?
Acoustic environment distortion knowledge, model domain vs. feature domain processing, specific environment distortion models, uncertainty processing, and acoustic models trained by the same adaptation process.
p.6
Neural Networks and Their Architectures
What is deep learning?
A type of machine learning that models abstractions in data using a graph with multiple processing layers.
p.10
Systematic Literature Review Methodology
What does a score of 6 or less indicate in the quality assessment?
The paper was excluded from the review.
p.7
Machine Learning Techniques
How does an unsupervised learning algorithm cluster inputs?
By grouping inputs based on the features extracted from each input object.
p.7
Machine Learning Techniques
Can unsupervised learning algorithms assign names to clusters?
No, they do not assign names but can differentiate among clusters.
p.2
Neural Networks and Their Architectures
What did Hinton et al. conclude about deep neural networks?
They outperform GMM-HMM models on various speech recognition benchmarks.
p.3
Applications of Speech Recognition
What types of recognition can speech signals provide information about?
Speech, speaker, emotion, health, language, accent, age, and gender recognition.
p.10
Systematic Literature Review Methodology
What is the final step in the systematic review process?
Applying quality assessment rules to identify the final list of papers.
p.4
Types of Speech Recognition Systems
What is automatic gender recognition?
The process of recognizing whether the speaker is male or female.
p.3
Applications of Speech Recognition
What is automatic speech recognition?
The capability of a machine or computer to recognize the content of words and phrases in an uttered language.
p.3
Research Gaps and Future Directions
What does the systematic review aim to identify?
Research patterns, gaps, and future directions in the use of deep neural networks in speech recognition.