Pub. Date | : April, 2021 |
---|---|
Product Name | : The IUP Journal of Computer Sciences |
Product Type | : Article |
Product Code | : IJCS10421 |
Author Name | : Pranit Kotkar |
Availability | : YES |
Subject/Domain | : Management |
Download Format | : PDF Format |
No. of Pages | : 13 |
Audio signal processing is an important aspect of the latest technological developments and has emerged as a leading element of research. It is gaining immense traction in the virtual assistant/BPO industry for designing Artificial Intelligence (AI) solutions. The paper focuses on the development of classification strategies based on the information pulled from voice data. The linear Support Vector Machine (SVM) and Random Forest (RF) classifiers render the best results for gender identification. The methodology that suited the requirements for age classification was Ridge Regression with costs, wherein the age groups were distributed among three classes. It was observed that the data imbalance fetched disappointing results, and it was concluded from the literature review that the Synthetic Minority Oversampling Technique (SMOTE) could potentially improve the output. Identity recognition involved open and closed classification, and the closed set delivered high returns. It was found that the size of the dataset was a limiting factor and stood in the way of securing higher accuracy. The findings also suggest that the models are trained more efficiently based on subject familiarization.
The acoustic analysis of data has gained considerable popularity in Artificial Intelligence (AI) research and industrial applications. The BPO industry is a prime benefiter of the latest developments in voice data analysis. The use of natural language in voice-controlled interfaces is gradually transforming how humans interact with technology (Dale, 2016). The integration of natural language processing into the services is leading to a steady rise in the quality of virtual assistants. Alexa, Siri and Amazon Echo are some of the best examples in the virtual assistance landscape. The increasing importance of voice data is attributed to the fact that consumers are more comfortable in speaking and listening instead of typing out their concerns. It is reported by the National Centre for Voice and Speech that the average rate of speech is 150 words per minute for a person (ncvs.org., 2021), whereas the typing speed is only around 40 words per minute. An article from the University of Missouri confirms that we spend about 9% writing, 16% reading, 30% speaking and 45% listening (Skill, 2021). These
Machine learning, Biometrics, Model selection, Data imbalance, Closed and open set