Refining Baby Cry Classification using Data Augmentation (Time-Stretching and Pitch-Shifting), MFCC Feature Extraction, and LSTM Modeling

Sanjaya, Samuel Ady (2024) Refining Baby Cry Classification using Data Augmentation (Time-Stretching and Pitch-Shifting), MFCC Feature Extraction, and LSTM Modeling. 7th International Conference on New Media Studies (CONMEDIA).

[img] Text
Refining Baby Cry Classification using Data Augmentation (Time-Stretching and Pitch-Shifting), MFCC Feature Extraction, and LSTM Modeling.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB)

Abstract

Babies, in their early developmental stages, are unable to communicate their needs through language. When they seek to convey discomfort or express their needs, they primarily resort to crying. The dataset in this research contains 497 sounds of baby cries that are divided into 5 categories: hungry, belly pain, discomfort, burping, and tired. In this context, this paper proposes an innovative approach to address this challenge by employing Long Short-Term Memory (LSTM) networks in combination with Mel-Frequency Cepstral Coefficients (MFCC) as feature extraction for the baby cries analysis. The study focuses on developing a model capable of discerning the underlying messages within a baby's cry by leveraging the acoustic characteristics of the cry and the temporal dependencies inherent in the data. Apart from combining MFCC and LSTM, we also add two data augmentations specifically time stretching and pitch shifting to improve model performance. After the data has been successfully augmented, we extract MFCC features from recorded cry samples, and these features are used as input data for the LSTM neural network. The LSTM Model is created with 12 hidden layers and 30 epochs to run. The combination of data augmentation using time-stretch and pitch-shifting resulted in 96% accuracy on validation results, compared to only 72% accuracy on non-augmented data. The model with augmented data also has resulted in better loss which indicates the model is trained better using from larger dataset. In conclusion, it can be said that the combination of data augmentation and feature extraction has a significant impact on a model's ability to learn.

Item Type: Article
Keywords: baby cry, MFCC, LSTM, pitch shifting, time stretching
Subjects: 000 Computer Science, Information and General Works > 000 Computer Science, Knowledge and Systems > 006 Special Computer Methods
Divisions: Faculty of Engineering & Informatics > Information System
Depositing User: Administrator UMN Library
Date Deposited: 05 Aug 2025 09:06
Last Modified: 05 Aug 2025 09:06
URI: https://kc.umn.ac.id/id/eprint/39846

Actions (login required)

View Item View Item