Search

Sequence-Based Facial Emotion Recognition using EfficientNet and LSTM

ํƒœ๊ทธ
Projects
Python
PyTorch
๐Ÿ‘ฅ Authors
Celestine Akpanoko, Alex Esser, Srikanth Narayanan, Chang-Yong Song, Hunter Mast
โš™๏ธ Training
Adam optimizer, early stopping enabled
๐ŸŽฅ Input
Video frame sequences from AFEW-VA
๐ŸŽฏ Target
21-class classification for both valence and arousal
๐Ÿ” Temporal Modeling
LSTM for capturing emotional transitions
๐Ÿง  Model
EfficientNet + LSTM
๐Ÿงช Loss
Multi-task composite loss (valence + arousal)
4 more properties
Abstract:
โ€ข
We introduce a hybrid emotion recognition model that combines a pre-trained EfficientNet backbone with LSTM to capture the temporal dynamics of facial expressions. Unlike traditional static FER approaches, our model processes video sequences to classify valence and arousal into 21-level categories (from -10 to 10), enabling more accurate emotion estimation over time. The system is trained on the AFEW-VA dataset and outperforms baseline CNN-LSTM architectures, demonstrating the benefits of integrating spatio-temporal modeling in affective computing.
โ—ฆ
Paper:
Sequence-based FER using EfficientNet and LSTM.pdf
324.4KB

Results

Metric
Valence
Arousal
F1 Score
0.8938
0.8223
Accuracy
0.9222
0.8517
CCC
0.9738
0.9613
โ€ข
Compared to ResNet-50 baseline, our model improved the CCC score by over 60% (Valence: 0.31 โ†’ 0.97)
โ€ข
EfficientNetLSTM achieved superior spatio-temporal representation despite smaller batch sizes and longer training time (7 hrs vs. 6 hrs)