An Encoder–Decoder-Based Deep Learning Model for Segmenting Occlusion in the Lower Part of the Face

Authors

  • Mrinmoy Sadhukhan Department of Computer & System Sciences, Visva-Bharati University, India https://orcid.org/0000-0002-3246-8295
  • Indrajit Bhattacharya Department of Computer Applications, Kalyani Government Engineering College, India https://orcid.org/0000-0002-3881-2755
  • Paramartha Dutta Department of Computer & System Sciences, Visva-Bharati University, India https://orcid.org/0000-0003-3946-2440
  • Kaushik Roy Department of Computer Science, West Bengal State University, India

DOI:

https://doi.org/10.47852/bonviewAIA62026690

Keywords:

face occlusion, segmentation, UNet, self-attention

Abstract

This paper presents a deep learning-based model for accurate segmentation of facial occlusions from the lower facial region. Unlike many existing image segmentation methods that rely heavily on bounding box annotated datasets, the proposed model eliminates the need for such supervision, thereby improving generalization to unseen data. The system generates a binary mask to identify occluded facial areas caused by masks, incorrectly worn masks, and niqabs (where only the forehead and eye regions remain visible). Segmenting such occlusions becomes particularly difficult in datasets with varied types of valid and invalid masks or other similar obstructions that were not present during training. To tackle this, the architecture of the UNet model is modified with the addition of self-attention block. The model is trained using an augmented dataset and validated on multiple benchmark datasets. To evaluate its practical deployment potential, the model is tested on edge devices within a simulated environment. The proposed model exhibits enhanced performance when compared to current state-of-the-art approaches, achieving a remarkable 99.62% training accuracy and 99.48% validation accuracy with a minimal training and validation loss of 0.01 and 0.0172, respectively. Additionally, the model accurately segments diverse facial masks, enabling the identification of individuals attempting to conceal their identities. This capability also supports facial reconstruction by restoring occluded regions, thereby enhancing security applications. Its design balances high accuracy with broad applicability, making it a robust solution for facial occlusion handling and recognition.

 

Received: 3 July 2025 | Revised: 4 January 2026 | Accepted: 22 January 2026

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

The data that support the findings of this study are openly available in CelebFaces Attributes Dataset at https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html; in RMFD-Real-World Masked Face Dataset at https://github.com/X-zhangyang/Real-World-MaskedFace-Dataset; in FMLD Dataset at https://arxiv.org/abs/1511.06523v1 and https://github.com/borutb-fri/FMLD; in MAFA - Masked Faces at https://www.kaggle.com/datasets/revanthrex/mafadataset; in Augmented CelebA Dataset at https://www.kaggle.com/datasets/mrinmoysadhukhan/augmented-celeba-dataset/data; and in Face Mask Detector at https://www.kaggle.com/datasets/spandanpatnaik09/face-mask-detectormask-not-mask-incorrect-mask.

 

Author Contribution Statement

Mrinmoy Sadhukhan: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization, Project administration. Indrajit Bhattacharya: Conceptualization, Validation, Formal analysis, Investigation, Resources, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration. Paramartha Dutta: Conceptualization, Validation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Supervision. Kaushik Roy: Resources, Data curation, Writing – review & editing.


Downloads

Published

2026-02-01

Issue

Section

Research Article

How to Cite

Sadhukhan, M., Bhattacharya, I., Dutta, P., & Roy, K. (2026). An Encoder–Decoder-Based Deep Learning Model for Segmenting Occlusion in the Lower Part of the Face. Artificial Intelligence and Applications. https://doi.org/10.47852/bonviewAIA62026690