Search results

1 – 1 of 1
Open Access
Article
Publication date: 5 December 2022

Kittisak Chotikkakamthorn, Panrasee Ritthipravat, Worapan Kusakunniran, Pimchanok Tuakta and Paitoon Benjapornlert

Mouth segmentation is one of the challenging tasks of development in lip reading applications due to illumination, low chromatic contrast and complex mouth appearance. Recently…

Abstract

Purpose

Mouth segmentation is one of the challenging tasks of development in lip reading applications due to illumination, low chromatic contrast and complex mouth appearance. Recently, deep learning methods effectively solved mouth segmentation problems with state-of-the-art performances. This study presents a modified Mobile DeepLabV3 based technique with a comprehensive evaluation based on mouth datasets.

Design/methodology/approach

This paper presents a novel approach to mouth segmentation by Mobile DeepLabV3 technique with integrating decode and auxiliary heads. Extensive data augmentation, online hard example mining (OHEM) and transfer learning have been applied. CelebAMask-HQ and the mouth dataset from 15 healthy subjects in the department of rehabilitation medicine, Ramathibodi hospital, are used in validation for mouth segmentation performance.

Findings

Extensive data augmentation, OHEM and transfer learning had been performed in this study. This technique achieved better performance on CelebAMask-HQ than existing segmentation techniques with a mean Jaccard similarity coefficient (JSC), mean classification accuracy and mean Dice similarity coefficient (DSC) of 0.8640, 93.34% and 0.9267, respectively. This technique also achieved better performance on the mouth dataset with a mean JSC, mean classification accuracy and mean DSC of 0.8834, 94.87% and 0.9367, respectively. The proposed technique achieved inference time usage per image of 48.12 ms.

Originality/value

The modified Mobile DeepLabV3 technique was developed with extensive data augmentation, OHEM and transfer learning. This technique gained better mouth segmentation performance than existing techniques. This makes it suitable for implementation in further lip-reading applications.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

1 – 1 of 1