LIP Net: Real-Time Semantic Segmentation of Person Body Parts

Published in ICROS 35th symposium, 2020

Ilyas Talha1, Kim Hyongsuk1,2,*

1Division of Electronic Engineering, Intelligent Robots Research Center, Jeonbuk National University, Jeonju, South Korea
2Division of Electronic and Information Engineering, Jeonbuk National University, Jeonju, South Korea
*Corresponding author: hskim@jbnu.ac.kr

In contrast to high accuracy state-of-the-art approaches that assumes the abundance of computational resources to be available to run deep CNNs. The goal of this paper was to perform human body parsing from a practical perspective, because many applications cannot fully enjoy the high accuracy demonstrated by recent computationally heavy models.

Abstract

Human visual understanding and pose estimation in wild scenarios is one of the fundamental tasks of computer vision. Traditional deep convolution networks (DCN) use pooling or subsampling layers to increase the receptive field and to gather larger contextual information for better segmenting human body parts. But these subsampling layers reduce the localization accuracy of the DCN. In this work, we propose a novel DCN, which uses artuous convolution with different dilation rates to probe the incoming feature maps for gathering multi-scale context. We further combine a gating mechanism which recalibrates the convolutional feature responses adaptively by learning the channel-wise statistics. This gating mechanism helps to regulate the flow of salient features to the next stages of network. Hence our architecture can focus on different granularity from local salient regions to global semantic regions, with minimum parameter budget. Our proposed architecture achieves a processing speed of 49 frames per second on standard resolution images.

Files

Citation

@article{ilyas2020lip,
title={Low-Cost 3D Sensor System: Using Image-based Laser Triangulation},
author={Ilyas, Talha and Kim, Hyongsuk},
journal={제어로봇시스템학회 국내학술대회 논문집},
pages={110–112},
year={2020}}

Direct Link