- Journal Home
- Volume 19 - 2024
- Volume 18 - 2023
- Volume 17 - 2022
- Volume 16 - 2021
- Volume 15 - 2020
- Volume 14 - 2019
- Volume 13 - 2018
- Volume 12 - 2017
- Volume 11 - 2016
- Volume 10 - 2015
- Volume 9 - 2014
- Volume 8 - 2013
- Volume 7 - 2012
- Volume 6 - 2011
- Volume 5 - 2010
- Volume 4 - 2009
- Volume 3 - 2008
- Volume 2 - 2007
- Volume 1 - 2006
J. Info. Comput. Sci. , 17 (2022), pp. 102-117.
[An open-access article; the PDF is free to any online user.]
Cited by
- BibTex
- RIS
- TXT
Most existing models of RGB-D salient object detection (SOD) utilize heavy backbones like VGGs and ResNets which lead to large model size and high computational costs. In order to improve this problem, a lightweight two-stage decoder network is proposed. Firstly, the network utilizes MobileNet-V2 and a customized backbone to extract the features of RGB images and depth maps respectively. In order to mine and combine cross-modality information, cross reference module is used to fuse complementary information from different modalities. Subsequently, we design a feature enhancement module to enhance the clues of the fused features which has four parallel convolutions with different expansion rates. Finally, a two-stage decoder is used to predict the saliency maps, which processes high-level features and low-level features separately and then merges them. Experiments on 5 benchmark datasets comparing with 10 state-of-the-art models demonstrate that our model can achieve significant improvement with smallest model size.
}, issn = {3080-180X}, doi = {https://doi.org/}, url = {http://global-sci.org/intro/article_detail/jics/22353.html} }Most existing models of RGB-D salient object detection (SOD) utilize heavy backbones like VGGs and ResNets which lead to large model size and high computational costs. In order to improve this problem, a lightweight two-stage decoder network is proposed. Firstly, the network utilizes MobileNet-V2 and a customized backbone to extract the features of RGB images and depth maps respectively. In order to mine and combine cross-modality information, cross reference module is used to fuse complementary information from different modalities. Subsequently, we design a feature enhancement module to enhance the clues of the fused features which has four parallel convolutions with different expansion rates. Finally, a two-stage decoder is used to predict the saliency maps, which processes high-level features and low-level features separately and then merges them. Experiments on 5 benchmark datasets comparing with 10 state-of-the-art models demonstrate that our model can achieve significant improvement with smallest model size.