Real-Time Semantic Segmentation of Road Scenes via Hybrid Dilated Grouping Network

Yan Zhang; Xuguang Zhang; Deting Miao; Hui Yu

doi:10.53941/ijndi.2025.100006

Abstract

Real-time semantic segmentation is a critical step for various real-world application scenarios such as autonomous driving systems. How to achieve a high accuracy while keeping a high inference speed has become a challenging issue for real-time semantic segmentation. To tackle this challenge, we propose a Hybrid Dilated Grouping Network (HDGNet) for real-time semantic segmentation of outdoor scenes in this study, which not only improves the accuracy of image segmentation, but also considers the inference speed. To reduce model parameters to speed up inference, we propose to use factorization convolution to replace ordinary two-dimensional convolution. However, simply reducing the amount of model parameters may sacrifice segmentation accuracy. We thus further introduce dilated convolution to extract multi-scale spatial information. The HDG module is constructed by combining factorization convolution and dilated convolution, which not only reduces the model parameters and improves the model inference speed, but also extracts local and more contextual information. And furthermore, to enhance the feature expression ability of the network, we introduce a channel attention mechanism to capture the information interaction between channels. After obtaining the shallow features and deep high-level semantic information, we design the skip layer connections to fuse the feature branches from different stages to improve the segmentation accuracy. The experiments conducted on the widely used datasets show that the proposed model achieves superior real-time performance over existing methods but using significantly fewer model parameters.

References

1.
Zhao, H.S.; Shi, J.P.; Qi, X.J.; et al. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, 2017 ; pp. 6230–6239. doi: 10.1109/CVPR.2017.660
2.
Yu, N.X.; Yang, R.; Huang, M.J. Deep common spatial pattern based motor imagery classification with improved objective function. Int. J. Netw. Dyn. Intell., 2022, 1: 73−84. doi: 10.53941/ijndi0101007
3.
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst., Man, Cybern., 1979, 9: 62−66. doi: 10.1109/TSMC.1979.4310076
4.
Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 2002, 24: 603−619. doi: 10.1109/34.1000236
5.
Achanta, R.; Shaji, A.; Smith, K.; et al. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34: 2274−2282. doi: 10.1109/TPAMI.2012.120
6.
Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graphics (TOG) 2004 , 23, 309–314. doi: 10.1145/1015706.1015720
7.
Li, M.C.; Wang, Z.D.; Li, K.L.; et al. Task allocation on layered multiagent systems: When evolutionary many-objective optimization meets deep Q-learning. IEEE Trans. Evol. Comput., 2021, 25: 842−855. doi: 10.1109/TEVC.2021.3049131
8.
Liu, W.B.; Wang, Z.D.; Zeng, N.Y.; et al. A novel randomised particle swarm optimizer. Int. J. Mach. Learn. Cybern., 2021, 12: 529−540. doi: 10.1007/s13042-020-01186-4
9.
Alicja, K.; Maciej, S. Can AI see bias in X-ray images. Int. J. Netw. Dyn. Intell., 2022, 1: 48−64. doi: 10.53941/ijndi0101005
10.
Zhao, G.Y.; Li, Y.T.; Xu, Q.R. From emotion AI to cognitive AI. Int. J. Netw. Dyn. Intell., 2022, 1: 65−72. doi: 10.53941/ijndi0101006
11.
Xu, X.; Zhang, J.R.; Li, Y.J.; et al. Adversarial attack against urban scene segmentation for autonomous vehicles. IEEE Trans. Ind. Inf., 2021, 17: 4117−4126. doi: 10.1109/TII.2020.3024643
12.
Li, X.; Duan, H.B.; Mo, H.; et al. A novel visual perception framework for unmanned aerial vehicles: Challenges and approaches. In Proceedings of 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; IEEE: New York, 2021; pp. 8359–8363. doi: 10.1109/CAC53003.2021.9727934
13.
Ahmed, I.; Din, S.; Jeon, G.; et al. Towards collaborative robotics in top view surveillance: A framework for multiple object tracking by detection using deep learning. IEEE/CAA J. Autom. Sinica, 2021, 8: 1253−1270. doi: 10.1109/JAS.2020.1003453
14.
Dong, G.S.; Yan, Y.; Shen, C.H.; et al. Real-time high-performance semantic image segmentation of urban street scenes. IEEE Trans. Intell. Transp. Syst., 2021, 22: 3258−3274. doi: 10.1109/TITS.2020.2980426
15.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 07–12 June 2015; IEEE: New York, 2015; pp. 3431–3440. doi: 10.1109/CVPR.2015.7298965
16.
Chen, L.C.; Papandreou, G.; Kokkinos, I.; et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell., 2018, 40: 834−848. doi: 10.1109/TPAMI.2017.2699184
17.
Fu, J.; Liu, J.; Tian, H.J.; et al. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, 2019; pp. 3141–3149. doi: 10.1109/CVPR.2019.00326
18.
Shakiba, F.M.; Shojaee, M.; Azizi, S.M.; et al. Real-time sensing and fault diagnosis for transmission lines. Int. J. Netw. Dyn. Intell., 2022, 1: 36−47. doi: 10.53941/ijndi0101004
19.
Paszke, A.; Chaurasia, A.; Kim, S.; et al. ENet: A deep neural network architecture for real-time semantic segmentation. arXiv: 1606.02147, 2016
20.
Romera, E.; Alvarez, J.M.; Bergasa, L.M.; et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst., 2018, 19: 263−272. doi: 10.1109/tits.2017.2750080
21.
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39: 2481−2495. doi: 10.1109/TPAMI.2016.2644615
22.
Lian, X.H.; Pang, Y.W.; Han, J.G.; et al. Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recognit., 2021, 110: 107622. doi: 10.1016/j.patcog.2020.107622
23.
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, 2017 ; pp. 1800–1807. doi: 10.1109/CVPR.2017.195
24.
Howard, A.G.; Zhu, M.L.; Chen, B.; et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017
25.
Zhao, H.S.; Qi, X.J.; Shen, X.Y.; et al. ICNet for real-time semantic segmentation on high-resolution images. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, 2018; pp. 418–434. doi: 10.1007/978-3-030-01219-9_25
26.
Mehta, S.; Rastegari, M.; Caspi, A.; et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, 2018; pp. 561–580. doi: 10.1007/978-3-030-01249-6_34
27.
Yu, C.Q.; Wang, J.B.; Peng, C.; et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, 2018; pp. 334–349. doi: 10.1007/978-3-030-01261-8_20
28.
Szegedy, C.; Liu, W.; Jia, Y.Q.; et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: New York, 2015; pp. 1–9. doi: 10.1109/CVPR.2015.7298594
29.
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015
30.
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; et al. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, 2016; pp. 2818–2826. doi: 10.1109/CVPR.2016.308
31.
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; et al. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 4–9 February 2017; AAAI Press: Washington, DC, USA, 2017; pp. 4278–4284
32.
Lo, S.Y.; Hang, H.M.; Chan, S.W.; et al. Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In Proceedings of the 1st ACM International Conference on Multimedia in Asia, Beijing, China, 15–18 December 2019; ACM: New York, 2019; p. 1. doi: 10.1145/3338533.3366558
33.
Li, G.; Yun, I.; Kim, J.; et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation. arXiv: 1907.11357, 2019
34.
Wang, Y.; Zhou, Q.; Liu, J.; et al. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, China, 22–25 September 2019; IEEE: New York, 2019; pp. 1860–1864. doi: 10.1109/ICIP.2019.8803154
35.
Huang, Z.L.; Wang, X.G.; Huang, L.C.; et al. CCNet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 27 October 2019–2 November 2019; IEEE: New York, 2019; pp. 603–612. doi: 10.1109/ICCV.2019.00069
36.
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, 2018; pp. 7132–7141. doi: 10.1109/CVPR.2018.00745
37.
Woo, S.; Park, J.; Lee, J.Y.; et al. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, 2018; pp. 3–19. doi: 10.1007/978-3-030-01234-2_1
38.
Wang, Q.L.; Wu, B.G.; Zhu, P.F.; et al. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: New York, 2020; pp. 11531–11539. doi: 10.1109/CVPR42600.2020.01155
39.
Wang, P.Q.; Chen, P.F.; Yuan, Y.; et al. Understanding convolution for semantic segmentation. In Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: New York, 2018; pp. 1451–1460. doi: 10.1109/WACV.2018.00163
40.
Wu, H.S.; Liang, C.X.; Liu, M.S.; et al. Optimized HRNet for image semantic segmentation. Expert Syst. Appl., 2021, 174: 114532. doi: 10.1016/j.eswa.2020.114532
41.
Gao, G.W.; Xu, G.A.; Yu, Y.; et al. MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst., 2021, 23: 25489−25499. doi: 10.1109/TITS.2021.3098355
42.
Wu, T.Y.; Tang, S.; Zhang, R.; et al. CGNet: A light-weight context guided network for semantic segmentation. IEEE Trans. Image Process., 2021, 30: 1169−1179. doi: 10.1109/TIP.2020.3042065
43.
Zhang, X.T.; Chen, Z.X.; Wu, Q.M.J.; et al. Fast semantic segmentation for scene perception. IEEE Trans. Ind. Inf., 2019, 15: 1183−1192. doi: 10.1109/TII.2018.2849348
44.
Yang, Z.G.; Yu, H.S.; Fu, Q.; et al. NDNet: Narrow while deep network for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst., 2021, 22: 5508−5519. doi: 10.1109/TITS.2020.2987816
45.
Poudel, R.P.K.; Bonde, U.; Liwicki, S.; et al. ContextNet: Exploring context and detail for semantic segmentation in real-time. In Proceedings of the British Machine Vision Conference 2018, Newcastle, UK, 3–6 September 2018; BMVA: Durham, UK, 2018
46.
Wang, J.W.; Xiong, H.Y.; Wang, H.B.; et al. ADSCNet: Asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl. Intell., 2020, 50: 1045−1056. doi: 10.1007/s10489-019-01587-1
47.
Ye, L.; Zeng, J.X.; Yang, Y.; et al. BSDNet: Balanced sample distribution network for real-time semantic segmentation of road scenes. IEEE Access, 2021, 9: 84034−84044. doi: 10.1109/ACCESS.2021.3087510
48.
Kim, M.; Park, B.; Chi, S. Accelerator-aware fast spatial feature network for real-time semantic segmentation. IEEE Access, 2020, 8: 226524−226537. doi: 10.1109/ACCESS.2020.3045147
49.
Wang, W.F.; Fu, Y.J.; Pan, Z.J.; et al. Real-time driving scene semantic segmentation. IEEE Access, 2020, 8: 36776−36788. doi: 10.1109/ACCESS.2020.2975640
50.
Zhou, Q.; Wang, Y.; Fan, Y.W.; et al. AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl. Soft Comput., 2020, 96: 106682. doi: 10.1016/j.asoc.2020.106682
51.
Zhang, Z.P.; Zhang, K.P. FarSee-Net: Real-time semantic segmentation by efficient multi-scale context aggregation and feature space super-resolution. In Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May 2020–31 August 2020; IEEE: New York, 2020; pp. 8411–8417. doi: 10.1109/ICRA40945.2020.9196599
52.
Li, H.C.; Xiong, P.F.; Fan, H.Q.; et al. DFANet: Deep feature aggregation for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, 2019; pp. 9514–9523. doi: 10.1109/CVPR.2019.00975
53.
Han, H.Y.; Chen, Y.C.; Hsiao, P.Y.; et al. Using channel-wise attention for deep CNN based real-time semantic segmentation with class-aware edge information. IEEE Trans. Intell. Transp. Syst., 2021, 22: 1041−1051. doi: 10.1109/TITS.2019.2962094
54.
Li, G.; Jiang, S.L.; Yun, I.; et al. Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes. IEEE Access, 2020, 8: 27495−27506. doi: 10.1109/ACCESS.2020.2971760
55.
Hu, X.G.; Wang, H.B. Efficient fast semantic segmentation using continuous shuffle dilated convolutions. IEEE Access, 2020, 8: 70913−70924. doi: 10.1109/ACCESS.2020.2987080
56.
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016; ICLR: San Juan, Puerto Rico, 2016

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us