2504000070
  • Open Access
  • Article
Real-Time Semantic Segmentation of Road Scenes via Hybrid Dilated Grouping Network
  • Yan Zhang 1,   
  • Xuguang Zhang 1, *,   
  • Deting Miao 1,   
  • Hui Yu 2

Received: 10 Jan 2024 | Accepted: 23 Apr 2024 | Published: 25 Mar 2025

Abstract

Real-time semantic segmentation is a critical step for various real-world application scenarios such as autonomous driving systems. How to achieve a high accuracy while keeping a high inference speed has become a challenging issue for real-time semantic segmentation. To tackle this challenge, we propose a Hybrid Dilated Grouping Network (HDGNet) for real-time semantic segmentation of outdoor scenes in this study, which not only improves the accuracy of image segmentation, but also considers the inference speed. To reduce model parameters to speed up inference, we propose to use factorization convolution to replace ordinary two-dimensional convolution. However, simply reducing the amount of model parameters may sacrifice segmentation accuracy. We thus further introduce dilated convolution to extract multi-scale spatial information. The HDG module is constructed by combining factorization convolution and dilated convolution, which not only reduces the model parameters and improves the model inference speed, but also extracts local and more contextual information. And furthermore, to enhance the feature expression ability of the network, we introduce a channel attention mechanism to capture the information interaction between channels. After obtaining the shallow features and deep high-level semantic information, we design the skip layer connections to fuse the feature branches from different stages to improve the segmentation accuracy. The experiments conducted on the widely used datasets show that the proposed model achieves superior real-time performance over existing methods but using significantly fewer model parameters.

References 

  • 1.
    Zhao, H.S.; Shi, J.P.; Qi, X.J.; et al. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, 2017 ; pp. 6230–6239. doi: 10.1109/CVPR.2017.660
  • 2.
    Yu, N.X.; Yang, R.; Huang, M.J. Deep common spatial pattern based motor imagery classification with improved objective function. Int. J. Netw. Dyn. Intell., 2022, 1: 73−84. doi: 10.53941/ijndi0101007
  • 3.
    Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst., Man, Cybern., 1979, 9: 62−66. doi: 10.1109/TSMC.1979.4310076
  • 4.
    Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 2002, 24: 603−619. doi: 10.1109/34.1000236
  • 5.
    Achanta, R.; Shaji, A.; Smith, K.; et al. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34: 2274−2282. doi: 10.1109/TPAMI.2012.120
  • 6.
    Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graphics (TOG) 2004 , 23, 309–314. doi: 10.1145/1015706.1015720
  • 7.
    Li, M.C.; Wang, Z.D.; Li, K.L.; et al. Task allocation on layered multiagent systems: When evolutionary many-objective optimization meets deep Q-learning. IEEE Trans. Evol. Comput., 2021, 25: 842−855. doi: 10.1109/TEVC.2021.3049131
  • 8.
    Liu, W.B.; Wang, Z.D.; Zeng, N.Y.; et al. A novel randomised particle swarm optimizer. Int. J. Mach. Learn. Cybern., 2021, 12: 529−540. doi: 10.1007/s13042-020-01186-4
  • 9.
    Alicja, K.; Maciej, S. Can AI see bias in X-ray images. Int. J. Netw. Dyn. Intell., 2022, 1: 48−64. doi: 10.53941/ijndi0101005
  • 10.
    Zhao, G.Y.; Li, Y.T.; Xu, Q.R. From emotion AI to cognitive AI. Int. J. Netw. Dyn. Intell., 2022, 1: 65−72. doi: 10.53941/ijndi0101006
  • 11.
    Xu, X.; Zhang, J.R.; Li, Y.J.; et al. Adversarial attack against urban scene segmentation for autonomous vehicles. IEEE Trans. Ind. Inf., 2021, 17: 4117−4126. doi: 10.1109/TII.2020.3024643
  • 12.
    Li, X.; Duan, H.B.; Mo, H.; et al. A novel visual perception framework for unmanned aerial vehicles: Challenges and approaches. In Proceedings of 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; IEEE: New York, 2021; pp. 8359–8363. doi: 10.1109/CAC53003.2021.9727934
  • 13.
    Ahmed, I.; Din, S.; Jeon, G.; et al. Towards collaborative robotics in top view surveillance: A framework for multiple object tracking by detection using deep learning. IEEE/CAA J. Autom. Sinica, 2021, 8: 1253−1270. doi: 10.1109/JAS.2020.1003453
  • 14.
    Dong, G.S.; Yan, Y.; Shen, C.H.; et al. Real-time high-performance semantic image segmentation of urban street scenes. IEEE Trans. Intell. Transp. Syst., 2021, 22: 3258−3274. doi: 10.1109/TITS.2020.2980426
  • 15.
    Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 07–12 June 2015; IEEE: New York, 2015; pp. 3431–3440. doi: 10.1109/CVPR.2015.7298965
  • 16.
    Chen, L.C.; Papandreou, G.; Kokkinos, I.; et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell., 2018, 40: 834−848. doi: 10.1109/TPAMI.2017.2699184
  • 17.
    Fu, J.; Liu, J.; Tian, H.J.; et al. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, 2019; pp. 3141–3149. doi: 10.1109/CVPR.2019.00326
  • 18.
    Shakiba, F.M.; Shojaee, M.; Azizi, S.M.; et al. Real-time sensing and fault diagnosis for transmission lines. Int. J. Netw. Dyn. Intell., 2022, 1: 36−47. doi: 10.53941/ijndi0101004
  • 19.
    Paszke, A.; Chaurasia, A.; Kim, S.; et al. ENet: A deep neural network architecture for real-time semantic segmentation. arXiv: 1606.02147, 2016
  • 20.
    Romera, E.; Alvarez, J.M.; Bergasa, L.M.; et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst., 2018, 19: 263−272. doi: 10.1109/tits.2017.2750080
  • 21.
    Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39: 2481−2495. doi: 10.1109/TPAMI.2016.2644615
  • 22.
    Lian, X.H.; Pang, Y.W.; Han, J.G.; et al. Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recognit., 2021, 110: 107622. doi: 10.1016/j.patcog.2020.107622
  • 23.
    Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, 2017 ; pp. 1800–1807. doi: 10.1109/CVPR.2017.195
  • 24.
    Howard, A.G.; Zhu, M.L.; Chen, B.; et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017
  • 25.
    Zhao, H.S.; Qi, X.J.; Shen, X.Y.; et al. ICNet for real-time semantic segmentation on high-resolution images. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, 2018; pp. 418–434. doi: 10.1007/978-3-030-01219-9_25
  • 26.
    Mehta, S.; Rastegari, M.; Caspi, A.; et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, 2018; pp. 561–580. doi: 10.1007/978-3-030-01249-6_34
  • 27.
    Yu, C.Q.; Wang, J.B.; Peng, C.; et al. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, 2018; pp. 334–349. doi: 10.1007/978-3-030-01261-8_20
  • 28.
    Szegedy, C.; Liu, W.; Jia, Y.Q.; et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: New York, 2015; pp. 1–9. doi: 10.1109/CVPR.2015.7298594
  • 29.
    Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015
  • 30.
    Szegedy, C.; Vanhoucke, V.; Ioffe, S.; et al. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, 2016; pp. 2818–2826. doi: 10.1109/CVPR.2016.308
  • 31.
    Szegedy, C.; Ioffe, S.; Vanhoucke, V.; et al. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 4–9 February 2017; AAAI Press: Washington, DC, USA, 2017; pp. 4278–4284
  • 32.
    Lo, S.Y.; Hang, H.M.; Chan, S.W.; et al. Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In Proceedings of the 1st ACM International Conference on Multimedia in Asia, Beijing, China, 15–18 December 2019; ACM: New York, 2019; p. 1. doi: 10.1145/3338533.3366558
  • 33.
    Li, G.; Yun, I.; Kim, J.; et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation. arXiv: 1907.11357, 2019
  • 34.
    Wang, Y.; Zhou, Q.; Liu, J.; et al. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In Proceedings of 2019 IEEE International Conference on Image Processing (ICIP), Taipei, China, 22–25 September 2019; IEEE: New York, 2019; pp. 1860–1864. doi: 10.1109/ICIP.2019.8803154
  • 35.
    Huang, Z.L.; Wang, X.G.; Huang, L.C.; et al. CCNet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 27 October 2019–2 November 2019; IEEE: New York, 2019; pp. 603–612. doi: 10.1109/ICCV.2019.00069
  • 36.
    Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, 2018; pp. 7132–7141. doi: 10.1109/CVPR.2018.00745
  • 37.
    Woo, S.; Park, J.; Lee, J.Y.; et al. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, 2018; pp. 3–19. doi: 10.1007/978-3-030-01234-2_1
  • 38.
    Wang, Q.L.; Wu, B.G.; Zhu, P.F.; et al. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: New York, 2020; pp. 11531–11539. doi: 10.1109/CVPR42600.2020.01155
  • 39.
    Wang, P.Q.; Chen, P.F.; Yuan, Y.; et al. Understanding convolution for semantic segmentation. In Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: New York, 2018; pp. 1451–1460. doi: 10.1109/WACV.2018.00163
  • 40.
    Wu, H.S.; Liang, C.X.; Liu, M.S.; et al. Optimized HRNet for image semantic segmentation. Expert Syst. Appl., 2021, 174: 114532. doi: 10.1016/j.eswa.2020.114532
  • 41.
    Gao, G.W.; Xu, G.A.; Yu, Y.; et al. MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst., 2021, 23: 25489−25499. doi: 10.1109/TITS.2021.3098355
  • 42.
    Wu, T.Y.; Tang, S.; Zhang, R.; et al. CGNet: A light-weight context guided network for semantic segmentation. IEEE Trans. Image Process., 2021, 30: 1169−1179. doi: 10.1109/TIP.2020.3042065
  • 43.
    Zhang, X.T.; Chen, Z.X.; Wu, Q.M.J.; et al. Fast semantic segmentation for scene perception. IEEE Trans. Ind. Inf., 2019, 15: 1183−1192. doi: 10.1109/TII.2018.2849348
  • 44.
    Yang, Z.G.; Yu, H.S.; Fu, Q.; et al. NDNet: Narrow while deep network for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst., 2021, 22: 5508−5519. doi: 10.1109/TITS.2020.2987816
  • 45.
    Poudel, R.P.K.; Bonde, U.; Liwicki, S.; et al. ContextNet: Exploring context and detail for semantic segmentation in real-time. In Proceedings of the British Machine Vision Conference 2018, Newcastle, UK, 3–6 September 2018; BMVA: Durham, UK, 2018
  • 46.
    Wang, J.W.; Xiong, H.Y.; Wang, H.B.; et al. ADSCNet: Asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl. Intell., 2020, 50: 1045−1056. doi: 10.1007/s10489-019-01587-1
  • 47.
    Ye, L.; Zeng, J.X.; Yang, Y.; et al. BSDNet: Balanced sample distribution network for real-time semantic segmentation of road scenes. IEEE Access, 2021, 9: 84034−84044. doi: 10.1109/ACCESS.2021.3087510
  • 48.
    Kim, M.; Park, B.; Chi, S. Accelerator-aware fast spatial feature network for real-time semantic segmentation. IEEE Access, 2020, 8: 226524−226537. doi: 10.1109/ACCESS.2020.3045147
  • 49.
    Wang, W.F.; Fu, Y.J.; Pan, Z.J.; et al. Real-time driving scene semantic segmentation. IEEE Access, 2020, 8: 36776−36788. doi: 10.1109/ACCESS.2020.2975640
  • 50.
    Zhou, Q.; Wang, Y.; Fan, Y.W.; et al. AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl. Soft Comput., 2020, 96: 106682. doi: 10.1016/j.asoc.2020.106682
  • 51.
    Zhang, Z.P.; Zhang, K.P. FarSee-Net: Real-time semantic segmentation by efficient multi-scale context aggregation and feature space super-resolution. In Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May 2020–31 August 2020; IEEE: New York, 2020; pp. 8411–8417. doi: 10.1109/ICRA40945.2020.9196599
  • 52.
    Li, H.C.; Xiong, P.F.; Fan, H.Q.; et al. DFANet: Deep feature aggregation for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, 2019; pp. 9514–9523. doi: 10.1109/CVPR.2019.00975
  • 53.
    Han, H.Y.; Chen, Y.C.; Hsiao, P.Y.; et al. Using channel-wise attention for deep CNN based real-time semantic segmentation with class-aware edge information. IEEE Trans. Intell. Transp. Syst., 2021, 22: 1041−1051. doi: 10.1109/TITS.2019.2962094
  • 54.
    Li, G.; Jiang, S.L.; Yun, I.; et al. Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes. IEEE Access, 2020, 8: 27495−27506. doi: 10.1109/ACCESS.2020.2971760
  • 55.
    Hu, X.G.; Wang, H.B. Efficient fast semantic segmentation using continuous shuffle dilated convolutions. IEEE Access, 2020, 8: 70913−70924. doi: 10.1109/ACCESS.2020.2987080
  • 56.
    Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016; ICLR: San Juan, Puerto Rico, 2016
Share this article:
How to Cite
Zhang, Y.; Zhang, X.; Miao, D.; Yu, H. Real-Time Semantic Segmentation of Road Scenes via Hybrid Dilated Grouping Network. International Journal of Network Dynamics and Intelligence 2025, 4 (1), 100006. https://doi.org/10.53941/ijndi.2025.100006.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2025 by the authors.