Downloads

Wang, X., Kan, X., Zhang, Z., & Sun, W. An automatic coke optical texture recognition method based on semantic segmentation model. International Journal of Network Dynamics and Intelligence. 2024, 3(4), 100022. doi: https://doi.org/10.53941/ijndi.2024.100022

Article

An automatic coke optical texture recognition method based on semantic segmentation model

Xialin Wang 1, Xiu Kan 1,*, Zhen Zhang 1, and Weizhou Sun 2

1 Shanghai University of Engineering Science, Shanghai 201620, China

2 Anhui University of Technology, Anhui 243002, China

* Correspondence: xiu.kan@sues.edu.cn

Received: 1 October 2023

Accepted: 30 October 2024

Published: 24 December 2024

Abstract: To solve the segmentation problem of coke optical texture in coke photomicrograph, a semantic segmentation method is proposed based on the multi-scale feature fusion and attention strategy in this paper. The multi-scale module is com­bined with convolutional block attention module (CBAM) to design a feature extraction strategy, and the Coke-Net network model is established to extract the coke optical texture from coke photomicrographs. The relationship between pixels is fully considered to refine the segmentation edge, and the extraction results with spatial consistency are output to complete the precise segmentation of the coke optical structure. The ablation experiment and contrast experiment are used to demonstrate the effectiveness of the proposed method in coke optical texture extraction.

Keywords:

coke optical texture photomicrograph semantic segmentation automatic recognition

1. Introduction

Coke is a kind of solid fuel formed by carbonization of coking coal at high temperature, and plays three major roles in blast furnace ironmaking: the pillar skeleton, heater and reducing agent [12]. The microstructure of coke is closely related to its quality. The study of coke optical texture depends on the understanding of coking coal and coke properties, and has very important significance for rapid evaluation of coke quality and guidance of coal blending and coking [36]. Coke optical texture refers to the coke stomatal wall structure presented by coke samples under the polarized light microscope at 200-500 times magnification. Coking coal produces a large number of colloids during the coking process, so that the coal particles are bonded together by interfacial bonding reactions to form the coke stomatal wall structure. Therefore, colloids’ quantity and quality are related to the properties and strength of the coke, and this directly determines the quality of coke [7, 8].

In fact, photomicrographs are widely used in geological engineering, biomedicine, material chemical science and other fields. Especially in biomedicine, watershed [9], edge detection, K-means [10], and deep learning [11] are used in cell segmentation detection. In geological engineering, N. D. Deng et al. [12] used the binary segmentation algorithm to obtain the contrast enhanced segmentation images. The Otsu algorithm [13], watershed region growing, and machine learning-based multivariant classification [14] were used to segment rock components. Coal geology is a branch of geology, and is closely related to the optical texture of coke. Coal is the basic energy source in China, and the optical texture of coke microscopic image can directly reflect the thermoplasticity of coal, which can improve the utilization rate of coal. Due to the proposal of the “double carbon” policy, significant effort is made to expand green and low-carbon industries, and use energy efficiently.

In the past few decades, many scholars have used image processing methods to analyze and study the microstructure of coke, and have achieved significative results. Image segmentation algorithms represented by the mean shift method [15], watershed algorithm [16], and K-means [17] were applied to the field of coke photomicrograph analysis. In particular, H. Liu et al. used the K-means method to segment various tissue components in coke optical tissue images, where the coke optical tissue was extracted and clustered for tissue segmentation. An improved mean-shift clustering algorithm was proposed in [18] to extract the optical texture of coke, aiming at solving the adhesion of different tissue components and fuzzy boundaries in coke photomicrographs. However, the above methods still need a lot of manual intervention to improve the processing results in applications, and the segmentation accuracy of these methods has not reached the practical level.

With the development of deep learning, convolutional neural networks, with the ability of automatic feature learning, have achieved better segmentation performance than traditional image processing methods based on image segmentation [1921]. In 2014, J. Long et al. [22] proposed fully convolutional networks (FCN), opening up the application of deep learning to semantic segmentation. In 2015, O. Ronneberger et al. [23] proposed the Unet network, which uses the design idea of the FCN network and adds the deconvolution operation and jump connection to the network to enhance the ability to extract local features; Z. Chu et al. [24] combined the idea of residual connection [25] where the Unet and designed ResUnet were used to achieve the automatic segmentation of the sea-route region in satellite remote sensing images. O. Oktay et al. [26] proposed a new attention model, which introduced the attention mechanism into the Unet network for the first time, enabling the network to automatically focus on target structures of different shapes and further improving the accuracy and efficiency of pathology segmentation in medical images. Coke photomicrographs share the commonality of high resolution and high complexity with remote sensing images, and the coke optical texture is similar to medical pathologies but differs from other components only in color and texture.

Based on the above analysis, it is obvious that coke photomicrographs are visual images of colloidal structures, which can evaluate the quality of coke objectively. The correlation between coke microstructure and coke quality is required to be clarifiedd. Thus, a fast and automated coke photomicrograph analysis method needs to be designed to achieve automatic identification and characterization of each microscopic component of coke. By adding a novel multi-scale module and convolutional block attention module (CBAM) [27], a Coke-Net is established to extract coke optical textures from coke photomicrographs.

The rest of this paper is organized as follows. The proposed method is described in detail according to the processing flow in Section 1. The experimental results are presented in Section 2, and the effectiveness of the method in this paper is proved. Finally, Section 3 presents a summary of the entire paper.

2. Materials and Methods

2.1. Sample Collection and Image Acquisition

The photomicrographs of coke used in this paper are collected from coke samples that are made by professionals. The sample-making process strictly follows the standard of the China Coal Industry Association [28]. Firstly, the coal to be tested is crushed to a size of 0.1 mm. Then, vitrinite enrichment is carried out, and the enriched vitrinite particles are mixed with charred anthracite coal in a certain ratio. Finally, the charred coke blocks are cut vertically along the central axis to make coke samples. The 3 × 3 mm area of each coke sample is photographed by the mosaic photography method [29] and combined into an image with a resolution of 12659 × 12144 pixels. The image acquisition process and the environment used in this paper are shown in Figure 1. The coke photomicrographs used in this paper are shown in Figure 2.

Figure 1. (a) Image acquisition process and (b) Image acquisition.

Figure 2. Microscopic images of coke samples and partial enlargements.

2.2. Data Preparation and Image Pre-processing

In order to ensure the generalization performance of the model and the accuracy of coke optical texture extraction, the image data of this experiment is sourced with eight different coke samples made by professionals. Two coke photomicrographs with different field-of-view resolutions are taken for each sample, and each coke photomicrograph is sampled in blocks of 272 photomicrographs at 768 × 768 resolution, resulting in a final dataset containing 4352 images. The 2176 coke photomicrographs corresponding to four samples are extracted as the original training dataset, and the 2176 coke photomicrographs corresponding to the remaining four samples are used as the original testing dataset. 300 images are randomly selected from the original training dataset and divided into training dataset and validation dataset according to the 7:3 ratio. The professional is asked to label the coke optical texture in each image, where the coke optical texture is marked as 1, and the other parts are marked as 0. The process of making the dataset is shown in Figure 3.

Figure 3. Flow chart of image splitting and labeling.

2.3. Microscopic image analysis

Coke optical texture extraction: A Coke-Net is proposed which is a semantic segmentation network in the deep learning stage. The structure of the Coke-Net is shown in Figure 4. It realizes the function of extracting coke optical texture from coke photomicrographs.

Figure 4. Structure of Coke-Net

The Coke-Net refers to the Unet and consists of both asymmetric decoder and encoder. The encoder of the proposed Coke-Net is composed of four down-sampling modules, each of which includes a Multi-Scale module and a max-pooling layer. The Multi-Scale module is responsible for extracting and outputting the low-resolution feature matrix of the image. The connection module in the middle of the network consists of a Multi-Scale module and a CBAM module. The CBAM module implements the filtering of useless information in the underlying features and improves the efficiency of the network. The network encoder consists of four up-sampling modules, and each module has a Multi-Scale module and a deconvolution layer. Correspondingly, the low-resolution feature matrix is reuced to its original resolution size step by step. Since the max-pooling layer in the encoder tends to lose low-level semantic information such as the position and shape, a connection consisting of the CBAM module is set up between the encoder Multi-Scale module and the corresponding decoder Multi-Scale module. As the coke optical texture extraction problem is essentially a pixel by pixel binary classification problem, the network finally outputs a probability map through the sigmoid layer. Each position on the probability map corresponds to the pixel's probability value belonging to the coke optical texture. By thresholding the probability map, the final coke optical texture extraction results can be obtained finally. The library and parameters of the networks are shown in Table 1.

Block Layer(filter size) Channel Output size
Conv2D(3,3) 16 768 × 768
Multi-Scale Block1 Conv2D(3,3) 16 768 × 768
Multi-Scale Block9 Conv2D(3,3) 16 768 × 768
Conv2D(1,1) 16 768 × 768
CBAM Block1 768 × 768
Conv2D(3,3) 32 384 × 384
Multi-Scale Block2 Conv2D(3,3) 32 384 × 384
Multi-Scale Block8 Conv2D(3,3) 32 384 × 384
Conv2D(1,1) 32 384 × 384
CBAM Block2 384 × 384
Conv2D(3,3) 64 192 × 192
Multi-Scale Block3 Conv2D(3,3) 64 192 × 192
Multi-Scale Block7 Conv2D(3,3) 64 192 × 192
Conv2D(1,1) 64 192 × 192
CBAM Block3 192 × 192
Conv2D(3,3) 128 96 × 96
Multi-Scale Block4 Conv2D(3,3) 128 96 × 96
Multi-Scale Block6 Conv2D(3,3) 128 96 × 96
Conv2D(1,1) 128 96 × 96
CBAM Block4 96 × 96
Conv2D(3,3) 256 48 × 48
Conv2D(3,3) 256 48 × 48
Multi-Scale Block5 Conv2D(3,3) 256 48 × 48
Conv2D(1,1) 256 48 × 48
CBAM Block5 48 × 48
Table 1 Structure of Coke­Net

1) Multi-Scale block

As the size of coke optical texture is different, and its ferret diameter often changes in the range of [300μm, 100μm]. Hence, a Multi-Scale module is designed to replace the two convolution layers in the network. Taking into account the Multi-Scale module could reduce the semantic gap between different feature channel layers by fusing the feature maps in adjacent layers [30]. Thus, the Multi-Scale module is used here to improve the semantic segmentation performance of coke optical texture boundaries. As shown in Figure 5, the Multi-Scale module uses a convolution block, two serialized convolution blocks, and three serialized convolution blocks to convolve the input image. The extracted features are stitched together in dimensions and then output through a convolution block. Meanwhile, referring to the residual network, the module input is added to the dimensionality result.

Figure 5. Structure of Multi-Scale block.

Each convolutional block consists of one convolutional layer, one batch normalization layer, and one activation layer. The addition of the batch normalization layer can effectively reduce the complexity and uncertainty of the network in the training process. Moreover, the layer can effectively reduce the probability of overfitting the network during training. The activation layer itself can increase the nonlinearity of the network, which can avoid the problem of gradient disappearance to a certain extent and improve the generalization performance of the network.

2) CBAM block

CBAM reduces the interference of image background regions and other impurities, improves the detection effect of coke optical texture regions, and then improves the segmentation efficiency of the model. Thus, the approach adds the CBAM module to the designed network which implements the network's autonomous learning of feature weights in coke photomicrographs. The CBAM module consists of a channel attention part and a spatial attention part to retain more useful information. The channel attention part focuses on the features related to the optical texture in the multidimensional input features, and the spatial attention part focuses on meaningful components of the coke optical texture. The CBAM module is shown in Figure 6.

Figure 6. CBAM module architecture.

The CBAM module is located between the encoder and decoder, and ensures that the original features of coke optical texture are completely preserved in the process of network coding. In other words, the CBAM module is responsible for obtaining features from the encoder and giving weights taht are feedbacked to the corresponding decoding module. The working process of the CBAM module is as follows: firstly, average-pooling and max-pooling are conducted to the input features with the size of H × W × C, and two different spatial context descriptors are generated with the size of 1 × 1 × C. Then, both descriptors are forwarded to a shared network to produce the channel attention features to be merged together. Finally, the channel weights are normalized to [0, 1] using the sigmoid activation function, and the channel attention factor is multiplied with the input feature map to obtain the channel-weighted feature map. The channel attention factor is shown in formular (1).

(1)
(2)

The CBAM module generates spatial attention features by utilizing the inter-spatial relationship of features. Firstly, two-channel features with the size of and the size of H × W × 1 are aggregated by using the max-pooling layer and avg-pooling layer, respectively. Each denotes average-pooled features and max-pooled features across the channel. Those are then concatenated and convolved by a 7 × 7 convolution layer. The sigmoid activation function is also used to normalize the space weight to [0, 1], and obtain the spatial attention factor in formulars (3) and (4).

(3)
(4)

The specific realization of spatial attention is to multiply the input feature matrix and the channel attention coefficient directly, so that the key image features of coke optical isotropic organization are highlighted, and the network pays more attention to this region. The principle is to extract the channel features, and then realize nonlinear changes through the sigmoid activation function, so as to reflect the importance of different positions of coke optical organization for classification tasks. On the other hand, the basic idea of channel attention is to realize the weight calibration of features through global pooling of the feature matrix and feature interaction by a small network. Then, the sigmoid activation function is used to calculate the correlation degree between different dimensions of the feature matrix and the optical organization classification task of coke.

3) Loss function

In the task of coke optical texture segmentation, there are two problems in the image data set. First, from the perspective of the image data set, due to the random distribution of coke optical texture. When the original microscopic image is divided into several image samples with a resolution of 768×768, there is no coke optical texture but only image backgrounds in most samples. This results in the imbalance of positive and negative samples in the coke optical texture image data set. Second, from the perspective of the image sample, even in the sample containing both the coke optical texture and the image background. The proportion of the optical tissue is inconsistent with that of the background, and the boundary of the optical tissue is blurred, which is excessively unclear from the background.

In order to solve the above problems, this paper adopts the Focal loss function, which optimizes the common binary cross entropy loss function by introducing and weight factors.

(5)

where, represents the true value of pixels in the coke optical texture image, represents the output predicted value of the Coke-Net, represents the weight factor that balances the imbalance of positive and negative samples in the image data set, represents the modulation coefficient that controls the weights of easily and difficultly classifying pixels in a sample image.

Since the coke optical texture occupies a small proportion in the whole coke photomicrograph, the ratio between the coke optical texture and background can be balanced by setting the value of the sample balance weight factor . For the areas with fuzzy edges and low contrast, the modulation coefficient can be set to make the Coke-Net strengthen the learning and feature extraction of the edge during the training, so as to improve the edge segmentation accuracy of the coke optical texture.

3. Experimental Rresults and Discussion

All the experiments in this section are based on the following hardware configuration of the computation platform: CPU: Intel(R) Xeon(R) Silver 4208 CPU @ 2.1GHz, GPU: NVIDIA Quadro P4000, RAM: 64GB. Because of Pytorch is the most common and sTable framework, Pytorch12.0 is used as the deep learning framework. Moreover, the GPU is used to accelerate the training and testing of the network.

3.1. Evaluation Metrics

Considering the coke optical texture extraction problem as a binary classification problem of pixels on coke photomicrographs, four typical evaluation indexes: precision, recall, dice-coefficient, and accuracy, are selected to evaluate the coke optical texture extraction results. The evaluation indexes are determined as follow:

(6)
(7)
(8)
(9)

where, TP represents the area that the network correctly outputs the coke optical texture, TN represents the area that the network correctly outputs the background, FP represents the area that the network incorrectly outputs the coke optical texture, TN represents the area that the network incorrectly outputs the background. Thus, precession represents the proportion of pixels classified as coke optical texture that is indeed coke optical texture. Recall represents the proportion of pixels classified as coke optical texture that is actually coke optical texture. The dice-coefficient responds to the similarity of the extracted of coke optical texture to the real results. Accuracy represents the proportion of pixels correctly classified as coke optical texture to all pixels in the image.

3.2. Model Training

In order to improve the convergence speed of the network and further prevent overfitting, the Adam algorithm is selected as a parameter updating strategy. Adam [31] is suiTable for problems with large data or parameters, and for problems with noises. The basic idea of Adam is to calculate the network learning rate based on the first-order moment of the gradient, while simultaneously calculate the mean momentum of the second-order moment of the gradient to update the parameters. This benefits the handling of the problem of severe fluctuations in the network learning rate or inability to update parameters when the gradient noise is too large or too small. In this paper, the initial learning rate is set to be 0.0001. The batch size is set as 2 for the training phase and as 1 for the testing phase, and a total of 200 training iterations are performed. The training flow of the coke microscopic optical tissue semantic segmentation network is shown in Algorithm 1 below.

Algorithm 1: Training algorithm of coke optical texture segmentation network
Input: Train dataset: , Test dataset: , Learning rate: , loss function: Loss, maximum train epochs: N
1: while epoches < N do:
2: generate mini-batch train dataset: ;
3: train model based on back propagation algorithm: model Network();
4: inference test dataset based on current model: ;
5: calculate train loss function: ;
6: evaluate model: ;
7: if then is minimum:
8:   save model:
9:  end if
10: end while

The change curves of the accuracy and loss rate of the network model designed in this paper are shown in Figure 7. With the increase of the number of iterations, the network accuracy rate keeps rising while the loss rate keeps decreasing, and the network effect reaches the best in about 100 rounds. Therefore, the network parameters are saved as the best network model at 100 rounds.

Figure 7. Accuracy and loss rate change curve.

3.3. Experimental Results

1) Ablation experiment: To verify the effectiveness of the proposed Coke-Net model, this subsection designs ablation experiments to demonstrate that the Multi-Scale module and the CBAM module can effectively improve the segmentation performance. The experiments are conducted with the Unet as the backbone network, moreover, four networks are selected: the Unet, Unet+CBAM, Unet+Multi-Scale+CBAM, and Coke-Net. Experimental results are shown in Table 2 and Figure 8.

Network P R Dice Acc
Unet 0.923 0.938 0.932 0.931
Unet+CBAM 0.946 0.943 0.945 0.943
Unet+Multi-Scale+CBAM 0.959 0.961 0.959 0.963
Coke-Net 0.971 0.967 0.968 0.970
Table 2 Ablation experiment results

Figure 8. Ablation experiment results.

As can be seen from Table 2 and Figure 8, the addition of both the Multi-Scale module and the CBAM module can improve the network’s segmentation effect to a certain extent. Among them, the dice-coefficient and accuracy of Unet+CBAM are improved by 1.3% and 1.2% compared to the Unet, moreover, the dice-coefficient and accuracy of Unet+MultiScale+CBAM are improved by 2.7% and 3.2% compared to the Unet. The dice-coefficient and accuracy of the Coke-Net designed in this paper reach the highest among the four networks approaching 0.968 and 0.970, respectively.

2) Comparison experiment: To verify the practicality of the Coke-Net proposed in this paper, this subsection compares the Coke-Net with three existing models including the Unet++ [32], Attention-Unet [26] and Trans-Unet [33]. The Coke-Net uses the Unet as the baseline, and incorporates spatial and channel attention modules. The The Unet++ is embedded with the Unet of different depths. Attention modules are added to the Attention-Unet to suppress irrelevant information in the image so as to highlight important local features. Therefore, it is persuasive to conduct comparative experiments with the Unet++, Attention-Unet and Trans-Unet. The experimental results are shown in Table 3 and Figure 9.

Network P R Dice Acc
Unet++ 0.961 0.957 0.954 0.956
Attention-Unet 0.965 0.964 0.960 0.966
Trans-Unet 0.975 0.968 0.965 0.972
Coke-Net 0.979 0.970 0.974 0.977
Table 3 Comparison experiment results

Figure 9. Comparison experiment results.

From Table 3 and Figure 9, it can be seen that the dice-coefficient and accuracy of the Coke-Net are improved, respectively, by 2%, 1.4%, 0.9% and 2.1%, 1.1%, 0.5% compared to the Unet++, Attention-Unet and Trans-Unet. The results indicate that the Coke-Net can effectively extract coke optical texture.

To illustrate capabilities of the Coke-Net, five typical photomicrographs are selected from the testing dataset, and four networks are used for coke optical texture extraction experiments, respectively. The experimental results are shown in Figure 10. As it is shown in Figure 10, the Coke-Net works better and can effectively extract the optical texture in these five photomicrographs. The addition of the attention mechanism makes the extraction accuracy of the Atteintion Unet significantly higher than that of the Unet++, but still poses the problem of misclassification of the small areas in the 1st and the 4th photomicrographs. Due to the complex composition of coke optical texture, a small part of inert maceral derived components show sblue and purple alternately under refracted light irradiation, which will easily affect the segmentation of coke optical texture. As a result, the Unet++, Attention-Unet and Trans-Unet all fail to properly segment the boundaries of optical texture. The Coke-Net proposed in this paper can overcome the color transformation problem that is presented by coke at different angles of the incident light, and can identify coke optical texture effectively from coke photomicrographs. The extraction results are close to the actual labeling results by professionals.

Figure 10. Visual segmentation examples of different models.

4. Conclusions

In this paper, a semantic segmentation method has been proposed to solve the segmentation problem of coke optical texture in coke photomicrograph. Both Multi-Scale module and attention mechanism have been taken into consideration in the framework of the proposed semantic network model. Ablation experiments and comparison experiments have been designed showing that the dice-coefficient and accuracy of the Coke-Net have improved, respectively, by 2%, 1.4%, 0.9%, 2.3%, 0.9% and 2.1%, 1.1%, 0.5%, 2.7%, 0.7% compared to the Unet++, Attention-Unet, Tans-Unet, Unet+CBAM, and Unet+Multi-Scale+CBAM. This has demonstrated the effectiveness of the designed Coke-Net model in extracting the coke optical texture.

Author Contributions: Xialin Wang: Conceptualization, methodology, validation, investigation, drafting the original manuscript, as well as reviewing and editing the final version. Xiu Kan: Conceptualization, methodology, validation, investigation, data curation, drafting the original manuscript, and reviewing and editing the final version. Zhen Zhang: Validation, investigation, data curation, reviewing and editing all manuscript versions. Weizhou Sun: Reviewing and editing the manuscript.

Funding: Open Foundation of Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University (MKF202202).

Data Availability Statement: This article introduces a dataset production method and process, this paper did not issue a standard dataset, if the reader has data and code requirements can contact the author via e-mail (xiu.kan@sues.edu.cn).

Conflicts of Interest: The authors declare no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

References

  1. Ujisawa, Y.; Nakano, K.; Matsukura, Y.; et al. Subjects for achievement of blast furnace operation with low reducing agent rate. ISIJ Int., 2005, 45: 1379−1385. doi: 10.2355/isijinternational.45.1379
  2. Mohanty, A.; Chakladar, S.; Mallick, S.; et al. Structural characterization of coking component of an Indian coking coal. Fuel, 2019, 249: 411−417. doi: 10.1016/j.fuel.2019.03.108
  3. Meng, F.Y.; Gupta, S.; French, D.; et al. Characterization of microstructure and strength of coke particles and their dependence on coal properties. Powder Technol., 2017, 320: 249−256. doi: 10.1016/j.powtec.2017.07.046
  4. Kasai, A.; Kiguchi, J.; Kamijo, T.; et al. Degradation of coke by molten iron oxide in the cohesive zone and dripping zone of a blast furnace. Tetsu-to-Hagane, 1998, 84: 697−701. doi: 10.2355/tetsutohagane1955.84.10_697
  5. Natsui, T.; Sunahara, K.; Ujisawa, Y. Effects of gasification and smelting reduction on coke degradation. Tetsu-to-Hagane, 2006, 92: 841−848. doi: 10.2355/tetsutohagane1955.92.12_841
  6. Hiraki, K.; Hayashizaki, H.; Yamazaki, Y.; et al. The effect of changes in microscopic structures on coke strength in carbonization process. ISIJ Int., 2011, 51: 538−543. doi: 10.2355/isijinternational.51.538
  7. Donskoi, E.; Poliakov, A.; Mahoney, M.R.; et al. Novel optical image analysis coke characterisation and its application to study of the relationships between coke structure, coke strength and parent coal composition. Fuel, 2017, 208: 281−295. doi: 10.1016/j.fuel.2017.07.021
  8. Li, Q.Z.; Zhao, C.S.; Chen, X.P.; et al. Comparison of pulverized coal combustion in air and in O2/CO2 mixtures by thermo-gravimetric analysis. J. Anal. Appl. Pyrol., 2009, 85: 521−528. doi: 10.1016/j.jaap.2008.10.018
  9. Koyuncu, C.F.; Arslan, S.; Durmaz, I.; et al. Smart markers for watershed-based cell segmentation. PLoS One, 2012, 7: e48664. doi: 10.1371/journal.pone.0048664
  10. Zhang, C.C.; Xiao, X.Y.; Li, X.M.; et al. White blood cell segmentation by color-space-based K-Means clustering. Sensors, 2014, 14: 16128−16147. doi: 10.3390/s140916128
  11. Eshraghian, J.K.; Baek, S.; Levi, T.; et al. Nonlinear retinal response modeling for future neuromorphic instrumentation. IEEE Instrum. Meas. Mag., 2020, 23: 21−29. doi: 10.1109/MIM.2020.8979519
  12. Deng, N.D.; Wang, X.; Wang, F.; et al. The application of threshold segmentation algorithm in loess microstructure image analysis. In Proceedings of 2012 National Conference on Information Technology and Computer Science, Lanzhou, China, November 2012; Atlantis Press, 2012; pp. 531–533. doi: 10.2991/citcs.2012.206
  13. Qin, Y.G.; Luo, Z.Q.; Dai, Z.; et al. Three-dimensional structural imaging of rock components and methods for component segmentation and extraction. JOM, 2020, 72: 2198−2206. doi: 10.1007/s11837-020-04133-4
  14. Andrew, M. Correction to: A quantified study of segmentation techniques on synthetic geological XRM and FIB-SEM images. Comput. Geosci., 2018, 22: 1513. doi: 10.1007/s10596-018-9780-2
  15. Wang, P.Z.; Mao, X.Q.; Mao, X.F.; et al. Coke photomicrograph segmentation based on an improved mean shift method. In Proceedings of 2009 Asia-Pacific Conference on Information Processing, Shenzhen, China, 1819 July 2009; IEEE: New York, 2009; pp. 27–30. doi: 10.1109/APCIP.2009.143
  16. Poliakov, A.; Donskoi, E. Separation of touching particles in optical image analysis of iron ores and its effect on textural and liberation characterization. Eur. J. Mineral., 2019, 31: 485−505. doi: 10.1127/ejm/2019/0031-2844
  17. Liu, H.G.; Zhang, L.H.; Zhou, S.Y.; et al. Research on the method of coke optical tissue segmentation based on adaptive clustering. Int. J. Photoenergy, 2021, 2021: 4378823. doi: 10.1155/2021/4378823
  18. Zh ou, F.; Yue, G.X.; Jiang, J.G. A novel intelligent technique for recognition of coke optical texture. J. Softw., 2011, 6: 1476−1483 doi: 10.4304/jsw.6.8.1476-1483
  19. Mao, K.J.; Lu, W.; Wu, K.X.; et al. Bone age assessment method based on fine-grained image classification using multiple regions of interest. Syst. Sci. Control Eng., 2022, 10: 15−23. doi: 10.1080/21642583.2021.2018669
  20. Li, X.; Li, M.L.; Yan, P.F.; et al. Deep learning attention mechanism in medical image analysis: Basics and beyonds. Int. J. Netw. Dyn. Intell., 2023, 2: 93−116. doi: 10.53941/ijndi0201006
  21. Bai, H.Y.; Mao, J.G.; Chan, S.H.G. A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal. Neurocomputing, 2022, 508: 1−18. doi: 10.1016/j.neucom.2022.08.037
  22. Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39: 640−651. doi: 10.1109/TPAMI.2016.2572683
  23. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 59 October 2015; Springer: Berlin/Heidelberg, 2015; pp. 234–241. doi: 10.1007/978-3-319-24574-4_28
  24. Chu, Z.Q.; Tian, T.; Feng, R.Y.; et al. Sea-Land segmentation with Res-UNet and fully connected CRF. In Proceedings of 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July 2019 - 02 August 2019; IEEE: New York, 2019; pp. 3840–3843. doi: 10.1109/IGARSS.2019.8900625
  25. He, K.M.; Zhang, X.Y.; Ren, S.Q.; et al. Deep residual learning for image recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vegas, NV, USA, 27–30 June 2016; IEEE: New York, 2016; pp. 770–778. doi: 10.1109/CVPR.2016.90
  26. Oktay, O.; Schlemper, J.; Le Folgoc, L.; et al. Attention U-Net: Learning where to look for the pancreas. arXiv: 1804.03999, 2018. doi: 10.48550/arXiv.1804.03999
  27. Xu, Y.J.; Mao, Z.D.; Chen, Z.N.; et al. Context propagation embedding network for weakly supervised semantic segmentation. Multimed. Tools Appl., 2020, 79: 33925. doi: 10.1007/s11042-020-08787-9
  28. China National Coal Industry Association. Method of preparing coal samples for the coal petrographic analysis: GB/T 16773-2008. Standards Press of China, 2008 (in Chinese)
  29. Laraqui, A.; Saaidi, A.; Satori, K. MSIP: Multi-scale image pre-processing method applied in image mosaic. Mult. Tools Appl., 2018, 77: 7517−7537. doi: 10.1007/s11042-017-4659-0
  30. Fan, J.H.; Bocus, M.J.; Hosking, B.; et al. Multi-scale feature fusion: Learning better semantic segmentation for road pothole detection. In Proceedings of the 2021 IEEE International Conference on Autonomous Systems (ICAS), Montreal, QC, Canada, 11–13 August 2021; IEEE: New York, 2021. doi: 10.1109/ICAS49788.2021.9551165
  31. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; 2015
  32. Zhou, Z.W.; Siddiquee, M.M.R; Tajbakhsh, N.; et al. UNet++: A nested U-Net architecture for medical image segmentation. In Proceedings of the 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Granada, Spain, 20 September 2018; Springer: Berlin/Heidelberg, 2018, pp. 3–11. doi: 10.1007/978-3-030-00889-5_1
  33. Chen, J.N.; Lu, Y.Y.; Yu, Q.H.; et al. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv: 2102.04306, 2021. doi: 10.48550/arXiv.2102.04306