- 1.
Khan, Y.; Hammarström, D.; Ellefsen, S.; et al. Normalization of gene expression data revisited: The three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter. BMC Bioinform. 2022, 23, 241. https://doi.org/10.1186/s12859-022-04791-y
- 2.
Li, J.; Witten, D.M.; Johnstone, I.M.; et al. Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics 2012, 13, 523–538. https://doi.org/10.1093/biostatistics/kxr031.
- 3.
Quackenbush, J. Microarray data normalization and transformation. Nat. Genet. 2002, 32, 496–501. https://doi.org/10.1038/ng1032.
- 4.
Greener, J.G.; Kandathil, S.M.; Moffat, L.; et al. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. https://doi.org/10.1038/s41580-021-00407-0.
- 5.
Kann, B.H.; Hosny, A.; Aerts, H. Artificial intelligence for clinical oncology. Cancer Cell 2021, 39, 916–927. https://doi.org/10.1016/j.ccell.2021.04.002.
- 6.
Cui, M.; Deng, F.; Disis, M.L.; et al. Advances in the Clinical Application of High-throughput Proteomics. Explor. Res. Hypothesis Med. 2024, 9, 209–220. https://doi.org/10.14218/erhm.2024.00006.
- 7.
Cui, M.; Cheng, C.; Zhang, L. High-throughput proteomics: A methodological mini-review. Lab. Investig. 2022, 102, 1170–1181. https://doi.org/10.1038/s41374-022-00830-7.
- 8.
Liu, D.D.; Zhang, L. Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001–2017. Lab. Investig. 2019, 99, 118–127. https://doi.org/10.1038/s41374-018-0125-5.
- 9.
Bhandari, N.; Walambe, R.; Kotecha, K.; et al. A comprehensive survey on computational learning methods for analysis of gene expression data. Front. Mol. Biosci. 2022, 9, 907150. https://doi.org/10.3389/fmolb.2022.907150.
- 10.
Conesa, A.; Madrigal, P.; Tarazona, S.; et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016, 17, 13. https://doi.org/10.1186/s13059-016-0881-8.
- 11.
Sharma, A.; Rani, R. A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis. Arch. Comput. Methods Eng. 2021, 28, 4875–4896. https://doi.org/10.1007/s11831-021-09556-z.
- 12.
Foltz, S.M.; Greene, C.S.; Taroni, J.N. Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously. Commun. Biol. 2023, 6, 222. https://doi.org/10.1038/s42003-023-04588-6.
- 13.
Ghandhi, S.A.; Shuryak, I.; Ponnaiya, B.; et al. Cross-platform validation of a mouse blood gene signature for quantitative reconstruction of radiation dose. Sci. Rep. 2022, 12, 14124. https://doi.org/10.1038/s41598-022-18558-1.
- 14.
Wang, G.; Kitaoka, T.; Crawford, A.; et al. Cross-platform transcriptomic profiling of the response to recombinant human erythropoietin. Sci. Rep. 2021, 11, 21705. https://doi.org/10.1038/s41598-021-00608-9.
- 15.
Angel, P.W.; Rajab, N.; Deng, Y.; et al. A simple, scalable approach to building a cross-platform transcriptome atlas. PLoS Comput. Biol. 2020, 16, e1008219. https://doi.org/10.1371/journal.pcbi.1008219.
- 16.
Franks, J.M.; Cai, G.; Whitfield, M.L. Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data. Bioinformatics 2018, 34, 1868–1874. https://doi.org/10.1093/bioinformatics/bty026.
- 17.
Ritchie, M.D.; Holzinger, E.R.; Li, R.; et al. Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 2015, 16, 85–97. https://doi.org/10.1038/nrg3868.
- 18.
Le Cao, K.A.; Rohart, F.; McHugh, L.; et al. YuGene: A simple approach to scale gene expression data derived from different platforms for integrated analyses. Genomics 2014, 103, 239–251. https://doi.org/10.1016/j.ygeno.2014.03.001.
- 19.
Pacini, C.; Dempster, J.M.; Boyle, I.; et al. Integrated cross-study datasets of genetic dependencies in cancer. Nat. Commun. 2021, 12, 1661. https://doi.org/10.1038/s41467-021-21898-7.
- 20.
Nam, A.S.; Chaligne, R.; Landau, D.A. Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat. Rev. Genet. 2021, 22, 3–18. https://doi.org/10.1038/s41576-020-0265-5.
- 21.
Sharif, M.I.; Li, J.P.; Naz, J.; et al. A comprehensive review on multi-organs tumor detection based on machine learning. Pattern Recognit. Lett. 2020, 131, 30–37.
- 22.
Thalor, A.; Kumar Joon, H.; Singh, G.; et al. Machine learning assisted analysis of breast cancer gene expression profiles reveals novel potential prognostic biomarkers for triple-negative breast cancer. Comput. Struct. Biotechnol. J. 2022, 20, 1618–1631. https://doi.org/10.1016/j.csbj.2022.03.019.
- 23.
Thompson, J.A.; Tan, J.; Greene, C.S. Cross-platform normalization of microarray and RNA-seq data for machine learning applications. PeerJ. 2016, 4, e1621. https://doi.org/10.7717/peerj.1621.
- 24.
Majid, A.; Ali, S.; Iqbal, M.; et al. Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. Comput. Methods Programs Biomed. 2014, 113, 792–808.
- 25.
Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; et al. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. https://doi.org/10.1016/j.csbj.2014.11.005
- 26.
Maldonado, S.; Weber, R.; Famili, F. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines. Inf. Sci. 2014, 286, 228–246. https://doi.org/10.1016/j.ins.2014.07.015.
- 27.
Abdulrauf Sharifai, G.; Zainol, Z. Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes 2020, 11, 717. https://doi.org/10.3390/genes11070717.
- 28.
Yijing, L.; Haixiang, G.; Xiao, L.; et al. Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl.-Based Syst. 2016, 94, 88–104. https://doi.org/10.1016/j.knosys.2015.11.013.
- 29.
Feng, C.H.; Disis, M.L.; Cheng, C.; et al. Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: Random forest and multinomial logistic regression models. Lab. Investig. 2022, 102, 236–244. https://doi.org/10.1038/s41374-021-00662-x.
- 30.
Hambali, M.A.; Oladele, T.O.; Adewole, K.S. Microarray cancer feature selection: Review, challenges and research directions. Int. J. Cogn. Comput. Eng. 2020, 1, 78–97. https://doi.org/10.1016/j.ijcce.2020.11.001.
- 31.
Zheng, Y.; Li, Y.; Wang, G.; et al. A hybrid feature selection algorithm for microarray data. J. Supercomput. 2018, 76, 3494–3526. https://doi.org/10.1007/s11227-018-2640-y.
- 32.
Bajer, D.; Zorić, B.; Dudjak, M.; et al. Evaluation and analysis of bio-inspired optimization algorithms for feature selection. In Proceedings of the 2019 IEEE 15th International Scientific Conference on Informatics, Poprad, Slovakia, 20–22 November 2019; pp. 000285–000292. https://doi.org/10.1109/Informatics47936.2019.9119256.
- 33.
Deng, F.; Zhao, L.; Yu, N.; et al. Union with recursive feature elimination: A feature selection framework to improve the classification performance of Multicategory Causes of Death in Colorectal Cancer. Lab. Investig. 2024, 104, 100320. https://doi.org/10.1016/j.labinv.2023.100320.
- 34.
Guo, H.; Li, Y.; Jennifer, S.; et al. Learning from class-imbalanced data: Review of methods and applications. Expert. Syst. Appl. 2017, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.035.
- 35.
Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Feature selection for high-dimensional data. Progress. Artif. Intell. 2016, 5, 65–75. https://doi.org/10.1007/s13748-015-0080-y.
- 36.
Hira, Z.M.; Gillies, D.F. A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Adv. Bioinform. 2015, 198363. https://doi.org/10.1155/2015/198363.
- 37.
da Conceicao Braga, L.; Goncalves, B.O.P.; Coelho, P.L.; et al. Identification of best housekeeping genes for the normalization of RT-qPCR in human cell lines. Acta Histochem. 2022, 124, 151821. https://doi.org/10.1016/j.acthis.2021.151821.
- 38.
Wang, Z.; Lyu, Z.; Pan, L.; et al. Defining housekeeping genes suitable for RNA-seq analysis of the human allograft kidney biopsy tissue. BMC Med. Genom. 2019, 12, 86. https://doi.org/10.1186/s12920-019-0538-z.
- 39.
Ai, C. A Method for Cancer Genomics Feature Selection Based on LASSO-RFE. Iran. J. Sci. Technol. Trans. A Sci. 2022, 46, 731–738. https://doi.org/10.1007/s40995-022-01292-8.
- 40.
Song, Y.; Wang, Y.; Geng, X.; et al. Novel biomarker genes for the prediction of post-hepatectomy survival of patients with NAFLD-related hepatocellular carcinoma. Cancer Cell Int. 2023, 23, 269. https://doi.org/10.1186/s12935-023-03106-2.
- 41.
Song, R.; He, S.; Wu, Y.; et al. Validation of reference genes for the normalization of the RT-qPCR in peripheral blood mononuclear cells of septic patients. Heliyon. 2023, 9, e15269. https://doi.org/10.1016/j.heliyon.2023.e15269.
- 42.
Bairakdar, M.D.; Tewari, A.; Truttmann, M.C. A meta-analysis of RNA-Seq studies to identify novel genes that regulate aging. Exp. Gerontol. 2023, 173, 112107. https://doi.org/10.1016/j.exger.2023.112107.
- 43.
Veryaskina, Y.A.; Titov, S.E.; Ivanov, M.K.; et al. Selection of reference genes for quantitative analysis of microRNA expression in three different types of cancer. PLoS ONE 2022, 17, e0254304. https://doi.org/10.1371/journal.pone.0254304.
- 44.
Echle, A.; Rindtorff, N.T.; Brinker, T.J.; et al. Deep learning in cancer pathology: A new generation of clinical biomarkers. Br. J. Cancer 2021, 124, 686–696. https://doi.org/10.1038/s41416-020-01122-x.
- 45.
Bhuva, D.D.; Cursons, J.; Davis, M.J. Stable gene expression for normalisation and single-sample scoring. Nucleic Acids Res. 2020, 48, e113. https://doi.org/10.1093/nar/gkaa802.
- 46.
Xu, L.; Luo, H.; Wang, R.; et al. Novel reference genes in colorectal cancer identify a distinct subset of high stage tumors and their associated histologically normal colonic tissues. BMC Med. Genet. 2019, 20, 138. https://doi.org/10.1186/s12881-019-0867-y.
- 47.
Yu, A.C.; Mohajer, B.; Eng, J. External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review. Radiol. Artif. Intell. 2022, 4, e210064. https://doi.org/10.1148/ryai.210064.
- 48.
Tong, L.; Wu, P.Y.; Phan, J.H.; et al. Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction. Sci. Rep. 2020, 10, 17925. https://doi.org/10.1038/s41598-020-74567-y.
- 49.
Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524.
- 50.
Jo, J.; Choi, S.; Oh, J.; et al. Conventionally used reference genes are not outstanding for normalization of gene expression in human cancer research. BMC Bioinform. 2019, 20, 245. https://doi.org/10.1186/s12859-019-2809-2.
- 51.
Faraldi, M.; Gomarasca, M.; Sansoni, V.; et al. Normalization strategies differently affect circulating miRNA profile associated with the training status. Sci. Rep. 2019, 9, 1584. https://doi.org/10.1038/s41598-019-38505-x.
- 52.
Evans, C.; Hardin, J.; Stoebel, D.M. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief. Bioinform. 2018, 19, 776–792. https://doi.org/10.1093/bib/bbx008.
- 53.
Abbas-Aghababazadeh, F.; Li, Q.; Fridley, B.L. Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing. PLoS ONE 2018, 13, e0206312. https://doi.org/10.1371/journal.pone.0206312.
- 54.
Cheng, L.; Lo, L.Y.; Tang, N.L.; et al. CrossNorm: A novel normalization strategy for microarray data in cancers. Sci. Rep. 2016, 6, 18898. https://doi.org/10.1038/srep18898.
- 55.
Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; et al. The Impact of Normalization Methods on RNA-Seq Data Analysis. Biomed. Res. Int. 2015, 2015, 621690. https://doi.org/10.1155/2015/621690.
- 56.
Schwarzenbach, H.; da Silva, A.M.; Calin, G.; et al. Data Normalization Strategies for MicroRNA Quantification. Clin. Chem. 2015, 61, 1333–1342. https://doi.org/10.1373/clinchem.2015.239459.
- 57.
Li, P.; Piao, Y.; Shon, H.S.; et al. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data. BMC Bioinform. 2015, 16, 347. https://doi.org/10.1186/s12859-015-0778-7.
- 58.
Risso, D.; Ngai, J.; Speed, T.P.; et al. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 2014, 32, 896–902. https://doi.org/10.1038/nbt.2931.
- 59.
Maza, E.; Frasse, P.; Senin, P.; et al. Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: A matter of relative size of studied transcriptomes. Commun. Integr. Biol. 2013, 6, e25849. https://doi.org/10.4161/cib.25849.
- 60.
Dillies, M.A.; Rau, A.; Aubert, J.; et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 2013, 14, 671–683. https://doi.org/10.1093/bib/bbs046.
- 61.
Hansen, K.D.; Irizarry, R.A.; Wu, Z. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 2012, 13, 204–216. https://doi.org/10.1093/biostatistics/kxr054.
- 62.
Kim, J.-H. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 2009, 53, 3735–3745. https://doi.org/10.1016/j.csda.2009.04.009.
- 63.
Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. arXiv 2018, arXiv:1811.12808. https://arxiv.org/abs/1811.12808
- 64.
Conover, W.J.; Tercero-Gómez, V.G.; Cordero-Franco, A.E. The sequential normal scores transformation. Seq. Anal. 2017, 36, 397–414. https://www.tandfonline.com/doi/abs/10.1080/07474946.2017.1360091
- 65.
Brodsky, E.; Darkhovsky, B.S. Non-Parametric Statistical Diagnosis: Problems and Methods; Springer: Dordrecht, The Netherlands, 2013. http://dx.doi.org/10.1007/978-94-015-9530-8
- 66.
Vandesompele, J.; De Preter, K.; Pattyn, F.; et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002, 3, 1–12. https://doi.org/10.1186/gb-2002-3-7-research0034
- 67.
Steinwart, I.; Christmann, A. Support Vector Machines; Springer: New York, NY, USA, 2008. https://doi.org/10.1007/978-0-387-77242-4
- 68.
Hosmer, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; Wiley: Hoboken, NJ, USA, 2013.
- 69.
Kulkarni, V.Y.; Sinha, P.K. Pruning of random forest classifiers: A survey and future directions. In Proceedings of the 2012 International Conference on Data Science & Engineering (ICDSE), Cochin, India, 18–20 July 2012. https://doi.org/10.1109/ICDSE.2012.6282329.
- 70.
Ma, B.; Meng, F.; Yan, G.; et al. Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Comput. Biol. Med. 2020, 121, 103761. https://doi.org/10.1016/j.compbiomed.2020.103761.
- 71.
Sheridan, R.P.; Wang, W.M.; Liaw, A.; et al. Extreme gradient boosting as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 2016, 56, 2353–2360. https://doi.org/10.1021/acs.jcim.6b00591.
- 72.
Karthik, S.; Sudha, M. A survey on machine learning approaches in gene expression classification in modelling computational diagnostic system for complex diseases. Int. J. Eng. Adv. Technol. 2018, 8, 182–191. https://doi.org/10.35940/ijeat.B5609.12821.
- 73.
Dunne, R.A. A statistical Approach to Neural Networks for Pattern Recognition; John Wiley & Sons: Hoboken, NJ, USA, 2007.
- 74.
Zhou, J.; Gandomi, A.H.; Chen, F.; et al. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 2021, 10, 593. https://doi.org/10.3390/electronics10050593.
- 75.
Handelman, G.S.; Kok, H.K.; Chandra, R.V.; et al. Peering into the black box of artificial intelligence: Evaluation metrics of machine learning methods. Am. J. Roentgenol. 2019, 212, 38–43. https://doi.org/10.2214/AJR.18.20224.
- 76.
Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine learning interpretability: A survey on methods and metrics. Electronics 2019, 8, 832. https://doi.org/10.3390/electronics8080832.
- 77.
Vujović, Ž. Classification model evaluation metrics. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 599–606. https://doi.org/10.14569/IJACSA.2021.0120670.
- 78.
Yu, N.; Deng, F.; Lin, Y.; et al. LIME-explained small-scale tabular transformer used for improving the classification performance of multi-category causes of death in colorectal cancer. In Proceedings of the 2023 IEEE 8th International Conference on Intelligent Informatics and Biomedical Sciences, Okinawa, Japan, 23–25 November 2023; pp. 2665–275. https://doi.org/10.1109/ICIIBMS60103.2023.10347787.
- 79.
Deng, F.; Li, S.-Q.; Zhang, X.-R.; et al. An intelligence method for recognizing multiple defects in rail. Sensors 2021, 21, 8108. https://doi.org/10.3390/s21238108.
- 80.
Deng, F.; Huang, J.; Yuan, X.; et al. Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data. Lab. Investig. 2021, 101, 430–441. https://doi.org/10.1038/s41374-020-00525-x.
- 81.
Molania, R.; Foroutan, M.; Gagnon-Bartsch, J.A.; et al. Removing unwanted variation from large-scale RNA sequencing data with PRPS. Nat. Biotechnol. 2023, 41, 82–95. https://doi.org/10.1038/s41587-022-01440-w.
- 82.
Cui, X.; Churchill, G.A. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 2003, 4, 210. https:// doi.org /10.1186/gb-2003-4-4-210
- 83.
Jiang, K.; Koob, J.; Chen, X.D.; et al. Programmable eukaryotic protein synthesis with RNA sensors by harnessing ADAR. Nat. Biotechnol. 2023, 41, 698–707. https://doi.org/10.1038/s41587-022-01534-5.
- 84.
Graf, J.; Cho, S.; McDonough, E.; et al. FLINO: A new method for immunofluorescence bioimage normalization. Bioinformatics 2022, 38, 520–526. https://doi.org/10.1093/bioinformatics/btab686.
- 85.
Lin, Y.; Golovnina, K.; Chen, Z.X.; et al. Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster. BMC Genom. 2016, 17, 28. https://doi.org/10.1186/s12864-015-2353-z.
- 86.
Venkatesh, B.; Anuradha, J. A review of feature selection and its methods. Cybern. Inf. Technol. 2019, 19, 3–26. https://doi.org/ 10.2478/cait-2019-0001
- 87.
Wu, J.; Kong, L.; Yi, M.; et al. Prediction and screening model for products based on fusion regression and xgboost classification. Comput. Intell. Neurosci. 2022, 2022, 4987639. https://doi.org/10.1155/2022/4987639.
- 88.
Hollmann, N.; Müller, S.; Eggensperger, K.; Hutter, F. Tabpfn: A transformer that solves small tabular classification problems in a second. arXiv 2022, arXiv:2207.01848. https://doi.org/10.48550/arXiv.2207.01848.
- 89.
Tian, Y.; Sun, C.; Poole, B.; et al. What makes for good views for contrastive learning? In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020.
- 90.
Khosla, P.; Teterwak, P.; Wang, C.; et al. Supervised contrastive learning. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020.
- 91.
Pan, Y.; Yao, T.; Li, Y.; et al. Transferrable prototypical networks for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2239–2247. http://doi.org/10.1109/CVPR.2019.00234
- 92.
Huang, X.; Khetan, A.; Cvitkovic, M.; et al. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv 2020, arXiv:2012.06678. https://doi.org/10.48550/arXiv.2012.06678.
- 93.
Somepalli, G.; Goldblum, M.; Schwarzschild, A.; et al. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv 2021, arXiv:2106.01342. https://doi.org/10.48550/arXiv.2106.01342.