2505000683
  • Open Access
  • Article
Normalization and Selecting Non-Differentially Expressed Genes Improve Machine Learning Modelling of Cross-Platform Transcriptomic Data
  • Fei Deng 1,   
  • Catherine H. Feng 1, 2,   
  • Nan Gao 3, 4,   
  • Lanjing Zhang 1, 4, 5, 6, *

Received: 29 Jan 2025 | Revised: 07 Apr 2025 | Accepted: 22 May 2025 | Published: 26 May 2025

Abstract

Normalization is a critical step in quantitative analyses of biological processes. Recent works show that cross-platform integration and normalization enable machine learning (ML) training on RNA microarray and RNA-seq data, but no independent datasets were used in their studies. Therefore, it is unclear how to improve ML modelling performance on independent RNA array and RNA-seq based datasets. Inspired by the house-keeping genes that are commonly used in experimental biology, this study tests the hypothesis that non-differentially expressed genes (NDEG) may improve normalization of transcriptomic data and subsequently cross-platform modelling performance of ML models. Microarray and RNA-seq datasets of the TCGA breast cancer were used as independent training and test datasets, respectively, to classify the molecular subtypes of breast cancer.  NDEG (p > 0.85) and differentially expressed genes (DEG) (p < 0.05) were selected based on the p values of ANOVA analysis and used for subsequent data normalization and classification, respectively. Models trained based on data from one platform were used for testing on the other platform. Our data show that NDEG and DEG gene selection could effectively improve the model classification performance. Normalization methods based on parametric statistical analysis were inferior to those based on nonparametric statistics. In this study, the LOG_QN and LOG_QNZ normalization methods combined with the neural network classification model seem to achieve better performance. Therefore, NDEG-based normalization appears useful for cross-platform testing on completely independent datasets. However, more studies are required to examine whether NDEG-based normalization can improve ML classification performance in other datasets and other omic data types.

References 

  • 1.
    Khan, Y.; Hammarström, D.; Ellefsen, S.; et al. Normalization of gene expression data revisited: The three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter. BMC Bioinform. 2022, 23, 241. https://doi.org/10.1186/s12859-022-04791-y
  • 2.
    Li, J.; Witten, D.M.; Johnstone, I.M.; et al. Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics 2012, 13, 523–538. https://doi.org/10.1093/biostatistics/kxr031.
  • 3.
    Quackenbush, J. Microarray data normalization and transformation. Nat. Genet. 2002, 32, 496–501. https://doi.org/10.1038/ng1032.
  • 4.
    Greener, J.G.; Kandathil, S.M.; Moffat, L.; et al. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. https://doi.org/10.1038/s41580-021-00407-0.
  • 5.
    Kann, B.H.; Hosny, A.; Aerts, H. Artificial intelligence for clinical oncology. Cancer Cell 2021, 39, 916–927. https://doi.org/10.1016/j.ccell.2021.04.002.
  • 6.
    Cui, M.; Deng, F.; Disis, M.L.; et al. Advances in the Clinical Application of High-throughput Proteomics. Explor. Res. Hypothesis Med. 2024, 9, 209–220. https://doi.org/10.14218/erhm.2024.00006.
  • 7.
    Cui, M.; Cheng, C.; Zhang, L. High-throughput proteomics: A methodological mini-review. Lab. Investig. 2022, 102, 1170–1181. https://doi.org/10.1038/s41374-022-00830-7.
  • 8.
    Liu, D.D.; Zhang, L. Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001–2017. Lab. Investig. 2019, 99, 118–127. https://doi.org/10.1038/s41374-018-0125-5.
  • 9.
    Bhandari, N.; Walambe, R.; Kotecha, K.; et al. A comprehensive survey on computational learning methods for analysis of gene expression data. Front. Mol. Biosci. 2022, 9, 907150. https://doi.org/10.3389/fmolb.2022.907150.
  • 10.
    Conesa, A.; Madrigal, P.; Tarazona, S.; et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016, 17, 13. https://doi.org/10.1186/s13059-016-0881-8.
  • 11.
    Sharma, A.; Rani, R. A Systematic Review of Applications of Machine Learning in Cancer Prediction and Diagnosis. Arch. Comput. Methods Eng. 2021, 28, 4875–4896. https://doi.org/10.1007/s11831-021-09556-z.
  • 12.
    Foltz, S.M.; Greene, C.S.; Taroni, J.N. Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously. Commun. Biol. 2023, 6, 222. https://doi.org/10.1038/s42003-023-04588-6.
  • 13.
    Ghandhi, S.A.; Shuryak, I.; Ponnaiya, B.; et al. Cross-platform validation of a mouse blood gene signature for quantitative reconstruction of radiation dose. Sci. Rep. 2022, 12, 14124. https://doi.org/10.1038/s41598-022-18558-1.
  • 14.
    Wang, G.; Kitaoka, T.; Crawford, A.; et al. Cross-platform transcriptomic profiling of the response to recombinant human erythropoietin. Sci. Rep. 2021, 11, 21705. https://doi.org/10.1038/s41598-021-00608-9.
  • 15.
    Angel, P.W.; Rajab, N.; Deng, Y.; et al. A simple, scalable approach to building a cross-platform transcriptome atlas. PLoS Comput. Biol. 2020, 16, e1008219. https://doi.org/10.1371/journal.pcbi.1008219.
  • 16.
    Franks, J.M.; Cai, G.; Whitfield, M.L. Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data. Bioinformatics 2018, 34, 1868–1874. https://doi.org/10.1093/bioinformatics/bty026.
  • 17.
    Ritchie, M.D.; Holzinger, E.R.; Li, R.; et al. Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 2015, 16, 85–97. https://doi.org/10.1038/nrg3868.
  • 18.
    Le Cao, K.A.; Rohart, F.; McHugh, L.; et al. YuGene: A simple approach to scale gene expression data derived from different platforms for integrated analyses. Genomics 2014, 103, 239–251. https://doi.org/10.1016/j.ygeno.2014.03.001.
  • 19.
    Pacini, C.; Dempster, J.M.; Boyle, I.; et al. Integrated cross-study datasets of genetic dependencies in cancer. Nat. Commun. 2021, 12, 1661. https://doi.org/10.1038/s41467-021-21898-7.
  • 20.
    Nam, A.S.; Chaligne, R.; Landau, D.A. Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat. Rev. Genet. 2021, 22, 3–18. https://doi.org/10.1038/s41576-020-0265-5.
  • 21.
    Sharif, M.I.; Li, J.P.; Naz, J.; et al. A comprehensive review on multi-organs tumor detection based on machine learning. Pattern Recognit. Lett. 2020, 131, 30–37.
  • 22.
    Thalor, A.; Kumar Joon, H.; Singh, G.; et al. Machine learning assisted analysis of breast cancer gene expression profiles reveals novel potential prognostic biomarkers for triple-negative breast cancer. Comput. Struct. Biotechnol. J. 2022, 20, 1618–1631. https://doi.org/10.1016/j.csbj.2022.03.019.
  • 23.
    Thompson, J.A.; Tan, J.; Greene, C.S. Cross-platform normalization of microarray and RNA-seq data for machine learning applications. PeerJ. 2016, 4, e1621. https://doi.org/10.7717/peerj.1621.
  • 24.
    Majid, A.; Ali, S.; Iqbal, M.; et al. Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. Comput. Methods Programs Biomed. 2014, 113, 792–808.
  • 25.
    Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; et al. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. https://doi.org/10.1016/j.csbj.2014.11.005
  • 26.
    Maldonado, S.; Weber, R.; Famili, F. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines. Inf. Sci. 2014, 286, 228–246. https://doi.org/10.1016/j.ins.2014.07.015.
  • 27.
    Abdulrauf Sharifai, G.; Zainol, Z. Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes 2020, 11, 717. https://doi.org/10.3390/genes11070717.
  • 28.
    Yijing, L.; Haixiang, G.; Xiao, L.; et al. Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl.-Based Syst. 2016, 94, 88–104. https://doi.org/10.1016/j.knosys.2015.11.013.
  • 29.
    Feng, C.H.; Disis, M.L.; Cheng, C.; et al. Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: Random forest and multinomial logistic regression models. Lab. Investig. 2022, 102, 236–244. https://doi.org/10.1038/s41374-021-00662-x.
  • 30.
    Hambali, M.A.; Oladele, T.O.; Adewole, K.S. Microarray cancer feature selection: Review, challenges and research directions. Int. J. Cogn. Comput. Eng. 2020, 1, 78–97. https://doi.org/10.1016/j.ijcce.2020.11.001.
  • 31.
    Zheng, Y.; Li, Y.; Wang, G.; et al. A hybrid feature selection algorithm for microarray data. J. Supercomput. 2018, 76, 3494–3526. https://doi.org/10.1007/s11227-018-2640-y.
  • 32.
    Bajer, D.; Zorić, B.; Dudjak, M.; et al. Evaluation and analysis of bio-inspired optimization algorithms for feature selection. In Proceedings of the 2019 IEEE 15th International Scientific Conference on Informatics, Poprad, Slovakia, 20–22 November 2019; pp. 000285–000292. https://doi.org/10.1109/Informatics47936.2019.9119256.
  • 33.
    Deng, F.; Zhao, L.; Yu, N.; et al. Union with recursive feature elimination: A feature selection framework to improve the classification performance of Multicategory Causes of Death in Colorectal Cancer. Lab. Investig. 2024, 104, 100320. https://doi.org/10.1016/j.labinv.2023.100320.
  • 34.
    Guo, H.; Li, Y.; Jennifer, S.; et al. Learning from class-imbalanced data: Review of methods and applications. Expert. Syst. Appl. 2017, 73, 220–239. https://doi.org/10.1016/j.eswa.2016.12.035.
  • 35.
    Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Feature selection for high-dimensional data. Progress. Artif. Intell. 2016, 5, 65–75. https://doi.org/10.1007/s13748-015-0080-y.
  • 36.
    Hira, Z.M.; Gillies, D.F. A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Adv. Bioinform. 2015, 198363. https://doi.org/10.1155/2015/198363.
  • 37.
    da Conceicao Braga, L.; Goncalves, B.O.P.; Coelho, P.L.; et al. Identification of best housekeeping genes for the normalization of RT-qPCR in human cell lines. Acta Histochem. 2022, 124, 151821. https://doi.org/10.1016/j.acthis.2021.151821.
  • 38.
    Wang, Z.; Lyu, Z.; Pan, L.; et al. Defining housekeeping genes suitable for RNA-seq analysis of the human allograft kidney biopsy tissue. BMC Med. Genom. 2019, 12, 86. https://doi.org/10.1186/s12920-019-0538-z.
  • 39.
    Ai, C. A Method for Cancer Genomics Feature Selection Based on LASSO-RFE. Iran. J. Sci. Technol. Trans. A Sci. 2022, 46, 731–738. https://doi.org/10.1007/s40995-022-01292-8.
  • 40.
    Song, Y.; Wang, Y.; Geng, X.; et al. Novel biomarker genes for the prediction of post-hepatectomy survival of patients with NAFLD-related hepatocellular carcinoma. Cancer Cell Int. 2023, 23, 269. https://doi.org/10.1186/s12935-023-03106-2.
  • 41.
    Song, R.; He, S.; Wu, Y.; et al. Validation of reference genes for the normalization of the RT-qPCR in peripheral blood mononuclear cells of septic patients. Heliyon. 2023, 9, e15269. https://doi.org/10.1016/j.heliyon.2023.e15269.
  • 42.
    Bairakdar, M.D.; Tewari, A.; Truttmann, M.C. A meta-analysis of RNA-Seq studies to identify novel genes that regulate aging. Exp. Gerontol. 2023, 173, 112107. https://doi.org/10.1016/j.exger.2023.112107.
  • 43.
    Veryaskina, Y.A.; Titov, S.E.; Ivanov, M.K.; et al. Selection of reference genes for quantitative analysis of microRNA expression in three different types of cancer. PLoS ONE 2022, 17, e0254304. https://doi.org/10.1371/journal.pone.0254304.
  • 44.
    Echle, A.; Rindtorff, N.T.; Brinker, T.J.; et al. Deep learning in cancer pathology: A new generation of clinical biomarkers. Br. J. Cancer 2021, 124, 686–696. https://doi.org/10.1038/s41416-020-01122-x.
  • 45.
    Bhuva, D.D.; Cursons, J.; Davis, M.J. Stable gene expression for normalisation and single-sample scoring. Nucleic Acids Res. 2020, 48, e113. https://doi.org/10.1093/nar/gkaa802.
  • 46.
    Xu, L.; Luo, H.; Wang, R.; et al. Novel reference genes in colorectal cancer identify a distinct subset of high stage tumors and their associated histologically normal colonic tissues. BMC Med. Genet. 2019, 20, 138. https://doi.org/10.1186/s12881-019-0867-y.
  • 47.
    Yu, A.C.; Mohajer, B.; Eng, J. External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review. Radiol. Artif. Intell. 2022, 4, e210064. https://doi.org/10.1148/ryai.210064.
  • 48.
    Tong, L.; Wu, P.Y.; Phan, J.H.; et al. Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction. Sci. Rep. 2020, 10, 17925. https://doi.org/10.1038/s41598-020-74567-y.
  • 49.
    Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524.
  • 50.
    Jo, J.; Choi, S.; Oh, J.; et al. Conventionally used reference genes are not outstanding for normalization of gene expression in human cancer research. BMC Bioinform. 2019, 20, 245. https://doi.org/10.1186/s12859-019-2809-2.
  • 51.
    Faraldi, M.; Gomarasca, M.; Sansoni, V.; et al. Normalization strategies differently affect circulating miRNA profile associated with the training status. Sci. Rep. 2019, 9, 1584. https://doi.org/10.1038/s41598-019-38505-x.
  • 52.
    Evans, C.; Hardin, J.; Stoebel, D.M. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief. Bioinform. 2018, 19, 776–792. https://doi.org/10.1093/bib/bbx008.
  • 53.
    Abbas-Aghababazadeh, F.; Li, Q.; Fridley, B.L. Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing. PLoS ONE 2018, 13, e0206312. https://doi.org/10.1371/journal.pone.0206312.
  • 54.
    Cheng, L.; Lo, L.Y.; Tang, N.L.; et al. CrossNorm: A novel normalization strategy for microarray data in cancers. Sci. Rep. 2016, 6, 18898. https://doi.org/10.1038/srep18898.
  • 55.
    Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; et al. The Impact of Normalization Methods on RNA-Seq Data Analysis. Biomed. Res. Int. 2015, 2015, 621690. https://doi.org/10.1155/2015/621690.
  • 56.
    Schwarzenbach, H.; da Silva, A.M.; Calin, G.; et al. Data Normalization Strategies for MicroRNA Quantification. Clin. Chem. 2015, 61, 1333–1342. https://doi.org/10.1373/clinchem.2015.239459.
  • 57.
    Li, P.; Piao, Y.; Shon, H.S.; et al. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data. BMC Bioinform. 2015, 16, 347. https://doi.org/10.1186/s12859-015-0778-7.
  • 58.
    Risso, D.; Ngai, J.; Speed, T.P.; et al. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 2014, 32, 896–902. https://doi.org/10.1038/nbt.2931.
  • 59.
    Maza, E.; Frasse, P.; Senin, P.; et al. Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: A matter of relative size of studied transcriptomes. Commun. Integr. Biol. 2013, 6, e25849. https://doi.org/10.4161/cib.25849.
  • 60.
    Dillies, M.A.; Rau, A.; Aubert, J.; et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 2013, 14, 671–683. https://doi.org/10.1093/bib/bbs046.
  • 61.
    Hansen, K.D.; Irizarry, R.A.; Wu, Z. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 2012, 13, 204–216. https://doi.org/10.1093/biostatistics/kxr054.
  • 62.
    Kim, J.-H. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 2009, 53, 3735–3745. https://doi.org/10.1016/j.csda.2009.04.009.
  • 63.
    Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. arXiv 2018, arXiv:1811.12808. https://arxiv.org/abs/1811.12808
  • 64.
    Conover, W.J.; Tercero-Gómez, V.G.; Cordero-Franco, A.E. The sequential normal scores transformation. Seq. Anal. 2017, 36, 397–414. https://www.tandfonline.com/doi/abs/10.1080/07474946.2017.1360091
  • 65.
    Brodsky, E.; Darkhovsky, B.S. Non-Parametric Statistical Diagnosis: Problems and Methods; Springer: Dordrecht, The Netherlands, 2013. http://dx.doi.org/10.1007/978-94-015-9530-8
  • 66.
    Vandesompele, J.; De Preter, K.; Pattyn, F.; et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002, 3, 1–12. https://doi.org/10.1186/gb-2002-3-7-research0034
  • 67.
    Steinwart, I.; Christmann, A. Support Vector Machines; Springer: New York, NY, USA, 2008. https://doi.org/10.1007/978-0-387-77242-4
  • 68.
    Hosmer, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; Wiley: Hoboken, NJ, USA, 2013.
  • 69.
    Kulkarni, V.Y.; Sinha, P.K. Pruning of random forest classifiers: A survey and future directions. In Proceedings of the 2012 International Conference on Data Science & Engineering (ICDSE), Cochin, India, 18–20 July 2012. https://doi.org/10.1109/ICDSE.2012.6282329.
  • 70.
    Ma, B.; Meng, F.; Yan, G.; et al. Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Comput. Biol. Med. 2020, 121, 103761. https://doi.org/10.1016/j.compbiomed.2020.103761.
  • 71.
    Sheridan, R.P.; Wang, W.M.; Liaw, A.; et al. Extreme gradient boosting as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 2016, 56, 2353–2360. https://doi.org/10.1021/acs.jcim.6b00591.
  • 72.
    Karthik, S.; Sudha, M. A survey on machine learning approaches in gene expression classification in modelling computational diagnostic system for complex diseases. Int. J. Eng. Adv. Technol. 2018, 8, 182–191. https://doi.org/10.35940/ijeat.B5609.12821.
  • 73.
    Dunne, R.A. A statistical Approach to Neural Networks for Pattern Recognition; John Wiley & Sons: Hoboken, NJ, USA, 2007.
  • 74.
    Zhou, J.; Gandomi, A.H.; Chen, F.; et al. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 2021, 10, 593. https://doi.org/10.3390/electronics10050593.
  • 75.
    Handelman, G.S.; Kok, H.K.; Chandra, R.V.; et al. Peering into the black box of artificial intelligence: Evaluation metrics of machine learning methods. Am. J. Roentgenol. 2019, 212, 38–43. https://doi.org/10.2214/AJR.18.20224.
  • 76.
    Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine learning interpretability: A survey on methods and metrics. Electronics 2019, 8, 832. https://doi.org/10.3390/electronics8080832.
  • 77.
    Vujović, Ž. Classification model evaluation metrics. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 599–606. https://doi.org/10.14569/IJACSA.2021.0120670.
  • 78.
    Yu, N.; Deng, F.; Lin, Y.; et al. LIME-explained small-scale tabular transformer used for improving the classification performance of multi-category causes of death in colorectal cancer. In Proceedings of the 2023 IEEE 8th International Conference on Intelligent Informatics and Biomedical Sciences, Okinawa, Japan, 23–25 November 2023; pp. 2665–275. https://doi.org/10.1109/ICIIBMS60103.2023.10347787.
  • 79.
    Deng, F.; Li, S.-Q.; Zhang, X.-R.; et al. An intelligence method for recognizing multiple defects in rail. Sensors 2021, 21, 8108. https://doi.org/10.3390/s21238108.
  • 80.
    Deng, F.; Huang, J.; Yuan, X.; et al. Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data. Lab. Investig. 2021, 101, 430–441. https://doi.org/10.1038/s41374-020-00525-x.
  • 81.
    Molania, R.; Foroutan, M.; Gagnon-Bartsch, J.A.; et al. Removing unwanted variation from large-scale RNA sequencing data with PRPS. Nat. Biotechnol. 2023, 41, 82–95. https://doi.org/10.1038/s41587-022-01440-w.
  • 82.
    Cui, X.; Churchill, G.A. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 2003, 4, 210. https:// doi.org /10.1186/gb-2003-4-4-210
  • 83.
    Jiang, K.; Koob, J.; Chen, X.D.; et al. Programmable eukaryotic protein synthesis with RNA sensors by harnessing ADAR. Nat. Biotechnol. 2023, 41, 698–707. https://doi.org/10.1038/s41587-022-01534-5.
  • 84.
    Graf, J.; Cho, S.; McDonough, E.; et al. FLINO: A new method for immunofluorescence bioimage normalization. Bioinformatics 2022, 38, 520–526. https://doi.org/10.1093/bioinformatics/btab686.
  • 85.
    Lin, Y.; Golovnina, K.; Chen, Z.X.; et al. Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster. BMC Genom. 2016, 17, 28. https://doi.org/10.1186/s12864-015-2353-z.
  • 86.
    Venkatesh, B.; Anuradha, J. A review of feature selection and its methods. Cybern. Inf. Technol. 2019, 19, 3–26. https://doi.org/ 10.2478/cait-2019-0001
  • 87.
    Wu, J.; Kong, L.; Yi, M.; et al. Prediction and screening model for products based on fusion regression and xgboost classification. Comput. Intell. Neurosci. 2022, 2022, 4987639. https://doi.org/10.1155/2022/4987639.
  • 88.
    Hollmann, N.; Müller, S.; Eggensperger, K.; Hutter, F. Tabpfn: A transformer that solves small tabular classification problems in a second. arXiv 2022, arXiv:2207.01848. https://doi.org/10.48550/arXiv.2207.01848.
  • 89.
    Tian, Y.; Sun, C.; Poole, B.; et al. What makes for good views for contrastive learning? In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020.
  • 90.
    Khosla, P.; Teterwak, P.; Wang, C.; et al. Supervised contrastive learning. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020.
  • 91.
    Pan, Y.; Yao, T.; Li, Y.; et al. Transferrable prototypical networks for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2239–2247. http://doi.org/10.1109/CVPR.2019.00234
  • 92.
    Huang, X.; Khetan, A.; Cvitkovic, M.; et al. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv 2020, arXiv:2012.06678. https://doi.org/10.48550/arXiv.2012.06678.
  • 93.
    Somepalli, G.; Goldblum, M.; Schwarzschild, A.; et al. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv 2021, arXiv:2106.01342. https://doi.org/10.48550/arXiv.2106.01342.
Share this article:
How to Cite
Deng, F.; Feng, C. H.; Gao, N.; Zhang, L. Normalization and Selecting Non-Differentially Expressed Genes Improve Machine Learning Modelling of Cross-Platform Transcriptomic Data. Transactions on Artificial Intelligence 2025, 1 (1), 83–104. https://doi.org/10.53941/tai.2025.100005.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2025 by the authors.