2603003539
  • Open Access
  • Article

Statistically Dense Intervals in Binary Sequences with Applications to Assessing Local Enrichment in the Human Genome

  • Ben Galili 1,2,*,   
  • Ofri Kutchinsky 1,   
  • Shahar Mor 2,   
  • Zohar Yakhini 1,2

Received: 08 Sep 2025 | Revised: 23 Mar 2026 | Accepted: 31 Mar 2026 | Published: 21 Apr 2026

Abstract

Statistical enrichment tools are highly useful in biological research. Current approaches to statistical enrichment in ranked or ordered lists are either limited to fixed thresholds or, as in GSEA and GOrilla, are limited to the list’s suffix (prefix). These methods assess the extreme density of 1s on either side of the binary vector. Statistical significance can be assessed using, e.g., variants of the Wilcoxon Rank-Sum Test and the mHG statistic. In this work, we extend the mHG approach to address enrichment within any index interval of the binary vector. We define and partially characterize related distributions under a uniform null model. Our partial characterization yields useful bounds for extreme events. We provide a software tool to the community that implements the method in Python. Finally, we analyze several example use cases and describe the results. We show, for example, that lung cancer differential expression, comparing ADC to other types, is enriched in a region of Chromosome 3. This example illustrates a typical use case for imHG: assessing enriched intervals for any set of genes of interest. We provide a Python implementation, imHG, for finding and reporting enriched genomic intervals with any given list of genes of interest.

References 

  • 1.

    Eden, E.; Lipson, D.; Yogev, S.; et al. Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol. 2007, 3, e39.

  • 2.

    Subramanian, A.; Tamayo, P.; Mootha, V.K.; et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550.

  • 3.

    Liberzon, A.; Subramanian, A.; Pinchback, R.; et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011, 27, 1739–1740.

  • 4.

    Eden, E.; Navon, R.; Steinfeld, I.; et al. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinform. 2009, 10, 48.

  • 5.

    The Gene Ontology Consortium. Available online: https://geneontology.org (accessed on 20 January 2026).

  • 6.

    Leibovich, L.; Yakhini, Z. Efficient motif search in ranked lists and applications to variable gap motifs. Nucleic Acids Res. 2012, 40, 5832–5847.

  • 7.

    Street, K.; Risso, D.; Fletcher, R.B.; et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 2018, 19, 477.

  • 8.

    Bar-Joseph, Z.; Gitter, A.; Simon, I. Analyzing time series gene expression data. Bioinformatics 2004, 20, 2493–2503. 

  • 9.

    Rapoport, R.; Greenberg, A.; Yakhini, Z.; et al. A Cyclic Permutation Approach to Removing Spatial Dependency between Clustered Gene Ontology Terms. Biology 2024, 13, 175.

  • 10.

    Dixon, J.R.; Selvaraj, S.; Yue, F.; et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 2012, 485, 376–380.

  • 11.

    Caron, H.; van Schaik, B.; van der Mee, M.; et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 2001, 291, 1289–1292.

  • 12.

    Ben-Elazar, S.; Yakhini, Z.; Yanai, I. Spatial localization of co-regulated genes exceeds genomic gene clustering in the Saccharomyces cerevisiae genome . Nucleic Acids Res. 2013, 41, 2191–2201.

  • 13.

    Ben-Elazar, S.; Chor, B.; Yakhini, Z. The Functional 3D Organization of Unicellular Genomes. Sci. Rep. 2019, 9, 12734.

  • 14.

    Kariti, H.; Feld, T.; Kaplan, N. Hypothesis-driven probabilistic modelling enables a principled perspective of genomic compartments. Nucleic Acids Res. 2023, 51, 1103–1119.

  • 15.

    Golov, A.K.; Gavrilov, A.A.; Kaplan, N.; et al. A genome-wide nucleosome-resolution map of promoter-centered interactions in human cells corroborates the enhancer-promoter looping model. eLife 2024, 13, RP91596.

  • 16.

    Roth, R. Introduction to Coding Theory; Cambridge University Press: Cambridge, UK, 2006.

  • 17.

    Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006.

  • 18.

    Wagner, F. GO-PCA: an unsupervised method to explore gene expression data using prior knowledge. PloS ONE 2015, 10, e0143196.

  • 19.

    Sung, H.; Ferlay, J.; Siegel, R.L.; et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021, 71, 209–249.

  • 20.

    Niederhuber, J.E.; Armitage, J.O.; Kastan, M.B.; et al. (Eds.) Abeloff's Clinical Oncology; Elsevier: Amsterdam, The Netherlands, 2020.

  • 21.

    Galili, B.; Tekpli, X.; Kristensen, V.N.; et al. Efficient gene expression signature for a breast cancer immuno-subtype. PLoS ONE 2021, 16, e0245215.

  • 22.

    Biomart Datasets, Human Genes (GRCh38.P14). 2023. Available online: https://www.ensembl.org/biomart/martview/d907afbf9d849b84f97a226d8f032d6b (accessed on 30 June 2024).

  • 23.

    Ritchie Lab Visualization. Available online: http://visualization.ritchielab.org/phenograms/plot (accessed on 30 June 2024).

  • 24.

    Botling, J.; Edlund, K.; Lohr, M.; et al. Biomarker discovery in non-small cell lung cancer: integrating gene expression profiling, meta-analysis, and tissue microarray validation. Clin. Cancer Res. 2013, 19, 194–204.

  • 25.

    Jabs, V.; Edlund, K.; K¨onig, H.; et al. Integrative analysis of genome-wide gene copy number changes and gene expression in non-small cell lung cancer. PLoS ONE 2017, 12, e0187246.

  • 26.

    Lohr, M.; Hellwig, B.; Edlund, K.; et al. Identification of sample annotation errors in gene expression datasets. Arch. Toxicol. 2015, 89, 2265–2272.

  • 27.

    Goldmann, T.; Marwitz, S.; Nitschkowski, D.; et al. PD-L1 amplification is associated with an immune cell rich phenotype in squamous cell cancer of the lung. Cancer Immunol. Immunother. 2021, 70, 2577–2587.

  • 28.

    Khadse, A.; Haakensen, V.D.; Silwal-Pandit, L.; et al. Prognostic significance of the loss of heterozygosity of KRAS in early-stage lung adenocarcinoma. Front. Oncol. 2022, 12, 873532.

  • 29.

    Hou, J.; Aerts, J.; den Hamer, B.; et al. Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS ONE 2010, 5, e10312.

  • 30.

    Rousseaux, S.; Debernardi, A.; Jacquiau, B.; et al. Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers. Sci. Transl. Med. 2013, 5, 186ra66.

  • 31.

    Der, S.D.; Sykes, J.; Pintilie, M.; et al. Validation of a histology-independent prognostic gene signature for early-stage, non-small-cell lung cancer including stage IA patients. J. Thorac. Oncol. 2014, 9, 59–64.

  • 32.

    Lee, E.S.; Son, D.S.; Kim, S.H.; et al. Prediction of recurrence-free survival in postoperative non-small cell lung cancer patients by using an integrated model of clinical information and gene expression. Clin. Cancer Res. 2008, 14, 7397–7404.

  • 33.

    Zabarovsky, E.R.; Lerman, M.I.; Minna, J.D. Tumor suppressor genes on chromosome 3p involved in the pathogenesis of lung and other cancers. Oncogene 2002, 21, 6915–6935.

  • 34.

    Dehan, E.; Ben-Dor, A.; Liao, W.; et al. Chromosomal aberrations and gene expression profiles in non-small cell lung cancer. Lung Cancer 2007, 56, 175–184.

  • 35.

    Zhu, C.Q.; Ding, K.; Strumpf, D.; et al. Prognostic and predictive gene signature for adjuvant chemotherapy in resected non-small-cell lung cancer. J. Clin. Oncol. 2010, 28, 4417–4424.

  • 36.

    Enerly, E.; Steinfeld, I.; Kleivi, K.; et al. miRNA-mRNA Integrated Analysis Reveals Roles for miRNAs in Primary Breast Tumors. PLoS ONE 2011, 6, e16915.

  • 37.

    Parker, J.S.; Mullins, M.; Cheang, M.C.; et al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes. J. Clin. Oncol. 2009, 27, 1160–1167.

  • 38.

    Prat, A.; Pineda, E.; Adamo, B.; et al. Clinical implications of the intrinsic molecular subtypes of breast cancer. Breast 2015, 24, S26–S35.

  • 39.

    Aure, M.R.; Jernstr¨om, S.; Krohn, M.; et al. Integrated analysis reveals microRNA networks coordinately expressed with key proteins in breast cancer. Genome Med. 2015, 7, 21.

  • 40.

    Haukaas, T.H.; Euceda, L.R.; Giskeødegard, G.F.; et al. Metabolic clusters of breast cancer in relation to gene- and protein expression subtypes. Cancer Metab. 2016, 4, 12.

  • 41.

    Tinholt, M.; Vollan, H.K.M.; Sahlberg, K.K.; et al. Tumor expression, plasma levels and genetic polymorphisms of the coagulation inhibitor TFPI are associated with clinicopathological parameters and survival in breast cancer, in contrast to the coagulation initiator TF. Breast Cancer Res. 2015, 17, 44.

  • 42.

    Ankill, J.; Aure, M.R.; Bjørklund, S.; et al. Epigenetic alterations at distal enhancers are linked to proliferation in human breast cancer. NAR Cancer 2022, 4, zcac008.

  • 43.

    Tekpli, X.; Lien, T.; Røssevold, A.H.; et al. An independent poor-prognosis subtype of breast cancer defined by a distinct tumor immune microenvironment. Nat. Commun. 2019, 10, 5499.

  • 44.

    Minn, A.J.; Gupta, G.P.; Padua, D.; et al. Lung metastasis genes couple breast tumor size and metastatic spread. Proc. Natl. Acad. Sci. USA 2007, 104, 6740–6745.

  • 45.

    Wang, Y.; Klijn, J.G.M.; Zhang, Y.; et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005, 365, 671–679.

Share this article:
How to Cite
Galili, B.; Kutchinsky, O.; Mor, S.; Yakhini, Z. Statistically Dense Intervals in Binary Sequences with Applications to Assessing Local Enrichment in the Human Genome. Bioinformatics Methods and Applications 2026, 1 (1), 3.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2026 by the authors.