2506000739
  • Open Access
  • Article
High-Dimensional Independence Testing via Maximum and Average Distance Correlations
  • Cencheng Shen 1, *,   
  • Yuexiao Dong 2

Received: 22 Jan 2025 | Revised: 01 Jun 2025 | Accepted: 05 Jun 2025 | Published: 12 Jun 2025

Abstract

This paper investigates the utilization of maximum and average distance correlations for multivariate independence testing. We characterize their consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions, compare the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure. The resulting tests are non-parametric and applicable to both Euclidean distance and the Gaussian kernel as the underlying metric. To better understand the practical use cases of the proposed tests, we evaluate the empirical performance of the maximum distance correlation, average distance correlation, and the original distance correlation across various multivariate dependence scenarios, as well as conduct a real data experiment to test the presence of various cancer types and peptide levels in human plasma.

References 

  • 1.
    Pearson, K. Notes on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 1895, 58, 240–242.
  • 2.
    Szekely, G.; Rizzo, M.; Bakirov, N. Measuring and testing independence by correlation of distances. Ann. Stat. 2007, 35, 2769–2794.
  • 3.
    Szekely, G.; Rizzo, M. Brownian distance covariance. Ann. Appl. Stat. 2009, 3, 1233–1303.
  • 4.
    Gretton, A.; Herbrich, R.; Smola, A.; et al. Kernel methods for measuring independence. J. Mach. Learn. Res. 2005, 6, 2075–2129.
  • 5.
    Gretton, A.; Gyorfi, L. Consistent nonparametric tests of independence. J. Mach. Learn. Res. 2010, 11, 1391–1423.
  • 6.
    Li, R.; Zhong, W.; Zhu, L. Feature screening via distance correlation learning. J. Am. Stat. Assoc. 2012, 107, 1129–1139.
  • 7.
    Zhong, W.; Zhu, L. An iterative approach to distance correlation-based sure independence screening. J. Stat. Comput. Simul. 2015, 85, 2331–2345.
  • 8.
    Shen, C.; Wang, S.; Badea, A.; et al. Discovering the signal subgraph: An iterative screening approach on graphs. Pattern Recognit. Lett. 2024, 184, 97–102.
  • 9.
    Zhang, Q. On the properties of distance covariance for categorical data: Robustness, sure screening, and approximate null distributions. Scand. J. Stat. 2025, 52, 777–804.
  • 10.
    Zhou, Z. Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Ser. Anal. 2012, 33, 438–457.
  • 11.
    Fokianos, K.; Pitsillou, M. Testing independence for multivariate time series via the auto-distance correlation matrix. Biometrika 2018, 105, 337–352.
  • 12.
    Shen, C.; Chung, J.; Mehta, R.; et al. Independence testing for temporal data. Trans. Mach. Learn. Res. 2024. Available online: https://openreview.net/forum?id=jv1aPQINc4 (accessed on 1 January 2025).
  • 13.
    Fukumizu, K.; Gretton, A.; Sun, X.; et al. Kernel measures of conditional dependence. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007.
  • 14.
    Szekely, G.; Rizzo, M. Partial distance correlation with methods for dissimilarities. Ann. Stat. 2014, 42, 2382–2412.
  • 15.
    Wang, X.; Pan, W.; Hu, W.; et al. Conditional Distance Correlation. J. Am. Stat. Assoc. 2015, 110, 1726–1734.
  • 16.
    Edelmann, D.; Goeman, J.Aregressionperspective on generalized distance covariance and the hilbert–Schmidt independence criterion. Stat. Sci. 2022, 37, 562–579.
  • 17.
    Panda, S.; Shen, C.; Perry, R.; et al. Universally consistent k-sample tests via dependence measures. Stat. Probab. Lett. 2025, 216, 110278.
  • 18.
    Szekely, G.; Rizzo, M. Hierarchical clustering via joint between-within distances: Extending ward’s minimum variance method. J. Classif. 2005, 22, 151–183.
  • 19.
    Rizzo, M.; Szekely, G. DISCO analysis: A nonparametric extension of analysis of variance. Ann. Appl. Stat. 2010, 4, 1034–1055.
  • 20.
    Lee, Y.; Shen, C.; Priebe, C.E.; Vogelstein, J.T. Network dependence testing via diffusion maps and distance-based correlations. Biometrika 2019, 106, 857–873.
  • 21.
    Shen, C.; Arroyo, J.; Xiong, J.; et al. Community correlations and testing independence between binary graphs. arXiv 2025, arXiv:1906.03661.
  • 22.
    Guo, D.; Wang, C.; Wang, B.; et al. Learning fair representations via distance correlation minimization. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 2139–2152.
  • 23.
    Zhen, X.; Meng, Z.; Chakraborty, R.; et al. On the versatile uses of partial distance correlation in deep learning. In Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; pp. 327–346.
  • 24.
    Ramdas, A.; Reddi, S.J.; B. P’oczos; et al. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015.
  • 25.
    Zhu, C.; Yao, S.; Zhang, X.; et al. Distance-based and rkhs-based dependence metrics in high dimension. Ann. Stat. 2020, 48, 3366–3394.
  • 26.
    Zhu, L.; Xu, K.; Li, R.; et al. Projection correlation between two random vectors. Biometrika 2017, 104, 829–843.
  • 27.
    Huang, C.; Huo, X. A statistically and numerically efficient independence test based on random projections and distance covariance. Front. Appl. Math. Stat. 2022, 7, 779841.
  • 28.
    Sejdinovic, D.; Sriperumbudur, B.; Gretton, A.; et al. Equivalence of distance-based and rkhs-based statistics in hypothesis testing. Ann. Stat. 2013, 41, 2263–2291.
  • 29.
    Shen, C.; Vogelstein, J.T. The exact equivalence of distance and kernel methods in hypothesis testing. AStA Adv. Stat. Anal. 2021, 105, 385–403.
  • 30.
    Lyons, R. Distance covariance in metric spaces. Ann. Probab. 2013, 41, 3284–3305.
  • 31.
    Lyons, R. Errata to “distance covariance in metric spaces”. Ann. Probab. 2018, 46, 2400–2405.
  • 32.
    Good, P. Permutation, Parametric, and Bootstrap Tests of Hypotheses; Springer: New York, NY, USA, 2005.
  • 33.
    Huo, X.; Szekely, G. Fast computing for distance covariance. Technometrics 2016, 58, 435–447.
  • 34.
    Chaudhuri, A.; Hu, W. A fast algorithm for computing distance correlation. Comput. Stat. Data Anal. 2019, 135, 15–24.
  • 35.
    Shen, C.; Panda, S.; Vogelstein, J.T. The chi-square test of distance correlation. J. Comput. Graph. Stat. 2022, 31, 254–262.
  • 36.
    Zhang, Q.; Filippi, S.; Gretton, A.; et al. Large-scale kernel methods for independence testing. Stat. Comput. 2018, 28, 113–130.
  • 37.
    Szekely, G.; Rizzo, M. The distance correlation t-test of independence in high dimension. J. Multivar. Anal. 2013, 117, 193–213.
  • 38.
    Wang, Q.; Chaerkady, R.; Wu, J.; et al. Mutant proteins as cancer-specific biomarkers. Proc. Natl. Acad. Sci. USA 2011, 108, 2444–2449.
  • 39.
    Wang, Q.; Zhang, M.; Tomita, T.; et al. Selected reaction monitoring approach for validating peptide biomarkers. Proc. Natl. Acad. Sci. USA 2017, 114, 13519–13524.
  • 40.
    Vogelstein, J.T.; Bridgeford, E.W.; Wang, Q.; et al. Discovering and deciphering relationships across disparate data modalities. eLife 2019, 8, e41690.
Share this article:
How to Cite
Shen, C.; Dong, Y. High-Dimensional Independence Testing via Maximum and Average Distance Correlations. Applied Statistical Analysis and Computing 2025, 1 (1), 100003.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2025 by the authors.