Skip to contents

Computes the Chatterjee correlation coefficient, a robust measure of dependence. Unlike classical correlation coefficients (Pearson, Spearman), Chatterjee's coefficient measures the degree to which y is a function of x (functional dependence), capturing non-linear relationships.

Characteristics:

  • Functional Dependence: Value is 1 if and only if y is a measurable function of x.

  • Independence: Value is 0 if x and y are independent.

  • Range: The coefficient is theoretically in \([0, 1]\).

  • Asymmetry: The measure is asymmetric; \(C(X, Y) \neq C(Y, X)\). It specifically tests if $Y = f(X)$.

Usage

chatterjee_correlation(x, y)

Arguments

x

A numeric vector (the predictor/independent variable).

y

A numeric vector (the response/dependent variable).

Value

A numeric value representing the Chatterjee correlation coefficient.

A numeric vector containing:

  1. Correlation Coefficient: The Chatterjee correlation estimate.

Details

The coefficient is calculated using the ranks of y when sorted by x. This implementation computes the sample version of the coefficient as described by Chatterjee (2021).

Formula: Given pairs \((X_i, Y_i)\), sort them such that \(X_{(1)} \le \dots \le X_{(n)}\). Let \(r_i\) be the rank of \(Y_{(i)}\). The coefficient is: $$ \xi_n(X, Y) = 1 - \frac{3 \sum_{i=1}^{n-1} |r_{i+1} - r_i|}{n^2 - 1} $$

References

Chatterjee, S. (2021). A new coefficient of correlation. Journal of the American Statistical Association, 116(536), 2009-2022.

Examples

# Functional dependence (Y = X^2)
x <- runif(50, -1, 1)
y <- x^2
chatterjee_correlation(x, y) # Should be high (near 1)
#> [1] 0.03601441

# Independence
x <- runif(50)
y <- runif(50)
chatterjee_correlation(x, y) # Should be low (near 0)
#> [1] -0.05402161

# Asymmetry check
chatterjee_correlation(x, y)
#> [1] -0.05402161
chatterjee_correlation(y, x)
#> [1] -0.03721489