Causal discovery in heavy-tailed models • causalXtreme

The goal of causalXtreme is to provide an interface to perform causal discovery in linear structural equation models (SEM) with heavy-tailed noise. For more details see Gnecco et al. (2019, https://arxiv.org/abs/1908.05097).

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("nicolagnecco/causalXtreme")

Example

Let us generate 500 observations from a SEM of two Student-t variables, X₁ and X₂, with 1.5 degrees of freedom (i.e., heavy-tailed). When the function simulate_data is called with the default values, it returns a list containing:

the simulated dataset represented as a matrix of size n × p. Here, n = 500 is the number of observations and p = 2 is the number of variables,
the underlying directed acyclic graph (DAG) represented as an adjacency matrix dag of size p × p.

library(causalXtreme)

# basic example code
set.seed(1)
sem <- simulate_data(n = 500, p = 2, prob_connect = 0.5,
                     distr = "student_t", tail_index = 1.5)

At this point, we can look at the randomly simulated DAG.

sem$dag
#>      [,1] [,2]
#> [1,]    0    1
#> [2,]    0    0

We interpret the adjacency matrix as follows. Loosely speaking, we say that variable X_i causes variable X_j if the entry (i,j) of the adjacency matrix is equal to 1. We see that the first variable X₁ causes the second variable X₂, since the entry (1,2) of the matrix sem$dag is equal to 1. We can plot the simulated dataset.

plot(sem$dataset, pch = 20,
     xlab = "X1", ylab = "X2")

At this point, we can estimate the causal direction between X₁ and X₂ by computing the causal tail coefficients Γ₁₂ and Γ₂₁ (see Gnecco et al. 2019, Definition 1).

X1 <- sem$dataset[, 1]
X2 <- sem$dataset[, 2]

# gamma_12
causal_tail_coeff(X1, X2)
#> [1] 0.9523333

# gamma_21
causal_tail_coeff(X2, X1)
#> [1] 0.4816667

We see that the coefficient Γ₁₂ ≈ 1 (entry (1,2) of the matrix) and Γ₂₁ < 1 (entry (2,1) of the matrix). This is evidence for a causal relationship from X₁ to X₂.

We can also run the extremal ancestral search (EASE) algorithm, based on the causal tail coefficients (see Gnecco et al. 2019, sec. 3.1). The algorithm estimates from the data a causal order of the DAG.

ease(dat = sem$dataset)
#> [1] 1 2

In this case, we can see that the estimated causal order is correct, since X₁ (the cause) is placed before X₂ (the effect).

References

Gnecco, Nicola, Nicolai Meinshausen, Jonas Peters, and Sebastian Engelke. 2019. “Causal Discovery in Heavy-Tailed Models.” arXiv Preprint arXiv:1908.05097.

causalXtreme

Installation

Example

References

Links

License

Citation

Developers

Dev status