Generate data from a random DAG.

simulate_data(
  n,
  p,
  prob_connect,
  distr = c("student_t", "gaussian", "log_normal"),
  tail_index = 1.5,
  has_confounder = FALSE,
  is_nonlinear = FALSE,
  has_uniform_margins = FALSE
)

Arguments

n

Positive integer. The number of observations, must be larger than 1.

p

Positive integer. The number of variables, must be larger than 1.

prob_connect

Numeric --- between 0 and 1. The probability that an edge \(i {\rightarrow} j\) is added to the DAG.

distr

Character. The distribution of the noise. It is one of:

  • student_t, in this case the user has to specify the tail_index, i.e., the degrees of freedom,

  • gaussian,

  • log_normal.

tail_index

Positive numeric. The tail index, i.e., degrees of freedom, of the noise.

has_confounder

Boolean. Are there confounders in the system?

is_nonlinear

Boolean. Is the data generated non linear?

has_uniform_margins

Boolean. Are the variables rescaled uniformly between 0 and 1?

Value

List. The list is made of:

  • dataset --- Numeric matrix. Dataset of simulated data with n rows and p columns (note that the hidden variables are not included in this matrix).

  • dag --- Square binary matrix. The generated DAG, including both the observed variables and the confounders, if has_confounder = TRUE.

  • pos_confounders --- Integer vector. Represents the position of confounders (rows and columns) in dag. If has_confounder = FALSE, then pos_confounders = integer(0).