A study in causal discovery from population-based infant birth and death records. Academic Article uri icon

abstract

  • In the domain of medicine, identification of the causal factors of diseases and outcomes, helps us formulate better management, prevention and control strategies for the improvement of health care. With the goal of exploring, evaluating and refining techniques to learn causal relationships from observational data, such as data routinely collected in healthcare settings, we focused on investigating factors that may contribute causally to infant mortality in the United States. We used the U.S. Linked Birth/Infant Death dataset for 1991 with more than four million records and about 200 variables for each record. Our sample consisted of 41,155 records randomly selected from the whole dataset. Each record had maternal, paternal and child factors and the outcome at the end of the first year--whether the infant survived or not. For causal discovery we used a modified Local Causal Discovery (LCD2) algorithm, which uses the framework of causal Bayesian Networks to represent causal relationships among model variables. LCD2 takes as input a dataset and outputs causes of the form variable X causes variable Y. Using the infant birth and death dataset as input, LCD2 output nine purported causal relationships. Eight out of the nine relationships seem plausible. Even though we have not yet discovered a clinically novel causal link, we plan to look for novel causal pathways using the full sample after refining the algorithm and developing a more efficient implementation.

publication date

  • January 1, 1999