Causal Networks

Overview

I am continuing work started by recent PhD graduate Xin Zhang on the topic of causality.  Causal relationships carry with them a much stronger statement than that of association, as directionality is indicative of one entity being the direct cause of another.  Learning such relationships, especially from steady state data, is a hotly contested and debated area due to the difficulty defining when one event actually causes another.  Judea Pearl argues that causality is a man-made concept for explaining events.  His solution for studying causality involves the following three points: 1) Treat causation as a summary of behavior under interventions, 2) use equations and graphs as a mathematical langage within which causal thoughts can be represented and manipulated, and 3) treat interventions as a surgery over equations.  Under this framework, an algorithm was created called Inductive Causation (IC) to learn causal relationships from steady state data .  This algorithm is the basis for our study.

Abstract

Learning causal relationships between genes from steady state gene expression profiles is an important issue in bioinformatics and computational systems biology.  Among the developed methods, the Inductive Causation (IC) algorithm has been proven to be effective for inferring causal relationships among variables.  However, recent study in the context of gene regulatory networks shows that the IC algorithm results in low precision and recall rates.  To improve the performance, we propose two algorithms, the modified IC (mIC) algorithm and the mIC_CoD algorithm, that utilize partial prior knowledge of gene topological ordering information for learning causal relationships among genes.  We evaluate the performance of the algorithms on synthetic datasets and show that the precision and recall rates using the mIC and the mIC_CoD algorithms are significantly improved compared with those using the IC algorithm.  We also evaluate the performance of mIC algorithm against more conventional Bayesian network (BN) inference method.  The simulation study shows that the mIC algorithm outperforms the Bayesian network method in both precision and recall rates.  We further apply the algorithms on a melanoma microarray dataset, and identify several important causal relationships within a network of genes.  Among the discovered connections, the causal relationships associated with WNT5A, a gene playing an important role in melanoma, are supported by literature.  Current efforts involve further comparisons of the four algorithms (IC, mIC, mIC_CoD, BN) across synthetic data sets of varying numbers of variables and samples, as well as further validation against current real-world data.

Previous Publications

Zhang, Xin, Chitta Baral, and Seungchan Kim. “An Algorithm to Learn Causal Relations Between Genes from Steady State Data: Simulation and Its Application to Melanoma Dataset.”. 2005. 524-534.  (PDF)

Slides for previous paper (PPT)

Current Work

Some shortcomings in the current status of the project mainly involve the need to test the algorithm on scores of synthetic data sets of varying sample and variable sizes, as well as more current real data.  Thus. current efforts involve further comparisons of the four algorithms (IC, mIC, mIC_CoD, BN) across synthetic data sets of varying numbers of variables and samples, as well as further validation against current real-world data for which some sort of validation is possible.

ROCKY 08

Current progress of this study was presented by Michael P. Verdicchio at ROCKY08: The 6th annual Rocky Mountains Bioinformatics Conference presented by The International Society for Computational Biology (ISCB).  (link)  The conference ran from December 4th through the 7th, 2009 in Snowmass Village, Colorado.

Slides for ROCKY ’08 (PPT)
Poster for ROCKY ’08 (PDF 1.66MB)