Archive for the ‘Biological Knowledge’ Category

Learning Bayesian Networks

1 Comment »

Over the summer and this semester I have been endeavoring to teach myself about Bayesian networks and their usability in modeling biological systems, and specifically gene interactions. The task has proved to be a difficult one, as my approach has not been very structured. It started with going through the 1995 tutorial by David Heckerman, but with each step I’ve had to make side journeys to fill in gaps in background knowledge.

So far in our lab seminar series (http://sysbio.fulton.asu.edu) I’ve covered the background in probability theory and in graph theory (Markov chains/blankets/equivalence, faithfulness, d-separation, etc.), and have begun discussing how to actually learn Bayesian networks from data. The last thing discussed was an intro on leaning posterior probability distributions, which will lead into learning network topology, and finally learning both.

I am posting my slides/notes from the four talks I’ve given, but take them only as an illustration of my progress and not as a good resource. For good resources, check out David Heckerman, Nir Friedman, Dana Pe’er, Eran Segal, Marco Ramoni, Andrew Moore, Jose Pena, and Daphne Koller, Richard Neapolitan, and of course the seminal text by Judea Pearl. All of those are who I am learning from.

Talk 1 (ppt) (horrible introduction, I didn’t know anything)

Talk 2 (ppt) (getting there, decent probability discussion)

Talk 3 (ppt) (getting better, decent graph discussion)

Talk 4 (pdf) (not so great, mostly review, notes only, no slides used)


Some Progress

No Comments »

BioMart is pretty cool: http://www.biomart.org

I created an XML query using their form, narrowing the data set and species, then uploading my gene shopping list by Entrez gene IDs.  I then specified that I wanted 2000bp upstream and downstream (2 separate queries) and created the XML files.  I then removed line breaks in TextPad and ran the queries on a Linux machine using wget:

wget -O results.txt ‘http://www.biomart.org/biomart/martservice?query=MY_XML’

…where MY_XML is the newline-free XML query.  The result (for one upstream and one downstream query) was 2 text files with 2000 nucleotides per line for each of 200 genes.  I’ve sent this off to my biologist colleague to see if it’s indeed what we’re looking for, but I think it is.

My other colleague is looking at our target gene and getting a consensus sequence or a position weight matrix for the binding target sequence.  We will then search for this sequence in the up/downstream sequences of our shopping list of genes and extract the best targets.


The Search Continues

No Comments »

This is a lot harder than I thought it would be. I guess there isn’t a tool pre-developed for any data acquisition one wishes. An interesting fact: I just downloaded the human genome. It’s just under 3 gigs of text. Talk about needles in a haystack.