BioMart is pretty cool: http://www.biomart.org
I created an XML query using their form, narrowing the data set and species, then uploading my gene shopping list by Entrez gene IDs. I then specified that I wanted 2000bp upstream and downstream (2 separate queries) and created the XML files. I then removed line breaks in TextPad and ran the queries on a Linux machine using wget:
wget -O results.txt ‘http://www.biomart.org/biomart/martservice?query=MY_XML’
…where MY_XML is the newline-free XML query. The result (for one upstream and one downstream query) was 2 text files with 2000 nucleotides per line for each of 200 genes. I’ve sent this off to my biologist colleague to see if it’s indeed what we’re looking for, but I think it is.
My other colleague is looking at our target gene and getting a consensus sequence or a position weight matrix for the binding target sequence. We will then search for this sequence in the up/downstream sequences of our shopping list of genes and extract the best targets.
