Learning Statistics from ground zero: Python + Khan Academy


I think this is going to be a cool way to go about learning statistics. Started on the first few problems @ khan and they’re doing an awesome job explaining the breakdown of statistics!

so far I’ve written some short programs for sample mean, sample standard deviation, population mean, and population standard deviation

http://bpaste.net/show/22474/

 

Programmer to bioinformaticist. How I began migrating my skillset to work with data-driven biology.


STEP 0: Join up some IRC chat conversations:

  • #bioinformatics
  • #biology
  • #ai
  • #math
  • #hplusroadmap

The above resources are awesome places and TBH I’m actively recruiting people to get more of a discussion going on.

STEP 1: Check out the bug trackers for open source projects Project:

 

Running tests on biopythons GFF library


STEP 0: Run the tests!

  • $ cd biopython/gff/Tests
  • $ python test_GFFSeqIOFeatureAdder

Thats it! you’ve run the tests!

 

Installing bcbb/gff ( beta ) for gene predictions to protein sequences


STEP 0: Install the GFF library:

  • Within biopython’s default dir
  • $ mkdir BCBio && cd BCBio
  • $ git clone git@github.com/chapmanb/bcbb.git
  • $ cd bcbb/gff/
  • $ sudo python setup.py install

STEP 1: Setup the code!

  • create file: “gene_prediction_to_protein_sequences.py” … and fill it with: http://pastie.org/3177232
  • Modify the main() function call to reflect the locations of the below files
  • sample fasta
  • sample gff

STEP 2: RUN!

  • $ python gene_prediction_to_protein_sequences.py

Expect to get an output like:

output saved to: biopython_tutorials/GFF_tutorials/glimmer-proteins.fa

 

A good app is a FAST app


Getting Django on Heroku prancing 8 times faster.

 

Troubleshooting RESTful API calls on HMMER


STEP 0: Attempt to use example API call:

  • $ curl -L -H ‘Expect:’ -H ‘Accept:text/xml’ -F seqdb=pdb -F algo=phmmer -F seq=’
  • $ curl: (26) failed creating formpost data

So we’re working with a broken API call here … first off Google that error and we get back a mention of a curl “strace” operation

STEP 1: Run Curl strace

  • $ strace curl -L -H ‘Expect:’ -H ‘Accept:text/xml’ -F seqdb=pdb -F algo=phmmer -F seq=’
  • returns: http://pastie.org/3036474

Substantial amount of text there but lets work up from the bottom:

STEP 2: Problem?

  • If i had to guess ln:448
  • open(“test.seq”, O_RDONLY) = -1 ENOENT (No such file or directory)

STEP 3: HMMER expects test.seq file as input

STEP 4: Run it again!

  • $ curl -L -H ‘Expect:’ -H ‘Accept:text/xml’ -F seqdb=pdb -F algo=phmmer -F seq=’
  • Beautiful ! We get the expected output!

Note: additionally the sequence can be appended directly to the Curl HTTP request: http://pastie.org/3036798

 

Haml, Sass and Compass for Django development on Ubuntu


STEP 0: Install HamlPy

  • $ sudo easy_install hamlpy

STEP 1: Install Sass

  • $ cd ~/
  • $ git clone git://github.com/nex3/sass.git
  • $ cd sass/
  • $ rake install
  • $ cd ~/
  • $ rm -rf sass/

STEP 2: Install Compass ( for this directory structure ) NOTE: ubuntu doesnt require quotes around the dir names

  • $ gem install compass
  • $ cd django_projects/mysite/static/
  • $ compass install compass . --syntax sass --sass-dir sass --css-dir css --javascripts-dir javascripts --images-dir images
  • youll get a console output like this: http://pastie.org/3028330
  • Haml and Pythonize: http://pastie.org/3028383

STEP 3: Run Haml watcher

  • $ cd django_projects/mysite/
  • $ hamlpy-watcher templates/

STEP 4: Run Sass/Compass watcher ( in new terminal )

  • $ cd ~/django_projects/mysite/
  • $ compass watch

Notes:

  • This setup all files need to be explicitly generated from their .haml and .sass formats using the watchers
  • Uniike typical Ruby deployment run-time compilation of the css wont happen unless you’re running Ruby on your production servers
  • Using two code bases: Compass/Sass with Django and HamlPy

 

Install NumPy 1.6.1 on Ubuntu 10.04 via command line


STEP 0: Only run this if you’ve attempted previous installations ( this will clean out the needed dirs )

  • sudo rm -rf /usr/local/lib/python2.6/dist-packages/matplotlib*
  • sudo rm -rf /usr/local/lib/python2.6/dist-packages/pylab*
  • sudo rm -rf /usr/local/lib/python2.6/dist-packages/mpl_toolkits/mplot3d
  • sudo rm -rf /usr/local/lib/python2.6/dist-packages/mpl_toolkits/axes_grid
  • sudo rm -rf /usr/local/lib/python2.6/dist-packages/mpl_toolkits/axes_grid1
  • sudo rm -rf /usr/local/lib/python2.6/dist-packages/mpl_toolkits/axisartist
  • sudo rm /usr/local/lib/python2.6/dist-packages/mpl_toolkits/*.py

STEP 1: Tricky workaround to install the dependencies

  • sudo apt-get build-dep python-matplotlib

STEP 2: Now we remove the old NumPy, while the installed dependencies remain

  • sudo apt-get remove python-numpy

STEP 3: Download NumPy 1.6.1

  • cd ~/
  • wget http://downloads.sourceforge.net/project/numpy/NumPy/1.6.1/numpy-1.6.1.tar.gz

STEP 4: Install NumPy 1.6.1

  • tar xzvf numpy-1.6.1.tar.gz
  • cd numpy-1.6.1
  • python setup.py build
  • sudo python setup.py install

… And you’re all done!

Thanks to the original post on the ubuntu forums for this clever little work around and now to the hacking!

http://ubuntuforums.org/showthread.php?t=1573925

 

Python GHMM on Ubuntu 10.04 installation


STEP 0: Install prerequisite packages

  • $ sudo apt-get install python-dev libxml2-dev swig libtool

STEP 1: Install GHMM

  • $ svn co https://ghmm.svn.sourceforge.net/svnroot/ghmm/trunk/ghmm
  • $ cd ghmm/
  • $ sh autogen.sh
  • $ ./configure
  • $ make
  • $ sudo make install

additional troubleshooting here:
http://www.linuxquestions.org/questions/linux-software-2/ghmm-library-277690/

original library:
http://ghmm.sourceforge.net/installation.html

Keywords: General Hidden Markov Model, Python, Artificial intelligence

 

The evolution of the high-throughput biology lab


The evolution of the biology lab:
- entirely automated
- no human interaction with physical equipment
- run from a UI or an API
- remove human error through compiling / error checking before running a protocol

Think of it in terms of a computer: to interact with it, and use it to some ends.. you don’t need to be working INSIDE of it. Humans in a lab only create errors.

The above information I see as being clear, if you have some say on how I might be wrong, please voice it.

However, my primary assertion is this:

The evolution of the most powerful systems in biological research ( and thus the individuals who do the most damage ) will abstract scientific protocols to a programming language.

Yes, scientists today like pretty interfaces ( they don’t want to have to think… and this is a self-detrimental post coming from a designer ).  However when you factor in the complex nature of biological research and the need for multiplexing. Why do we have 96 well microplates, which now are being replaced with 384 or even 1536 well plates?

Multiplexing.

The systems as they are engineered today exist to make small steps towards automation, and slowly automate the research process.  The moment we have a system capable of cross-automating all of the biological research done in a lab…

The very best biologists will be programmers.

Current downfall:

Entirely cross-automating / designing all systems to work together seamlessly.

http://images.sciencedaily.com/2009/04/090402143451-large.jpg