Thursday, January 26, 2012

The annoying silent update of RUM pipeline

Largely because of my thesis project, I need to follow closely to the development of RNA-Seq mapping tools. Unfortunately, some of the tools just failed to run. RUM v1.10 [1], is one of them. 


RUM was obtained here. Latest version as of today is v.1.10
The installation step of RUM is slightly different from what you expect for bifx tools. First you have to download a install perl script from the website. Execute the install script and the programs and annotation files are fetched from Amazon cloud.


I worked with a Fastq dataset that is dynamically trimmed.
Using the RUM.runner.pl, with the parameters -variable_read_length and RUM supplied hg19 annotation. I could only map <1% of the reads onto the references, which is ridiculous. I have tried many pipelines with the same data-set and I have never seen it before. Even Bowtie / BWA alone consider 60% of the reads as uniquely mapped. 


RUM v.1.10 failed


I encountered another problem along the way of bug chasing. As mentioned, I have dynamically trimmed the Fastq reads before proceeding with RUM. The reads are thus of variable lengths. But then, to my surprise. What I got was
The fowards and reverse files are different sizes.\n$size1 versus $size2.  They should be the exact same size
Obviously, the runner script assume both forward and reverse reads fastq file must be in same size (bytes). It just failed checking. If the two files are of equals sizes, then they shouldn't be in equal size!


Fortunately a week ago I did a small scale test on another server using the same RUM version and the same data-set. Although I did not look into the details of the result, the result didn't shock me as much as today. When I started looking into the details of the runner script. It turns out the two scripts I used is different. The RUM.runner.pl, which I got today and several days ago (Yes, I have tried to install the pipeline and did some modification to the runner script for a few days already) was different from the one I downloaded last week. 


So, what happened exactly?
Obviously, the developer has updated the RUM.runner script being fetched from the cloud  by the installer script silently. So even the version is still v.1.10, the runner script is actually different. Honestly, I really appreciate the work of the tools developers for all of the mapping tools. I understand the maintenance is hard and largely voluntarily after the publication. But please make sure the scripts works at least, and, the take home message. Better documentation for the changes.


For those who wish to run RUM successful, see below
The old version which works can be obtained here: OLD.RUM.runner.pl
The difference between two runner script can be seen here 



References


[1] Grant, G. R., M. H.Farkas, et al. (2011). "Comparative analysis of RNA-Seq alignmentalgorithms and the RNA-Seq unified mapper (RUM)." Bioinformatics 27(18): 2518-2528.

No comments: