A pixel's journey

Friday, November 15, 2013

Is Bioinformatician (we) free-rider?

A lesson to learn for all bioinformatician.

Dear ***,

You probably know I desperately want to write up a tool paper last year (***), at that moment my biggest fear was not B** in fact, but was the expectation that I would be placed middle-of-no-where in the authorship in their Biology paper. I guessed right.

Not sure if it is normal, but just got very frustrated in this lab. Yesterday **** summoned me to his office, told me that my position was shifted backward (from 2nd to 4th).

The situation is like that. We developed ***, we used it to discover those ******* event, all validated. Among them, the Biology lab already did some work on one of the ****** (long before we developed ***). They used that as an example to do more in depth work and submitted the ****** ****. I did the submission phase works. My position was 2nd. The 1st is a former student who devoted his PhD life on it.

The paper came back with revision. Lots of biology questions. I did my part and proposed the Bioinformatics response. I thought it was alright. Yesterday **** told me he had discussed with ******** in the *****-***** meeting, and made a decision to shift me backward, because their lab members worked hard on revision phase and did many experiments. I had no choice and said it is fair.

Maybe it is fair. I do not argue that the wet-lab team deserve important positions because they worked hard. I just couldn't believe (1) we did well and there was little Bioinformatics related question, (2) then it is my fault to contribute little in this revision phase. I also spent my last 1.5 year on this. If I didn't propose/argue with them to publish *** independently. Then what do I get now for working on this in this post-doc life?

For wet-lab members, they all have their projects. And mine is to work to help them get things out. Am I a technician?

When they worked hard on wet-lab experiment, do we just sit down and take a nap?

In ****'s office, I did not argue at all. I accepted it because it is entirely not up to me, and it was a decision they have already made. ****just re-directed the message / informed me. **** re-iterated we are not pet bioinformatician, but I think we are just *free-rider* in biology's people eye.

**** said to me such decision is to prevent the wet-lab people think we are free-rider, get it their lab and get important position in the paper. And this decision will make them relieve. So basically it means I am the one to be sacrificed for fear of morale hazard in wet-lab team. And by doing so, I can work them the wet-lab team more closely. I have no objection, but again, I know I have lots of morale to spare.

Regards,

******

Saturday, June 22, 2013

PLOS ONE: career suicide?

The biggest news to academic publishing recently is probably the release of impact factor (IF) by Thomson Reuters. The famous OA journal PLOS ONE received its biggest drop in IF and is expected to fall even lower.

There has been various discussions about PLOS ONE, for example here at Early Career Ecologists. There must be some reason why the authors decide to publish there, be it timing issue or being frustrated by the long peer-review process.

Some claim (it should be in fact) that quality of each article should be assessed independently, instead of using the impact of the journal to be a proxy. However, PLOS ONE may really be a career suicide, especially junior faculty (at least in my place). The following real scenario say it all.

An anonymous source from the Grant Panel meeting. A panel chair assessed some grant proposals, went over the PI's CV, spotted 3 PLOS ONE articles. Without going into any detail, he said something badly about the competence of the PI, like "This guy does not make quality science". I don't know the final outcome of the proposal, but it didn't look good.

Face the truth in Hong Kong. Only tenured professors can play the PLOS ONE game.

Thursday, January 17, 2013

ViralFusionSeq

Summary

ViralFusionSeq (VFS) is a versatile high-throughput sequencing (HTS) tool for discovering viral integration events and reconstruct fusion transcripts at single-base resolution. VFS combines soft-clipping information, read-pair analysis, and targeted de novo assembly to discover and annotate viral-human fusion events.

Project

The work was conceived by Jing-Woei Li, with wet-lab support by Nathalie Wong's lab.

Availability

Visit VFS page at SourceForge

Citation

Li, J.W., et al. (2013)ViralFusionSeq: Accurately discover viral integration events and reconstructfusion transcripts at single-base resolution, Bioinformatics.

Wednesday, April 4, 2012

BGI vs Dennis Lo (CUHK) on Trisomy 21 test

A big news to all who are into high throughput sequencing (HTS) and has collaboration with BGI

About a week ago, Mayo clinic won the Prometheus's patent war [1].

Briefly, medical test that rely on correlation alone is not eligible for patent. Only those with go beyond reciting the law of nature but also to apply such law can be patented.

Dennis Lo firstly found fetal DNA could be detected from maternal blood. His team published high throughput sequencing based test on Trisomy 21 in STM in 2010 and validated on a large cohort in 2011 [2] [3]. Subsequently they patented the method.

Today, CUHK has declared war on BGI, claiming they infringe their the patent on using HTS based method to diagnose trisomy 21.

CUHK claimed they possess the exclusive right on using HTS to diagnose Trisomy 21 (The patent was hold by Dennis Lo and in part by Sequenom, where Dennis Lo is in its scientific advisory board).

CUHK currently holds the patent in Hong Kong. CUHK has also filed patent in China, but that is still under review. BGI claimed CUHK will never get the patent in China (why? you guess it). BGI further claims they also contributed to the research. But note that BGI only has a name on the second BMJ paper. It is elusive how much the director of BGI could contribute to this project. 2 speculations, (1) BGI actually contributed a lot to this project but they did not get a lot of authorship, or (2) BGI has a name on it simply because of the CUHK-BGI collaboration that started several years ago.

This war raise the question again on what can be patented. Obviously on trisomy 21 subject, you got more sequence reads because of the extra chromosome. Using clinical cases to establish the normal boundary would not be that difficult. Is this HTS based method just demonstrating correlation alone? Does it hold sufficient innovation beyond reciting the law of nature?

No matter what, I am worry about the grim future collaboration between CUHK and BGI, especially the Institute of Trans-Omic?

The news about "CUHK versus BGI" is in traditional Chinese, as attached below.

無創驗唐氏症中大稱被假冒

中大醫學院研發透過抽血驗DNA的無創驗唐氏綜合症技術，去年只在港、美推出，卻在內地廣泛臨床應用。

研發者盧煜明教授昨指有人假冒其技術；但正為內地及香港醫生提供有關化驗服務的「華大基因」否認假冒，並指中大不可能在內地取得專利保護。

盧煜明：中大唯一有權測試

在華大基因的網頁可見，「無創性產前基因檢測」已在內地多個省市，包括深圳和天津廣泛推行，總理溫家寶亦曾表示支持華大的研究成果，能助天津市成為第一個世界上沒有「唐娃娃」（唐氏綜合症）城市的目標，甚至推至全國各地。

華大基因執行總監王立志承認，華大所用的核心技術與中大的一樣，且承認他們曾參與中大的有關研究，稱只是最後的學術文章由中大醫學院化學病理學系系主任盧煜明發表。

但盧煜明指現時市面有假冒測試，而中大是唯一有權做此測試的機構。他說，中大已在99年成功獲批本地專利，並在07年於內地提交申請，惟仍在審批當中。

不過，王立志指內地衞生部門的指引訂明，臨床的檢測是不會獲得批專利權，因此中大的申請是不會獲批。

華大基因：參與研究非假冒

王又承認，華大有與本港婦產科私家醫生合作，但不肯透露數目，強調所提供的報告都有ISO認證及內地認證。他稱，內地人接受檢查的費用較香港便宜，但他不肯透露實際收費。

他又說，由於他們有參與中大的研究，在美國應擁有部分專利權，相信若與中大打官司也有足夠理據，但他表示不會這樣做。

本報記者以顧客身份致電本港其中一間有提供「胎兒染色體異常的無創性產前診斷」（下稱NIFTY）服務的領峰醫務中心查詢，職員表示收費僅6,000元，更說較中大的便宜，兩星期便有結果。

職員又稱，孕婦的血液樣本會被送到深圳「華大基因」檢驗，該公司與負責婦產科的專科醫生劉子建合作已久，並稱有關技術在其他國家都有使用，只是香港未普及。

早年有份參與研發有關技術的劉子建，原為中大醫學院榮譽教授，但已被中大除名。本報向中大醫學院婦產科主任鍾國衡查詢，他否認劉子建的除名與使用技術的爭議有關。

消息稱，中大懷疑本港有醫生向病人解釋華大的報告，已屬侵犯本港的專利權，正徵詢法律意見，保留追究權利。本報昨日向劉子建查詢，但至截稿前未有回應。

References

[1] Justices Back Mayo Clinic Argument on Patents
[2] Maternal Plasma DNA Sequencing Reveals the Genome-Wide Genetic and Mutational Profile of the Fetus
[3] Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study

Thursday, January 26, 2012

The annoying silent update of RUM pipeline

Largely because of my thesis project, I need to follow closely to the development of RNA-Seq mapping tools. Unfortunately, some of the tools just failed to run. RUM v1.10 [1], is one of them.

RUM was obtained here. Latest version as of today is v.1.10
The installation step of RUM is slightly different from what you expect for bifx tools. First you have to download a install perl script from the website. Execute the install script and the programs and annotation files are fetched from Amazon cloud.

I worked with a Fastq dataset that is dynamically trimmed.
Using the RUM.runner.pl, with the parameters -variable_read_length and RUM supplied hg19 annotation. I could only map <1% of the reads onto the references, which is ridiculous. I have tried many pipelines with the same data-set and I have never seen it before. Even Bowtie / BWA alone consider 60% of the reads as uniquely mapped.

RUM v.1.10 failed

I encountered another problem along the way of bug chasing. As mentioned, I have dynamically trimmed the Fastq reads before proceeding with RUM. The reads are thus of variable lengths. But then, to my surprise. What I got was

The fowards and reverse files are different sizes.\n$size1 versus $size2. They should be the exact same size

Obviously, the runner script assume both forward and reverse reads fastq file must be in same size (bytes). It just failed checking. If the two files are of equals sizes, then they shouldn't be in equal size!

Fortunately a week ago I did a small scale test on another server using the same RUM version and the same data-set. Although I did not look into the details of the result, the result didn't shock me as much as today. When I started looking into the details of the runner script. It turns out the two scripts I used is different. The RUM.runner.pl, which I got today and several days ago (Yes, I have tried to install the pipeline and did some modification to the runner script for a few days already) was different from the one I downloaded last week.

So, what happened exactly?
Obviously, the developer has updated the RUM.runner script being fetched from the cloud by the installer script silently. So even the version is still v.1.10, the runner script is actually different. Honestly, I really appreciate the work of the tools developers for all of the mapping tools. I understand the maintenance is hard and largely voluntarily after the publication. But please make sure the scripts works at least, and, the take home message. Better documentation for the changes.

For those who wish to run RUM successful, see below
The old version which works can be obtained here: OLD.RUM.runner.pl
The difference between two runner script can be seen here

References

[1] Grant, G. R., M. H.Farkas, et al. (2011). "Comparative analysis of RNA-Seq alignmentalgorithms and the RNA-Seq unified mapper (RUM)." Bioinformatics 27(18): 2518-2528.

Monday, January 23, 2012

The SEQanswers wiki (SEQwiki) selected as the semantic MediaWiki of Jan 2012

SEQwiki is a wiki database of the available tools for analyzing high-throughput sequencing (HTS) data (it currently includes over 500 such tools), a global listing of HTS service providers (it includes over 100), and a set of tutorials explaining how to analyze HTS data to address specific biological questions. SEQwiki is created and maintained by SEQanswers

Let's see how the scientific community perceives SEQwiki,

...... the community site SEQanswers (http://seqanswers.com/forums/showthread.php?t=43) at present provide the most current census and capabilities list of almost all of the existing alignment and assembly programs [1].

SeqAnswers wiki has a frequently updated list of all types of high throughput sequencing related software (http://seqanswers.com/wiki/Special:BrowseData) including those useful for epigenomics [2].

There is a large community of users and developers for the Illumina platform; the http://seqanswers.com website is an excellent resource when starting to explore the variety of programs available for analyzing the data generated [3].

SEQwiki is available at here

For more, visit Description at Semantic Mediawiki or refer to [4]

References

[1] Flicek P, Birney E: Sense from sequence reads: methods for alignment and assembly. Nature Methods 2009, 6:S6-S12.

[2] Huss M: Introduction into the analysis of high-throughput-sequencing based epigenome data. Briefings in bioinformatics 2010, 11:512-523.

[3] Kircher M, Heyn P, Kelso J: Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics 2011, 12:382.

[4] Li J-W, Robison K, Martin M, Sjödin A, Usadel B, Young M, Olivares EC, Bolser DM. The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis. Nucleic Acids Research. 2011.