7 Eylül 2015 Pazartesi

ProtAnnot -- Highlight sequence variants that might explain your weird masses


In higher organisms proteins commonly have a ton of different forms. Splicing events are very happy to take a protein that has multiple functions and cleave out one of them to make a more specific protein. Of course...these cleavages occur at the genetic level and don't follow the same rules as trypsin. To detect these events with proteomics you have two choices -- the first is Top Down and the second is shotgun proteomics with a database that knows about the alternative sequences.

ProtAnnot is a new tool described in this open access paper by Tarun Mall et al., that is an add-in for the Integrated Genome Browser (IGB).  It highlights your alternative proteoforms within a sequence. I especially like the trick it does with data processing. So your normal session of IGB isn't interrupted in any way, if you choose to use ProtAnnot it fires up an extra thread on your server automatically to do its computations.

If you just can't get the masses of your protein to line up or get that last bit of sequence coverage, this tool might be exactly what you need.

6 Eylül 2015 Pazar

GOFDR! Analyzing proteomics data from the gene ontology level


Shotgun proteomics is amazing at identifying peptide spectral matches (PSMs). This is what we get out of the instrument: an MS/MS spectra that we can match to something with high confidence to something in our database.  The tricky part is getting relevant biological data back out. Figuring out exactly what PSM belongs to what peptide and what peptide belongs to which protein is the hard part. Evolution is working against us here -- it is much easier from a biological standpoint to make proteins with new functions from similar protein than it is to make a new one from scratch.

There are some really clever people thinking about other ways of inferring biological data out and I think we'll be hearing about a lot of it soon.  One new (to me!) approach is called GOFDR and its from Qiangtian Gong et al., and is described in this new paper here.

The idea is this: cut out the middlemen. That is, we've got the PSM confidently identified. If it is from a conserved region of a protein why would we bother going all the way through trying to infer which peptide and protein it is from. Chances are if its a PSM that matches multiple different proteins that those proteins are at least similar in their function. Thats the gene ontology part.

Example: This drug leads to upregulation of this peptide that can be linked to one of 60 different actin variants? Who cares what one it is, it sounds like this drug has a cytoskeletal component!

Thats the "GO" part. The "FDR"? its cause thats the level where they want to apply the false discovery rates, at the gene [protein] ontology level.

Is it simple in this form? Not at all. To run this pipeline the data is ran through multiple programs, including PSI-BLAST. At the end they see that they really have to spend time manually adjusting their scores and thresholds. Is it an interesting way to look and to think about our data? Absolutely.

4 Eylül 2015 Cuma

Wanna MALDI at half-million resolution?


I don't have a ton of MALDI experience. A little here and there, but I've always found it fascinating. I ran one a few times in grad school and I was very turned off by having to calibrate it constantly throughout the day. That might have been the start of my TOF hatred, come to think of it...

What I want is MALDI on a modern Orbitrap, like the Fusion above.  Thats the MassTech APMALDI-HR. Some friends of mine got to mess around with one and what I hear is that it is excellent!

You can check it out here!

3 Eylül 2015 Perşembe

Intelligent optimization of search parameters for best possible data!


This is a very heavy and extremely interesting paper from a search algorithm optimization perspective. Oh, the paper in question is from Sonja Holl et al., and is available here.

You should probably read it yourself (its open access!) but I'm going to stumble through my layman-level interpretation of what I just read over Holiday Inn coffee that I think was some sort of homeopathy caffeine experiment...

The paper is essentially a meta-analysis of 6 data sets from three different types of instruments. Some come from ion traps, some come from Orbitraps, and some come from something called a Q-TOOF ;)

The goal of the study was to see how much changing the search parameters in a guided way would improve or hurt the results. And its kind of drastic. What they came up with is something that is a new optimization platform for a big and super interesting project called Taverna (will investigate!). The optimization plaform in Taverna looks at your data and determines what search parameters that you should be using for ideal levels of high quality peptide spectral matches (PSMs).

The taverna optimization platform looks at a number of variables including mass accuracy, isotopic distributions and more peptide-centric parameters like missed cleavages and enzyme fidelity. Up to this point, I was wondering why someone would re-write Preview....but then they make a sharp right turn and incorporate retention time prediction into the algorithm!  Interesting, right?!?

Another interesting plus? It appears to be a designed for server level applications!  A nice read even if your neurons aren't firing all the way!  Now its time to figure out what this Taverna thing is all about!

2 Eylül 2015 Çarşamba

Wanna know what's going on in poplar tree proteins?


If you spend a lot of time climbing trees, chances are you hate poplar trees. Wait. What I mean is: if you climbed a lot of trees as a child...because well-adjusted adults don't climb a lot of trees of course! chances are you hate poplar trees.  They grow too fast and the branches aren't nearly as strong as their width might suggest.

However, some enterprising geneticists chose a poplar tree (the western balsa wood poplar (sounds strong, right?) rather than some more appropriate climbing tree as the first one to have its genome sequenced a few years back.  This, of course, opens the poplar tree to proteomics!

In this new paper (ASAP at JPR) from Phil Loziuk et al., and linked to some guy named Muddiman, this team does a disturbingly thorough job of proteomic characterization of this tree.  They first section the internal areas of the tree into whatever passes as tree organs and then use multistage fractionation and optimized FASP to end up with nearly 10,000 unique protein groups identified on a Q Exactive. The goal of the study was to hunt down transcription factors involved in cellulose production, which is never easy to do thanks to their low copy numbers.  But when you get plant proteins down to the 10K unique level, you are going to be able to find just about anything, including transcription factors.

They pick the most interesting proteins by tree organ and develop an absolute quantification method that can be used routinely to assay the levels of the proteins most deeply involved in cellulose production.

I like this paper because its such a good story. "We set out to understand more about tree growth because its very useful for the lumber industry scientist...and here is a nice assay you can use." It really highlights how we can sit down with a scientist with a unique problem and apply our existing tools all the way to a solution.

Oh...and the sample prep/fractionation method is pretty interesting as well!

1 Eylül 2015 Salı

Experimental Null Method!


This one is really interesting and an idea that I like more the more I think about it.  The authors capture the idea really well in the first picture.

In general, the idea is this: its hard to find the biomarkers because we see hundreds of thousands or millions of things in a standard peptide ID experiment. So we eliminate a ton of stuff from contention by assessing our entire system variability (sample prep, LC, mass spec, data processing) by comparing two control groups to one another. This gives us a baseline to go on. Then the stuff thats weird in our experimental sample can be considered to have some validity if it exceeds the total variation limit within our experiment.

Now. The big question in my mind is how to I easily do this and automate it. Cause the Qu lab has a pretty great bioinformatician or three...and I don't.... but I'm gonna give this one a whirl later with some commercial tools!

Shoutout to @pitman_mark for the heads up on this cool paper!