13 Aralık 2015 Pazar

proBAMsuite! Great new proteogenomics tools!


Man, I love a software package with a catchy title. And I love a free software package that has a ton of promise!  proBAMsuite has all of these things!

Is a set of R tools that are meant to help you integrate the data from your next gen sequencing files with your LC-MS/MS spectra. This is an overview of the steps involved.


Of course, the process isn't trivial. The RNAseq data needs to be lined up and QC'ed and so do the MS/MS spectra and the PSMs and the Peptide matches. When we're looking at millions of measurements the number of false discoveries has to go up, just mathematically, nevermind the fact that not every MS/MS spectra or next gen read is as good as the others.

In order to control the false discoveries, the capabilities are in place to control the FDR at the PSM and peptide level. Even cooler, maybe, is this idea:  The decoy matches are kept and allowed to be mapped against the total genomics data, so you can get a good idea of the FDR at the complete, reassembled level!  Total system FDR.

Why would we go to all this trouble?

1) How bout more data about your protein than you'd maybe even want? Check out the suite's sweet output!



And, of course, more explanations for what those weird MS/MS spectra are!

Open access pre-release of paper here!

12 Aralık 2015 Cumartesi

The second version of the OpenMS LFQ nodes are available! Now for PD 2.1!


The label free quan nodes from OpenMS I keep going on about?  Version 2 is now available!  More stable, faster, and works in Proteome Discoverer 2.1.

You can get them here. Once this PC stops looking like this:


I'll install 'em and give 'em a good hard run!

Keep this good code coming, people!

10 Aralık 2015 Perşembe

Find unidentifed differentially regulated reporter biomarkers in reporter ion datasets!


I feel kind of smart for this one, though I'm afraid I'm getting to the point where I really really should get an indoor hobby of some kind since this is most of what I did last weekend. What do you guys do when its too cold to rock climb but you can't snowboard yet?

Anyway. I have access to an amazingly cool set of TMT/iTRAQ samples. I have access because there is a distinct and observable phenotype. Not a little one, either. The hundreds of samples in group 1 and group 2 are extremely different. Proteomics, so far, has shown just about nothing different between the two. Weird, right?  For years we've been suspecting a novel mutational system or PTM that we've just never seen before, but we've not been able to find a way to hunt it down.

So, here was the thought that killed this last weekend: What if I completely ignored the IDs? What if I only looked at the spectra that showed a significant difference at the reporter ion level?  And then I tried to figure out what they were later?

In PD 2.1 + Quan you can do this. There is a tab in your report that is your "Quan spectra".

You can actually go to that and look at every MS/MS spectra. You can see the RAW reporter values and you can even see your quantification spectra zoomed in.

So, you can actually go through and see all the stuff that is different. See the reporter ions above? This is exactly the trend I should be seeing in this sample set based on the phenotype. Exactly. And this MS/MS spectra is the most differentially regulated observation in this entire sample set of 1M or so MS/MS spectra. And this PSM shows up just like this three times in different, overlapping fractions. I think the precursor intensity for this is 1e6-5e6. More importantly, since in PD 2.1 we can plot our reporter ion intensities by their SIGNAL TO NOISE (yay!!!!!!), the S/N of these reporter ions are >500!!!

In sum, this is the perfect biomarker for this experiment and maybe the thing we've been trying to find in one form or another for 5 years (Holy cow, I don't think I'm exaggerating. Its 2015?!?!).  Not to get my hopes up to high or anything....

Where it gets difficult, however, is linking that back to the full fragmentation spectra.

For example, check this out, and I'd LOVE it if you guys had advice. I'm putting in a feature request and will be bugging the great people at PD.Support but I'll take any ideas I can get.


Anything from the Protein/Peptide/PSM and MS/MS spectrum can be checked and exported to .DTA, mgf, or whatever. Then I can do big DeltaM searches in Byonic or DeNovo GUI it or PEAKS it.

But I've got to go through one at a time and find the MS/MS spectrum info to export. Kinda looks like next weekends gonna be a wash if I can't find a shortcut (cause I have about 200 interesting things to look at now that I have NO idea what the fudge they are!)

I suspect I'm looking at a PTM but I don't have anything to match any of our normal suspects. Or...I'm looking at unique class-switch sequences in the variable regions of antibodies!  Either way, there are biomarkers in this dataset that traditional peptide searching can not identify and the dataset is just too big for Byonic WildCard, but here I've vastly reduced (computationally, at least...) the complexity of this problem!  Will I find my biomarkers this way? Who knows, but on some of these hard datasets we need every lead we can get, right?

Again, if you have any advice or thoughts on how I might simplify this, I'd love to hear it!!!

8 Aralık 2015 Salı

Full optimization of a QE HF for TMT quan!

Got a QE HF and wondering how you can best optimize that speedy monster for the best possible TMT 10plex quan? Well, you don't have to do the experiment yourself, cause my buddy Tabiwang (et al.,) already did that for you.

You can check out a description of the method on Accelerating Science here.

And this will directly link you to the poster describing the optimized parameters.

Rumor is that an extensive application note may be in development.

Playing with the OpenMS Proteome Discoverer community nodes!


Hot diggity dog!

I got some samples and got to work playing with the OpenMS PD Community nodes for PD 2.0, which you can get here!  BTW, new and improved nodes are coming for PD 2.1!!!

Here is the processing setup. The LFQ nodes require Sequest and Percolator for now. I looked at my samples and picked a good retention time that made sense. The peaks were real nice so I used a typical retention time of 60 seconds.  I would have used a smaller window with other LFQ software, but this stuff is fast enough that I didn't really care.

Note:  In Spectrum selector "MS Order" MUST say "Any" or it won't work.


These are the settings I used for Consensus. It appears that you only need the two nodes on the right, but I don't see any problems when I use the other nodes. The data may not fully integrate, but it doesn't hurt the output. There is a dramatic difference in speed on my PC when I change the number of cores that the Profiler is allowed to use. If I give it 8 cores this thing is faaaaaassssttttt!!!!

Okay. Boring part over!  How's the data look?


Well, you get these sweet new tabs!  Quantified proteins/ quantified peptides and EVEN BETTER?!?!? Quantified features!!! You get quantification even if you didn't identify stuff.
"Hey, whats that thing that's upregulated 27-fold in the tumor?"  Well, sir/madam, that is your biomarker. Figure out what the heck it is now.

Okay. Sorry for all the scribbles. This isn't my data. This anonymous protein is present in all 12 samples analyzed. The files I put in are labeled in order of their "F" value in PD.  My first file is "Abundance number 1".

I can go into quantified peptides and/or features to see the individual quan values, or I can pop over to the PSM tab and see how the original MS1 intensities look.

Okay. But this is the real test. How do the values compare to the RAW intensity values and XIC areas?


 Really really well. Definitely try out this software.


5 Aralık 2015 Cumartesi

Confirmation of NIA standards for Alzheimer's disease via protein biomarkers


So, I've read 2 biology sciency things today. In both cases, the scientific method was at work (YAY!). Researchers were looking as published results and in the first case (the tardigrade genome I mentioned earlier in the week) striking problems were found with the data.

The second paper is more positive for the studies pre-dating it!  In this paper from Huded et al., some researchers in India decided to test the National Institute on Aging's criteria for Alzheimer's diagnosis and progression.  This requires removal of cerebral spinal fluid (CSF) and testing for a number of known protein biomarkers. Quantification reveals presence and severity of the disease.

Now, there hasn't just been one test on these biomarkers, there have been tons of them. So it would be super weird if they didn't check out properly (or you could blame it on the ELISA assays they were using). But, hey! sometimes you want to verify it yourself, especially if it requires extracting fluid from someone's nervous system!!  On diseases that are this nefarious, every data point is going to help. Lets get early detection and drugs on the market, STAT!

4 Aralık 2015 Cuma

A team at UVA decided to rewrite the textbook on antibody profiling.


This is such a great paper!  AND its Open Access. Several people who occasionally read my ramblings here who need to see this right now are about to get this link emailed directly to them! You're welcome!

The paper is from Lichao Zhang et al., and some guy named Don Hunt was apparently involved which might explain some things about it.

When I visit people who profile antibodies, they are doing 2 things. First they are getting intact masses on the antibody. In big facilities, maybe its a whole group of people figuring out what intact protein masses are there.  The second thing is digesting with trypsin and peptide mapping. Between the two groups they pretty much figure out what they're looking at. Groups that use multiple enzymes get better coverage, but you're looking at a ton of runs.

This approach? Kind of a lower-middle down approach with just enough awesome tweaks to maybe get the whole antibody figured out in one shot!

They start with the whole antibody and then they reduce and alkylate it (more on that in a minute). Then they run it through or over an immobilized enzyme I've never ever heard of, aspergillopepsin I, which instantly cuts the antibody to pieces around 3-9 kDa long.  See? Lower-middle-down! What else would you call it?

What else would you call peptides that are 3-9kDa long? Perfect for ETD!  In this case they used an LTQ Orbitrap Velos with ETD. And these perfectly-sized fragments give off amazing levels of coverage. They process everything with ProsightPC BioMarker search functions.

Okay. Neat, right?  But it gets better.

The digestion occurs with a bioreactor. The antibody goes in and comes out digested...and the reaction quenched. Want bigger fragments? Increase the flowrate. Smaller? Decrease it.

One last thing. They alkylate the cysteines with a new reagent. Its called NAEM


Not only does it alkylate in 10 minutes, but it also puts a positive charge on the cysteines which aids in fragmentation.

How's it work out? Absolutely ridiculous levels of coverage of these huge and hugely important proteins and their PTMs in record instrument time!