23 Aralık 2015 Çarşamba

What is a PFAM? And how do they deal with all this data?


Personally, I think the biologists and biochemists need to hurry up and annotate the function of every protein from every organism under every biological condition. Until they stop slacking and get that stuff done, we need to use some shortcuts to extract biological data from our peptide spectral matches. Fortunately, smart people have been working on this gap for us.

Gene Ontology (GO) is tricky stuff. If we don't exactly know what a gene does can we infer from its similarity to genes we better understand what the heck it does?

More tricky, and way more biologically relevant? Protein Ontology (PO?)!  One way of getting this data is via PFAM (which you can access here).  I'll be honest. I didn't really know what this is was for a long time. I just knew that it was an option in the Annotation node in Proteome Discoverer.  Cool, I have new column that says that all this stuff that is upregulated shares a PFAM ID (actually, I made that part up. Its never that easy, is it?)

Turns out that the people making PFAM are working really hard making this data:
1) More accurate
2) More relevant
3) More current

As you can imagine, all of this is hard, but...

(holy cow)...

Can you imagine what the 3rd one is like these days?

The amount of sequencing information in databases is increasing EXPONENTIALLY and the current tools for creating PFAM information increases at a linear rate. It doesn't take a stolen GoogleImage to show that this is a problem, but...I'm nervously waiting for an important phone call...so...


So, what do we do about it? Well, Robert Finn et al., say in this new OpenAccess paper, we fix the algorithms to deal with this glut of data. So they did.

When I clicked on this link in Twitter this morning, I honestly expected a dense paper that I probably would hardly be able to read and would likely not understand at all. I was pleasantly surprised to find that this team can seriously write and that I not only learned a lot about how PFAM works, but I also (think) I got a good understanding of their challenges and how their new algorithms power through in dealing with them. Solid and interesting paper that makes me want to add this column to all of my processed data from now on!



22 Aralık 2015 Salı

Updated guide to connecting your NanoLC-MS!


Got a Thermo nanoLC? Wanna connect it to a Thermo mass spec? Want every frickin' part number and easy to follow diagrams?

TAAADAAA!!! This link will lead you to a new and updated version of the nanoLC connection guide. It is at PlanetOrbitrap so you might need to log in and then re-click the link to get directly in.

21 Aralık 2015 Pazartesi

PTMs in centromeres!


I had to dig deep in my brain and then finally just look at Google Images to remember what a centromere is and why its important.  Hopefully the nice sketch I found above clarifies it for you as well. Cause its the protein that holds chromosomes together. Its gonna be deeply involved in cell/chromosome division, sexual reproduction and probably all sorts of other things.

In this new paper from Aaron Bailey et al., in press (and currently open access) at MCP this group looked at the post-translational modifications that can show up on these important proteins.

They started with a HeLa cell line that had a stable affinity tag at some centromere and then immunoprecipitated to get at their proteins of interest. Chemicals were used to arrest the cells in certain stages of mitosis or something. Multiple enzymes, including LysC and AspN were used to get big chunks of the cleaned up protein for effective PTM identification and localization.

What did they find?


20 Aralık 2015 Pazar

BetterExplained -- a great site for math concepts


I seem to have forgotten all the little that I ever knew about Math. This site, BetterExplained, uses clever examples to either teach or remind you of what a match concept is.

16 Aralık 2015 Çarşamba

Open Genomics Engine


Sorry, this is something I just stumbled on that I didn't want to forget about!  I lost the password to my EverNote account...but it does look super cool, right? If you're into that weird DRNA sequencing stuff, that is...

14 Aralık 2015 Pazartesi

Use protein solubility to get around protein abundance issues in biofluids?


For biofluids, one of the biggest problems is the high abundance junk. "Junk" probably isn't the right word since evolution probably wouldn't have erred toward filling our fluids with albumin if it wasn't important, but...you know what I mean....

In an interesting take to this problem, Bollineni et al., tried a protein solubility approach. Rather than specifically depleting the most abundant proteins using an immuno-affinity approach, they used different concentrations of ammonium sulfate to precipitate or solubilize different populations of plasma proteins. This gave them a less directly biased way of fractionating out the high abundance things.

To my friends out there who are in the "do not deplete!" camp, sure, you're probably going to run into the same problems, like the fraction that has albumin will pull down tons of interesting things with it. But for people who will accept this loss in order to see the stuff that isn't at 1e9 copies per uL this might be an simple approach to see something different than what your Top4,10, or 14 depletion column is giving you.


13 Aralık 2015 Pazar

proBAMsuite! Great new proteogenomics tools!


Man, I love a software package with a catchy title. And I love a free software package that has a ton of promise!  proBAMsuite has all of these things!

Is a set of R tools that are meant to help you integrate the data from your next gen sequencing files with your LC-MS/MS spectra. This is an overview of the steps involved.


Of course, the process isn't trivial. The RNAseq data needs to be lined up and QC'ed and so do the MS/MS spectra and the PSMs and the Peptide matches. When we're looking at millions of measurements the number of false discoveries has to go up, just mathematically, nevermind the fact that not every MS/MS spectra or next gen read is as good as the others.

In order to control the false discoveries, the capabilities are in place to control the FDR at the PSM and peptide level. Even cooler, maybe, is this idea:  The decoy matches are kept and allowed to be mapped against the total genomics data, so you can get a good idea of the FDR at the complete, reassembled level!  Total system FDR.

Why would we go to all this trouble?

1) How bout more data about your protein than you'd maybe even want? Check out the suite's sweet output!



And, of course, more explanations for what those weird MS/MS spectra are!

Open access pre-release of paper here!