12 Eylül 2015 Cumartesi

pVIEW: tons of tools, including 15N (n15) quantification


I swear I wrote a blog post on this years ago. Seriously, but it took me forever to find this software and then re-remember how to use it.

pVIEW is a really nice piece of software. It does a ton of different things...including 15N quantification!

It is incredibly user-friendly. If you are using a Thermo Instrument, I highly recommend you download this tool as well:

Whats it do? Well, you click on it and show it a directory. And then, without complaining or without any extra steps it converts you data rapidly and perfectly to mZxmL (not sure I capitalized the right stuff.) Then you can pull your data right into pVIEW.

pVIEW can be downloaded at the Princeton Proteomics and Mass Spec core website here.


10 Eylül 2015 Perşembe

Run complete programs on any system with BioDocker! Wait...what's a BioDocker?


Okay. I'm going to be pretty excited about this whole thing, cause I knew about exactly none of this 20 minutes ago.

It is totally awesome that we have all these talented programmers and bioinformaticians out there writing interesting new code. A problem is that, just like any expert in anything, they start talking their expert-ese and it becomes hard for outsiders to figure out what they are talking about. I take things about proteomics -terms and such, for granted all the time even though I try very hard not to.

This is an acknowledged problem in their field. That they can't reach users cause sometimes users don't know what a Perl thingy is.  Even worse, maybe someone assumes that you have that Perl thing on your PC because they've had it on every PC they've owned in the last 15 years.

An awesome effort is underway and its called Docker. Its generic for everybody, but what I can understand of it is that its a "container" for a program that includes all the requirements for running it. Say you need that Perl thing and some Perl add-in things, then it would be included in the Docker.

A more focused thing for us is BioDocker. Same goal, but specifically for bioinformatics type stuff.  Sounds great, right?!?

BTW, I'm learning all this from Yasset's blog.
Cause you know what? They've already constructed two awesome proteomics BioDockers.  The first is the all-powerful Trans Proteomics Pipeline and the second is the DIA-Umpire!

Is it simple enough that a dummy like me can use it? Actually...I think it might be...not without challenges, but its getting there!

If it isn't 100% what we need/want right now, its a great step in the right direction. Lets get all these awesome tools and put them into an easily digestible format. They get more users which hopefully translates into more grant justifications and more cool algorithms and we get better data!  Win win win!

Video of Dr. Makarov talking about every Orbitrap!


I just stumbled on this video and its pretty sweet. Its Alexander Makarov talking about every Orbitrap -- from the classic all the way to the Lumos and the developments each one went through!  Worth the half-hour for me!

9 Eylül 2015 Çarşamba

Macrophage S1P Chemosensing -- and an interesting way of integrating genomics and proteomics!


All this next gen sequencing data out there!  How do we leverage all of it to our advantage? We can supplement our databases for mutations and we can cross-reference our quan, but this new paper from Nathan Manes et al., out of

In this new paper at MCP from Nathan Manes et al., out of the CNPU these researchers describe a different twist on integrating next gen sequencing data with LC-MS/MS.

The model is also super interesting. The study investigates osteoclasts, the cells that destroy bone. During normal maintenance osteoclasts break down bone where appropriate and osteoblasts rebuild it. This is a tightly controlled process (involving chemotaxis), but one that is only partially understood. Disregulation of this tight process leads to many different diseases, the most common of which is osteoporosis.

The focus of this study is the use of next gen sequencing technology and mass spec to explore that pathway. As a model they have some mouse cells that function like osteoclasts and they can add the right chemotaxic things to activate them. Cool, right?!?

First they started out with the next gen sequencing following all the normal protocols (they did deep sequencing via Hi-Seq) to get a list of transcripts that were differentially regulated in a significant matter.

Then they went a different direction. They used an in-depth literature search to hunt down proteins that have been implicated in these pathways. Some of this info comes from other quantitative proteomics studies and others come from genomics techniques. Why reproduce data that is already out there for free!  Strong protein candidates were filtered and heavy copies of good peptides were made to develop an absolute quantification method for SRM analysis for these targets.

To wrap it all up they took the results from their next-gen and from their absolute LC-MS quan and compared it (it compares strikingly well!) and then they dumped it all into a cool modeling program called Simmune that they developed that you can check out (and download for free) here!

Great, interesting study on an interesting model that uses some really original thinking and tools.

7 Eylül 2015 Pazartesi

File migration...


Hey guys!  The following pages through the blog are currently down but are finding their way to new, awesome, permanent homes:

The Orbitrap Methods database is down completely
The Exactive family cycle time calculators are still available (email me: orsburn@vt.edu and I'll get them to you). My PPT tutorials are also down.
They might be down for a few days. The migration isn't a simple drop and drag but these new solutions should allow the documents to be accessible to more people...permanently...and will be free for me!

All videos are still up...and also migrating to duplicate locations..w00t!


ProtAnnot -- Highlight sequence variants that might explain your weird masses


In higher organisms proteins commonly have a ton of different forms. Splicing events are very happy to take a protein that has multiple functions and cleave out one of them to make a more specific protein. Of course...these cleavages occur at the genetic level and don't follow the same rules as trypsin. To detect these events with proteomics you have two choices -- the first is Top Down and the second is shotgun proteomics with a database that knows about the alternative sequences.

ProtAnnot is a new tool described in this open access paper by Tarun Mall et al., that is an add-in for the Integrated Genome Browser (IGB).  It highlights your alternative proteoforms within a sequence. I especially like the trick it does with data processing. So your normal session of IGB isn't interrupted in any way, if you choose to use ProtAnnot it fires up an extra thread on your server automatically to do its computations.

If you just can't get the masses of your protein to line up or get that last bit of sequence coverage, this tool might be exactly what you need.

6 Eylül 2015 Pazar

GOFDR! Analyzing proteomics data from the gene ontology level


Shotgun proteomics is amazing at identifying peptide spectral matches (PSMs). This is what we get out of the instrument: an MS/MS spectra that we can match to something with high confidence to something in our database.  The tricky part is getting relevant biological data back out. Figuring out exactly what PSM belongs to what peptide and what peptide belongs to which protein is the hard part. Evolution is working against us here -- it is much easier from a biological standpoint to make proteins with new functions from similar protein than it is to make a new one from scratch.

There are some really clever people thinking about other ways of inferring biological data out and I think we'll be hearing about a lot of it soon.  One new (to me!) approach is called GOFDR and its from Qiangtian Gong et al., and is described in this new paper here.

The idea is this: cut out the middlemen. That is, we've got the PSM confidently identified. If it is from a conserved region of a protein why would we bother going all the way through trying to infer which peptide and protein it is from. Chances are if its a PSM that matches multiple different proteins that those proteins are at least similar in their function. Thats the gene ontology part.

Example: This drug leads to upregulation of this peptide that can be linked to one of 60 different actin variants? Who cares what one it is, it sounds like this drug has a cytoskeletal component!

Thats the "GO" part. The "FDR"? its cause thats the level where they want to apply the false discovery rates, at the gene [protein] ontology level.

Is it simple in this form? Not at all. To run this pipeline the data is ran through multiple programs, including PSI-BLAST. At the end they see that they really have to spend time manually adjusting their scores and thresholds. Is it an interesting way to look and to think about our data? Absolutely.