Phage procapsids - how do they work?

Been thinking about bacteriophage capsid assembly a bit today. For all we know about phages, and other double-stranded DNA viruses for that matter, a few steps in their assembly remain difficult to explain.

Some of my notes on a lecture from 2011. "Escape" probably isn't the right word, and that last word should be "scaffold", but you get the idea.

Some of my notes on a lecture from 2011. "Escape" probably isn't the right word, and that last word should be "scaffold", but you get the idea.

Here's one such question: when phage scaffold proteins undergo proteolysis during capsid maturation, what happens to all the resulting peptides? As far as I know, the scaffold has to get processed at some point during phage particle maturation (see Medina et al. 2010, cited below, for the lambda example) but that capsid is a cramped environment without many exit routes. Do the scaffold bits just leak out through the capsid somewhere? (Perhaps more importantly, what about situations requiring a distinct prohead protease? Where does that enzyme go once it's done all its protein processing?)

There are some recent modeling-based studies of phage capsid maturation (e.g., Jiang et al. 2015) but they still don't seem to account for proteolysis, just conformation at particular maturation stages.

Has someone already solved this mystery? Is it really a mystery in the first place, or am I just missing something?

* Medina, Elizabeth, Doug Wieczorek, Eva Margarita Medina, Qin Yang, Michael Feiss, and Carlos Enrique Catalano. 2010. “Assembly and Maturation of the Bacteriophage Lambda Procapsid: GpC Is the Viral Protease.” Journal of Molecular Biology 401 (5). Elsevier Ltd: 813–30. doi:10.1016/j.jmb.2010.06.060.

* Jiang, Jiajian, Jing Yang, Yuriy V Sereda, and Peter J Ortoleva. 2015. “Early Stage P22 Viral Capsid Self-Assembly Mediated by Scaffolding Protein: Atom-Resolved Model and Molecular Dynamics Simulation.” The Journal of Physical Chemistry. B 119 (16). American Chemical Society: 5156–62. doi:10.1021/acs.jpcb.5b00303.

Automating your listening experience

My randomly-selected music recommendation system is now fully automated! That is, it's now a Twitter bot named Music Suggestron.  That's how services get automated these days, right?

Music Suggestron uses Spotify for almost every link at the moment but I'll try to get it to spit out links to other sites soon. There's a decent-looking Python wrapper for the Soundcloud API. For now, though, it's a continuing list of tracks and artists you may have never considered listening to.

The power of two or many more

I'll admit it. I didn't know what ohnologs were until today. That's what I get for preferring microbiology to the study of more complex organisms.

Oh, what's an ohnolog? Let's start with an ortholog: that's a gene sequence, seen in more than one species, but at one point in time it existed in a single form in the ancestor to those species. Orthologs shouldn't be confused with other similarly-named terms like homologs or paralogs, though even seasoned evolutionary biologists make the mistake every so often. No, the words "homolog" and "paralog" aren't orthologs unless their ancestral form is just "log". Don't worry about it too much. 

Ohnologs are the result of whole genome duplication. They're named for Japanese-American evolutionary biologist Susumu Ohno,* who is responsible for the idea that our vertebrate ancestors (going back about 500 million years, so not just our ancestors but those of every vertebrate on Earth) experienced two rounds of whole genome duplication. Any gene duplication event creates opportunities for the new copies to diverge in sequence and function, so imagine what could happen when a full genome undergoes duplication.

Copies of copies. Image from Imartin6 on Wikimedia Commons, along with more of an explanation than I'm likely to provide here. Suffice to say that each square is a gene and that duplication is a source of change among gene sequences.

Copies of copies. Image from Imartin6 on Wikimedia Commons, along with more of an explanation than I'm likely to provide here. Suffice to say that each square is a gene and that duplication is a source of change among gene sequences.

The Singh lab at the Institut Curie maintains a database of ohnologs called OHNOLOGS, perhaps as the result of a terminology duplication event. They now have a paper in PLOS Computational Biology about the subject and their work as well (cited below). It's worth reading if you'd like to know why orthologs are worth studying (spoiler: many of them have functions sensitive to cancer-causing disruption) or if you'd like to see an example of a six-way Venn diagram.

Citation: Singh PP, Arora J, Isambert H (2015) Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes. PLoS Comput Biol 11(7): e1004394. doi:10.1371/journal.pcbi.1004394.

*Ohno is also widely credited with coining the unfortunate and inaccurate term "junk DNA", though the term itself may have predated him and may have meant many different things.**

** Dan Graur is angry about many things and this is one of them.

Data in plates and innovations in breakfast

Here's a useful tool I found on reddit this morning - an R package called phenoScreen.

It's set up to make working with lab data from 96 or 384-well plates easier to work with and visualize. I'm not really sure how most people produced plate maps otherwise without spending hours reinventing wheels.

On that note, isn't there anything else we could start reinventing? The wheel has been around for at least six millenia. Perhaps we could start reinventing toasters.

A very brief guide to converting E. coli gene IDs

E. coli has been a model organism for long enough for many of its genes to go by several different names. This isn't a terrible problem as long as we keep some well-organised databases, but even those databases have their own unique identifiers. The issue is compounded by the fact that genetic loci in different strains of E. coli all get different identifiers as well.

Please avoid using common names when creating lists of E. coli genes! They're easy to remember but don't make for consistent, unique database identifiers, as this example shows - at least nine different common names could be used.

Many names for the same thing.

Many names for the same thing.

Here are a few easy, E. coli-specific ways to handle and convert nearly any gene ID you may find.

  • ecoli.txt - This is an actively-maintained list hosted by Uniprot. It lists ordered locus IDs (b codes and JW codes) and their corresponding accession numbers for Swiss-Prot, Uniprot, and EcoGene, along with a few common names.
  • Uniprot's ID mapping tool - Useful for converting Uniprot IDs to NCBI Gene IDs and vice-versa. The other conversions can be hit-or-miss, especially with databases like BioCyc.
  • EcoGene mapping tool - Useful for converting EcoGene IDs (they start with EG, but don't confuse them with an EchoBase ID as they start with EB) to other identifiers.
  • PIR ID mapping tool - Yes, another ID mapping tool, including some common E. coli databases like EcoGene.