Hopkins Biology Schleif

Foundations and Applications of Molecular Biology Homework

Assigned Sept. 6, due Sept. 13

1. Suppose a protein solution at 200 mg/ml consists entirely of one species of 50,000 molecular weight. What is the average distance between edges of adjacent protein molecules?

[1] Partial specific volume of protein = 0.73 mL/g (or about 1.33g/mL)

[2] 0.2g/mL * 1ml/g * 1.33mL = 0.15mL (total volume of protein in 1 mL)

[3] 0.2g/mL * 1mol/(5*104g) * 6.02*1023 molecules/mol = 2.4*1018 molecules

[4] 0.15/(2.4*1018) = 6.23*10-20mL/molecule (volume occupied by each protein)

[5] 1mL/(2.4*1018molecules = 4.15*10-19mL/molecule (space "cleared out" by each protein)

[6] 3√(4.15*10-19) = 7.46*10-7cm = 7.46*10-9m (distance between centers of adjacent proteins assuming cubic volumes for [5] – if you used e.g. spherical volumes, this could be slightly different)

[7] 2r = 2 * 2.45*10-7 = 4.9*10-7cm = 4.9*10-9m

[8] 7.46*10-9 - 4.9*10-9 = 2.56*10-9m

2. For a population growing in which the numbers in the n+1 generation are related to the previous generation by N(n+1) = rN(n)x(1-N(n)). (Maximum N is 1.0). Explore graphs of N from generation 0 to 500 and from generation 450 to 500 as a function of r. As a starting value for N(0), use 0.01. Show a couple graphs at interesting values of r, one of which is r near 3.58 (not whatever value I mentioned in class), and comment on the behavior.

Between r=0 and r=1.0101010101 (yes, that is the actual number), the trend follows an exponential decline. Above this value, until about r=2.2, there is a more sigmoidal curve, reaching a max of 0.5 around generation 9. Starting at this point, there is something of an overshoot, where the exponential increase hits a peak, and then falls back to an equilibrium. This equilibrium becomes unstable at higher values, causing an oscillation between two values, that results in what looks like two converging curves until a value of about r=3.06 at which point the oscillating values begin to diverge over time. Interestingly, this meta trend ("splitting" curves as r increases) repeats around r=3.41, and the two equilibria become four. This fractalizing results in what appears to be a chaotic pattern beginning around r=3.58. (No, your response did not have to be anywhere near this detailed, but I just thought this was all interesting)


Assigned Sept. 11, due Sept. 18

1. Problem 2.5 from the text.

If the RNA transcript cannot wind around the DNA, then the polymerase cannot rotate as it progresses, and the DNA ahead of the polymerase retains the same number of twists in a shorter and shorter length of DNA. Thus, the DNA becomes overwound, which is the state of positively supercoiled DNA. The opposite happens behind the polymerase.

2. Problem 2.18 from the text.

By reference to a figure of DNA showing the carbon atoms of the ribose, it can be seen that the 5' end of the bottom strand of a major groove (axis of the the is vertical) is on the bottom of the figure. Therefore, the image on the left shows the 5'-AAG-3' sequence.


1. One method for determining the approximate number of superhelical turns in a proposed structure is to build it without twists from a ribbon, to pull the ends taut, and to count the twists. Why does this work?

Lk = Tw + Wr. Without twists Tw=0, so Lk = Wr. i.e. all the superhelical turns in the structure are due to writhe (Wr). Pulling the ends taut does not change the linking number (Lk), but reduces Wr to zero, so now Lk = Tw, and it is easy to count twists.

2. With next generation sequencing, it is reasonable in an experiment to obtain the sequences of 500,000 fragments of length about 100 base pairs. How, using only DNA sequencing could you determine the location of the origin of replication of a bacterium like E. coli.? (Note that recovery of sequences in next gen sequencing is, itself, sequence-dependant)

In principle, counting the numbers of reads for each gene would identify the origin because it would have the highest copy number (more young individuals in a growing population than old individuals). Because however, the efficiency of generating reads is somewhat sequence dependent, naive counting cannot be used. Therefore, use the fact that the slower a population is growing, the less the difference between numbers of youngsters and oldsters. You could merely sequence DNA from cells with a very slow growth rate and from cells with a very fast growth rate, using the slow rate data to normalize the fast rate data. (A sequence at the origin will have the highest ratio of (No. reads of a sequence from fast cells)/(No. of reads of the sequence from slow cells).)


Assigned Sept. 18, due Sept. 25

1. What is the mechanism by which bacterial RNA polymerases melts the -10 region of the promoter?

Upon binding to a double-stranded promoter region, RNA polymerase then grabs bases of the non-template strand as they melt out of ds structure. These are held by aromatic ring stacking with three tryptophan residues of the sigma subunit.

2. RNA polymerases do not proceed at a uniform rate down any particular gene, in addition to random effects that vary from one polymerase to the next, but average out, there are sequence-specific effects that affect all the polymerases as they transcribe across specific sequences. Describe (devise or invent on your own in preference to scouring the literature to find an answer) a method for identifying in vivo, those regions in a gene at which elongation is fast or slow. If the same regions do not show up when doing in vitro transcription, what might the difference between the in vivo and in vitro results mean?

One way would be to halt transcription using a drug like actinomycin D, and determining the locations of polymerases. After isolating the polymerase-studded DNA, the non-protected DNA could be digested away, leaving only the polymerase footprints. After sequencing, these can be aligned to the genome to determine locations of the polymerases; closely-spaced series of polymerases will correspond to areas transcribing more slowly. Differences between in vivo and in vitro results would suggest the actions of additional factors or proteins present in the cell modulating transcription speed.


Assigned Sept. 20, due Sept. 27

1. Predict the physiological and biochemical consequences of covalently attaching sigma-70 to the core of RNAP in such a way that the sigma and of the core of RNAP can both function normally?

Sigma-70 is a common sigma factor used primarily for the expression of "housekeeping" genes. Covalently attaching this sigma factor would ensure that it was the only one used because its local concentration near the core RNAP would be much higher than the other sigma factors that diffuse through the cytoplasm, and thus, only the "standard" genes would be expressed. Most notably, this would prohibit the expression of stress-response genes that use the various other sigma factors, so the cells would be able to grow under normal conditions, but would abruptly die in any other (i.e. heat-shock, high/low pH, high/low salt, etc.)

2. How could you determine the average physical half-life of mRNA in bacterial cells from data obtained by simultaneously adding radioactive uridine and rifamycin?

Rifamycin halts transcription initiation, but not elongation, so after its addition, only RNA already elongating will incorporate any radioactivity. Ribosomal RNA will be completed and will be stable while radioactivity in messenger RNA will at first increase, and then will decay. Hence, the physical half-life of mRNA can be determined from the decaying portion of the curve of radioactivity in RNA. Thus, a time course of the radioactivity present in RNA will at first climb, as elongating transcripts are finished; during this period, there is a tug-of-war between the (in this experiment) inseparable rates of elongation and degradation. However, after the radioactivity level peaks, we can measure mRNA half-life directly by looking at the corresponding decline

Assigned Sept. 25, due Oct. 2

1. How can a nucleotide in natural RNA possess three phosphodiester bonds?

In addition to the two backbone bonds (at 3' and 5'), RNA can be phosphorylated on its 2' hydroxyl to establish a phosphodiester bond. This in fact occurs at the branch point adenosine during lariat formation in RNA splicing.

2. Why is self-cutting and splicing important to the viability of a virusoid?

A virusoid is composed of circular RNA, but encodes no proteins. In order to replicate, it must undergo rolling circle replication, which requires the RNA to be cut, and this must be accomplished without the aid of proteins.

Assigned Sept. 27, due Oct. 9

1. What is the context of Cech's discovery of self-splicing RNA? (What was he trying to do, what did he find, how did he proceed?)

Cech was trying to investigate splicing of Tetrahymena ribosomal RNA in vitro. He noticed that his control, which was purified away from any protein, was already spliced. Thinking this was the result of contamination, he tried more careful purification, though the splicing still occurred. Ultimately, he expressed the RNA off of a plasmid in E. coli, which should contain absolutely no splicing machinery, and still saw this occurring, demonstrating that this was not an artifact, but actually self-splicing RNA.

2. Either devise on your own, or find in the literature, a mechanism for obligatory (taking only one from a group of introns as in Dscam) alternative (taking only one from a group of introns as in Dscam) splicing and describe the basic idea.

One possible answer: consider a protein that binds to/near exons at a slow rate, and recruits a second protein to adjacent exons at a much higher rate, targeting them for being spliced out, leaving only the sequence protected by the first protein.

Assigned Oct. 2, due Oct. 11

1. See "Strong Intranucleoid Intereaction Organize the Escherichia coli Chromosome into a Nucleoid Filament", Proc. Natl. Acad. Sci. USA 107, 4991-5 (2010). What would a DNA contact map of the E. coli look like?

Areas near the origin would contact only other areas near the origin; the middle areas, only other middle areas. Thus, it would look like a (comparatively) thin diagonal line, with sharper cutoffs, and no long-range contacts.

2. OK, so the gross structure of DNA in the nucleus is known and 10,000 loops associated with enhancers and chromosomal domains has been documented, what are some other important questions that can be answered with Hi-C approaches?

See the recent publications from Erez Lieberman's lab.

Assigned Oct. 9, due Oct. 16

1 and 2. Use the program PyMol and the appropriate protein data bank files to examine the structural changes to the dimerization domain of AraC caused by the binding of arabinose. Examine both changes within a subunit and the disposition of one subunit with respect to the other. Briefly describe the steps you used, what you found, and your conclusions as to the effect of arabinose binding on the protein.

Perhaps the most obvious change is that of the N-terminal 20 or so residues. In the apo form, this arm is arranged away from the core (the first few residues are disordered and do not appear on the crystal structures); in the holo form, however, this region is folded over the sugar-binding pocket. There are some changes to this beta barrel as well, in particular residues 53-55, which transition slightly out of the beta-sheet conformation. Additionally, the coiled-coil of the dimerization interface changes slightly in orientation, leading to an overall rotation of the two subunits of the dimer relative to one another. To address this problem satisfactorily, it was necessary to use the rcsb (pdb) file containing more than a single subunit.

Assigned Oct. 11, due Oct. 18

1. How do you expect ssrA RNA enters the ribosome A site? How would you definitively show this?

Since ordinary charged tRNA's are carried into the ribosome bound to elongation factor EF1, it seems likely that ssrA also is carried by EF1. Test by constructing an in vitro translation system with a truncated mRNA. Does ssrA bind to the ribosome only in the presence of EF1?

2. Which aminoacyl-tRNA synthetases would you expect to possess editing capabilities to reduce misacylation?

Glycine-alanine, valine-isoleucine, cysteine-serine, serine-threonine, phenylalanine-tyrosine. On shape along, one might expect glutamic-glutamine and aspartic-asparagine, but the charged carboxyl group probably allows enough discrimination.

Assigned Oct. 16, due Oct. 23

1. and 2. Continuation of the Poisson simulation from class.

At this point you should have a column of 21 numbers, 0-20, and a column alongside this giving the number of the instances, n, (cultures) where the sum of the 1's equalled the number alongside. Now make a column that gives the probability that the number showed up. From this make a column which you can then sum to obtain the average for your set of 100 numbers (of resistant mutants). Now make a column whose sum will give you the variance of your set of 100 numbers. As you vary the probability of being mutant from 0.05 to 0.2 or so, look at the relationship of the mean to the variance. This is a very important property of Poisson distributions.

Do the following calculations in clean rows and/or columns so you retain the ability to see the mean and variance of the numbers of mutants in the 100 cultures. Now, multiply the numbers of mutants in each of the 100 cultures by four (two doublings). With this set of numbers calculate the mean and variance. As you play with the probability of making a mutant and observe the mean and variance of the cells and of the "grown" cells, you will see the fundamental basis of the Luria-Delbruck fluctuation test.

You should see that the peak of the curve of the numbers of resistant colonies rather closely tracks with the expected number which is mutation frequency x 100. The variance should be close to the mean. When the "cultures are allowed to double a few times, say 10 x growth, the mean will be increased by ten, but the variance by 100. Thus, if mutations occur before exposure to phage, variance will be greater than the mean.

Assigned Oct. 18, due Oct. 25

1. If in a large population, the frequence of A/A and a/A individuals is each 0.5, and if the selective pressure killing off the a/a individuals is removed, how many generations will it take for the distribution of genotypes to reach equilibrium percentages of A/A, a/A, and a/a individuals, and what are the percentages of the three genotypes?

As shown in class, application of the classical chromosome mixing matrix shows equilibrium is reached in the next generation and that the ratios are A/A-16, A/a-9, aa-1.

2. a. Show how 24 oligonucleotides could be annealed together to form a three dimensional cube. b. What is the most complicated structure to date that has been formed by annealing pieces of DNA? When you have found the most recent and most complex structure formed, note the rather surprising annealing conditions.

For each of the eight corners of the cube, generate three oligonucleotides, spanning about half of two edges. Make each end also possess homology to the end of the corresponding oligo for the other corner on that edge and face. Some amazing structures are now being built. Equally amazing is the very long times, weeks sometimes, that the DNA must be annealed

Assigned Oct. 23, due Oct. 30

1. a. Drawing of cube

b. Bridge PCR is a method for generated localized areas containing perhaps 106 molecules of one sequence on a glass plate. Most of the images of bridge PCR that you get from a Google search are incorrect. Pick one such depiction and show why it is wrong, and then generate a correct drawing(s).

Many illustrations show only one type of primer attached to the surface of the flow cell or clusters of sequences all in the same orientation. In order for bridge amplification to occur, properly, both the (different) 5' and 3 ' primers must be present.

2. From the offerings of a company that synthesizes DNA oligonucleotides pick a special nucleotide that can be introduced during DNA synthesis that will subsequently allow crosslinking to a protein bound very near the modification, and perhaps allow subsequent identification of the protein.

Bromodeoxyuridine (BrdU) is a thymidine analog that can be crosslinked to proteins through exposure to UV light

Assigned Oct. 25, due Nov. 6

1. The construction of a gene for a TALEN protein requires addressing a technical difficulty, and the binding of a TALEN protein to DNA raises an interesting question. What are they?

The multiple repeats of the 34 amino acid unit make it hard to assemble the corresponding gene. Given the length of the target DNA sequence (around 36 basepairs), TALENS must physically wrap around the DNA molecule multiple times, raising the question of just how this can be accomplished efficiently

2. How the heck can CRISPR find its target so quickly considering the fact that only very rarely is a stretch of double-stranded DNA spontaneously melted to expose a single strand to the CRISPR?

Cas9 first scans for the PAM sequence in ds DNA, then waits for melting of the recognition region, but aborts when the unzipping encounters an incorrect base. Nonetheless, it is still mysterious how cas9 can find target sequences so quickly.

Assigned Oct. 30, due Nov. 8

1. The namesake problem. From the spelling of your name, birthplace, date of birth, remove the letters not used as single letter abbreviations of the amino acids. Then use Blast at the NLM to find your namesake protein. Comment on the degree of homology

2. Does the Needleman-Wunsch global alignment method give the same alignment if you apply the same principle starting from the N-terminus?

Yes, try it and see. This is not a surprise because the Needleman-Wunsch method was advertised as providing the best global alignment, and thus it should not matter which end you start from.

Assigned Nov. 6, due Nov. 13

1. Problem 11.7 from text.

2. What would you conclude if a small fraction of AraC minus mutations could be suppressed by compensating mutations in RNA polymerase, but that no mutations exist in AraC that can be currected by mutations in CRP protein?

Assigned Nov. 8, due Nov. 15

1. and 2. What mechanism does the structure of ToxT, a relative of AraC, at first suggest for the mechanism by which AraC shifts from DNA looping in the absence of arabinose to binding to adjacent DNA half-sites in the presence of arabinose? Is this mechanism compatible or incompatible with the structure or AraC or with any other information you known or have read about AraC? Tell why.

Assigned Nov. 13, due Nov. 27

1. How would you isolate nonsense mutations in the S gene of phage lambda?

2. Messenger initiating from prm begins with the translation initiation codon AUG of repressor. Why does this absence of a ribosome binding site minimize fluctuations in the level of repressor in the cell?