Concepedia

Abstract

Clustered regularly interspaced short palindromic repeats (CRISPRs) are DNA direct repeats present in many prokaryotic genomes.1-4 They are composed of 24–47 bp direct repeats exhibiting weak dyad symmetry, and separated by 26–72 bp nonrepetitive sequences. Interestingly, some of the spacer sequences are found in foreign replicons such as phages, plasmids, or chromosomes.5-7 CRISPRs are flanked by CRISPR-associated (cas) genes, most of which encode uncharacterized proteins. The CRISPR systems (CRISPRs and Cas proteins) are classified into several subtypes, and each subtype contains several different subtype-specific cas genes.8 The amino acid sequences of several Cas family members are similar to those of nucleases, helicases, RNA- and DNA-binding proteins, or transcription factors. On the basis of their properties, CRISPR systems have been hypothesized to be DNA repair ones9 or prokaryotic host defense ones against invading foreign replicons.5-7, 10 It was experimentally shown that CRISPR systems are involved in resistance against phages.11-14 Recently, it was shown that one of the Cas proteins, Cas2, comprises a novel family of endoribonucleases, with a double-split β-α-β fold, and cleaves single-stranded RNAs preferentially within U-rich regions.15 Furthermore, a complex of Cas proteins comprising repeat-associated mysterious protein (RAMP) superfamily proteins, Cse1, Cse2, Cse3, Cse4, and Cse5e, cleaves a CRISPR RNA precursor in each repeat and retains the cleavage products containing the virus-derived sequence.14 Cse3, which is essential for this cleavage, adopts a ferredoxin-like fold, and its overall structure is similar to those of many RNA-binding proteins.10, 16 Cse2 adopts a novel fold with large continuous basic patches on one side of its surface.17 An extremely thermophilic bacterium, Thermus thermophilus HB8,18 has ‘Ecoli subtype’- and ‘Mtube subtype’-like CRISPR systems8 on megaplasmid pTT27, and expression of the cas genes is positively regulated by cyclic AMP receptor protein (CRP).19 This strain has another operon, TTHB160-161-162-163-164-165, encoding RAMP superfamily proteins on pTT27, which is not present in the vicinity of CRISPRs unlike other cas genes (NCBI accession no. NC_006462). Transcription of this operon was significantly decreased in a crp-deficient strain although the CRP-dependent promoter was not found upstream of the operon.19 In the present study, we determined the crystal structure of one of the RAMP superfamily proteins, T. thermophilus Cmr5 (TTHB164), comprising a novel homotrimer. CRISPR, clustered regularly interspaced short palindromic repeats; CRP, cyclic AMP receptor protein; RAMP, repeat-associated mysterious protein. The open reading frame of T. thermophilus Cmr5 was cloned into the pET-11a expression vector (NdeI-BamHI site) (Novagen). A selenomethionine-substituted protein was produced in the E. coli methionine auxotroph B834(DE3) strain (Novagen). The cell lysate was heated at 70°C for 10 min. Then the soluble fraction was applied to a SuperQ Toyopearl column (Tosoh Corp.) that had been preequilibrated with 20 mM Tris-HCl (pH 8.0), and the flowthrough fraction was collected. This fraction was applied to a Resource S column (GE Healthcare Biosciences) that had been preequilibrated with 20 mM MES buffer (pH 6.0), and the flowthrough fraction was collected. This fraction was applied to a HiTrap heparin column (GE Healthcare Biosciences) that had been preequilibrated with 20 mM MES buffer (pH 6.0) and was eluted with a linear gradient of 0–2M NaCl. The target fractions were collected and applied to a HiLoad 16/60 Superdex 75pg column (GE Healthcare Biosciences) that had been preequilibrated with 20 mM MES buffer (pH 6.0) containing 0.5M NaCl. The purified protein was concentrated to 9.3 mg/mL using a Centriprep concentrator (3 kDa molecular-weight cutoff, Millipore), and dithiothreitol was added to the sample to a final concentration of 1 mM. The molecular mass of the purified protein determined on gel filtration column chromatography on a HiLoad 16/60 Superdex 75pg column (GE Healthcare Biosciences) was 36 kDa, and that determined on light scattering photometry (DynaPro-801 detector, Protein Solutions) was 40 kDa, suggesting that it exists as a trimer in solution. Crystallization of the Cmr5 protein was performed by the sitting drop vapor diffusion method by mixing 0.6 μL of a protein solution with an equal volume of a reservoir solution comprising 0.1M sodium acetate (pH 4.6), 1.7M ammonium sulfate, and 30 mM dithiothreitol at 20°C. Single-wavelength anomalous dispersion data were collected with the RIKEN Structural Genomics Beamline II (BL26B2) at SPring-8 (Hyogo, Japan) using the anomalous scattering from Se atoms. A data set was collected at 2.05 Å resolution using a Mar Mosaic225 CCD detector (Rayonix, LLC). The collected data were processed with the HKL2000 program suite.20 The positions of 10 of 12 possible Se atoms in the asymmetric unit of the crystal were determined with program SOLVE,21 and then density modification was performed with program RESOLVE.22 The automatic-tracing procedure in program ARP/wARP23 was used to build the initial model. The model refinement, initial picking, and manual verifying of water molecules were performed using programs CNS and XtalView/Xfit.24, 25 Seven electron densities that probably correspond to sulfate ions were detected in the asymmetric unit. Electron densities corresponding to multiple conformers could not be detected. According to PROCHECK in the CCP4 suite,26 99.0% of the residues in the final model are in the most favored region of a Ramachandran plot, with no residues in disallowed regions. Data collection statistics and processed data statistics are presented in Table I. The coordinates are available in the Protein Data Bank, under accession code 2ZOP. The T. thermophilus Cmr5 monomer is composed of 114 amino acid residues whose molecular mass is calculated to be 13 kDa. A BLAST search revealed that homologous proteins that are most closely related to this protein are TT_P0119 from T. thermophilus HB27 (5e-56), there only being one amino acid substitution (D65H), TaqDRAFT_0013 from Thermus aquaticus Y51MC23 (3e-36), Haur_0575 from Herpetosiphon aurantiacus ATCC 23779 (9e-19), PTH_1924 from Pelotomaculum thermopropionicum SI (4e-18), and Daud_1819 from Candidatus desulforudis audaxviator MP104C (4e-16) [Fig. 1(A)]. (A) Sequence alignment of T. thermophilus Cmr5 (ttCmr5) with representative homologous proteins; TaqDRAFT_0013 from T. aquaticus Y51MC23; Haur_0575, Cmr5 family protein from H. aurantiacus ATCC 23779; PTH_1924, hypothetical protein from P. thermopropionicum SI; and Daud_1819, Cmr5 family protein from C. desulforudis audaxviator MP104C. Strictly conserved and similar residues are boxed in red and represented by red letters, respectively. Basic residues are indicated by asterisks. The sequences were aligned using ClustalW2.28 The secondary structure was generated with ESpript 2.2.29 (B) Ribbon diagram of the T. thermophilus Cmr5 trimer; chains A (red), B (blue), and C (green). Sulfate ions identified in the structure are not shown. (C) Molecular surface representation of the T. thermophilus Cmr5 trimer (chains ABC). Red and blue surfaces represent negative and positive electrostatic potentials (−5 kBT, +5 kBT), respectively. The electrostatic potentials were calculated using the Adaptive Poisson-Boltzmann Solver (APBS)30 with PyMol APBS tools. (D) Superpositioning of the main chain structures of T. thermophilus Cmr5 (red) and AF1862 from A. fulgidus DSM 4304 (PDB code: 2OEB) (blue). B–D were generated with program PyMol which is available at: http://pymol.sourceforge.net/. The three-dimensional crystal structure of T. thermophilus Cmr5 was determined at a resolution of 2.1 Å, with crystallographic Rwork and Rfree factors of 23.5 and 24.9%, respectively (Table I). The asymmetric unit of the crystal comprised two trimers of Cmr5 (trimer 1, chains ABC [Fig. 1(B)]; trimer 2, chains DEF), which are related through a noncrystallographic pseudo-symmetric two-fold axis. The structures of the six monomers are similar, as shown by the r.m.s.d. values of 0.30–0.45 Å for the corresponding Cα atoms. The Cα r.m.s.d. values for the two trimers are 0.75–1.22 Å. The two trimers exhibit the highest correlation when they are related through a pseudo-two-fold axis. There are disordered regions, that is, residue 114 in chain A, residues 113 and 114 in chain B, residues 1–3, 54, 113, and 114 in chain C, residues 1–5, 55, and 114 in chain D, residues 1, 56, and 114 in chain E, and residues 1, 55, and 114 in chain F, which are not included in the model. The Cmr5 monomer consists of six α-helices (chain A: α1, residues 4–19; α2, residues 25–42; α3, residues 44–52; α4, residues 57–69; α5, residues 75–84; and α6, residues 87–111) [Fig. 1(A,B)]. Hydrophobic cores are formed at the trimer interfaces, which are composed of L38, A42, A47, and F50 of one chain, and L87, L88, and L91 of the other chain. The interaction surface areas at the interfaces of chains AB and AC, calculated with the AreaIMol program,26 are 224 and 280 Å2, which account for 3.4 and 4.2% of the total surface area of chain A. The T. thermophilus Cmr5 structure was compared with previously determined structures in the PDB database, using the secondary-structure matching server.27 The closest structure was that of hypothetical protein AF1862, a Cmr5 family protein from Archaeoglobus fulgidus DSM 4304 (PDB code: 2OEB), the root mean square deviation (r.m.s.d.) being 1.59, and the Q-, P-, and Z-scores being 0.53, 3.4, and 5.3 (sequence identity = 28%) [Fig. 1(D)]; however, the AF1862 protein is a monomer unlike T. thermophilus Cmr5. T. thermophilus Cmr5 has a high theoretical isoelectric point of approximately 9.8, and has large continuous basic patches on one side of its surface, and acidic patches on the other side [Fig. 1(C)]. The amino acid residues comprising the basic and acidic patches are conserved in the four closest homologs to T. thermophilus Cmr5 [Fig. 1(A)]. Furthermore, this structural feature is conserved in AF1862, although its basic patches are smaller than those of T. thermophilus Cmr5. RAMP has been predicted to be a RNA-binding protein in a putative host defense system,10 and it was experimentally shown that a complex of RAMP superfamily proteins cleaves RNA.14 It might be that Cmr5 interacts with RNA through its basic patches. We thank Masami Nishida for the protein purification, and Yoshiaki Kitamura and Yuka Nonaka for the data collection at SPring-8.

References

YearCitations

Page 1