Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences

TLDR

RepeatMasker is a widely used computational genomics tool for identifying, classifying, and masking repetitive elements, including low‑complexity sequences and interspersed repeats. The paper presents two Basic Protocols that give detailed instructions for using RepeatMasker via a web interface or a command‑line Unix/Linux system to analyze genomic sequences. RepeatMasker locates repeats by aligning genome sequences to a repeat library such as Repbase, and the authors describe a Basic Protocol for standard use and an Alternate Protocol that employs WU‑BLAST to reduce processing time. The authors report that cross_match, the default alignment program, is time‑consuming for large sequences, and they discuss the software’s advantages, limitations, bugs, and offer guidelines for interpreting results.

Abstract

RepeatMasker is a popular software tool widely used in computational genomics to identify, classify, and mask repetitive elements, including low-complexity sequences and interspersed repeats. RepeatMasker searches for repetitive sequence by aligning the input genome sequence against a library of known repeats, such as Repbase. Here, we describe two Basic Protocols that provide detailed guidelines on how to use RepeatMasker, either via the Web interface or command-line Unix/Linux system, to analyze repetitive elements in genomic sequences. Sequence comparisons in RepeatMasker are usually performed by the alignment program cross_match, which requires significant processing time for larger sequences. An Alternate Protocol describes how to reduce the processing time using an alternative alignment program, such as WU-BLAST. Further, the advantages, limitations, and known bugs of the software are discussed. Finally, guidelines for understanding the results are provided.

References

Page 1

	Year	Citations

Page 1