Publication | Open Access
The Pfam protein families database: towards a more sustainable future
6.3K
Citations
20
References
2015
Year
Structural BioinformaticsGeneticsMolecular BiologySustainable FuturePfam DatabaseGenomicsBioinformatics DatabaseSequence MotifPfam AnnotationsMolecular EcologyProteomicsBiological DatabaseSequence AnalysisOmicsProtein Structure PredictionFunctional GenomicsBioinformaticsProtein BioinformaticsBiologyPfam EntriesNatural SciencesOmics DatasetsComputational BiologySystems BiologyMedicine
The authors aim to incorporate Pfam entries lacking reference proteome matches into the reference proteomes by collaborating with UniProt. Pfam now builds families on UniProtKB reference proteomes, limiting displayed counts to this set and reducing manual curation, while the authors work with UniProt to add remaining entries that lack reference matches. The reorganisation of Pfam, now based on reference proteomes, allows more frequent releases, reduces the number of displayed sequences yet retains model organism coverage, removes Pfam‑B, introduces a new clan‑relationship tool, and in release 29.0 contains 16,295 entries and 559 clans.
In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.
| Year | Citations | |
|---|---|---|
Page 1
Page 1