Publication | Closed Access
Strategies for the effective identification of remotely related sequences in multiple PSSM search approach
12
Citations
18
References
2007
Year
Search OptimizationEngineeringMachine LearningGeneticsGenomicsSequence AlignmentGene RecognitionMultiple ProfilesBioinformatics DatabaseReference SequenceData ScienceData MiningPattern RecognitionMolecular EcologyComputational GenomicsBiostatisticsRelated SequencesA PssmEffective IdentificationSequence AnalysisKnowledge DiscoveryStatistical GeneticsOmicsComputer ScienceFunctional GenomicsBioinformaticsSignal ProcessingComputational BiologyCombinatorial Pattern MatchingSystems BiologyMedicine
Searches using position specific scoring matrices (PSSMs) have been commonly used in remote homology detection procedures such as PSI-BLAST and RPS-BLAST. A PSSM is generated typically using one of the sequences of a family as the reference sequence. In the case of PSI-BLAST searches the reference sequence is same as the query. Recently we have shown that searches against the database of multiple family-profiles, with each one of the members of the family used as a reference sequence, are more effective than searches against the classical database of single family-profiles. Despite relatively a better overall performance when compared with common sequence-profile matching procedures, searches against the multiple family-profiles database result in a few false positives and false negatives. Here we show that profile length and divergence of sequences used in the construction of a PSSM have major influence on the performance of multiple profile based search approach. We also identify that a simple parameter defined by the number of PSSMs corresponding to a family that is hit, for a query, divided by the total number of PSSMs in the family can distinguish effectively the true positives from the false positives in the multiple profiles search approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1