Publication | Closed Access
A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer
30
Citations
13
References
2009
Year
Unknown Venue
EngineeringHardware AlgorithmComputer ArchitectureSpeech RecognitionNatural Language ProcessingHardware SecurityData ScienceHigh-performance ArchitectureComputational LinguisticsVideo IndexingRobust Speech RecognitionParallel ComputingReal-time LanguageHealth SciencesComputer EngineeringComputer ScienceBackend Search StageReal-time SpeedSpeech CommunicationSpeech TechnologyHardware AccelerationSpeech ProcessingParallel ProgrammingSpeech InputSpeech Perception
Today's best quality speech recognition systems are implemented in software. These systems fully occupy the resources of a high-end server to deliver results at real-time speed: each hour of audio requires a significant fraction of an hour of computation for recognition. This is profoundly limiting for applications that require extreme recognition speed, for example, high-volume tasks such as video indexing (e.g., YouTube), or high-speed tasks such as triage of homeland security intelligence. We describe the architecture and implementation of one critical component -- the backend search stage -- of a high-speed, large-vocabulary recognizer. Implemented on a multi-FPGA Berkeley Emulation Engine 2 (BEE2) platform, we handle a standard 5000-word Wall Street Journal speech benchmark. Our backend search engine can decode on average 10 times faster than real-time running at 100 MHz, i.e, 10x faster than real-time, with negligible degradation in accuracy, running at a clock rate ~ 30x slower than a conventional server. To the best of our knowledge, this is both the most complex, and the fastest recognizer ever to be realized in a hardware form.
| Year | Citations | |
|---|---|---|
Page 1
Page 1