Low-latency pipelined crossbar arbitration

Abstract

Heuristic, parallel, iterative matching algorithms for input-queued cell switches with virtual output queuing require O(log N) iterations to achieve good performance. If the hardware implementation of the number of iterations required is not feasible within the cell duration, the matching process can be pipelined to obtain a matching in every cell time slot. However, existing approaches incur a substantial latency penalty due to the way the pipelining is performed, which renders them unattractive in latency-sensitive applications such as parallel computer interconnects. We introduce a new class of pipelined matching algorithms that can be based on any existing iterative matching algorithm, makes the minimum latency independent of the pipeline depth, and is highly amenable to distributed implementation. Our simulation results confirm that specific instances of this class achieve significantly lower average latency throughout the load range than existing schemes do. We also propose an instantiation of the scheme that, in addition, significantly improves the performance with nonuniform traffic.

References

Page 1

	Year	Citations

Page 1