Concepedia

Publication | Closed Access

Session state: beyond soft state

46

Citations

28

References

2004

Year

Abstract

The cost and complexity of administration of large systems has come to dominate their total cost of ownership. Stateless and soft-state components, such as Web servers or network routers, are relatively easy to manage: capacity can be scaled incrementally by adding more nodes, rebalancing of load after failover is easy, and reactive or proactive (“rolling”) reboots can be used to handle transient failures. We show that it is possible to achieve the same ease of management for the state-storage subsystem by subdividing persistent state according to the specific guarantees needed by each type. While other systems [21, 19] have addressed persistent-until-deleted state, we describe SSM, an implemented store for a previously unaddressed class of state – user-session state – that exhibits the same manageability properties as stateless or soft-state nodes while providing firm storage guarantees. In particular, any node can be proactively or reactively rebooted at any time to recover from transient faults, without impacting online performance or losing data. We then exploit this simplified manageability by pairing SSM with an application-generic, statisticalanomaly-based framework that detects crashes, hangs, and performance failures, and automatically attempts to recover from them by rebooting faulty nodes as needed. Although the detection techniques generate some false positives, the cost of recovery is so low that the false positives have limited impact. We provide microbenchmarks to demonstrate SSM’s built-in overload protection, failure management and self-tuning. Finally, we benchmark SSM integrated into a production enterprise-scale interactive service to demonstrate that these benefits need not come at the cost of significantly decreased throughput or response time. 1.

References

YearCitations

Page 1