Reliability for Network Swapping Systems that Support Migration of Remotely Swapped Pages

T. Newhall, B. Mitchell, and J. Rosse (USA)


Cluster Computing, Network RAM, Reliability.


Network swapping systems allow individual cluster nodes with over-committed memory to use the idle mem ory of remote nodes as their backing store, and to swap their pages over the network. As the number of nodes in a cluster increases, it becomes more likely that a node will fail or become unreachable, making it important that such a system provide reliability support. Without relia bility, a single node crash can affect programs running on other cluster nodes by losing remotely swapped page data that was stored on the crashed node. Our network swap ping system, Nswap, has design features that complicate reliability: swapped pages can migrate from one node to another in response to changes in a node's local memory needs. As a result, reliability schemes that rely on fixed placement of page and reliability data are not applicable to our system. Our reliability solutions solve the unique challenge of providing reliability to network swapping sys tems that both support dynamic changes to the size of re mote RAM swap space and support migration of remotely swapped page data. Results show that even though our Mir roring reliability scheme adds time and space overhead to Nswap, it still outperforms swapping to disk by a factor of up to 8.2. Our dynamic Parity scheme will provide reliabil ity with minimal time and space overhead.

Important Links:

Go Back