Using Active Disks for Failure Detection: Two Phase Commit WITHOUT Blocking

E. Borowsky and R. Golding (USA)


Distributed shared memory; replication protocol; two phase commit; 2PC; network storage


Recent advances in network attached disk technology have inspired a host of research on distributed storage systems [1, 2, 3, 4]. Naturally, part of the appeal of such sys tems is the opportunity they afford for widely replicated data; however, with wide data redundancy comes a host of consistency issues. This paper address the problem of writing concurrently to multiple network attached devices with a two phase commit write protocol. Most work in this area proposes using three-phase commit protocols to avoid blocking [5, 6, 2]. We introduce a novel reconcilia tion protocol managed by the storage devices themselves to alleviate a blocked transaction should one occur. In our sys tem the set of shared disks implementing a replicated ob ject maintains coordination to the object. This approach al lows shorter access times in the common case where clients and storage devices do not fail, reverting to a separate pro cedure to resolve blocking and maintain data consistency only when failures occur.

Important Links:

Go Back