Service recovery in a loosely coupled distributed computing environment

Author:
James Garvin, Michael Baldwin, Willie Chang
File Size:
119.83 kB
Date:
24 May 2001
Downloads:
1174 x

We propose a new method to enforce the fault-tolerant and recovery capabilities of critical network services in a distributed computing environment. With our approach a service can be dynamically dispatched onto any available host, and at any time, each service is not only viable but also consumes the normal amount of resources without duplication. In the events or indications of system failures, services would reestablish themselves onto other hosts via a non-preemptive remote execution process. The basic simulation is to have a vital service reside on a primary host, with a secondary host designated as standing by. The primary host performs the service until the occurrence or indication of fatal faults in the system. Then, the secondary host resumes the service and becomes the primary host, with yet another host being designated as the new secondary host for that service.

Service recovery in a loosely coupled distributed computing environment