International Journal of Advances in Computer Science and Its Applications
Author(s) : P. LATCHOUMY, P. SHEIK ABDUL KHADER
Grid computing provides the ability to access, utilize and control a variety of underutilized heterogeneous resources distributed across multiple administrative domains while it is an error prone environment. The failure of resources affects job execution during runtime. We propose a new strategy named Fault Tolerant Job Scheduler with Efficient Job Execution using Improved Fault Tolerant Algorithm in Grid computing which effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline. This system maintains the history of fault occurrence of resources with respect to processor, memory and bandwidth. Whenever a resource broker has jobs to schedule, the system finds the fault tolerant resources based on their failure rate. The resources with lowest failure rate will have the highest priority for scheduling. The job manager can monitor the execution of job and return the results to the user after successful completion. If failure occurs it reschedules the job with the next optimal resource using the last saved state.