Misplaced Pages

Master-checker

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Fault tolerance method for multiprocessor systems
This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.
Find sources: "Master-checker" – news · newspapers · books · scholar · JSTOR (November 2024)

Master-checker or master/checker is a hardware-supported fault tolerance architecture for multiprocessor systems, in which two processors, referred to as the master and checker, calculate the same functions in parallel in order to increase the probability that the result is exact. The checker CPU is synchronised at clock level with the master CPU and processes the same programs as the master. Whenever the master CPU generates an output, the checker CPU compares this output to its own calculation and in the event of a difference raises a warning.

The master-checker system generally gives more accurate answers by ensuring that the answer is correct before passing it on to the application requesting the algorithm being completed. It also allows for error handling if the results are inconsistent. A recurrence of discrepancies between the two processors could indicate a flaw in the software, hardware problems, or timing issues between the clock, CPUs, and/or system memory. However, such redundant processing wastes time and energy. If the master-CPU is correct 95% or more of the time, the power and time used by the checker-CPU to verify answers is wasted. Depending on the merit of a correct answer, a checker-CPU may or may not be warranted. In order to alleviate some of the cost in these situations, the checker-CPU may be used to calculate something else in the same algorithm, increasing the speed and processing output of the CPU system.

References

  1. ^ Cin, M. Dal; Grygier, A.; Hessenauer, H.; Hildebrand, U.; Hönig, J.; Hohl, W.; Michel, E.; Pataricza, A. (September 21, 1993). "Fault Tolerance in Distributed Shared Memory Multiprocessors". In Bode, Arndt; Cin, Mario (eds.). Parallel Computer Architectures. Lecture Notes in Computer Science. Vol. 732. Springer Publishing. pp. 33–35. doi:10.1007/978-3-662-21577-7_3. ISBN 978-3-540-57307-4.
Stub icon

This microcomputer- or microprocessor-related article is a stub. You can help Misplaced Pages by expanding it.

Categories: