Summary
Centralized system management usually requires that the managed systems
need to access some network based naming, directory or authentication
system like NIS, Radius, ActiveDirectory. Though those management services are
designed to allow redundancy, they are a critical component in a system
architecture, since when they fail or even if merely the network
connectivity between a managed system and the management service fails, the
managed systems work degradedly or fail completely.
Thus, even with redundancy implemented, the management services constitute a single point of failure.
Node Director Solution
While the Node Director uses an LDAP directory service for storing management
information, managed systems can be set up in a way they do not need access
this directory service: instead of binding them to the LDAP directory the
Node Director provides functionality to update the native local databases
on every managed system with the centrally managed informations.
Thus, i.e. user accounts are held in /etc/passwd, authentication information in /etc/shadow, and so on, making the systems self-contained without losing the advantage of a centrally managed environment.
Pros and Cons
- + Network failures
- While network failures usually causes degradation of systems relying on some network based naming or directory service with secondary effects as services going down due to users being temporarily unknown or services being blocked until the name service is available again, in the above Node Director scenario the systems keep operating unimparedly. Of course, as long as their network connectivity is cut they will be unaccessible, too, but do not need any recovery after network connectivity has been restored.
- + Management system failures
- The loss of the management system composed of the Node Director and its management database, will not have any effect on the production environment. Rather, the ability to manage the system is degraded, but the systems keep operating.
- + Managability of degraded system environments
- Since all the centrally managed information is kept locally, in case of a failure of the management system urgent system management tasks can be performend locally.
- + Systems independence
- Holding a local copy of the management information makes the systems operate more independent and self-contained. It is even possible to temporarily or permanently cut them from the management system (Node Director) without any consequences: they hold the last state they had when still managed and further on can be administrated locally.
- - Replication overhead
- The need for replication of the management information causes some overhead. Changes in the management database will not be effective until they have been replicated onto the managed system population.
- - Inconsistencies
- The excessive application of replication increases the risk of temporary inconsistencies i.e. caused by network failures.