Distributed System - ANSWER A collection of autonomous
computers/devices which perform a task together
Host - ANSWER Computer or device that contains one or more CPUs,
memory, and possibly disk storage
Network - ANSWER Links hosts together
Middleware - ANSWER Layer of software whose purpose is to mask the
heterogeneity of the distributed system, and to provide a convenient
programming model for programmers
Characteristics of a DS - ANSWER 1. Each host operates independently, so
there is no central authority, although hosts may be allocated as authorities on
particular tasks
2. Each host keeps time separately, so every host has a clock, synchronization
requires an algorithm
3. Hosts fail independently of one another
4. Hosts exchange messages to: share state, use others' resources, and notify
remote changes. This is both the function of a DS and the means to control the
DS
Design Considerations - ANSWER Openness, scalability, transparency,
management, heterogeneity, concurrency, fault tolerance, security, performance,
other issues
Openness (design consideration) - ANSWER Interfaces to the OS, network,
and services must be based on published specifications (such as standards)
Allows for new hosts to be added to the network
1. Makes it possible for one vendor to build systems which interact with those
of another vendor
,2. Without such openness, user has to buy everything from the same vendor
Concurrency (design consideration) - ANSWER Concurrency is important
because:
1. Even if the DS is comprised only of hosts running a single process, the
distributed system as a whole will be multi-process
2. Multiple clients may try to access a service
3. Duplicated resources may provide equivalent services
So issues similar to those in multi-process operating systems must be addressed:
locking, synchronization
Scalability (design consideration) - ANSWER Scalability pros of DS:
1. relatively cheap components may be used to build a small system at first
2. Additional components can be easily added as demand on the system
increases
For this to work we need duplicated resources, allowing us to avoid: the notion
of single master services, short fixed names, performance bottlenecks
Scalability and state:
Some resources have state information, and the duplication of state information
leads to the problem of propagating changes of state to the various copies
State Information - ANSWER Data that alters the resource's future responses
to requests (e.g. The State of a file server is the data held on disk, alterations to
the state alter the response given to a read request)
Fault Tolerance (design consideration) - ANSWER Increased number of
components leads to an increased likelihood that at least one component fails at
any given time, so design the DS to continue to work even if any component
fails.
Use fault isolation, and fault masking
Need hardware redundancy and software recovery
There are different measures of failure rate
, Fault tolerance is costly
There are different failure modes
Fault Isolation - ANSWER Fault occurring in one resource does not affect
operation of other resources (need hardware redundancy)
Fault Masking - ANSWER Take action to restore the service when a particular
resource of the service fails without the user noticing (need software recovery)
Types of Measures of Failure Rate - ANSWER 1. Mean Time Before Failure
(MTBF)
2. t-Fault Tolerance
Mean Time Before Failure (MTBF) - ANSWER 1. Average period of time
between failures, can assume a random distribution of failures, can also measure
using the probability of a failure occurring in any period of time
t-fault Tolerance - ANSWER Must be more than t failed components before a
service fails
Can be a more intuitive measurement
Tells us that we can suffer t failures before losing a service
Failure Modes - ANSWER Different ways a system can fail. We can evaluate
systems based on how well they perform in the presence of different kinds of
failures
1. Failstop
2. crash
3. crash+link
4. Receive omission
5. Send omission
6. General omission
7. Byzantine failure
computers/devices which perform a task together
Host - ANSWER Computer or device that contains one or more CPUs,
memory, and possibly disk storage
Network - ANSWER Links hosts together
Middleware - ANSWER Layer of software whose purpose is to mask the
heterogeneity of the distributed system, and to provide a convenient
programming model for programmers
Characteristics of a DS - ANSWER 1. Each host operates independently, so
there is no central authority, although hosts may be allocated as authorities on
particular tasks
2. Each host keeps time separately, so every host has a clock, synchronization
requires an algorithm
3. Hosts fail independently of one another
4. Hosts exchange messages to: share state, use others' resources, and notify
remote changes. This is both the function of a DS and the means to control the
DS
Design Considerations - ANSWER Openness, scalability, transparency,
management, heterogeneity, concurrency, fault tolerance, security, performance,
other issues
Openness (design consideration) - ANSWER Interfaces to the OS, network,
and services must be based on published specifications (such as standards)
Allows for new hosts to be added to the network
1. Makes it possible for one vendor to build systems which interact with those
of another vendor
,2. Without such openness, user has to buy everything from the same vendor
Concurrency (design consideration) - ANSWER Concurrency is important
because:
1. Even if the DS is comprised only of hosts running a single process, the
distributed system as a whole will be multi-process
2. Multiple clients may try to access a service
3. Duplicated resources may provide equivalent services
So issues similar to those in multi-process operating systems must be addressed:
locking, synchronization
Scalability (design consideration) - ANSWER Scalability pros of DS:
1. relatively cheap components may be used to build a small system at first
2. Additional components can be easily added as demand on the system
increases
For this to work we need duplicated resources, allowing us to avoid: the notion
of single master services, short fixed names, performance bottlenecks
Scalability and state:
Some resources have state information, and the duplication of state information
leads to the problem of propagating changes of state to the various copies
State Information - ANSWER Data that alters the resource's future responses
to requests (e.g. The State of a file server is the data held on disk, alterations to
the state alter the response given to a read request)
Fault Tolerance (design consideration) - ANSWER Increased number of
components leads to an increased likelihood that at least one component fails at
any given time, so design the DS to continue to work even if any component
fails.
Use fault isolation, and fault masking
Need hardware redundancy and software recovery
There are different measures of failure rate
, Fault tolerance is costly
There are different failure modes
Fault Isolation - ANSWER Fault occurring in one resource does not affect
operation of other resources (need hardware redundancy)
Fault Masking - ANSWER Take action to restore the service when a particular
resource of the service fails without the user noticing (need software recovery)
Types of Measures of Failure Rate - ANSWER 1. Mean Time Before Failure
(MTBF)
2. t-Fault Tolerance
Mean Time Before Failure (MTBF) - ANSWER 1. Average period of time
between failures, can assume a random distribution of failures, can also measure
using the probability of a failure occurring in any period of time
t-fault Tolerance - ANSWER Must be more than t failed components before a
service fails
Can be a more intuitive measurement
Tells us that we can suffer t failures before losing a service
Failure Modes - ANSWER Different ways a system can fail. We can evaluate
systems based on how well they perform in the presence of different kinds of
failures
1. Failstop
2. crash
3. crash+link
4. Receive omission
5. Send omission
6. General omission
7. Byzantine failure