DSE
Supervised learning - answer1. Input data is labeled.
2. Uses training dataset.
3. Used for prediction.
4. Enables classification and regression.
Unsupervised Learning - answer1. Input data is unlabeled.
2. Uses the input data set.
3. Used for analysis.
4. Enables Classification, Density Estimation, & Dimension Reduction
Selection bias - answerWhen the representation of a population CANNOT be
randomized due to the selection of individuals, groups, or data for analysis. There are 4
types.
Attrition - answerA kind of selection bias caused by loss of participants, discounting trial
subjects/tests that did not run to completion.
Data - answerWhen specific subsets of data are chosen to support a conclusion or
rejection of bad data on arbitrary grounds, instead of according to previously stated or
generally agreed criteria.(switched up)
Time interval - answerA trial may be terminated early at an extreme value, but the
extreme value is likely to be reached by the variable with the largest variance, even if all
variables have a similar mean.
Sampling bias - answerIt is a systematic error due to a non-random sample of a
population causing some members of the population to be less likely to be included than
others resulting in a biased sample.
What is the difference between "long" and "wide" format data? - answerIn the wide
format, a subject's repeated responses will be in a single row, and each response is in a
separate column.
In the long format, each row is a one time point per subject.
What do you understand by the term Normal Distribution? - answerData is usually
distributed in different ways with a bias to the left or to the right or it can all be jumbled
up.
However, there are chances that data is distributed around a central value without any
bias to the left or right and reaches normal distribution in the form of a bell-shaped
curve.
Supervised learning - answer1. Input data is labeled.
2. Uses training dataset.
3. Used for prediction.
4. Enables classification and regression.
Unsupervised Learning - answer1. Input data is unlabeled.
2. Uses the input data set.
3. Used for analysis.
4. Enables Classification, Density Estimation, & Dimension Reduction
Selection bias - answerWhen the representation of a population CANNOT be
randomized due to the selection of individuals, groups, or data for analysis. There are 4
types.
Attrition - answerA kind of selection bias caused by loss of participants, discounting trial
subjects/tests that did not run to completion.
Data - answerWhen specific subsets of data are chosen to support a conclusion or
rejection of bad data on arbitrary grounds, instead of according to previously stated or
generally agreed criteria.(switched up)
Time interval - answerA trial may be terminated early at an extreme value, but the
extreme value is likely to be reached by the variable with the largest variance, even if all
variables have a similar mean.
Sampling bias - answerIt is a systematic error due to a non-random sample of a
population causing some members of the population to be less likely to be included than
others resulting in a biased sample.
What is the difference between "long" and "wide" format data? - answerIn the wide
format, a subject's repeated responses will be in a single row, and each response is in a
separate column.
In the long format, each row is a one time point per subject.
What do you understand by the term Normal Distribution? - answerData is usually
distributed in different ways with a bias to the left or to the right or it can all be jumbled
up.
However, there are chances that data is distributed around a central value without any
bias to the left or right and reaches normal distribution in the form of a bell-shaped
curve.