DATA MINING TEST STUDY GUIDE
QUESTIONS AND ANSWERS
consistency - Answer-if the value is consistent with other values or inconsistent
believably - Answer-if the values are trustworthy or not, can we trust the data and the
data source
timeliness - Answer-if the values will be able to process on time or not, updatability
interpretability - Answer-if the data easy to understand or not
why do we pre-process data - Answer-to reduce redundancy of using same data on
the data set
to save time during the data analysis phase of data mining
to handle incomplete data set to find possible data to replace the missing data
clean noisy data from data set
what are some procedure of handling missing values - Answer-getting rid of valuable
data from the data set
creating an incomplete data set
What is data mining - Answer-the process of discovering interacting patterns and
knowledge from large amount of data
What is the steps of a process of knowledge discovery (KDD)? - Answer-data
cleaning
data integration
data selection
data transformation
data mining
data evaluation
knowledge presentation
data cleaning - Answer-to remove noise and inconsistent data
data integration - Answer-where multiple data source may be combined
data selection - Answer-where data relevant to the analysis task are retrieved from
the database
data transformation - Answer-where data are transformed and consolidated into
forms appropriate from mining by performing summary or aggregation operation
regression - Answer-used to prediction missing or unavailable numerical data values
rather than class labels
clustering - Answer-used to generate class labels for a group of data
QUESTIONS AND ANSWERS
consistency - Answer-if the value is consistent with other values or inconsistent
believably - Answer-if the values are trustworthy or not, can we trust the data and the
data source
timeliness - Answer-if the values will be able to process on time or not, updatability
interpretability - Answer-if the data easy to understand or not
why do we pre-process data - Answer-to reduce redundancy of using same data on
the data set
to save time during the data analysis phase of data mining
to handle incomplete data set to find possible data to replace the missing data
clean noisy data from data set
what are some procedure of handling missing values - Answer-getting rid of valuable
data from the data set
creating an incomplete data set
What is data mining - Answer-the process of discovering interacting patterns and
knowledge from large amount of data
What is the steps of a process of knowledge discovery (KDD)? - Answer-data
cleaning
data integration
data selection
data transformation
data mining
data evaluation
knowledge presentation
data cleaning - Answer-to remove noise and inconsistent data
data integration - Answer-where multiple data source may be combined
data selection - Answer-where data relevant to the analysis task are retrieved from
the database
data transformation - Answer-where data are transformed and consolidated into
forms appropriate from mining by performing summary or aggregation operation
regression - Answer-used to prediction missing or unavailable numerical data values
rather than class labels
clustering - Answer-used to generate class labels for a group of data