Data Science for Business Chapters 1&2 Questions and Answsers 2024
Clustering Techniques -Correct Answersgroup entities by their shared features w/o a focused objective. Lyft -Correct AnswersHelps determine whether a co-occurrence (association) in data is interesting, as opposed to simply being natural consequences of popularity 2 types of decisions -Correct Answers1) decisions for which "discoveries" need to be made within the data. 2) decisions that repeat, especially at massive scale (even minor everyday decisions can have huge effects at scale). Predictive modeling... -Correct Answersabstracts away most of the complexity of the world, focusing in on a particular set of indicators that correlate in some way with a quantity of interest. Overfitting a dataset -Correct Answerswhen you look too hard at a set of data and you find something in it... but it might not generalize to the complete set of data. -) algorithms attempt to stop this Classification -Correct AnswersAttempting to predict which subgroup an individual belongs to. SUPERVISED Scoring/class probablity estimation -Correct AnswersGives a score to an individual on how likely they are to belong to a subgroup. Regression (value estimation) -Correct AnswersAttempts to estimate, for each individual, the numerical value of some variable for the said individual. SUPERVISED "How much will a given customer us the service?" Classification normally predicts ________a__________, where as regression predicts ____________b___________. -Correct Answersa) if something will happen b) how much of something will happen Similarity Matching -Correct Answersattempts to recognize similar individuals based on data. SUPERVISED/UNSUPERVISED -) find similar companies of your best business customers -) utilized in other data tasks Clustering -Correct Answersattempts to group individuals by their similarities but without any specific purpose. UNSUPERVISED -) useful for preliminary domain exploration. -) who links to our websites the most? (Job title, gender, gender of sender, school?) Co-occurrence grouping (frequent itemset mining, association rule discovery, and market-basket analysis) -Correct Answersattempts to find associations between entities based on transactions involving them. UNSUPERVISED -) used in recommendation systems Profiling (Behavior description) -Correct Answersattempts to characterize the typical behavior of an individual, group, or population. UNSUPERVISED -) What is the typical cell phone usage of this customer segment? -) Used to establish behavior norms for anomaly detection applications (fraud/viruses) Link Predicition -Correct Answersattempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of link. SUPERVISED/UNSUPERVISED Data Reduction -Correct AnswersAttempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set. SUPERVISED/UNSUPERVISED -) Replace movies w/ movie genres -) Loss of info, but improved insight Casual Modeling -Correct AnswersAttempts to help us understand what events or action actually influence others. -) Targeted ads improved sales, but did those ads influence the customers to buy? or did they just find customers that were going to buy the product anyway? -) Used in A/B Testing A careful data scientist should always include -Correct Answersexact assumptions that were made with a casual conclusion -) Businesses need to weigh the tradeoffs between increasing investment to reduce assumptions made, versus deciding that the conclusions are good enough given the assumptions Unsupervised Methods -Correct AnswersWhen there is no target for grouping, grouping based solely on the trends seen in the data. Supervised Methods -Correct AnswersA specific target grouping is defined (will a customer leave after their contract expires?) In this case, segmentation is being done for a specific reason: to take action. A vital part in the early stages of the data mining process is (i)(ii) -Correct Answers(i) to decide whether the line of attack will be supervised or unsupervised, and (ii) if supervised, to produce a precise definition of a target variable. This variable must be a specific quantity that will be the focus of the data mining (and for which we can obtain values for some example data). Data Mining Flow Chart -Correct AnswersSaved as a snip, Remember this is based around exploration; it iterates on approaches and strategy rather than on software designs. Outcomes are far less certain and the results of a given step may change the fundamental understanding of the problem.
Written for
- Institution
- Data Science for Business
- Course
- Data Science for Business
Document information
- Uploaded on
- January 23, 2024
- Number of pages
- 6
- Written in
- 2023/2024
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Also available in package deal