Data science.
How would you create a taxonomy to identify key customer trends in unstructured data? - correct answersThe best way to approach this question is to mention that it is good to check with the business owner and understand their objectives before categorizing the data. Having done this, it is always good to follow an iterative approach by pulling new data samples and improving the model accordingly by validating it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your model is producing actionable results and improving over the time. Python or R - Which one would you prefer for text analytics? - correct answersThe best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high performance data analysis tools. Which technique is used to predict categorical responses? - correct answersClassification technique is used widely in mining for classifying data sets. What is logistic regression? Or State an example when you have used logistic regression recently. - correct answersLogistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc. What are Recommender Systems? - correct answersA subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc. Why data cleaning plays a vital role in analysis? - correct answersCleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is a cumbersome process because - as the number of data sources increases, the time take to clean the data increases exponentially due to the number of sources and the volume of data generated in these sources. It might take up to 80% of the time for just cleaning data making it a critical part of analysis task. Differentiate between univariate, bivariate and multivariate analysis. - correct answersThese are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can be referred to as univariate analysis.
Written for
- Institution
- Data science.
- Course
- Data science.
Document information
- Uploaded on
- May 15, 2023
- Number of pages
- 14
- Written in
- 2022/2023
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
how would you create a taxonomy to identify key customer trends in unstructured data correct answersthe best way to approach this question is to mention that it is good to check with the business o