D204 - MW Data Analytics Tools and Techniques with 100% correct answers
What are the defining characteristics of open data? data that is free to use with no cost and no restrictions What is the principle of informed consent in research? Potential research participants have to be given enough information about the goals, methods, and applications of the research project so they can decide whether they want to participate. Self-generated data refers to what practice? programming computers to engage with themselves to create their own training data While it is possible to gather vast amounts of data through passive collection, researchers still need to be concerned about representativeness. Why does this matter? Without representative data from a wide range of respondents in diverse situations, the results will not generalize well. What practice does "data scraping" refer to? Data scraping refers to the process of extracting data from formats that were not specifically designed for data sharing. What kind of data can be accessed with APIs? both proprietary and open data One common method of gathering new data is A/B testing. What does this refer to? A/B testing is a randomized experiment in which people visiting a website see one of two different versions. The response rates are then compared to choose the most effective version of the website. What is potentially a major advantage of using in-house data? You may be able to talk with the people who created the datasets. How do expert systems mimic the decision-making of experts? by explicitly listing decisions and outcomes in a logical chain like a flow chart What does it mean that algorithms like neural networks develop "implicit" rules? It means that the rules, or the basis by which the algorithm reaches its decision, may not be easy to describe to humans. Why have expert systems not progressed as much as machine learning in the development of decision-making systems? Expert systems quickly encounter the "combinatorial explosion" in which there are simply too many possibilities to enumerate them all. How do decision tree provide rules for decision-making? Decision tree explicitly define binary decisions at each step in the data to reach the outcome; these decisions can then be used in other contexts. Statistical applications like SPSS or jamovi are useful to data projects in what way? Their point-and-click interfaces make common analyses easier for non-specialists to conduct. According to the video, why are spreadsheets so important to data science? They are the "universal data container." What is the purpose of a "package" in a programming language like Python or R? Packages are collections of code that give additional functionality to programming languages and simplify many common tasks. What is "machine-learning-as-a-service" or "MLaaS"? MLaaS is a way of making machine learning easier and more accessible by hosting the software on the same cloud servers that store the data. How does the "combinatorial explosion" make optimization difficult? The number of possibilities increases so fast that it is often not possible to test all possible arrangement. Computers frequently work with data in matrices that are arranged in rows and columns. What is the name for the version of algebra that works best with matrices? linear algebra What is a major advantage of understanding the algebra behind data science procedures? You will better understand how to diagnose problem and respond when things don't work as expected. According to the example calculation in the video, what information do you have to have in order to use calculus? a function that describes the relationship between price and sales What is a "posterior probability" in Bayes' Theorem? A posterior probability is the probability of the cause, such as a disease, given the effect, such as a positive medical test for the disease. Which two techniques are the most common choices for dimensionality reduction? principal component analysis and factor analysis What is meant by "autocorrelation" in time-series data? each point in time is influenced by the points that came before it. Which are examples of models used in predictive analytics? regression and neural networks Which characteristic makes fraud detection particularly difficult? Because fraud is rare, it leads to imbalanced distributions, which can make modeling more difficult. If you can only choose one number to describe a distribution, then you should choose a measure of center. But what should you choose if you can have a second number? a measure of variability While anomaly detection is normally associated with negative outcomes like fraud or machine failure, it is more flexible than that. Which is the following is a positive outcome in anomaly detection? identifying new markets with potential value Why is it productive to aggregate models? Different models tend to overestimate and underestimate their predictions, so the differences frequently cancel out. What is meant by a "feature" in the context of feature reduction? a variable or dimension in the data. The technique of separating time-series data into and overall trend, a seasonal or cyclical trend, and random variations or noise is known by which term? decomposition Which are methods used for validating models? holdout testing data and cross-validation testing data Which two methods are common algorithms for classifying new cases into existing categories? k-means and k-nearest neighbors What is the purpose of model validation in data science? to determine how well the statistical model works with data other than the data used for modeling What is the name for a chart that shows "branches" or cases splitting from one, giant cluster, to individual clusters? a dendrogram What does the saying "data is for doing" mean? It means that data is typically gathered and analyzed to help direct what a person or company does. When people make decisions based on data science analyses, what kind of factors should they focus on? factors that are controllable and practical What is the purpose of interpretability in data science projects? Results that can be interpreted by humans can be used to form general principles for decision making in new situations. What are methods for self-generated data? 1) External Reinforcement Learning 2) Generative adversarial networks 3) Internal Reinforcement Learning What is the best use for the rules that are developed in neural networks? for automating decision making processes such as classification What is the purpose of aggregating the predictions of multiple models in data science? the combined predictions tend to be more accurate and more stable the the individual predictions. What is potentially a major disadvantage to in-house data? The data that you need for your project may not already exist in your organization. APIs, or Application Programming Interfaces, generally serve what function in a data science project? APS allow you to access data and include it in your data science programming. Which methods can be used for feature selection? 1) Correlation 2) Stepwise Regression 3) Lasso Regression 4) Variable Importance What is another name for optimization formula? mathematical programming What does it mean when a machine learning algorithm is referred to as a "black box"? a process that is hidden from view or difficult to understand.
Escuela, estudio y materia
- Institución
- Havard School
- Grado
- D204 WGU:DATA ANALYTICS
Información del documento
- Subido en
- 13 de octubre de 2023
- Número de páginas
- 7
- Escrito en
- 2023/2024
- Tipo
- Examen
- Contiene
- Preguntas y respuestas
Temas
-
d204 mw data analytics tools
Documento también disponible en un lote