100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

WGU D204 The Data Analytics Journey Questions & Answers Graded A+

Rating
-
Sold
-
Pages
110
Grade
A+
Uploaded on
22-07-2023
Written in
2022/2023

Business Understanding (order) - Answer 1st in Data Analytics Lifecycle Data Acquisition (order) - Answer 2nd in Data Analytics Lifecycle Data Cleaning (order) - Answer 3rd in Data Analytics Lifecycle Data Exploration (order) - Answer 4th in Data Analytics Lifecycle Predictive Modeling (order) - Answer 5th in Data Analytics Lifecycle Data Mining/Machine Learning (order) - Answer 6th in Data Analytics Lifecycle Reporting and Visualization (order) - Answer 7th in Data Analytics Lifecycle Business Understanding - Answer - Also known as the discovery phase or planning phase - Analyst defines the major questions of interest - Assesses the resource constraints of the project - Determines the needs of stakeholders Data Acquisition - Answer - Collecting Data - Data retrieved from DB - Use SQL to obtain from Data Warehouse - If data is not available, web scraping and surveys are used to acquire it Data Cleaning - Answer - Referred to as Data cleansing, data wrangling, data munging, and feature engineering - When this phase is ignored or skipped, the results from the analysis may become irrelevant. - There is no one common tool supporting this phase. An analyst will use SQL, Python, R, or Excel to perform various data modifications and transformations. - Data quality is measured in terms of uniqueness and relevance. Data Exploration - Answer - the analyst begins to understand the basic nature of data and the relationships within it. - This phase often relies on the use of data visualization tools and numerical summaries, such as measures of central tendency and variability. Predictive Modeling - Answer - These tools allow an analyst to move beyond describing the data to creating models that enable predictions of outcomes of interest. - Tools such as Python and R play an important role in automating the training and use of models. Data Mining - Answer - These tools became popular with the ability of computers to look for patterns in large amounts of data. Tools such as Python and R play an important role in this phase. - At times you may find that "machine learning" is used as a synonym for "data mining." However, some in the industry might refer to "machine learning" as a specialized segment of data mining techniques that continually update (i.e., "learn") to improve its modeling over time. Reporting and Visualization - Answer - an analyst tells the story of the data and uses graphs or interactive dashboards to inform others of the findings from the analyses. - Interactive dashboard tools, such as Tableau, give even the novice user the ability to interact with the data and spot trends and patterns. - Often, the goal of this phase is to provide actionable insights for various stakeholders. Business Understanding Problems - Answer Lack of clear focus on stakeholders, timeline, limitations and budget could potentially derail an analysis Data Acquisition Problems - Answer Quality and type of data may make access more difficult Data Cleaning Problems - Answer Some cleaning techniques could dramatically change data/outcomes Outliers not dealt with can cause problems with statistical models due to excessive variability. Data Exploration Problems - Answer Skipping this step could enable faulty perceptions of the data which hurt advanced analytics. Predictive Modeling Problems - Answer - Too many input variables (predictors) can cause problems - Correlation does not imply causation. - Time series models often need sufficient time data to offer precise trending. - Predictive model accuracy should be assessed using cross-validation. Data Mining Problems - Answer Running on entire data is problematic; need to subset data into training and testing datasets to build models. Reporting and visualization Problems - Answer - Due to potential large audience consumption, mistakes can cause bad business decisions and loss of revenue - Improper scales used in graphs could push for interpretations of the story that is inaccurate Descriptive - Answer Key focus: Observation Main question: What happened? Diagnostics - Answer Key focus: Explained reason Main question: Why did it happen? Descriptive Example - Answer In a healthcare setting, an unusually high number of people are admitted to the emergency room in a short period of time. ____ analytics tells you that this is happening and provides real-time data with all the corresponding statistics (date of occurrence, volume, patient details, etc.). Diagnostic Example - Answer In the healthcare example mentioned earlier, ___ analytics would explore the data and make correlations. For instance, it may help you determine that all of the patients' symptoms — high fever, dry cough, and fatigue — point to the same infectious agent. You now have an explanation for the sudden spike in volume at the ER. Predictive Example - Answer Back in our hospital example, ___ analytics may forecast a surge in patients admitted to the ER in the next several weeks. Based on patterns in the data, the illness is spreading at a rapid rate. Prescriptive Example - Answer Back to our hospital example: now that you know the illness is spreading, the ___ analytics tool may suggest that you increase the number of staff on hand to adequately treat the influx of patients. Causation - Answer there is a real-world explanation for why this is logically happening; it implies a cause and effect Show Causality - Answer A/B testing or experiments A/B testing example - Answer you own a website with a red login button. You're thinking that you should change it to blue, since it looks too ugly. To determine which color you should use, you ask your users. You randomly sample 100 users and show 50 users the red button and 50 users the blue button. You measure the ratio of people who login to your website for each group, and see if there's a big difference. This approach randomly assigns subjects to two groups: an A and B group. Planning - Answer 1. Define the goals 2. Organize resources 3. Coordinate people 4. Schedule project Wrangling - Answer 5. Get data 6. Clean data 7. Explore data 8. Refine data Modeling - Answer 9. Create model 10. Validate model 11. Evaluate model 12. Refine model Applying - Answer 13. Present model 14. Deploy model 15. Revisit model 16. Archive assets Data Science Process - Answer 1. Find a question 2. Collect the Data 3. Prepare the Data 4. Create a model 5. Evaluate the model 6. Deploy the model Stakeholders - Answer an individual or group that has an interest in any decision or activity of an organization. For an analyst, this could simply be the manager, or even higher executives. Stakeholder register - Answer project document that has information about the project stakeholders. It identifies the people, groups, and organizations that have any interest in the work and the outcome project sponsor - Answer responsible for identifying a project topic/scope that is well aligned with the strengths of the data analytics project and for providing access to necessary data and key support staff for continued project progress monitoring and timely feedback Data analytics program managers - Answer - provide direction for a team of data analysts. They also build that team, making hiring decisions and deciding where each analyst's skills will prove most productive for the organization. They oversee the work of an analytics department, ensuring its functionality. - their main goal is in collaborating with the team and make sure there is forward momentum - solidify the day-to-day activities needed for successful completion of the project iron triangle - Answer the ability of the project manager to deliver in time, cost, and quality Data privacy - Answer is responsibly collecting, using and storing data about people, in line with the expectations of those people, your customers, regulations and laws Data ethics - Answer doing the right thing with data, considering the human impact from all sides, and making decisions based on your brand values What regulators must review - Answer - What harms are they trying to protect people from? - What rights do they want to guarantee? - What problems are they trying to solve? - What are the privacy outcomes they hope to achieve for their citizens? American Statistical Association Ethical Guidelines 1 - Answer The ethical statistician uses methodology and data that are relevant and appropriate; without favoritism or prejudice; and in a manner intended to produce valid, interpretable, and reproducible results. The ethical statistician does not knowingly accept work for which he/she is not sufficiently qualified, is honest with the client about any limitation of expertise, and consults other statisticians when necessary or in doubt. It is essential that statisticians treat others with respect. American Statistical Association Ethical Guidelines 2 - Answer The ethical statistician is candid about any known or suspected limitations, defects, or biases in the data that may affect the integrity or reliability of the statistical analysis. Objective and valid interpretation of the results requires that the underlying analysis recognizes and acknowledges the degree of reliability and integrity of the data. American Statistical Association Ethical Guidelines 3 - Answer The ethical statistician supports valid inferences, transparency, and good science in general, keeping the interests of the public, funder, client, or customer in mind (as well as professional colleagues, patients, the public, and the scientific community). American Statistical Association Ethical Guidelines 4 - Answer The ethical statistician protects and respects the rights and interests of human and animal subjects at all stages of their involvement in a project. This includes respondents to the census or to surveys, those whose data are contained in administrative records, and subjects of physically or psychologically invasive research. American Statistical Association Ethical Guidelines 5 - Answer Science and statistical practice are often conducted in teams made up of professionals with different professional standards. The statistician must know how to work ethically in this environment. American Statistical Association Ethical Guidelines 6 - Answer The practice of statistics requires consideration of the entire range of possible explanations for observed phenomena, and distinct observers drawing on their own unique sets of experiences can arrive at different and potentially diverging judgments about the plausibility of different explanations. Even in adversarial settings, discourse tends to be most successful when statisticians treat one another with mutual respect and focus on scientific principles, methodology, and the substance of data interpretations. American Statistical Association Ethical Guidelines 7 - Answer The ethical statistician understands the differences between questionable statistical, scientific, or professional practices and practices that constitute misconduct. The ethical statistician avoids all of the above and knows how each should be handled. American Statistical Association Ethical Guidelines 8 - Answer Those employing any person to analyze data are implicitly relying on the profession's reputation for objectivity. However, this creates an obligation on the part of the employer to understand and respect statisticians' obligation of objectivity. Issue - Answer State the legal issue(s) to be discussed. Rule - Answer State the relevant statutes and case law. Application - Answer Apply the relevant rules to the facts that created the issue. Conclusion - Answer State the most likely conclusions using the logic of the application section and whether there has been a violation. Always Continue Learning About Your Business - Answer Take some time each week to read up on industry news so that you continuously learn about your organization's business, its challenges, and your competition. As little as an hour of reading a week can have major impacts on your ability to understand your domain, understand the business question behind your analytics, and communicate more effectively. Go Back to the Business Question - Answer Any time that you're presenting the results of your work, it's important that you're able to tie the numbers directly back to a business question that's important to your stakeholders. What do the numbers mean in relation to your organization's strategic initiatives? What action can your insights inform? In other words: Why should your stakeholders care? Don't Be Too Granular - Answer As an analyst, it can be easy to get lost in the specifics about the data. After all, that's what's so fascinating about your work! But unless specific data points are critical to the argument you're trying to make, it's better to summarize so as to not lose the attention of your key stakeholders and sponsors. If you're passionate about the detail, you can always put it in an appendix at the end of your report. Make the Data Easier to Consume - Answer If you're tasked with creating a report or giving a presentation for stakeholders, it's important to make the data easy to digest. Wherever possible, use graphs, charts, and other visual representations so that your audience will have an easier time understanding your data. In addition to making it easier to hold their attention, this will also make it easier for them to put the data to use moving forward. Debrief After Your Presentation - Answer After you've presented the data to your stakeholders, you should ask them if there was anything they would rather you do differently next time. Were there any numbers they wish you had focused less time on? Any that they thought deserved more time? These takeaways can be invaluable in informing how you present the information in your next presentation or report, and show your willingness to continuously improve while keeping the concerns of your organization front and center. Just Because You Can Discuss Technical Information - Doesn't Mean You Should - Answer If you perform something complex like a logistic regression, you may be tempted to discuss all of the intricate details when communicating results. But fight this temptation. Most stakeholders will not be interested in how much technical knowledge you know. They are interested in how you've answered their question Soft Skills - Answer - Persuasion - Communication - Emotional intelligence - Active listening - Logic and reasoning - Interpersonal skills - Negotiation histogram - Answer graph numerical data in "groups" or bins that allow bars to represent frequencies. Histograms are convenient graphs to show outliers in data and skewness (i.e., asymmetry of the data). For example, this is a histogram on "Years of Experience" which shows a right-sided (i.e., positive) skewness. boxplot - Answer concise summary of the quartiles of numerical data (i.e., cutpoints that divide the data into 25% percentile segments). This graph is also convenient for detecting outliers and skewness. In this example, if we created a boxplot on the lifespan of animals, we can see a positive skew with particular outliers (e.g., elephants). The middle half of the data (inner quartile range (IQR)) is represented by the box in the graph. heatmap - Answer colorful graph that can visually show frequency or interaction using a range of colors. Typically, red is used for the most frequent/popular outcomes where blue represents the least frequent/popular outcomes. In this example, a heatmap is applied to a webpage to show where customers typically hover their mouse or click/interact with the website. scatterplot - Answer two dimensional graph which is great to visualize correlation or relationships. Each dot on the scatterplot represents an outcome for two numerical variables of interest. Regression - Answer a technique that allows us to predict an outcome (either numerical or categorical) based on a set of predictor variables Regression Example - Answer One might think of this process as providing an output given a set of input variables. For example, an analyst might predict the churn of customers based upon various customer demographic data. Classification - Answer technique in which the analyst wants to assign an item to a specific category based on various conditions Classification approach - Answer Locate - location of the item needing classification among measurements of interest Compare - compare this item to items in close proximity among the variables of interest Assign - assign the item to the group it most resembles Classification Example - Answer if a person found an anonymous paper thought to be written by Shakespeare, they might look at various data for writings: reading level, word count, average word length, etc. One would then find this anonymous paper compared to other Shakespeare writings (locate), see how similar it is to other writings of Shakespeare vs non-Shakespeare writings (compare), and then make a decision if the paper appears to be written by Shakespeare (assign). Clustering - Answer attempts to identify an unknown object among known groups (opposite of Classifying) Clustering Example - Answer the groupings are unknown and the analyst wishes to determine if the objects belong to any groups, and if so, how many groups exist. For example, an analyst might take search queries on the Apple company to determine if they group in any particular ways. After analyzing the data and applying classification techniques, they believe that people search on either Apple history, or current Apple news. Time Series - Answer is a technique that looks for trends in data over time Time Series Approach - Answer When one analyzes data over time, there could be a variety of reasons why data varies. As such, this technique is focused on breaking apart different reasons for the variation (decomposition) Time Series Example - Answer although sales seem to be going up, an analyst might realize that sales always go up during that particular part of the year (i.e., seasonality). So what the analyst might have initially thought of as an actual trend was explained away by seasonality. This decomposition of time series data is essential in knowing whether any patterns in the data are true trends that should be recognized. Principal Component Analysis (PCA) - Answer This technique attempts to group variables into meaning groups Principal Component Analysis Approach - Answer an analyst attempts to find out if the variables themselves group in any meaningful way. Often, PCA is used in data reduction Principal Component Analysis Example - Answer suppose a business gives a survey with 20 questions on customer service. The analyst might not want to analyze each individual question but rather consider the overall sentiment of the survey. The analyst might perform a PCA and see that the 20 questions group into 3 groupings (e.g., ease of effort, issue resolution, and respect). The analyst then could collapse the 20 data outcomes to just 3 and analyze these instead. Structured Data - Answer Key Characteristic: Often numbers or labels, stored in a structured framework of columns and rows relating to pre-set parameters. Typical File Types: Databases Unstructured - Answer Key Characteristic: Text-heavy information that's not organized in a clearly defined framework or model. Typical File Types: Audio, Video, Image data, Natural Language, Documents Semi-structured - Answer Key Characteristic: Loosely organized into categories using meta tags. Typical File Types: JSON, XML, Email, Web pages JSON - Answer JavaScript Object Notation. This file type offers a format that humans can easily read/write, and machines to easily parse/generate. This type can be very beneficial when pulling data from an external source such as a web-server. Since the JSON format is text only, it can easily be sent to and from a server, and used as a data format by any programming language. Since it is somewhat structured (semi-structured) it can easily be ingested into a structured database. For example, a business might use a Twitter API to pull in Twitter data in JSON format on their company to be stored in a database for later analysis. SQL - Answer a language that allows an analyst to interface with relational databases. A relational database is a collection of data items with pre-defined relationships between them. These items are organized as a set of tables with columns and rows. Tables are used to hold information about the objects to be represented in the database. The relationship between the tables are connected via keys. Keys are very important part of relational database model. They are used to establish and identify relationships between tables and also to uniquely identify any record or row of data inside a table. A key can be a single attribute or a group of attributes, where the combination may act as a key. In the following visual, the database has three relational tables dealing with classes the college students take: student information (Student), course information (Courses), and what classes students take (Takes_Course). For these tables, the student ID (ID#) and class ID (ClassID) are the keys that can connect the tables in a query. Tableau - Answer help anyone see and understand their data via visual dashboards. An analyst can connect to almost any database, drag and drop to create visualizations, and share their creations with other stakeholders easily. Tableau dashboards allow stakeholders an opportunity to interact with the data via filters, parameters, and so on. The following is an example of a Tableau dashboard showing sales data around the states in the USA, along with a time series graph. R Objective - Answer Data analysis and statistical modeling R Workability - Answer Consists of many to use packages R Integration - Answer Locally run programs R Database Handling Capacity - Answer Poses problems handling very large datasets R IDE - Answer RStudio, R GUI R Essential Packages and Libraries - Answer ggplot2, tidyverse, caret Python Objective - Answer Data Science (Deep Learning/AI), Web Development, Embedded System Python Workability - Answer Can easily perform matrix computation as well as optimization Python Integration - Answer Programs integrated with web-app for easy deployment Python database handling capacity - Answer very large datasetsCan handle very large data easier Python IDE - Answer Spyder, iPython, Jupyter Notebook Python Essential Packages and Libraries - Answer Numpy, pandas, scipy, scikit-learn, TensorFlow Data scientists are able to find ___, ____, and ___ in unstructured data. - Answer order, meaning, and value What is involved in the planning phase? - Answer 1. Defining goals 2. Organizing resources 3. Coordinate people 4. Schedule project What is involved in the wrangling phase? - Answer 5. Get data 6. Clean data 7. Explore data 8. Refine data What is involved in the Modeling phase? - Answer 9. Create model 10. Validate model 11. Evaluate model 12. Refine model What is involved in the Applying phase? - Answer 13. Present model 14. Deploy model 15. Revisit model 16. Archive assets _____ are programming languages that are very frequently used for data manipulation and modeling. - Answer Python or R ____ are general-purpose languages that are used for the back end, the foundational elementsterm-26 of data science, and they provide maximum speed. - Answer C, and C++, and Java ____ is a language for working with relational databases to do queries and data manipulation. - Answer SQL What does SQL stand for? - Answer structured query language This is where you actually create the statistical model and you do the linear regression. You do the decision tree. You do the deep learning neural network. - Answer Modeling These are the developers, and the system architects, the people who focus on the hardware and the software that make data science possible - Answer Data engineers This is the phase of collecting data. - Answer Data acquisition Which phase? - Working with stakeholders to help them ask better questions so that both they and you understand the outcome. - Answer Discovery What are the 4 parts of data analytics cycle? - Answer Planning, Wrangling, Modeling and Applying This phase is also known as the discovery phase. During this phase, an analyst defines the major questions of interest that need to be answered, understand the needs of the stakeholders, and assess the resource constraints in the project. - Answer Business understanding ____ is the person who champions the vision of the project and has the authority to allocate resources. - Answer The project sponsor ___ is responsible for making sure things get done on time and within budget and removes roadblocks. - Answer Project manager ____ is when new requirements are added to the project that increases the time/resources needed to complete it. - Answer Scope creep What are the 3 types of analysis? - Answer Descriptive, Predictive, Prescriptive _____ describes the data that is present. Mean, Median, Mode, counting things. How many of each size and color of shirt were sold in the last month? Do we sell more shirts in the summer vs winter? - Answer Descriptive analysis _____ makes predictions about future state of business. Forecasting volumes for example. Based on last summer and winter, what will we sell next year? - Answer Predictive analytics _____ analysis with an end goal of making a recommendation. What colors and sizes of shirts should we sell to maximize profits? - Answer Prescriptive analytics ____ is just looking at any variable over time - Answer Time series analysis ______ is a programing language that is specific to statistics. It also has capabilities to visualize data. - Answer R _____ is a multipurpose programing language that has libraries that extend its capabilities to do statistical analysis. - Answer Python _____ are platforms that specialize in visualization. This is where you can make graphs and charts for presentations and data storytelling to executive leaders. - Answer Tableau and Power BI ____ are instant messaging platforms that facilitate in a faster, but less formal, way than email. - Answer Teams, Slack An European union law regulating their citizens must have informed consent and ability to request or delete their own data that you collect. - Answer GDPR When the researching organization consciously ignores data that calls their results into question or only presents one side of the results that puts them in a positive light. - Answer Conflict of interest Sometimes data might not be available and the analyst will use tools such as web scraping or surveys to acquire it during which phase? - Answer Data aquisition The ____ states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger (if you were to take 50 people out of that population and get the mean, then take another 50 random people and get their mean age, and so forth, all of those means would follow the normal distribution (bell curve)). - Answer Central Limit Theorem In this phase, the analyst begins to understand the basic nature of data and the relationships within it. This phase often relies on the use of data visualization tools and numerical summaries, such as measures of central tendency and variability. - Answer Data Exploration ____ enables an analyst to move beyond describing the data to creating models that enable predicting outcomes of interest. - Answer Predictive Modeling Tools such as _____ play an important role in automating the training and using of models. - Answer Python and R In this phase, an analyst tells the story of the data and uses graphs or interactive dashboards to inform others of the findings from the analyses. - Answer Reporting and Visualization Even if you have a wide spread of a variable, let's say, age in a population, and you take lots of sample groups, the mean age of those sample groups would tend to have a normal distribution. - Answer Central Limit theorem This is the phase of collecting data. Frequently, data will be retrieved from a database, perhaps a component of a data warehouse, by using a language like SQL. - Answer Data Acquisition "Collect the data" is synonymous with ______ - Answer data acquisition Exploring the data could be seen either in "____" or "____" - Answer Prepare the data Create a model Predictive or data mining models could be considered in the "____" grouping. - Answer Create a model

Show more Read less
Institution
WGU D204
Course
WGU D204











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
WGU D204
Course
WGU D204

Document information

Uploaded on
July 22, 2023
Number of pages
110
Written in
2022/2023
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

  • wgu d204

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
Exampool NURSING
View profile
Follow You need to be logged in order to follow users or courses
Sold
219
Member since
2 year
Number of followers
147
Documents
3591
Last sold
1 month ago
Power-horse Library

HELLO Dear ones On this Account you will find all Study related Materials ,EXAMS, STUDY GUIDES, CASES,NOTEBOOKS and many more. well come as we study for Excellency.

3.9

32 reviews

5
16
4
2
3
10
2
2
1
2

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions