Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

WGU D205 - Data Acquisition Exam 2023 Solved 100%Correct

Rating
-
Sold
-
Pages
21
Grade
A+
Uploaded on
01-03-2023
Written in
2022/2023

Documenting ETL Requirements - ANSWER-Has three sections: Requirements Summary - what it does, system, and type and frequency Standard: architectural framework, strategy, design conventions, describe the operation flow at the high level and dependencies Job-Level Details: ind. job. Course Chapters - High level Objectives - ANSWER-1) Understanding and Describing Data 2) The Basics of SQL for Analytics 3) SQL for Data Preparation 4) Aggregate Functions for Data Analysis 5) Window Functions for Data Analysis 6) Importing and Exporting Data 7) Analytics Using Complex Data Types 8) Performant SQL 9) Using SQL to Uncover the Truth - A case Study Lesson 1: Understanding and Describing Data - Objectives - ANSWER-By the end of this lesson, you will be able to: a) Explain data and its types b) Classify data based on its characteristics c) Calculate basic univariate statistics about data d) Identify outliers e) Use bivariate analysis to understand the relationship between two variables 1. Define Data: - ANSWER-Recorded measurements of something in the real world; a unit of observation. Two types: Qualitative and Quantitative. 1. Quantitative Data: - ANSWER-Data that can be described as a number: height, weight, quantity, dates (attributes and properties of a thing). Results are based on objective measures; it can be assigning to monetary value There are two types: Discrete and Continuous. 1. Quantitative Data: Discrete - ANSWER-Values that take a fixed level of precision: usually integers. Number of cats: 6 1.- Quantitative Data: Continuous - ANSWER-Any value that can be divided with an arbitrary amount of precision. Using "cats" it could be "I have 6 cats that vary in weight from 6lbs-16lbs." If a value can be described with a higher precision it is generally considered continuous. 1. Qualitative Data - ANSWER-Data results based on subjective measures and rank the data based on experience, intuition or a specific scenario. It can also be data organized by attributes. Continuing with the cats: you can group their weight by attributes: small (6-9), medium (9.1-12) or large (12.1-16). ("Use your words") 1. What is information? - ANSWER-Simply: patterns in data. You can use these patterns to make predictions, identify changes. In other words: the start of Data Analytics. 1. What is Knowledge? - ANSWER-A large organized collection of persistent and extensive information and experience that is used to predict and describe phenomena. Combined with information it becomes the Data Analytics process. 1. Data Analytics Process: - ANSWER-The process of gathering information from data and creating knowledge; which is then used to make predictions. We use statistics to make sense of data. 1. Statistics - ANSWER-The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. Two types: descriptive and inferential. 1. Statistics: Descriptive - ANSWER-Used to describe data. When it is on a single variable: univariate; two or more variables: multivariate. 1. Statistics: Inferential - ANSWER-Think of datasets as a sample, or a small portion of measurement from a larger group or population. Recent elections: pollsters: they take a small group of people's responses to infer election results. Unit of observation - ANSWER-The kinds of objects from which evidence is collected. Can you convert Quantitative to Qualitative and vice versa? - ANSWER-Yes. qualitative may be makes of cars - you can covert (or map) the make to a number to convert into quantitative. 1. Bivariate Analysis - ANSWER-Analysis that only includes 2 variables. The purpose is to determine empirical relationships between them. 1. Distribution - ANSWER-Mathematical description of a random phenomenon in terms of its sample space and the probabilities of events. 1. Univariate Analysis - ANSWER-One of the main branches of statistics. Helps us understand a single variable in a data set. Analysis techniques may include: * Data Frequency Distribution * Central Tendency * Dispersion 1. Data Frequency Distribution. - ANSWER-A count of the number of values that are in a dataset Methods: histogram. Analysis Technique for Univariate Analysis. 1. Normal Distribuition - ANSWER-Symmetric Bell Shape Curve. Often found in many datasets. Analysis Technique for Univariate Analysis. 1. Quantiles - ANSWER-In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. N-Quantiles Expressed as n-1 points used to divide a variable into n groups. A 4 quantile (or a quartile) is a group of 3 points that divide the variable into 4 equal groups of numbers (n-1; 4-1=3). Analysis Technique for Univariate Analysis. 1. Central Tendency - ANSWER-Helps answer what the typical value for a variable. Calculations can include; mean, median and mode. Analysis Technique for Univariate Analysis. 1. Central Tendency: Mode - ANSWER-The value that come up the most frequently. Issues: when you have multiple values tied for the most common it is referred to as multimodal. If no value is repeated - there is no mode. Useful: variable takes on a small , fixed number of values. Not useful: when it is a continuous quantitative variable (use another method) Analysis Technique for Univariate Analysis. 1.- Central Tendency: Mean - ANSWER-Simply the average. Good for describing "typical" values. Very sensitive to outliers - it will skew the dataset (sometimes making the "average" no longer representative) Analysis Technique for Univariate Analysis. 1.- Central Tendency: Median - ANSWER-Also called the 2nd Quartile and the fiftieth percentile. Where you take the dataset, sort the values and determine the middle number. (if data set is an even count - take the average of the two middle numbers). When you have outliers, the Median works better than the mean. Good practice: calculate both the mean and median - if they are significantly different - then you probably have an outlier. Analysis Technique for Univariate Analysis. 1. Dispersion - ANSWER-Used to describe how the data is spread out. Methods: Range; Standard Deviation/Variance, Interquartile Range (IQR). Analysis Technique for Univariate Analysis. 1. - Dispersion: Range - ANSWER-Difference between the highest and lowest number. Susceptible to outliers and does not provide information about the spread of the values in the middle of the data set. Analysis Technique for Univariate Analysis. 1.- Dispersion: Standard Deviation/Variance - ANSWER-Uses a very large formula to calculate. The square root of the average of the squared difference between each data point and the mean. You can get a result from zero all the way to positive infinity. The closer to zero the less the numbers in the dataset vary. If zero: dataset value are all the same. There are two formulas to use: for population and observation. The standard deviation is generally the quantity used most often to describe dispersion. It is affected by outliers, however not as extremely as range. Standard deviation is simply the square root of the average of the squared difference between each data point and the mean. The value of standard deviation ranges from 0 to positive infinity. The closer the standard deviation is to 0, the less the numbers in the dataset vary. If the standard deviation is 0, this means that all the values for a dataset variable are the same. Analysis Technique for Univariate Analysis. 1. Dispersion: Interquartile Range (IQR) - ANSWER-It's the difference between the first quartile (lower quartile) and the third quartile (upper quartile). Often used to define outliers. Large differences in values between Q1 and Q3 represent the outlier. Analysis Technique for Univariate Analysis. 1. Bivariate Analysis: Scatterplots. - ANSWER-Useful when there are a small number of points (30-500). If sample data set is large take random sample of 200. Useful for detecting patterns and seeing changes in the data. So how do they interact? You can see if one goes up does the other go up or down; is there periodicity (cyclical behavior). Are there outliers (a dot far removed). Is the trend strong or weak? Methods: Pearson Correlation Coefficient (linear trend. Ordinary Least Square: line calculated to see the trends - OOS for class). See my excel example. This will be covered in Lesson 3 using SQL. 1. Bivariate Analysis: Scatterplots: Pearson Correlation Coefficient - ANSWER-Interpreting results: closer to zero the weaker the correlation. Can be positive or negative. Be on the lookout for non-linear relationships: Pearson may give you a weak indication, however a "U" Shaped Graph may tell you otherwise. Furthermore you need enough data sets. Two points will have a perfectly straight line - however it is not enough to indicate correlation. Also: with small data sets - remove the outliers. Another issues: fallacy of correlation implying causation. Just because there is a strong correlation it does not mean that X causes Y. 1. Bivariate Analysis: Time Series - ANSWER-One of the most important types. X coordinate is time (either date or time). Helps business understand how things change over time and provide business context. Working with Missing Data - ANSWER-1) Delete Rows (when it's 5%) 2) Mean/Median/Mode Imputation (5-25% is missing) impute the value - it may provide a small bias, but at least you are not deleting data. 3) Regression Imputation: hard to do 4) Deleting Variables: it it's a lot - delete the variable 1- Statistical Significance Testing - ANSWER-Used to compare the statistical properties two groups; or one group before and after change. Think A/B testing it helps determine which is better and why. Parts: 1) Test Statistic we are examining 2) Null Hypothesis (results are due to chance) 3) Alternative Hypothesis (results cannot be explained by chance alone) 4) Significance Level (value the test statistic needs to take before it is decided that the null hypothesis cannot explain the difference). 1. Common Statistical Significance Test - ANSWER-1) Two-sample Z-test: A test to determine whether the averages of the two samples are different. This test assumes that both samples are drawn from a normal distribution with a known population standard deviation. 2) Two-sample T-test: A test to determine whether the average of two samples is different when either the sample set is too small (that is, fewer than 30 data points per sample), or if the population standard deviation is unknown. The two samples are also generally drawn from distributions assumed to be normal. 3) Pearson's Chi-Squared Test: A test to determine whether the distribution of data points to categories is different than what would be expected due to chance. This is the primary test for determining whether the proportions in tests, such as those in an A/B test, are beyond what would be expected from chance. Entity Relationship (ER) Modeling - ANSWER-Describes interrelated things of interest in a specific domain of knowledge. A basic ER Model is composed of entity types, and specifies relationships that can exist between entities. Benefits: relatively easy to understand; hide/expose details (drill down); and maps to relational database design. Risks: wrong responsibility(SME and IT disconnect); incomplete (can't find everything); different notations (census vs. census!) (ER) Entity - ANSWER-Objects, persons, events of abstractions. Relevant in the context of the data application. Individual instance = entity instance or instance. Entity types: class of objects, same characteristics. Entity Instances: Britt, Eric, Sharon. Entity Type: SET group under CFO (ER) Attributes - ANSWER-Instance Level: a fact about an entity occurrence Abstract Level: a class of facts about instances of an entity type. Key Attributes: a primary key/unique number. You can have one, or many (composite key - think HelperPK) (ER) Relationships - ANSWER-How one table relates to another. It is the class of facts that associate an instance of an entity type with another instance of an entity type. There is Cardinality: it is a restriction on how often an entity occurrence may participate in the collection effects represented by the relationship One to Many relationships (parent with multiple child). In video: Dot is parent, dashed line connects, child has no dot although it may have a diamond. Foreign Key: set of attributes in a table that refers to the primary key of another table. Minimum Cardinality: zero or one (optional or mandatory) Relationship readings: verb phrase of facts contains/part of/plays in/works for (ER) Identifying Relationships: Foreign Key Attribute - ANSWER-Part of child entity types: a) Cardinality for parent: same as normal relationship b) Cardinality for child: never optional. Entity Types: a) Child in identifying relationship: weak (rounded corners) b) Other: strong (square corners) (ER) Other Special Relationships - ANSWER-Can be more than one relationship between entities. Sharon works for BA and also manages dept. BA in the one to one. You can have many to many. Many employees work on a project; multiple projects are worked by many employees: solid line: to end dots. Recursive: relates to instances of the same entity type. Up to here we have seen Binary (arity 2) - relation between 2 entity types. Higher arity: associations between 3 entity types. This uses Ternary (arity 3) relationships: between 3. (ER) Arity - ANSWER-"order" Generally 2; sometimes 3 RARELY more. Some models do not support an Arity 3 so you have to do something called nominalization (objectizing).: transform relationship into entity type. (ER) Subtypes (category/Specialization) - ANSWER-Well defined subset of the occurrence of another entity type. The other entity types is called supertype, generic entity or generalization. Every member is a volunteer, but not all volunteers are members. Symbol is a circle with one line below: may be or a double line: complete set of categories: must be Fundamentals of RDBMS (relational database models) - ANSWER-Client/Server System: clients: the computer. Server: stores the files. When a server stores database and files is is a database server. Networks: communication links between client and service. LAN: local area network: client/server in same location WAN: wide area network: client/server in different geographical locations.

Show more Read less
Institution
WGU D205
Course
WGU D205










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
WGU D205
Course
WGU D205

Document information

Uploaded on
March 1, 2023
Number of pages
21
Written in
2022/2023
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

  • wgu d205
  • system
  • strateg
$12.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
LUCKYSTAR2022 West Virginia University
View profile
Follow You need to be logged in order to follow users or courses
Sold
912
Member since
4 year
Number of followers
724
Documents
9627
Last sold
6 days ago
LUCKYSTAR2022

Hi there well come to luckystar2022. Here you will find guaranteed quality solution for Nursing and any other Accademic related notes, exams, study guides, cases and many more. 100% value for your time and money. GOOD LUCK

3.5

156 reviews

5
62
4
26
3
25
2
8
1
35

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions