100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

BUSN 5000 Midterm Exam | Verified Answers, Complete Solutions

Rating
-
Sold
-
Pages
17
Grade
A
Uploaded on
11-09-2025
Written in
2025/2026

BUSN 5000 Midterm Exam | Verified Answers, Complete Solutions The term data is (singular/plural) _____. Plural A data set is made up of _____ that contain information on a specific entity. Records Each record is made of _____ that contain measurements of known types. Fields A data table is made up of rows containing _____ and columns containing _____. Observations, variables We say that data are tidy if each variable corresponds to a _____, each row an _____, and each cell a _____. column, observation, single value A quick-serve restaurant chain records sales, staffing and customer traffic every day for each store. You recognize this as a _____ data set where the unit of observation is the store-day. Panel We distinguish 4 stages of data analysis and refer to them compactly as _____ (in all caps). ATAC Name the stages of ATAC acquisition, transformation, analysis, communication The second stage involves, among other things, making sure the data are _____ (as the Posit folks would say). Tidy In the third stage, the workhorse will be the _____. CEF A variable will not have _____ if it does not measure what it is supposed to. Validity How to handle missing data depends on whether they are missing _____. Endogenously A national company has developed a new product and is offering it for sale at a discount to introduce it to the market. Randomly surveying customer who purchased the product in the initial discount period (would/would not) ______ generate a sample representing the population of typical customers. Would not It is advisable to _____ the acquisition, transformation and analysis tasks. Separate One reason reproducibility matters is to protect and support your _____ self. Future Another reason reproducibility matters to guard against _____ and _____. error, fraud One important component is describing the exact _____ of your raw input data. Source You should view a reproducible analysis as a _____ that you should be able to produce again and again. Product A _____ is a representation of the data structure comprising all of the attributes of the data and their types. data schema This representation of the data structure identifies the _____ to which each observation pertains. unit of record This representation of the data also makes clear what are the _____ that identify an observation. key variables A terabyte is equal to a _____ bytes. 1 trillion R stores real numbers as a _____ data type and allocates _____ bytes of data to each number. numeric, 8 A megabyte can store _____ num values, while a terabye can store roughly a _____ times that. 131072, 1 million First, use the library() and data() functions to load the wooldridge package and card data set. library(wooldridge) data(card) Card obtained the data from the _____. NLSYM The source of Card's data is a survey that began in _____ with _____ young men age 14-24. 1966, 5525 The same young men were surveyed again in selected years through _____ , effectively creating a _____ data set where the unit of observation is the person- _____ . 1981, panel, year The survey was not a random sample of the US population because men from neighborhoods with a high concentration of _____ residents were over-sampled. Black Card's analysis is based on the 1976 survey when the youngest respondents are _____. By 1976, attrition had reduced the sample size to _____ observations. After filtering the sample on observations with valid education and wage data, Card is left with an analysis sample of _____ young men. 24, 3694, 3010 The key variable in the data set is _____. Id The wage variable is measured in _____. The lwage variable is the _____ transformation of wage. cents, log The variable expert measures labor-market experience as ______. age - educ – 6 The str() function, which provides an overview of the data type, size, and content in a data set. Apply it to determine the structure of the card data set and answer the questions that follow. str(card) The card data set contains _____ observations and _____ variables. 3010, 34 What data type is lwage? _____. How about wage? ______. (Use the full-name description of the data type in your answers.) numeric, integer The third person in the data set is _____ years old, has _____ years of education, has _____ years of experience, and reported a wage of $ ______ . 34, 12, 16, 7.21 The skim() function provided by the skimr package is another useful tool for data documentation Load skimr via a library() command and then "skim" the card data. Answer a few more questions based on the skim() output. library(skimr) skim(card) How many variables have missing data? _____ . 6 What percentage of young men in the sample are missing IQ test scores? _____ % . (Answer to 1 decimal place, for example: ``99.9'' percent) 31.5 What percentage of the sample are Black? _____ %. Is that representative of the US population in 1976? (Yes/No) _____ . 23, No Finally, use the function to estimate the amount of memory allocated to store the Card data. (card) Based on the Card data take up _____ MB in memory. (Round to 3 digits) 0.438 The key idea behind _____ is that one draw from a population does not depend on another. random sampling Because we generally do not know the underlying data-generating process, we try to ______ it from the data we observe. Infer The frequentist approach to probability defines the probability of some event A as the number of times it occurs out of an _____ number of random trials. Infinite This idea of relative frequency converging to the true probability is an example of the _____. law of large numbers Because earnings distributions tend to be _____ right, the _______ distribution if often a good model for earnings data. skewed, lognormal The _____ is the thing we want to learn about. An _____ is the thing we compute to learn about it, which for a given set of data, gives us an _____ . estimand, estimator, estimate If E(estimator) equals the thing we want to learn about, we say that it is _____ Unbiased Sample selection may be a source of _____ if the data we have does not represent the population we want to learn about. Bias The natural log function is the inverse of the _____ function. Exponential The log of earnings is undefined if earnings equal _____. 0 Log transformations help us talk about _____ differences or changes. Percentage Comparing the earnings of women and men involves estimating the _____ expectation of _____ given _____. conditional, earnings, gender The concept of a random variable's expected value is a _____ average of all the random variable's possible _____. weighted, outcomes Because we rarely know a random variable's distribution, we typically _____ its expected value using its _____ average. estimate, sample The expected value of an indicator variable that takes on the values 1 and 0 is equivalent to the _____ the random variable equals _____. probability, 1 The _____ says that the expected value of the CEF of, say, Y given X, is the expected value of Y. law of iterated expectations The extract was filtered to include individuals who had worked at least _____ hours per week and _____ weeks during the past year. 36, 48 The filtered extract contains _____ observations on _____ variables. 50742, 12 The variable age is top-coded at ______. 85 There are _____ categories in the variable race with individuals identifying as Asian only assigned the value _____. 21, 4 Using the 2009 CPS data we find, men are almost _____ percentage points more likely to earn at least $100,000. 9 On average, we found that men earned roughly $_____ (round to the nearest thousand) more than women, which translates into about a ______ percentage (round to the nearest integer) earnings gap 19000, 43 First, we need to load the data. Because the March 2009 extract is contained in an .xlsx file, we will the read_xlsx function provided by the readxl package. Load the package and read the file, note the file name is . library(readxl) cps09mar <- read_xlsx("./data/") Now, use this sample code to complete the filtering operation to create a new data set called cps09mar_2534 containing only the younger workers: cps09mar_2534 <- _____ ______ ______(age <= ______, age >= ______) cps09mar %>%, filter, 34, 25 Use this sample code to complete the mutating operation that will create a new gender variable that takes on the values Female and Male and adds it to cps09mar_2534: _____ <- _____ %>% mutate(_____ = case_when(_____ == 1 ~ "Female", _____ == 0 ~ "Male")) cps09mar_2534, gender, female, female First, let's replicate the earnings distributions shown in Figure 4 of the deck for 25-34 year-olds. Here is all the code you need. Just fill in the blank with the earnings distribution object name and see what you get: earnings_dist_fvm <- ggplot(cps09mar_2534, aes(x=earnings, group = gender, fill = gender)) + geom_density(adjust=1.5, alpha = 0.4) + labs(title="Distribution of earnings by gender") __________ earnings_dist_fvm Compared with the earnings distributions in Figure 4, these show (more/less) _____ overlap. More Next, let's compute the gap in average earnings in dollar and percentage terms. Use code on slide 49 to carry out this calculation: earnings_bar <- cps09mar_2534%>% _____(_____) %>% _____ (_____ = _____(_____)) earnings_bar group_by(gender), summarise(avg_earnings= mean(earnings)) The average gender earnings gap among 25-34 year-olds is $______ (round to the nearest hundred), which is about $_____ smaller than the gap for all workers. (Use your first answer to Part B Q6.) 10100, 8900 The dollar gap among 25-34 year-olds translates into a ______ percent (round to the nearest integer), which is about _____ percentage points smaller than the gap for all workers. 26, 17 We'll finish this exercise by calculating the gender gap in the likelihood of earning six figures among 25-34 year-olds. Use the code on slide 38 to complete this code chunk: six_figs_fvm <- _____ %>% _____(_____) %>% _____(six_figs_shrs = mean(earnings >= _____)) print(_____) cps09mar_2534, group_by(gender), summarise, 100000, six_figs_fvm For 25-34 year-olds, the gender gap in the likelihood of earning at least $100,000 is only _____ percentage points. Answer to 2 decimal places, an example answer would read ''1.12'' percentage points. 3.54 Average earnings for men were $______ (round to the nearest integer), while average earnings for women were roughly $______ (round to the nearest 1,000) less. 64190, 19000 This average dollar difference translates into roughly a _____ (round to the nearest integer) percent earnings gap. 43 Based on Table 2, you would say male earnings increases are (more/less) ______ variable than female earnings. More Based on Figures 11 and 12, you would say male earnings increase (more/less) ______ rapidly than female earnings early in a career. More Based on Figure 12, in the first year of a career, male earnings increase _____ % on average while female increase by only _____ % (round to the nearest integer for both answers). 5, 3 Now we are ready to replicate the estimated CEFs for women and men using actual earnings. Remember, to talk in terms of career years, we "center" age on 23: cef_fvm <- cps_mar_2362 %>% mutate(age = age - ______) %>% # Center on age=23 group_by(______, _____) %>% summarise ( earnings = mean(earnings) ) 23, gender, age Now, plot the estimated CEFs just like in Figure 11, except the vertical axis should show actual dollar values: options(scipen=999) ggplot(cef_fvm, aes(_____, _____, color=_____)) + geom_point() + geom_line() + ylab("Average _____ by age") + labs(title="CEFs of _____ by gender") x=age, y=earnings, gender, earnings, earnings The CEF plots indicate that the gender earnings gap (grows/shrinks) _____ over a typical career. Grows We'll use filter to separate the male and female CEF estimates. Then, we'll compute the ratio, put it in a new data frame with age, and list its values: males <- _____(cef_fvm, _____ == "_____") females <- _____(cef_fvm, _____ == "_____") df_ratio <- (age = males$age, ratio = males$_____/females$_____) df_ratio filter, gender, "Male", filter, gender, "Female", earnings, earnings

Show more Read less
Institution
BUSN 5000
Course
BUSN 5000










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
BUSN 5000
Course
BUSN 5000

Document information

Uploaded on
September 11, 2025
Number of pages
17
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

BUSN 5000 Midterm Exam



The term data is (singular/plural) _____.
Plural

A data set is made up of _____ that contain information on a specific entity.
Records

Each record is made of _____ that contain measurements of known types.
Fields

A data table is made up of rows containing _____ and columns containing _____.
Observations, variables

We say that data are tidy if each variable corresponds to a _____, each row an _____,
and each cell a _____.
column, observation, single value

A quick-serve restaurant chain records sales, staffing and customer traffic every day for
each store. You recognize this as a _____ data set where the unit of observation is the
store-day.
Panel

We distinguish 4 stages of data analysis and refer to them compactly as _____ (in all
caps).
ATAC

Name the stages of ATAC
acquisition, transformation, analysis, communication

The second stage involves, among other things, making sure the data are _____ (as the
Posit folks would say).
Tidy

In the third stage, the workhorse will be the _____.
CEF

A variable will not have _____ if it does not measure what it is supposed to.
Validity

How to handle missing data depends on whether they are missing _____.
Endogenously

,A national company has developed a new product and is offering it for sale at a discount
to introduce it to the market. Randomly surveying customer who purchased the product
in the initial discount period (would/would not) ______ generate a sample representing
the population of typical customers.
Would not

It is advisable to _____ the acquisition, transformation and analysis tasks.
Separate

One reason reproducibility matters is to protect and support your _____ self.
Future

Another reason reproducibility matters to guard against _____ and _____.
error, fraud

One important component is describing the exact _____ of your raw input data.
Source

You should view a reproducible analysis as a _____ that you should be able to produce
again and again.
Product

A _____ is a representation of the data structure comprising all of the attributes of the
data and their types.
data schema

This representation of the data structure identifies the _____ to which each observation
pertains.
unit of record

This representation of the data also makes clear what are the _____ that identify an
observation.
key variables

A terabyte is equal to a _____ bytes.
1 trillion

R stores real numbers as a _____ data type and allocates _____ bytes of data to each
number.
numeric, 8

A megabyte can store _____ num values, while a terabye can store roughly a _____
times that.
131072, 1 million

, First, use the library() and data() functions to load the wooldridge package and card
data set.
library(wooldridge)
data(card)

Card obtained the data from the _____.
NLSYM

The source of Card's data is a survey that began in _____ with _____ young men age
14-24.
1966, 5525

The same young men were surveyed again in selected years through _____ , effectively
creating a _____ data set where the unit of observation is the person- _____ .
1981, panel, year

The survey was not a random sample of the US population because men from
neighborhoods with a high concentration of _____ residents were over-sampled.
Black

Card's analysis is based on the 1976 survey when the youngest respondents are
_____. By 1976, attrition had reduced the sample size to _____ observations. After
filtering the sample on observations with valid education and wage data, Card is left with
an analysis sample of _____ young men.
24, 3694, 3010

The key variable in the data set is _____.
Id

The wage variable is measured in _____. The lwage variable is the _____
transformation of wage.
cents, log

The variable expert measures labor-market experience as ______.
age - educ – 6

The str() function, which provides an overview of the data type, size, and content in a
data set. Apply it to determine the structure of the card data set and answer the
questions that follow.
str(card)

The card data set contains _____ observations and _____ variables.
3010, 34

What data type is lwage? _____. How about wage? ______. (Use the full-name
description of the data type in your answers.)

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
Jumuja Liberty University
Follow You need to be logged in order to follow users or courses
Sold
548
Member since
4 year
Number of followers
415
Documents
2654
Last sold
2 weeks ago

3,9

115 reviews

5
60
4
15
3
20
2
4
1
16

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions