UPDATED Exam Questions and CORECT
Answers
String Variable - CORRECT ANSWER - A variable that stores a string (text) value. It is
the most complex type of variable, and can be transformed for various purposes such as
sentiment analysis.
Dates-Times Variables - CORRECT ANSWER - Seem simple but can be complex.
Important for almost every analysis and come in different formats.
Binary Variable - CORRECT ANSWER - A variable having only two possible states (e.g.,
1 or 0, yes or no, male or female, domestic or foreign)
Factors Variable - CORRECT ANSWER - Categorical variables that have a fixed and
unknown set of possible values. Useful for displaying character vectors in a non-alphabetical
order. Much easier to work with than strings.
Filtering Observations - CORRECT ANSWER - Allows you to subset observations based
on their values. It is useful for looking at specific types of observations. Often involved a
comparison operator, especially if the filter is based on a numeric type of variable (>=, <, >, etc.)
Reordering Rows - CORRECT ANSWER - Equivalent to ORDER BY in SQL and sorting
in Excel. It is helpful for quickly looking at extremes or organizing the data in alphabetical order.
Can be used together with filtering observations.
Selecting Variables - CORRECT ANSWER - It is not uncommon to get datasets with
hundreds or even thousands of variables. If that is the case, the first challenge will be narrowing
in on the variables you are truly interested in, which you can do by selecting them. You can use
helper functions to make selecting variables easier (e.g. starts_with, ends_with, contains, etc.)
, Creating New Variables - CORRECT ANSWER - Besides selecting existing variables, it
is useful to add new variables that are functions of existing ones. Often used with other
manipulation functions.
E.g.: A store manager may want to see the average price of products sold, only if these products
belong to a specific product type.
Summarizing Values - CORRECT ANSWER - Often used for numeric variables and with
a GROUP BY, which changes the unit of analysis from the complete data to specific groups.
Can involve different computations: mean, median, minimum, maximum, etc.
What package is used to import CSV files? - CORRECT ANSWER - readr package
What package is used to import Excel files? - CORRECT ANSWER - readxl package
Tibbles - CORRECT ANSWER - data frames with some changed aspects that make it
easier for them to work more effectively in the tidyverse.
What package from Tidyverse enables the transformation of data? - CORRECT
ANSWER - Dplyr
Which function allows a user to look at a subset of the rows? - CORRECT ANSWER -
filter()
Which function allows a user to reorder rows? - CORRECT ANSWER - arrange()
Which function allows a user to rename variables? - CORRECT ANSWER - rename()
Which function creates new variables? - CORRECT ANSWER - mutate()