Questions and CORRECT Answers
Which Pandas function from the options given below can read a dataset from a large text file? -
CORRECT ANSWER - read_csv
Which among the following options can be used to create a Data Frame in Pandas? - CORRECT
ANSWER - Numpy array, dictionary, list
A Pandas Series is a one-dimensional array which is labeled and can hold any data type. -
CORRECT ANSWER - True
If custom row indexes and customer column titles are not provided the index and column titles
will default to "blanks" or Null values. - CORRECT ANSWER - False
To retrieve values from a Pandas Data Frame we can reference rows and columns by name or by
index value. - CORRECT ANSWER - True
Pandas does not have it's own built in Statistical functions and we have to use other libraries or
modules to perform statistical analysis. - CORRECT ANSWER - False
In pandas functions or methods such as drop(), sort_values(), mean() etc. you can use a keyword
argument axis to specify the operations to be on rows or columns, which of the following are true
relating to the axis argument? Select ALL CORRECT ones. - CORRECT ANSWER -
axis=0 in drop() drops rows, if axis is not specified, it is equivalent to axis = 0
Unlike a Pandas DataFrame a Pandas Series cannot use strings as the row index - CORRECT
ANSWER - False
How is missing values represented in Pandas dataframes and series? - CORRECT
ANSWER - np.nan or nan (Sometimes printed out as NaN)
, Which of the following is true about value_counts(). - CORRECT ANSWER - It is a
method for series, It returns the frequency of unique values
Which of the following code will return only the columns of Age and Fare from the titanic_df
dataframe? Select ALL correct answers. - CORRECT ANSWER - titanic_df[['Age',
'Fare']], titanic_df.iloc[:, [5, 9]]
Statistics such as mean, std, etc do not make sense for these columns. Which of the following
columns contain categorical data? - CORRECT ANSWER - Survived, Pclass
To remove the "Name" and "Sex" columns completely from titanic_df (meaning that the
dataframe itself is changed), what should be done to the arguments in the drop() method? -
CORRECT ANSWER - "Name" and "Sex" must be passed in as a list, axis must be set to
be 1, inplace must be set to be True
To access elements or subsets of a dataframe, we learned to use index operator [], loc[] operator
and iloc[] operator. Which of the following can be used to retrieve a row from a dataframe? Be
careful answering this question, it is best to test it out in Jupyter notebook. Select ALL
CORRECT answers. - CORRECT ANSWER - loc[] with the row index value, iloc[] with
the positional integer starting with 0, [] with the positional integer starting with 0
In pandas functions or methods such as dropna(), fillna(), replace() etc. you can use a keyword
argument inplace. What of the following statement about this argument is INCORRECT? -
CORRECT ANSWER - The default value is True
You have a dataset with employee compensation information. One column has employee
department data. Employees are from 4 department: Sales, Marketing, Accounting and HR.
These 4 values are associated with employees. You know that the data is clean except that
department values are text. What is the most appropriate step of data cleaning related to this
column? - CORRECT ANSWER - Use pd.get_dummies() to convert this column into 4
dummy variables.