SOLUTIONS!!
Header rows of a CSV should represent what type of data? Answer - Data that
changes over time.
What is PEAK data? Answer - Real stock data (formally known as HCP) from
Yahoo Finance.
Define "Close" for stock data. Answer - The actual price that was recorded at
the exchange for that day.
Define "Adjusted Close" for stock data. Answer - A number provided by the
data provider, adjusted for stock splits and dividend payments.
When will "Close" and "Adjusted Close" data be the same? Answer - For the
current day of data, we see these start to differ as we go back in time.
Who (and where) created the Pandas library? Answer - Wes McKinney at AQR
(hedge fund).
What is a key component of Pandas? Answer - The dataframe.
What function can be implemented from Pandas to read a CSV? Answer -
pd.read_csv("csv")
,How do you print the first five rows of a Pandas Dataframe? Answer - By using
the df.head() function.
How do you print the last five rows of a Pandas Dataframe? Answer - By using
the df.tail() function.
How would you get the rows between 10 and 20 for a Pandas Dataframe?
Answer - df[10:21]
What is the operation called for returning a subset of the rows from a Pandas
Dataframe? Answer - Slicing.
What function would you use to get the maximum value of a column in a
Pandas Dataframe? Answer - df['Column Name'].max()
What function would you use to get the mean value of a column in a Pandas
Dataframe? Answer - df['Column Name'].mean()
What library is used to plot data from a Pandas Dataframe? Answer -
matplotlib.pyplot
How would you use pyplot to plot data from a Pandas Dataframe? Answer -
Use .plot() on the columns you are interested in plotting (like df['Adj
Close'].plot()) and then calling plt.show()
What is one method you would use to plot two columns simultaneously from a
Pandas Dataframe? (Assume we're plotting 'Close' and 'Adj Close') Answer -
df[[['Close', 'Adj Close']]].plot(), plt.show()
How many days are typically traded at the NYSE? Answer - 252 days.
, What reference symbol is good to use to determine if there was trading on a
given day? Answer - SPY.
How would you create an empty dataframe indexed on a start and end date?
Answer - dates = pd.date_range(start_date, end_date), df =
pd.DataFrame(index=dates)
What type of join is performed by default when calling
df.join(some_dataframe)? Answer - A left join.
When using a join function between dataframe a and dataframe b in the
following format df_a.join(df_b), what happens to the rows not present in b
when joined with a? Answer - Pandas will fill these absent values with nan.
How do you join two DataFrames (df_1 and df_2)? Answer - df_1.join(df_2)
How would you drop the rows in a DataFrame that contain nan? Answer -
df.dropna()
How could you join two DataFrames and drop nan values in a single line?
Answer - df_1.join(df_2, how='inner')
How would you tell Pandas to index a DataFrame on dates when reading in a
CSV? Answer - pd.read_csv(csv, index_col='Date')
How would you rename a column in a DataFrame? Answer - df =
df.rename(columns={'old name' : 'new name'})