SOLUTIONS GRADED A+
✔✔Imagine we have a dictionary like the 1 below.
data = {"Ones" : [1, 2, 3, 4],
"Tens" : [10, 20, 30, 40]
"Hundreds" : [100, 200, 300, 400]
"Thousands" : [1000, 2000, 3000, 4000]
"TenThousands" : [10000, 20000, 30000, 40000]}
import pandas as pd
data = {'name' : ["Jim", "Joel", "John", "Jack"],
'age' : ["35", "90", "27", "16"]}
df = pd.DataFrame.from_dict(data)
df
If we converted this dictionary into a dataframe using that same procedure, how many
rows and columns would you have? - ✔✔4 rows, 5 columns (if you don't count the index
column)
✔✔Sometimes data needs to be combined. It could be for reasons such as similarities
in the data, 2 separate correlated datasets, or numerous other reasons.
How would we merge the following data frame left aka merge left or a left merge?
df1 = pd.DataFrame({'key' : ['A', 'B', 'C', 'D'], 'value1' : ['1', '2', '3', '4'})
df2 = pd.DataFrame({'key' : ['B', 'D', 'E', 'F'], 'value1' : ['10', '20', '30', '40'}) - ✔✔result =
pd.merge(df1, df2, how = 'left')
✔✔First create a simple dataframe using the code below:
import pandas as pd
#create a list of lists
data = [['A1', 2, 4, 8], ['A2', 3, 7, 17], ['A3', 1, None, 7], ['A4', 989, 186, 3698], ['A5', 0, 0,
None]
#Create the pandas dataframe
df= pd.DataFrame(data, columns = ['id', 'value1', 'value2', 'value3'])
df
If you ran
df = df.dropna()
df
How many rows would remain in the dataframe? - ✔✔3
, ✔✔What package have we been using to import our data, and what is the abbreviation
we've been using?
import PACKAGE as ABBREVIATION - ✔✔pandas, pd
✔✔Assume the data file is in the same folder and location as your code. In other words,
you do not need to create a relative path or any path for that matter. You just need the
name of the file. We will not use a path variable, just the name of the file. Note that in
Week 3 we did this with the "segments.csv" dataset, so you could refer to that example.
homes = - ✔✔homes = pd.read_csv("homes.csv")
✔✔What are 3 of the ways we have explored a dataset in the course videos? -
✔✔.info(), .head(), .describe()
✔✔Missing values can be imputed/replaced with other values. If my dataset has 1000
rows, and 200 missing values for the category age. What could I impute for age?
(THIS QUESTION IS NOT ASKING WHAT YOU SHOULD USE, BUT WHAT YOU
COULD USE) - ✔✔Impute the most common value;
Impute the mean;
Impute the median
✔✔Sometimes when working with (struggling with!) missing values, you find that it is not
missing at all! Sometimes someone has been helpful (!) and entered some placeholder
value like 'THISISMISSING' what that happens, what could you do? (Choose the best
answer, it won't be the only possible answer) - ✔✔Replace 'THISISMISSING' with a
missing value (np.NaN)
✔✔Assume the test data ('test.csv') is loaded into your environment and named 'df'.
df = pd.read_csv('/data/test.csv')
What would produce the below result?
Rows 0-6 - ✔✔df.head(6)
✔✔Let's take a look at the homes dataset.
What is the mean Acreage of the homes?
How many records (count of rows) does the dataset have? - ✔✔0.267 (use
homes.mean())
7478 (use homes.describe())
✔✔I think we are mostly interested in building out a model to predict home sales price.
Without getting into the model, I know I will need some values related to home price.