Summary Week 10
Follow the instructions in the book in Chapter 4 to read in the okc data. Explain what each element in the code is doing. Glimpse() the data. ANSWER THESE QUESTIONS; 1.What is the data type of the essay fields? 2.What do they contain? 3.How do you know the data is in Spark? Continue to use the code in Chapter 4 to add a response variable. ANSWER THESE QUESTIONS: 4. What is the response variable added? 5. Explain how the response variable is aggregated. 6. What is the tally? Continue to use the code in Chapter 4 to create training and testing sets. ANSWER THESE QUESTIONS: 7. What percentage of the data is in the training set? 8. What is the purpose of the 'seed'? 9. What is the distribution of the response variable in the training set? 10. What does the 'mutate' function do in this line of code: mutate(frac =n/sum(n))? 11. What is the mean income in each set? Create a histogram of income for each set.
Connected book
Written for
- Institution
- Big Data Tools & Architecture
- Course
- Big Data Tools & Architecture
Document information
- Summarized whole book?
- No
- Which chapters are summarized?
- Unknown
- Uploaded on
- July 15, 2023
- File latest updated on
- February 22, 2024
- Number of pages
- 3
- Written in
- 2022/2023
- Type
- Summary