Summary Week 10
Follow the instructions in the book in Chapter 4 to read in the okc data. Explain what each element in the code is doing. Glimpse() the data. ANSWER THESE QUESTIONS; 1.What is the data type of the essay fields? 2.What do they contain? 3.How do you know the data is in Spark? Continue to use the code in Chapter 4 to add a response variable. ANSWER THESE QUESTIONS: 4. What is the response variable added? 5. Explain how the response variable is aggregated. 6. What is the tally? Continue to use the code in Chapter 4 to create training and testing sets. ANSWER THESE QUESTIONS: 7. What percentage of the data is in the training set? 8. What is the purpose of the 'seed'? 9. What is the distribution of the response variable in the training set? 10. What does the 'mutate' function do in this line of code: mutate(frac =n/sum(n))? 11. What is the mean income in each set? Create a histogram of income for each set.
Livre connecté
- 2019
- 9781492046325
- Inconnu
École, étude et sujet
- Établissement
- Big Data Tools & Architecture
- Cours
- Big Data Tools & Architecture
Infos sur le Document
- Livre entier ?
- Non
- Quels chapitres sont résumés ?
- Inconnu
- Publié le
- 15 juillet 2023
- Fichier mis à jour le
- 22 février 2024
- Nombre de pages
- 3
- Écrit en
- 2022/2023
- Type
- RESUME
Sujets
-
week 10