Summary Week 15
Using the code provided in section 4.5.1 AND 4.5.2, create an LDA model of the essays in the okc dataset. Create charts of most common terms per topic. ANSWER THE FOLLOWING QUESTIONS: 1. Explain how the html tags and newline characters were removed from the text. 2. How were the individual words combined into complete essays? 3. Explain what an LDA model does. 4. What are stop words? 5. In which 2 topics is the word 'want' not in the top ten? 6. What code snippet causes the topics to be represented by different colors? 7. What code snippet causes there to be a chart for each topic? Disconnect from your Spark cluster. Please submit a Word doc with your charts and a timestamp. Explain what your charts are showing. Include the answers to the questions with the questions and numbers.
Gekoppeld boek
Geschreven voor
- Instelling
- Big Data Tools & Architecture
- Vak
- Big Data Tools & Architecture
Documentinformatie
- Heel boek samengevat?
- Nee
- Wat is er van het boek samengevat?
- Onbekend
- Geüpload op
- 15 juli 2023
- Bestand laatst geupdate op
- 22 februari 2024
- Aantal pagina's
- 3
- Geschreven in
- 2022/2023
- Type
- SAMENVATTING