Summary Week 15
Using the code provided in section 4.5.1 AND 4.5.2, create an LDA model of the essays in the okc dataset. Create charts of most common terms per topic. ANSWER THE FOLLOWING QUESTIONS: 1. Explain how the html tags and newline characters were removed from the text. 2. How were the individual words combined into complete essays? 3. Explain what an LDA model does. 4. What are stop words? 5. In which 2 topics is the word 'want' not in the top ten? 6. What code snippet causes the topics to be represented by different colors? 7. What code snippet causes there to be a chart for each topic? Disconnect from your Spark cluster. Please submit a Word doc with your charts and a timestamp. Explain what your charts are showing. Include the answers to the questions with the questions and numbers.
Connected book
Written for
- Institution
- Big Data Tools & Architecture
- Course
- Big Data Tools & Architecture
Document information
- Summarized whole book?
- No
- Which chapters are summarized?
- Unknown
- Uploaded on
- July 15, 2023
- File latest updated on
- February 22, 2024
- Number of pages
- 3
- Written in
- 2022/2023
- Type
- Summary