Guides d'étude, Notes de cours & Résumés

Vous recherchez les meilleurs guides d'étude, notes d'étude et résumés sur  ? Sur cette page, vous trouverez 8 documents pour vous aider à réviser pour .

All 8 résultats

Trier par

Hands-On Exercise 6-1: Outlier Detection with Titanic dataset
  • Hands-On Exercise 6-1: Outlier Detection with Titanic dataset

  • Examen • 7 pages • 2024
  • Hands-On Exercise 6-1: Outlier Detection with Titanic dataset In this Hands-on exercise, you will learn. • How to use quantiles to detect the outliers in data (the Titanic Training dataset) Related DM Book Chapters/Sections: • Section 2.2.2 Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, and Interquartile Range Related Hands-on Exercises: • Exercise 1-2 Apache Spark and Basic Statistics Finish the assignments shown below. Submit a word document (...
    (0)
  • $10.49
  • + en savoir plus
Hands-On Experiment 5-2: Clustering with Spark - Part II
  • Hands-On Experiment 5-2: Clustering with Spark - Part II

  • Examen • 5 pages • 2024
  • Hands-On Experiment 5-2: Clustering with Spark - Part II
    (0)
  • $10.49
  • + en savoir plus
Hands-On Experiment 5-1: Clustering with Spark
  • Hands-On Experiment 5-1: Clustering with Spark

  • Examen • 4 pages • 2024
  • Hands-On Experiment 5-1: Clustering with Spark In this Hands-on exercise, you will learn. • How to use the k-means clustering algorithm in Apache Spark • How to handle data and features for clustering • Training and prediction for clustering • Evaluation for clustering Related DM Book Chapters/Sections: • Section 10.1 Cluster Analysis • Section 10.2 Partitioning Methods • Section 10.2.1 k-Means: A Centroid-Based Technique Submit a word document (or PDF) with answers/expl...
    (0)
  • $10.49
  • + en savoir plus
Hands-On Experiment 4-2: Classification with Titanic dataset
  • Hands-On Experiment 4-2: Classification with Titanic dataset

  • Examen • 4 pages • 2024
  • Hands-On Experiment 4-2: Classification with Titanic dataset 2.2.1 (20pts) Assignment 1: Index the Gender values We have learned how to index values using StringIndexer in previous hands-on exercises • Write codes for indexing the gender values 1. Import a Class 2. Define an indexer – Input column: Gender – Output column: IndexedGender 3. Train and transform • Take a screenshot of running your codes and outputs using the show (5) function 3 Building a Model 3.1 Training and T...
    (0)
  • $10.49
  • + en savoir plus
Hands-On Experiment 4-1: Classification with Spark
  • Hands-On Experiment 4-1: Classification with Spark

  • Examen • 7 pages • 2024
  • Hands-On Experiment 4-1: Classification with Spark In this Hands-on exercise, you will learn • Decision Tree classifier in Apache Spark • How to handle data, features, and training & testing data • Training & Testing • Evaluation Related DM Book Chapters/Sections: • Section 8.1 Basic Concepts • Section 8.2 Decision Tree DataFrame-based Spark ML is new, much easier, and better. However, some features are missing. The evaluator for DataFrame provides limited metrics only. Th...
    (0)
  • $10.49
  • + en savoir plus
Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II
  • Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II

  • Examen • 4 pages • 2024
  • Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II 1.3 Create DataFrames You can create your DataFrames using Assignment 1 1. Write spark codes to read the following data. (a) Only read the following four tables that will be used for this exercise i. orders ii. products iii. departments iv. order_products_train (b) Make sure that you read the “headers” as well i. Each CSV file of the dataset has a header line. ii. You can achieve this behavior by Assignment ...
    (0)
  • $10.49
  • + en savoir plus
Hands-On Experiment 3-1: Frequent Pattern Mining with Spark
  • Hands-On Experiment 3-1: Frequent Pattern Mining with Spark

  • Examen • 6 pages • 2024
  • 2.4 Let’s try to practice answering some exercise questions Q1: List 3 most frequent itemsets of size 1. Q2: Given support >= 30%, show itemsets and the counts for candidate itemsets of size 2 Q3: Colby is purchased most frequently with what other product? Q4: What is the confidence for the rule: American → Cheddar 3 Submission: Find frequent patterns using FPGrowth from a real-world grocery store dataset Please read the related news article “Kroger Knows Your Shopping Patterns B...
    (0)
  • $10.49
  • + en savoir plus
Hands-On Experiment 2-2: Data Warehousing with Hive
  • Hands-On Experiment 2-2: Data Warehousing with Hive

  • Examen • 78 pages • 2024
  • Objectives In this Hands-on exercise, you will learn 1. Practice PySpark SQL for data analytics. 2. Use enhanced aggregation to emulate SQL concepts like GROUPING SETS, ROLLUP, and CUBE in PySpark. 3. Analyzing Driver Risk factor 4. Analyzing data using Data Warehousing/OLAP functions in Hive Q1. (35pts) Modify/rewrite the grouping-set-query in the example with ROLLUP (Let’s call it rollup-query). Run it, check the results, and explain the differences. – Replace the GROUPING SETS ...
    (0)
  • $10.49
  • + en savoir plus