Study guides, Class notes & Summaries
Looking for the best study guides, study notes and summaries about ? On this page you'll find 9 study documents about .
All 9 results
Sort by
-
Hands-on Exercise Ex5-3: Detecting Fake News with Apache Spark and Spark NLP
- Exam (elaborations) • 13 pages • 2024
-
- $10.49
- + learn more
Hands-on Exercise Ex5-3: Detecting Fake News with Apache Spark and Spark NLP 
Assignment 1 – 4 (10pts each, 40pts in total) 
Do the exercises in Section 1.4 – 1.7 
Assignment 5 (30pts) 
Rewrite the codes for detecting fake/real news in Trump and Biden tweet datasets. Note: Do not 
combine those datasets. 
• Read the article [21] 
• (10pts) Write the codes for downloading the two files: 
o Use the two links in the article 
o Use the links from the raw data by clicking the raw button on th...
-
Hands-on Exercise Ex5-2: Topic modeling with Apache Spark and Spark NLP
- Exam (elaborations) • 16 pages • 2024
-
- $10.49
- + learn more
Hands-on Exercise Ex5-2: Topic modeling with Apache Spark and Spark NLP 
Assignments 1 – 4 (10pts each) 
Do the exercises in Sections 3.6 – 3.9 
Assignment 5 (20pts) 
Try different values of k and maxIter to see which combination best suits your data in Section 
3.8. Show at least five combinations, show their results, and explain why it’s best. 
Assignment 6 (40pts) 
(30pts) Rewrite the codes for finding topics in the tweets coronavirus dataset. (10pts) Also, try 
different values of k an...
-
Hands-on Exercise Ex5-1: Natural Language Processing (NLP) with Named Entity Recognition (NER)
- Exam (elaborations) • 8 pages • 2024
-
- $10.49
- + learn more
Hands-on Exercise Ex5-1: Natural Language Processing 
(NLP) with Named Entity Recognition (NER) 
Assignment 10 (10pts) 
Annotate (NER) a text using a PretrainedPipeline (recognize_entities_dl) in SparkNLP [12][13] 
• Input Text from Wikipedia 
The University of Illinois Springfield (UIS) is a public university in Springfield, Illinois, United 
States. The university was established in 1969 as Sangamon State University by the Illinois 
General Assembly and became a part of the University of Ill...
-
Learn Models using ML Pipeline in Spark.
- Exam (elaborations) • 3 pages • 2024
-
- $10.49
- + learn more
Learn Models using ML Pipeline in Spark. 
2.2.1.2 Specify parameters 
The next step is setting up parameters for ML algorithms, LogisticRegression. We give 10 for 
maxIter (Max Iteration) and 0.01 for regParam (Regularization parameter) 
For detail, see reference [7] 
After running the above codes in Spark shell, you will see a bunch of parameters you specified, 
e.g. maxIter and regParam, and can specify or change, aggregationDepth and etc. 
2.2.1.3 Learn model 
Now it’s time to learn mode wi...
-
Data Analytics using Spark SQL
- Exam (elaborations) • 2 pages • 2024
-
- $10.49
- + learn more
Data Analytics using Spark SQL 
Assignment1 (20pts) Related: Section 3 
Write and run a Spark command (not SQL query) to show the date when # of deaths was severe 
(more than 800 deaths), as well as # of confirmed cases, # of deaths, and country using the filter 
function. The output should be like the one below. 
+--------+-----+------+-----------------------+ 
| dateRep|cases|deaths|countriesAndTerritories| 
+--------+-----+------+-----------------------+ 
Note: Write commands/queries for all ...
Get paid weekly? You can!
-
Data Analytics with DW/OLAP using Hive
- Exam (elaborations) • 6 pages • 2024
-
- $10.49
- + learn more
Create Hive Tables 
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing 
data queries and analysis. This exercise will use Hive as a data warehouse/OLAP tool for 
analyzing data. 
2.1.3 Create Hive Tables 
2.1.3.1 Check Schema 
To check the schema of the tables, see the first 5 rows. To see , use the ‘head’ Linux 
commands. You can see the schema (at least field/column names) in the first line: driverId, 
name, ssn, location, certified, and wage-plan....
-
NoSQL Database HBase
- Exam (elaborations) • 2 pages • 2024
-
- $10.49
- + learn more
NoSQL Database HBase 
Assignments 
1. Write and run 11 HBase commands to insert a new row into the table. 
a. Table name: <your-namespace>:truck_event 
b. Rowkey: 20000 
c. Column family name: events 
d. Columns: values 
i. driverId: <your-login or UIS NetID> 
ii. truckId: 999 
iii. eventTime: 01:01.1 
iv. eventType: <Pick one from Normal, Overspeed, and Lane Departure> 
v. longitude: -94.58 
vi. latitude: 37.03 
vii. eventKey (This is a RowKey) 
viii. CorrelationId: 1000 
ix. ...
-
Building a Hadoop Cluster with three VMs
- Exam (elaborations) • 6 pages • 2024
-
- $10.49
- + learn more
Building a Hadoop Cluster with three VMs 
Assignments 
1. (55pts in total) Check whether your Hadoop cluster is running correctly or not. Explain 
and take screenshots. 
a. (15pts) Before you do it, change or show the hostnames of your three nodes. 
i. Hostnames 
1. Master: <your-NetID>-HM 
2. Worker1: <your-NetID>-W1 
3. Worker2: <your-NetID>-W2 
4. For example, sslee777-HM, sslee777-W1, and sslee777-W2 
ii. 
b. (15pts) By creating your user directory 
See the below example 
(...
-
Big_Data_Analytics_Ex2-1
- Exam (elaborations) • 3 pages • 2024
-
- $10.49
- + learn more
Write Pig scripts for finding truck drivers exceeded the speed limit, 
‘overspeed’. 
a. Dataset: Truck IoT dataset 
i. Dataset location (Linux filesystem): 
/home/data/CSC534BDA/datasets/Truck-IoT/ 
ii. Filenames: truck_event_text_ 
b. Write and run your Pig scripts 
i. (20pts) Find all truck drivers who exceeded the speed limit, ‘Overspeed’ 
ii. (10pts) Define schema when you load the data (Don’t use $0, $1, or etc.) 
iii. (10pts) Show the driver’s events grouped, if the drivers exc...
That summary you just bought made someone very happy. Also get paid weekly? Sell your study resources on Stuvia! Discover all about earning on Stuvia