Revature Week 4 Review Questions | 50 Questions with 100% Correct Answers | Updated & Verified
What is Hive? - Hive is a tool that allows for SQL-Like querying on big data. Originally built as a way to run MapReduce jobs by writing SQL, but has since changed (We're still using Hive on MapReduce jobs though) Where is the default location of Hive's data in HDFS? - o In the $HIVE_HOME directory. o By default, all database and table data files are stored at /user/hive/warehouse What is an External table? - o Data kept outside of Hive that we query using Hive What is a Managed table? - o Data kept inside of Hive's internal data warehouse. This gives safety + efficiency on the data since Hive controls it. What is a Hive partition? - o A Hive partition is a column of a table that we have split off into a smaller dataset. Provide an example of a good column or set of columns to partition on? - o Time. We can select an appropriate resolution to get reasonably sized partitions, it is easy to add new data, and many queries subset time. What's the benefit of partitioning? - o Selecting the columns we have partitioned can lead to increased performance
Written for
- Institution
- Revature
- Course
- Revature
Document information
- Uploaded on
- November 12, 2022
- Number of pages
- 7
- Written in
- 2022/2023
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
revature week 4 review questions | 50 questions with 100 correct answers | updated amp verified
Also available in package deal