Big Data Infrastructure (100% Correct Solutions)
What are the 4 V's of Big Data? correct answers Volume - lots of data (lots!) Variety - structure & unstructured Velocity - faster than transactional data (like CC's); think sensor data... so fast that some RDMS has trouble writing it all Value - make sense of the data so that action can be taken (in time) Define Big Data correct answers Big Data is data that exceeds the processing capacity of conventional database systems. As consumers and business users the size and scale of data is not what we care about. What Big Data is really all about is the ability to capture and analyze data and gain actionable insights from that data at a much lower cost than was historically possible. With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. No longer do we need complex software that takes months or years to set up and use. Nearly all the analytics power we need is available through simple software downloads or in the cloud. What is the trend for storage of data? What are some sources of increased data? correct answers From the floppy disk, CDs, flash drive to cloud, people need to store more and more data, and they also want to be able to access those data in a relatively easy manner. Once data becomes easy to store, people create and make more data. Some of the sources of increased data are social media, Internet of Things (IOT) and search queries from search engines. What is a common business use of data mining? (I.e., What is cross-selling?) correct answers Cross-selling is a common business use of data mining. Based on searches and product purchasing behavior, a business organization can figure out what products to recommend next. Using the sales history, current and social events, customer buying behaviors, and social media, an organization can recommend to its customers what items or services are frequently bought together. More importantly, the recommendations can be either customer specific/general or be based on what is trending. How does commodity hardware, Open Source and The Cloud enabling Big Data? correct answers Data storage infrastructure such as commodity hardware, Open Source and The Cloud is primarily responsible for storing and to some extent processing the immense amounts of data that companies are capturing. With people creating more and more data, companies used to store the data in their own commodity hardware such as data servers. Then, Open Source such as Linux, which are operating system on low-cost hardware, was invented to help companies store data with lower cost. And, instead of buying hardware and software and installing it in their own data centers and then maintaining that infrastructure, companies now store data in The Cloud, such as Amazon Web Services (AWS), to get the capabilities they want on demand over the Internet. All of these data storage infrastructures allow companies to store massive amount of data easily which in terms enables them to process and analyze data more efficiently. In a few sentences, describe a Big Data application correct answers Dirty data is a record that contains mistakes, errors or incomplete values. The challenge of dirty data is to find ways to clean the data. Data cleaning requires going through the data meticulously, noting where
Written for
- Institution
- Big Data Infrastructure
- Course
- Big Data Infrastructure
Document information
- Uploaded on
- July 2, 2023
- Number of pages
- 9
- Written in
- 2022/2023
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
what are the 4 vs of big data
Also available in package deal