Big Data refers to data sets that are too large or complex to be handled by
traditional data-processing software. It is characterised by:
1. Volume: Too large to fit into a single server.
2. Velocity: Data is generated and processed at high speeds, often in
real-time.
3. Variety: Data comes in various forms, including structured,
unstructured, text, and multimedia.
When data is too large to fit on a single server the data processing must
be distributed across multiple machines and functional programming
occurs as it makes to write correct and efficient distributed code.
The features of functional programming: supports immutable data
structures and statelessness, reducing side effects and making code more
predictable, higher-order functions and immutability make it easier to
distribute code across multiple servers.
Data Representation Models
Fact-Based Model: Represents data as individual facts, each
capturing a single piece of information.
Graph Schema: Captures the structure of the dataset using nodes
(entities), edges (relationships), and properties (attributes).
Challenges of Big Data
Big Data often lacks a clear structure, making analysis more difficult.
Traditional relational databases require data to fit into a row-and-column
format, which is not suitable for unstructured data. Techniques are
needed to discern patterns and extract useful information from
unstructured data.
Streaming Data
Data from sources like networked sensors, smartphones, video
surveillance, and mouse clicks are continuously streamed, contributing to
the velocity aspect of Big Data.
Functional Programming Support
Data that cannot be changed once created. Functions that do not depend
on external state, making them easier to test and debug. Functions that
can take other functions as arguments as they are of a higher order or
return them as results, enabling more flexible and reusable code.