Computer Science: Data Science Study Notes
Introduction to Data Science
Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. It involves
techniques from statistics, machine learning, and data mining to analyze data for decision-making.
Data Collection & Cleaning
Data collection is the process of gathering information from various sources such as databases,
APIs, and web scraping. Data cleaning involves removing duplicates, handling missing values, and
transforming data into a usable format.
Exploratory Data Analysis (EDA)
EDA helps understand the structure of data by summarizing key statistics and visualizing trends.
Common techniques include histograms, scatter plots, and box plots.
Statistics & Probability for Data Science
Key concepts include mean, median, standard deviation, probability distributions (normal, binomial),
hypothesis testing, and confidence intervals.
Machine Learning Basics
Machine learning is a subset of AI that enables systems to learn from data without being explicitly
programmed. It includes supervised learning (classification, regression), unsupervised learning
(clustering, dimensionality reduction), and reinforcement learning.
Data Visualization
Visualization tools like Matplotlib, Seaborn, and Tableau help represent data graphically. Common
techniques include line graphs, bar charts, heatmaps, and pie charts.
Big Data & Cloud Computing
Big Data involves handling large datasets using technologies like Hadoop and Spark. Cloud
computing services like AWS, Google Cloud, and Azure provide scalable computing power for data
Introduction to Data Science
Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms, and
systems to extract knowledge and insights from structured and unstructured data. It involves
techniques from statistics, machine learning, and data mining to analyze data for decision-making.
Data Collection & Cleaning
Data collection is the process of gathering information from various sources such as databases,
APIs, and web scraping. Data cleaning involves removing duplicates, handling missing values, and
transforming data into a usable format.
Exploratory Data Analysis (EDA)
EDA helps understand the structure of data by summarizing key statistics and visualizing trends.
Common techniques include histograms, scatter plots, and box plots.
Statistics & Probability for Data Science
Key concepts include mean, median, standard deviation, probability distributions (normal, binomial),
hypothesis testing, and confidence intervals.
Machine Learning Basics
Machine learning is a subset of AI that enables systems to learn from data without being explicitly
programmed. It includes supervised learning (classification, regression), unsupervised learning
(clustering, dimensionality reduction), and reinforcement learning.
Data Visualization
Visualization tools like Matplotlib, Seaborn, and Tableau help represent data graphically. Common
techniques include line graphs, bar charts, heatmaps, and pie charts.
Big Data & Cloud Computing
Big Data involves handling large datasets using technologies like Hadoop and Spark. Cloud
computing services like AWS, Google Cloud, and Azure provide scalable computing power for data