Tentamen (uitwerkingen)

Certified Machine Learning (Python) Practice Exam

Beoordeling

Verkocht

Pagina's

Cijfer

A+

Geüpload op

26-03-2025

Geschreven in

2024/2025

1. Introduction to Machine Learning • Definition and history of machine learning • Types of machine learning (Supervised, Unsupervised, Semi-supervised, Reinforcement) • Key applications of machine learning in different industries • Overview of machine learning lifecycle • Difference between AI, ML, and Deep Learning • Understanding the concept of training, testing, and validation 2. Python Basics for Machine Learning • Python programming essentials for data analysis and machine learning • Python libraries (NumPy, pandas, scikit-learn, TensorFlow, Keras, Matplotlib) • Working with data structures (lists, tuples, dictionaries, sets) in Python • Control flow (if, else, for, while loops) and functions in Python • Writing Python scripts and functions for ML workflows • Understanding Python's object-oriented features • Data manipulation using pandas: DataFrames, series, indexing, and slicing 3. Data Preprocessing and Cleaning • Data collection and loading data from different formats (CSV, Excel, SQL, etc.) • Handling missing data (imputation, removal, interpolation) • Encoding categorical variables (One-hot encoding, Label encoding) • Feature scaling techniques (Standardization, Normalization) • Data transformation and feature engineering • Handling imbalanced data (SMOTE, undersampling, oversampling) • Feature selection techniques (Correlation, Recursive Feature Elimination) • Handling outliers and data smoothing 4. Exploratory Data Analysis (EDA) • Descriptive statistics (mean, median, variance, skewness, kurtosis) • Visualizing data distributions (histograms, box plots, density plots) • Bivariate and multivariate analysis • Data visualization techniques (Matplotlib, Seaborn, Plotly) • Identifying patterns and relationships between features • Correlation matrices and heatmaps • Using scatter plots and pair plots for analysis 5. Supervised Learning Algorithms • Linear Regression o Concept of linear regression o Cost function, Gradient Descent o Model evaluation (MSE, RMSE, MAE, R-squared) • Logistic Regression o Understanding the logistic function and its application o Cost function for logistic regression o Model evaluation (Accuracy, Precision, Recall, F1-Score, ROC Curve) • Decision Trees o Splitting criteria (Gini Impurity, Entropy, Information Gain) o Overfitting and pruning techniques o Hyperparameter tuning (max depth, min samples split) • Support Vector Machines (SVM) o Hyperplane, margin, and kernels (linear, polynomial, RBF) o SVM for classification and regression o Regularization in SVM • k-Nearest Neighbors (k-NN) o Distance metrics (Euclidean, Manhattan) o Model evaluation (Confusion Matrix, K-Fold Cross Validation) o Choosing the right value for k • Naive Bayes o Probabilistic model and assumptions (Independence of features) o Types of Naive Bayes classifiers (Gaussian, Multinomial, Bernoulli) • Ensemble Methods o Bagging (Bootstrap Aggregating) o Random Forests and its advantages o Boosting (AdaBoost, Gradient Boosting, XGBoost) o Stacking and Blending for improved accuracy 6. Unsupervised Learning Algorithms • Clustering Techniques o K-Means clustering (Initialization, Elbow Method) o Hierarchical Clustering (Agglomerative, Divisive) o DBSCAN (Density-Based Spatial Clustering) o Evaluation of clustering models (Silhouette Score, Davies-Bouldin Index) • Dimensionality Reduction o Principal Component Analysis (PCA) o t-Distributed Stochastic Neighbor Embedding (t-SNE) o Linear Discriminant Analysis (LDA) o Feature extraction vs. feature selection 7. Deep Learning (Neural Networks) • Introduction to neural networks and how they mimic human brain • Layers in neural networks (Input, Hidden, Output) • Activation functions (Sigmoid, ReLU, Tanh, Softmax) • Backpropagation and Gradient Descent • Loss functions (Mean Squared Error, Cross-Entropy) • Overfitting and Regularization (Dropout, L2 regularization) • Introduction to Convolutional Neural Networks (CNN) o Layers (Convolutional, Pooling, Fully Connected) o Applications of CNN (Image classification, Object detection) • Introduction to Recurrent Neural Networks (RNN) o Understanding time-series data and sequence models o Long Short-Term Memory (LSTM) networks • Autoencoders and their applications (Anomaly detection, Data compression) 8. Model Evaluation and Optimization • Cross-validation techniques (K-Fold, Stratified K-Fold, Leave-One-Out) • Performance metrics for classification (Accuracy, Precision, Recall, F1-Score, ROC Curve, AUC) • Performance metrics for regression (MSE, RMSE, MAE) • Hyperparameter tuning (Grid Search, Random Search, Bayesian Optimization) • Model selection criteria and trade-offs • Bias-Variance trade-off and how to achieve optimal model performance • Feature importance and model interpretability 9. Working with Big Data in Machine Learning • Overview of Big Data frameworks (Hadoop, Spark) • Distributed machine learning and parallel processing • Handling large datasets using Dask and Spark MLlib • Introduction to GPUs and their role in accelerating machine learning 10. Deployment and Model Monitoring • Model deployment strategies (On-premises, Cloud-based solutions like AWS, Azure) • Model versioning and rollback • Continuous integration/continuous deployment (CI/CD) for ML models • Using Docker for containerization of ML models • Model monitoring and drift detection • Updating models in production • Ethics in AI and machine learning (Fairness, Transparency, Accountability) 11. Machine Learning in Real-World Applications • Natural Language Processing (NLP) o Text Preprocessing (Tokenization, Lemmatization, Stemming) o Bag-of-Words and TF-IDF o Sentiment analysis, Named Entity Recognition (NER) o Word Embeddings (Word2Vec, GloVe)

Meer zien Lees minder

Instelling

Computers

Vak

Computers

Voorbeeld van de inhoud

Certified Machine Learning (Python) Practice Exam
Question 1: What is the primary goal of machine learning?
Options:
A. To design explicit algorithms
B. To learn patterns from data
C. To compute statistics manually
D. To implement database queries
Answer: B
Explanation: Machine learning focuses on automatically learning patterns from data to make predictions
or decisions without explicit programming.

Question 2: Which of the following best describes supervised learning?
Options:
A. Learning from unlabeled data
B. Learning from labeled data
C. Learning without any feedback
D. Learning by trial and error
Answer: B
Explanation: Supervised learning uses labeled data to train models so that they can predict outcomes for
new, unseen data.

Question 3: In unsupervised learning, what is the main goal of clustering?
Options:
A. To predict target values
B. To reduce data dimensionality
C. To group similar data points
D. To enhance image resolution
Answer: C
Explanation: Clustering aims to group similar data points together based on features and similarities
without prior labeling.

Question 4: Which Python library is most commonly used for numerical computations in machine
learning?
Options:
A. pandas
B. NumPy
C. matplotlib
D. TensorFlow
Answer: B
Explanation: NumPy provides support for large, multi-dimensional arrays and matrices, making it
essential for numerical computations.

Question 5: What does the term “feature scaling” refer to?
Options:
A. Increasing the number of features

,B. Reducing the number of observations
C. Normalizing data values to a common scale
D. Encoding categorical variables
Answer: C
Explanation: Feature scaling normalizes data values so that features contribute equally to the model’s
performance.

Question 6: Which activation function is most commonly used in deep learning hidden layers?
Options:
A. Softmax
B. Sigmoid
C. ReLU
D. Linear
Answer: C
Explanation: ReLU (Rectified Linear Unit) is popular because it helps mitigate the vanishing gradient
problem while being computationally efficient.

Question 7: What is overfitting in machine learning models?
Options:
A. Underestimating the model’s complexity
B. When a model learns noise in the training data
C. Having too few training samples
D. When a model performs equally on training and test data
Answer: B
Explanation: Overfitting occurs when a model learns the training data—including its noise—instead of
the underlying pattern, resulting in poor generalization.

Question 8: Which technique is used for reducing overfitting in neural networks?
Options:
A. Increasing learning rate
B. Dropout
C. Using more layers
D. Removing bias
Answer: B
Explanation: Dropout randomly disables neurons during training, which helps prevent the network from
overfitting.

Question 9: In a confusion matrix, what does the term “True Positive” (TP) represent?
Options:
A. Incorrectly predicted positive cases
B. Correctly predicted negative cases
C. Correctly predicted positive cases
D. Incorrectly predicted negative cases
Answer: C
Explanation: True Positives are cases where the model correctly predicts the positive class.

,Question 10: Which method is used for hyperparameter tuning by exhaustively searching over
specified parameter values?
Options:
A. Random Search
B. Grid Search
C. Bayesian Optimization
D. Cross-Validation
Answer: B
Explanation: Grid Search systematically tests all parameter combinations to find the best model
configuration.

Question 11: What does the acronym “PCA” stand for in machine learning?
Options:
A. Principal Cluster Analysis
B. Partial Component Analysis
C. Principal Component Analysis
D. Probabilistic Clustering Algorithm
Answer: C
Explanation: PCA stands for Principal Component Analysis, a technique used for dimensionality
reduction.

Question 12: Which of the following is a common cost function for linear regression?
Options:
A. Cross-entropy loss
B. Mean Squared Error (MSE)
C. Hinge loss
D. Log loss
Answer: B
Explanation: Mean Squared Error (MSE) measures the average squared difference between predicted
and actual values in linear regression.

Question 13: What distinguishes reinforcement learning from other types of machine learning?
Options:
A. Use of labeled data
B. Learning based on rewards and penalties
C. Clustering data points
D. Dimensionality reduction
Answer: B
Explanation: Reinforcement learning involves an agent that learns to make decisions by receiving
rewards or penalties.

Question 14: Which library is primarily used for data manipulation and analysis in Python?
Options:
A. pandas
B. scikit-learn
C. TensorFlow

, D. Matplotlib
Answer: A
Explanation: pandas is a powerful library used for data manipulation and analysis, offering data
structures like DataFrames.

Question 15: In the context of decision trees, what is “pruning”?
Options:
A. Adding more branches to the tree
B. Reducing the depth of the tree to prevent overfitting
C. Increasing the number of leaves
D. Scaling features
Answer: B
Explanation: Pruning is the process of reducing the size of a decision tree to improve its generalization
by removing branches that have little importance.

Question 16: What is the purpose of one-hot encoding in data preprocessing?
Options:
A. To scale numeric features
B. To convert categorical variables into binary vectors
C. To impute missing values
D. To reduce dimensionality
Answer: B
Explanation: One-hot encoding transforms categorical variables into a binary matrix representation,
which is more suitable for ML algorithms.

Question 17: Which metric is most appropriate for evaluating a regression model?
Options:
A. Accuracy
B. Precision
C. Mean Absolute Error (MAE)
D. F1-Score
Answer: C
Explanation: Mean Absolute Error (MAE) is commonly used to evaluate regression models by measuring
the average absolute differences between predicted and actual values.

Question 18: Which of the following is an ensemble learning method?
Options:
A. Logistic Regression
B. k-Nearest Neighbors
C. Random Forest
D. Support Vector Machine
Answer: C
Explanation: Random Forest is an ensemble learning method that combines multiple decision trees to
improve model accuracy and reduce overfitting.

Question 19: In support vector machines, what does the “kernel trick” enable?
Options:

Meld schending auteursrecht

Geschreven voor

Instelling: Computers
Vak: Computers

Documentinformatie

Geüpload op: 26 maart 2025
Aantal pagina's: 55
Geschreven in: 2024/2025
Type: Tentamen (uitwerkingen)
Bevat: Vragen en antwoorden

Onderwerpen

certified machine learning python practice exam

€76,71

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

nikhiljain22

3,5

(226)

Maak kennis met de verkoper

nikhiljain22 EXAMS

Bekijk profiel

Volgen

Verkocht

960

Lid sinds

1 jaar

Aantal volgers

Documenten

23268

Laatst verkocht

6 uur geleden

3,5

226 beoordelingen

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper nikhiljain22. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €76,71. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 49846 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen