Comprehensive Questions and
Solutions Graded A+
Dataset - Answer: A large collection of data used to train and test the model. The data is often
normalised and sometimes labelled.
Deep learning - Answer: A type of machine learning where the model is made up of many
layers, each of which contain artificial 'neurons', which emulate human brain functions, to learn
patterns from large amounts of data.
Graphical processing unit (GPU) - Answer: A hardware component designed to process many
tasks simultaneously/concurrently, usually for image processing. However, it is also ideal for
machine learning model training tasks.
Hyperparameter tuning - Answer: The process of adjusting and optimising a NNs
hyperparameters, or settings, to improve its performance.
Large language model (LLM) - Answer: A model specifically designed to understand and
generate human language. Trained on vast amounts of text data.
Latency - Answer: The time, or delay, for a model to return an output (prediction, result,
classification) after receiving an input or question.
, Loss function - Answer: A function, or mathematical formula, that quantifies how far a model's
predictions are from what they should actually be. This function helps guide improvement to
the model.
Memory cell state - Answer: The part of the LSTM network which stores important information
over time
Bias - Answer: Systematically prejudiced error in a model's predictions that occurs due to
erroneous assumptions made during the learning process. High bias can cause the model to
oversimplify and miss important patterns, leading to underfitting.
Confirmation - Answer: The tendency in model training to focus on data or outcomes that
support existing assumptions, potentially ignoring contradictory evidence.
Historical - Answer: Bias that originates from historical data that reflects past prejudices or
inequities, leading to biassed predictions in machine learning models.
Labelling - Answer: Occurs when the labels assigned to training data are inconsistent or
subjective, which can mislead model learning and performance.
Linguistic - Answer: Arises from the language used in training data, affecting how models
interpret and generate language, potentially reinforcing stereotypes. (e.g. slang or very formal
training language leads to model only being familiar with certain linguistics
Sampling - Answer: When the training data is not representative of the overall population,
leading to models that perform poorly on unseen data.
Selection - Answer: Bias introduced when the data selection process favours certain outcomes
or characteristics, impacting the validity of the model's conclusions.