COS4861 Assignment 3 (COMPLETE ANSWERS) 2025 - DUE 10 September 2025; 100% correct solutions and explanations.
COS4861 Assignment 3 (COMPLETE ANSWERS) 2025 - DUE 10 September 2025; 100% correct solutions and explanations.COS Assignment 3 - [65 points] ## Working towards encoding systems in NLP. —------------------------------------------------------------------------------------------------------------------------ Due date: 10 September 2025 Year: 2025 Author: Dr Dongmo Cyrille, Mr Thapelo Sindane Contact: or —------------------------------------------------------------------------------------------------------------------------ You will learn how to: - define various encoding techniques (N-grams, ), and smoothing algorithms - build tokenizers, and N-grams models, Note 1: This assignment is designed to make you understand the fundamentals behind corpus-based Natural Language Processing (NLP) and various techniques applied for preprocessing, analysing, and generating insights from text such as word would, tokenization, and creating encoding systems. This is in no way a definitive list of examples, but the basic components you need to get started. Introduction Use the below corpus to answer the questions that follow. The assignment is divided into three section: - Theory based question - Critical thinking and understanding - And application / code [should be in python] The corpus “” When data are noisy, it’s our job as data scientists to listen for signals so we can relay it to someone who can decide how to act. To amp up how loudly hidden signals speak over the noise of big and/or volatile data, we can deploy smoothing algorithms, which though traditionally used in time-series analyses, also come into their own when applied on other sequential data. Smoothing algorithms are either global or local because they take data and filter out noise across the entire, global series, or over a smaller, local series by summarizing a local or global domain of Y, resulting in an estimation of the underlying data called a smooth. The specific smoother you use depends on your analysis’ goal and data quirks, because as we’ll see below, there are trade-offs to consider. Below are a few options, along with their intuition, limitations, and formula so you can rapidly evaluate when and why to use one over the other. “” Question 1 [12 points] - Theory 1) What is a corpus ? and how does it differ from other data types ? (2) 2) What is the technical term for splitting a corpus into different linguistic units such as paragraphs, sentences, and words in NLP (1) 3) Define N-grams and provide references from
Libro relacionado
- 2013
- 9789380578774
- Desconocido
Escuela, estudio y materia
- Institución
- University of South Africa
- Grado
- Natural Language Processing
Información del documento
- Subido en
- 4 de septiembre de 2025
- Número de páginas
- 19
- Escrito en
- 2025/2026
- Tipo
- Examen
- Contiene
- Preguntas y respuestas
Temas
-
cos4861 assignment 3 complete answers 2025 due