100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Notas de lectura

College aantekeningen Web Data Processing Systems (XM_40020), Master VU AI

Puntuación
-
Vendido
3
Páginas
51
Subido en
23-02-2022
Escrito en
2020/2021

Alle lectures voor het vak WDPS zijn aanwezig in dit document, door dit goed door te nemen en goed te oefenen is een goede cijfer gegarandeerd.

Institución
Grado











Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
23 de febrero de 2022
Número de páginas
51
Escrito en
2020/2021
Tipo
Notas de lectura
Profesor(es)
Jacopo urbani
Contiene
Todas las clases

Temas

Vista previa del contenido


Web Data Processing Systems
Created @October 21, 2020 3:02 PM

Class S2

Type S2

Materials



Lecture 1 (27 Okt 2020)
Introduction to course

Goals
This course is system-oriented. We familiarize with some of the topics that drive research on the Web. With
focus on 2 main themes:

1. Knowledge acquisition from the Web

2. Knowledge consumption



Lecture 2 (27 Okt)
Introduction to Knowledge Bases

limits of text

information retrieval is a field in CS about how to retrieve subset doc from large database.

Traditionally, the retrieval was keyword-based (search engines).

what are good string similarities?

what are good criteria to rank

how can we diversify the results

Recently it changed from keyword-based to entity-based retrieval.

text documents contain data (=latin word for sth that is given). However knowledge is familiarity,
awareness or understanding of someone or something, such as facts, information, descriptions or skills.

in essence what we like to do in order to implement the vision of entity-based search, is to build knowledge
repositories. We want to move away from collection of data/text and build that.

how are knowledge repositories build? 2 methods

Manifest knowledge

meaning accessible to humans, those repositories are called knowledge bases or knowledge
graphs. So content can be described using a graph




Web Data Processing Systems 1

, typically constructed manually or from unstructured sources

we need for this a language we understand, so less ambiguous is best. Logic is the language that
humans designed to express knowledge.

opinion: knowledge is something we can interpret without ambiguities

knowledge base: crystallization of factual knowledge in the form of associations between
entities and relations.

can be expresses as first order logic

recently google re-branded knowledge bases as knowledge-graphs (= same as base but
represented in graphs)




Latent knowledge

Other people suggest that we not need to understand the repository as long as knowledge can be
processed by machine and be used by tasks needing intelligence. Meaning is hidden to us. latent
models or latent feature models

typically learned using machine learning techniques

recently, latent models became very popular due to the rise of deep learning.

see more detail by statistical inference

which is better? different opinions. for some everything can be learned for other knowledge is only
what can be understood.



knowledge bases available on the web

WordNet (NLP oriented), most popular lexical db for english.

idea: create knowledge base of meanings of words. Groups of words into sets of synonyms (synsets)

two important languages related to knowledge bases

RDF

standardized language to exchange knowledge on the Web

RDF is a standard used to report statements that describe properties of resources

Properties are represented by IRIs while resources are either special IRIs, labels or special
placeholders called blank nodes(when you dont know sth)

The statements can be represented as triples of the form (subject predicate object) and serialized
with different formats: RDF/XML, N3, Turtle

RDF dataset can be represented as a directed graph




Web Data Processing Systems 2

, SPARQL

another standardized a specific language of W3C

SPARQL is a query language which has a SQL-inspired syntax. Finding
answers to a SPARQL query corresponds to find all possible graph homomorphisms between the
query and the graph[=knowledge base].




most important knowledge base right now is Wikidata. Created as data function of wikipedia, because that
was mainly text with which verify and keeping consistency is difficult.

Data is validated by the community

Keeps provenance (=herkomst) of the data

Multilingual by design

Supports plurality

high quality knowledge

DBpedia:

Project to convert Wikipedia pages into RDF,

Contains links to other KBs

Fairly large ontology but not rich in terms of expressiveness

YAGO

Unify Wikipedia and Wordnet




Web Data Processing Systems 3

, Exploit Wikipedia Info boxes to extract clean facts

Check the plausibility of facts via type checking

Freebase: discontinued



Lecture 3 (1 Nov)
Knowledge Acquisition


process to extract knowledge (to be integrated into knowledge bases) from unstructured
text or other data




In this course we only look at extracting entities and relations between them from unstructured data.

another important form of extraction consists of detecting events and other temporal expressions. We are not
going to talk about them

NLP Preprocessing

NLP : natural language processing

Typical distinction:

structured data: "databases"

unstructured data: "information retrieval", typically refers to "text"

Semi-structured data, because there is always some structure like title and bullets.



Before we can use some text, we must pre-process it. Std. tasks are:

tokenization

Goal

given a character sequence, split it into subsequences called tokens

tokens are often loosely referred as terms/words

Type vs Token

Token: instance of a sequence of characters in some particular document that are grouped
together as a useful semantic unit ⇒
multiset

Type: class of all tokens containing the same character sequence ⇒ set


Web Data Processing Systems 4
$7.18
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
MeldaMalkoc Vrije Universiteit Amsterdam
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
54
Miembro desde
3 año
Número de seguidores
34
Documentos
20
Última venta
5 meses hace

3.3

7 reseñas

5
2
4
1
3
2
2
1
1
1

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes