Samenvatting

samenvatting data mining (tabel vorm) - Advanced data analysis

Beoordeling

Verkocht

Pagina's

164

Geüpload op

13-09-2024

Geschreven in

2024/2025

Deze samenvatting is heel handig omdat het in tabel vorm is en duidelijk onderverdeeld in hoofdstukken. Links staat een term en rechts bijhorende uitleg. Is zeer handig tijdens het open-boek examen om heel snel info te vinden. Op het einde van het document staan enkele voorbeeld vragen.

Meer zien Lees minder

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Universiteit Antwerpen (UA)
Studie: Biomedische Wetenschappen
Vak: Data mining (2052FBDBMW)

Alle documenten voor dit vak (11)

Documentinformatie

Geüpload op: 13 september 2024
Aantal pagina's: 164
Geschreven in: 2024/2025
Type: Samenvatting

Onderwerpen

Voorbeeld van de inhoud

SUMMARY DATA ANALYSIS

CHAPTER 1 ~ INTRODUCTION

BIG DATA
Bit of context Get more comprehensive picture if you have integrated info (transcriptomics,
genomics, ..)
New perspectives by large scale molecular data.
Fundamentally new perspective that requires new tools --> they all fit into big data

Big data Big data= AI = deep learning
à evolution towards larger scale

Definition: is data for which conventional computer-techniques are not sufficient
anymore due to size, complexity,...
... is a disruptive trend in computer sciences. (let all our conventional ideas fall apart)

(complexity increases more compared to the evolution of computers)

Big data is characterized by 1. Volume
2. Velocity
3. Variety
4. Veracity

à is big data a reality in life sciences?

1. Volume = the amount of data we are dealing with, different than it was a long time ago

The cost of sequencing has become affordable, 2007 suddenly cost has decreased
drastically, moore’s law= this trend does not follow the getting cheaper of
sequencing, the computers cannot keep up.

1

, Moore's Law is an observation that the number of transistors in a computer chip
doubles every two years or so. As the number of transistors increases, so does
processing power. The law also states that, as the number of transistors increases,
the cost per transistor falls.

2. Velocity the speed at which it is produced, it is produces all the time (smart phone is
collecting data all the time that collects info from the environment) (data collected at
enormous speeds)

Data management gap: Speed data is produced goes faster than the growth of
people that deal with it. Impossible to have all IT people, need to be smarter how to
deal with this data

How bring data from sequencing facility to data servers in hospital --> most effective
way by bike (You have to be creative with big data)

Next step is special sequencing (single cell sequencing) à Produces massive data
Velocity is an important aspect, another trend

3. Variety in life science a lot of different data types, need to understand what you look at

heterogeneous and lots of unstructured data (eg sensor signals)

4. Veracity = data is never perfect, there might me noise, biases, info missing --> lots of aspect
that make you doubt the quality (how truthful is your data? Can you rely on it?)

It is problematic in life science, because living systems are stochastic and noisy -->
you have to deal with the messiness
Also techniques have limitations

2

,Consequences of big data à Large scale data and AI brought a new data intensive research paradigm.

(A paradigm is a standard, perspective, or set of ideas. A paradigm is a way of looking
at something.)

Large scale data they bring new paradigm à Shift in paradigm

Deep learning try to make sense of data by finding patterns in data

Data science

There is a lot of other AI than data science based AI
- Deep learning is a subset of machine learning that uses neural networks
with many layers (hence "deep") to model complex patterns in large
datasets.

- Data mining is the process of discovering patterns, correlations, and
anomalies within large sets of data using statistical and computational
techniques. (clustering, regression, discission trees, hierarchical clustering)

- Machine learning is a subset of artificial intelligence that focuses on
developing algorithms that enable computers to learn from and make
predictions or decisions based on data. (Includes supervised learning (e.g.,
linear regression, decision trees, neural networks), unsupervised learning
(e.g., k-means clustering, principal component analysis))

3

, WHAT IS DATA
What is data? Collection of data objects and their attributes
- Attribute = a property or characteristic of an object
• Examples: eye color of a person, temperature, etc.
• Attribute is also known as variable, field, characteristic, or feature
• objects= sample in lab, attribute = measure you do of the sample in the
lab
- A collection of attributes describe an object
• Object = is also known as record, point, case, sample, entity, or
instance
- Many other attributes for 1 object --> more, the better you know the object
- If have more attributes the better identify object
- Attributes can be numerical, binary, words

Example:

Attribute values Attribute values are numbers or symbols assigned to an attribute
- Distinction between attributes and attribute values

• Same attribute can be mapped to different attribute values
Example: height can be measured in cm or meters (160 cm or 1,6 m)

• Different attributes can be mapped to the same set of values
Example: Attribute values for ID and age are integers (whole numbers,
without any fractional or decimal parts)

• However properties of attribute values can still be different
Example: ID has no limit but age has a maximum and minimum value

- For each attribute type might have collection of values. E.g; post code
- Be aware what attribute value actually is
- Sometimes make choice what attribute values you allow (miles or feet)
- Can be max and min e.g. age

Attribute types • Nominal = Categories without a specific order
(based on properties) Examples: ID numbers, eye color, zip codes

• Ordinal = Categories with a meaningful order
Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height in
{tall, medium, short}

• Interval = Numerical data without a true zero point (The value zero represents a
complete absence of the attribute being measured)
Examples: calendar dates, temperatures in Celsius or Fahrenheit.

4

€6,39

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

paulienmeulemeester

Maak kennis met de verkoper

paulienmeulemeester Universiteit Antwerpen

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 jaar

Aantal volgers

Documenten

Laatst verkocht

8 maanden geleden

0,0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper paulienmeulemeester. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,39. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 57059 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

samenvatting data mining (tabel vorm) - Advanced data analysis

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Universiteit Antwerpen (UA) > Biomedische Wetenschappen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?