Data has always been crucial and important; science is supposed to be evidence based, people did
experiments, spend time on data collection and then try to find out similarities which leads to
physical laws of science and other stuff that we operate on. This is changing. Massive amount
computer power and data storage. Drivers of these change is Moore’s law; computer power doubles
every 18 months, or, the same amount of computing power costs half of the price is 18 months. The
result of that is packing more transistors in a computer. Cryvers law is about data storage. Here you
see that 1 GB in 1980 was about one million dollars, and now you get it for free. Nobody throws
away anything, you keep it in digital form as it’s more expensive to clean out the mess than buying a
bigger hard drive.
Two big drivers in the field that propel this notion of faster and more powerful computers and cost of
storage down:
I. Commerce, marketing/getting people to click on ads. We think it’s about buying and selling
stuff, but it lives on data collected through the process of selling stuff.
II. Risk society, we don’t tolerate any risk anymore. We have seatbelts in cars, helmets on bikes.
We invest an enormous amount in preventing crimes and terrorism. On data scale that is the
national security, these are organisations that are main drivers in the data world.
The age of Big Data
Big data is a marketing term, but based on something tangible. Big data consists of:
I. (large) Volumes, not fitting on your laptop although this is not completely true.
II. Variety of data
III. Velocity, it changes all the time. Medical practitioner having a paper based file, more
progressive practitioners had a database with patient data, but this was very static. This is
not what big data is about. It’s about batch processing real time processing of data, look
what happens on Facebook or Twitter, it changes all the time and on a massive scale and
composed of all sorts of data.
Tweets are totally unstructured as they could be about anything. They contain hashtags, normal texts
and everything in between. Then, images are even more complicated. It all happens on a large scale.
All this data is in the big data mess with which people do stuff.
In the old days, people would put much effort in collecting data to establish regularities in the world
but this is changing now, empirical research is changing. CBS interviewed people about how they feel
about the world today, they’re trying to get consumer sentiment. If you pop this over time, you see
that it changes. It turns out that there was a peak first and later people became more pessimistic
(during the crisis in 2008).
Today, there is various means to use data:
I. Data has a need with the traditional way and if you know something about events taking
place at a certain location then you can filter out these outliers. For instance, you take a
bunch of tweets, take out the negatives and positive emotions, then you take a large sample
of tweets and count how many times these words occur.
II. AB testing, this is the biggest experiment that all of us are in, all the time. When you go into a
website, it divides people into two categories. One gets to see type A and the other version B
of the same website. Content, colours, font, anything can be different. Then you observe
what happens and how many people respond to a certain thing. If more people sign up when
the form is on the middle of the website instead of the bottom, you know this is a better
experiments, spend time on data collection and then try to find out similarities which leads to
physical laws of science and other stuff that we operate on. This is changing. Massive amount
computer power and data storage. Drivers of these change is Moore’s law; computer power doubles
every 18 months, or, the same amount of computing power costs half of the price is 18 months. The
result of that is packing more transistors in a computer. Cryvers law is about data storage. Here you
see that 1 GB in 1980 was about one million dollars, and now you get it for free. Nobody throws
away anything, you keep it in digital form as it’s more expensive to clean out the mess than buying a
bigger hard drive.
Two big drivers in the field that propel this notion of faster and more powerful computers and cost of
storage down:
I. Commerce, marketing/getting people to click on ads. We think it’s about buying and selling
stuff, but it lives on data collected through the process of selling stuff.
II. Risk society, we don’t tolerate any risk anymore. We have seatbelts in cars, helmets on bikes.
We invest an enormous amount in preventing crimes and terrorism. On data scale that is the
national security, these are organisations that are main drivers in the data world.
The age of Big Data
Big data is a marketing term, but based on something tangible. Big data consists of:
I. (large) Volumes, not fitting on your laptop although this is not completely true.
II. Variety of data
III. Velocity, it changes all the time. Medical practitioner having a paper based file, more
progressive practitioners had a database with patient data, but this was very static. This is
not what big data is about. It’s about batch processing real time processing of data, look
what happens on Facebook or Twitter, it changes all the time and on a massive scale and
composed of all sorts of data.
Tweets are totally unstructured as they could be about anything. They contain hashtags, normal texts
and everything in between. Then, images are even more complicated. It all happens on a large scale.
All this data is in the big data mess with which people do stuff.
In the old days, people would put much effort in collecting data to establish regularities in the world
but this is changing now, empirical research is changing. CBS interviewed people about how they feel
about the world today, they’re trying to get consumer sentiment. If you pop this over time, you see
that it changes. It turns out that there was a peak first and later people became more pessimistic
(during the crisis in 2008).
Today, there is various means to use data:
I. Data has a need with the traditional way and if you know something about events taking
place at a certain location then you can filter out these outliers. For instance, you take a
bunch of tweets, take out the negatives and positive emotions, then you take a large sample
of tweets and count how many times these words occur.
II. AB testing, this is the biggest experiment that all of us are in, all the time. When you go into a
website, it divides people into two categories. One gets to see type A and the other version B
of the same website. Content, colours, font, anything can be different. Then you observe
what happens and how many people respond to a certain thing. If more people sign up when
the form is on the middle of the website instead of the bottom, you know this is a better