In this overview, I just want to kind of tell you what you 're gon na see and what some of my
goals are for you and for for the sequence. So I 'll just start with some goals like I said one of my
my primary goals is for this to be an overview. I'm not gonna get into all of the nuts and bolts
and nitty gritty about every type of machine learning or visualization.
In fact, I 'm going for a big picture so that you know where to look. That's important. I want you
to know. data science is not a panacea or a silver bullet. It 's not like everything is now easy
because of data science. There are major challenges that go along. With these opportunities so
that's a lot of goals but really just giving you a high level overview being literate with what the
different techniques in data science are and getting you excited about the opportunities and
challenges.
Data science is an emerging scientific discipline motivated by data intensive science, but it 's
really the science of how you handle data, collect clean stores, visualize and model with data.
Data science as a terminology means different things to different people, but we 've been doing
data science as humans for hundreds thousands of years collecting data modeling the world.
Data intensive inquiry laid the foundation for what Newton would go on to do. Kepler described
these elliptic motion of the planets and Newton explained why the plants move in these ellipses.
We should be aspiring to make our algorithms go from Kepler to Newton and that is a
worthwhile goal. It 's also very challenging okay so data science has been around for a long
time. There's a really interesting modern book called the fourth paradigm Data Intensive
scientific discovery, which basically shows or describes this. progression from theory and
analytics Mattox to experiments collecting data. we need a science that ties these together okay
just like simulations didn't displace experiments. They complement each other okay so that 's
just a very high level overview'.
What is Data Science?
A lot of data science goes into collecting the data, storing the data, cleaning the data
processing. Data science is a dynamic process again in the hands of expert humans. Teams of
humans asking questions that can be answered with data. There might be feedback loops and
all levels of this process and this is not kind of a static architecture.Data engineering is an
inherently collaborative art. So data science involves teams of people collaborating so teams of
experts. Data science is not about the data. It's not about algorithms. It's about the people and
the tea and the problems you're asking it's questions you 're asking it 's problems you want to
solve problem solving. data science is evolving being a core functionality that your whole team
you have to integrate this into into your team okay other things that are really important are
these aspects. of reproducibility. reproducibility is also critical in data science. How do you
collect enough data that your processes are reproducible that you 're staying you have
standards in some sense okay.