Artificial Intelligence (AI), Machine Learning (ML), Deep Learning, Neural Networks – Data Science can become an overwhelming topic for beginners. It’s purpose is to make sense of massive amounts of data collected from user behaviour, machine sensors, stock market movements, weather, financial metrics or other things.
The role of processing these large amount of data has long since arrived in our everyday lives. From personalized ads, content curation in social networks to route planning for our commutes, all of these topics are applications of data science in practise. Also, we see promising applications emerging, such like personalized medicine.
As a Python developer, data science is an interesting topic to dive into, but it is also challenging. This article is a curated list of beginner friendly resources to start your journey on becoming a data scientist.
What is Data Science, how does it affect our lives?
A good start to AI/ML is to get an overall picture of its different applications in our daily lives. Obviously you could skip this section, if you’re only interested in the technical details.
- Life 3.0 (Max Tegmark) – The book discusses a variety of societal implications, what can be done to maximize the chances of a positive outcome, and potential futures for humanity, technology and combinations thereof.
Mathematical Foundations
Running complex algorithms on huge data sets requires at least a basic understanding of the math behind it. Here are some bgeinner-friendly resources that helps you grasping these concepts
- Programming Collective Intelligence – This book was published in 2007 at a time when the ecosystem of Open Source Data Science libraries was way smaller. Still, this book covers commonly used algorithms with examples built from scratch. This makes it a good starting point.
Hands-on Machine Learning with Scikit-Learn, Keras and Tensorflow 2.0 –
- Doing Data Science – Published in 2013, this book presents methods and models for case studies at companies like Google, Microsoft, and eBay.
- Practical Statistics for Data Scientists – This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not
- Introduction to Machine Learning with Python – You’ll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library
Video Course / MOOCs
- Machine Learning Recipes – A series of free introductory videos to Machine Learning by Google Developers. It includes short videos about Tensorflow, scikit-learn, and TFLearn as well as an introduction to neural networks.
- Elements of AI – A beginner level free course made by the University of Helsinki to demystify the basics of Artificial Intelligence for Beginners
- Machine Learning Crash Course with TensorFlow APIs – A free 25 lessons course by Google
- Introduction to Deep Learning – MIT’s free introductory course on deep learning methods
Books
- Grokking Deep Learning – Grokking Deep Learning teaches you to build deep learning neural networks from scratch
- Deep Learning (MIT Press) – The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular
- Machine Learning Simplified – A Gentle Introduction to Supervised Learning
Popular Libraries for Data Science in Python
Here is a list of popular Python libraries commonly used for data science applications:
- numpy – The fundamental package for scientific computing with Python
- scipy – Fundamental algorithms for scientific computing in Python
- pandas – A fast, powerful, flexible and easy to use open source data analysis and manipulation tool
- tensorflow – An end-to-end open source platform for machine learning, initially developed by Google
- scikit-learn – Simple and efficient tools for predictive data analysis, built on NumPy, SciPy, and matplotlib
- TFLearn – Deep learning library featuring a higher-level API for TensorFlow
- PyTorch – An open source machine learning framework
- Keras – An open-source software library that provides a Python interface to TensorFlow for artificial neural networks
Cloud Computing Resources
Running Data Science applications at scale is a great use case for Cloud Computing. Microsoft Azure has some
- Getting Started with Azure – If you haven’t already, get your free Azure account here and enjoy 12 months of free services.
- Azure ML Studio – Start right away with Microsoft Azure Machine Learning Sevices right from you Visual Studio Code
Further Resources
If you want to dig deeper into the rabbit hole of Data Science, here are some more resources:
- Awesome Data Science – The awesome lists repositories often provides a good collection of resources around a specific topic, and the awesome-datascience repository is no exception. It contains a very comprehensive list of books, moocs, tutorials, and other content for all learnes of all levels of experience.
- Tensor Flow Resources