Ever since Sweep Sports Analytics launched in late May our inbox has been getting the same question: “how can I do sports analytics?”. So here it is: a 10-step guide to get started with sports analytics.
A few months ago I would not have expected to have started a blog and social media accounts with my university friend Chris. The feeling of exposing our work to thousands of people across the world (we’ve actually surpassed 1 million impressions on Instagram) gives us a feeling of excitement we hadn’t expected before.
At the same time, throwing yourself into the public comes with the risk of bad criticism and disappointment. What if you make an error that throws off your analysis completely? What if your post barely gets any readers? The data don’t lie, but garbage in-garbage out; use the wrong data and you’ve put yourself in a pickle.
I have been playing around with sports datasets for the past 7 years, when I started using R for data analysis. In the beginning I mainly used sports analytics for betting purposes, with little success unfortunately. I then used sports data to make learning data science more fun. Over the last couple of years I’ve been using sports analytics for fantasy sports purposes. Since I was doing analyses about various subjects in different sports on my own time, I decided to start a page.
I found the below quotes here after randomly searching for quotes on the importance of starting.
“A year from now you may wish you had started today.” – Karen Lamb
“Don’t worry about failures, worry about the chances you miss when you don’t even try.” – Jack Canfield
“You don’t have to be good to start … you just have to start to be good!” – Joe Sabah
“You may be disappointed if you fail, but you are doomed if you don’t try.” – Beverly Sills
“You can’t build a reputation on what you are going to do.” – Henry Ford
“The way to get started is to quit talking and begin doing.” – Walt Disney
“The hardest thing about getting started, is getting started.” – Guy Kawasaki
“The secret of getting ahead is getting started.” – Mark Twain
Sports Analytics as a Part of Data Science
Essentially, sports analytics is the practice of applying mathematical and statistical principles to sports and related peripheral activities. It is a field that applies data analysis techniques to analyze various components of the sports.
Data analysis and data science have seen a great increase in interest from companies and organizations across all industries over the past years. Harvard Business Review called data scientist the sexiest job of the 21st century in 2012. Today, the hype is increasing at a lower rate, but is still increasing.
The Venn Diagram above, created by Stephan Kolassa, shows the 4 main pillars for any data scientist. The same applies to a sports data scientist.
The four pillars are Communication, Statistics, Programming, and domain knowledge: Sports.
Throughout the article below, I go through my approach of how to become great in the four pillars. Keep in mind: practice makes perfect, and no amount of theory will ever be enough if you don’t put your knowledge to the test.
I do want to state that from personal experience, I strongly believe that the best way to become a sports data scientist is to pursue a degree relevant to Data Science. That could be a degree in Statistics, Computer Science, Analytics, Mathematics, Engineering, and more. I am not the expert in academic programs and since our subscribers are located all around the globe, I will not be recommending any university studies.
Disclosure: Some of the links below are affiliate links. This means that, at zero cost to you, we will earn an affiliate commission if you click through the link and finalize a purchase. This post is not sponsored in any way.
1. Love Sports and Be Curious
It all starts with passion. I remember waking up early every morning in elementary school since I was 7 years old to watch ESPN and their sports reviews of the previous night. As a Knicks fan, I couldn’t wait to see how Patrick Ewing and the gang performed. Michael Jordan and the Bulls were building their legacy and watching the highlights gave me the biggest energy boost. My school journal is all about sports, from baseball, american football, soccer, basketball, to my own sports performance.
Do you challenge the sports commentators when they say their opinion? Do you wonder why your favorite team can’t seem to win? Do you think about what would happen if your favorite player joined a different team? Good.
2. Read, listen, and follow the experts
As a relatively fresh field, sports analytics sees new trends pop up frequently. Some sports have had years of research and applied analytics. Others have not fully felt the impact of sports analytics.
- MIT Sloan Sports Analytics Conference – Check out their podcasts and YouTube videos
- Towards Data Science
- The Wharton Moneyball Post Game Podcast
Most importantly, try reading academic articles. The depth of knowledge that comes from the experts in an academic field cannot be met (easily) elsewhere.
I also want to make a statement about who the “real” experts are. So, here’s a pop quiz: who’s more knowledgable about basketball: Shaquille O’Neal or a PhD in Statistics with a focus on Basketball Analytics from a top university?
That being said, do your own research and make sure to follow the leading experts in the sports you love. Analytics give you the extra 5-10% edge. It’s that 90-95% that you should be looking to get from experienced sports analysts, former or current players, journalists, coaches, etc.
3. Learn to Code
Whether you know your way around the terminal or not, making sure you can use code to retrieve and manipulate data is very important. The two most popular languages for data science are Python and R. I personally have been using R. I do occasionally use Python, however, since some Python packages are better equipped to tackle specific problems.
As a beginner, it really doesn’t matter what you start off with as long as you start doing something. Here’s an article I’ve been sharing lately with anyone asking me what to choose: Python vs. R for Data Science.
Recommended e-Learning Courses
Below is a list of Coursera specializations I personally recommend. I majored in Economics and gained an interest in data science during my first Master’s degree in Business. Online courses played an important part in my academic and professional career and gave me the skills and confidence to apply for a second Master’s, this time in Analytics. That kicked off my career and professionally everything’s gone uphill since then.
Note that you can enroll for free to read and view the material, or pay the enrollment fee to access the assignments and get a certificate after completing the course.The first week is free. After that, it is $40/month for access to all courses. You can of course finish all courses as fast as you can. The faster, the more you save. Coursera Specializations are groups of interrelated courses and you can take individual courses if you like.
The list of good courses out there is long so do not limit yourself to what is presented below.
- Data Science: Foundations using R Specialization – Beginners
- This Coursera specialization is in my opinion one of the best series of courses on learning R available.
- The total effort is approximately 8 hours a week for 5 months, or ~175 hours in total.
- Data Science: Statistics and Machine Learning Specialization – Intermediate
- I totally recommend this for those who already feel comfortable with R, especially the first three courses.
- The total effort is approximately 6 hours a week for 6 months, or ~150 hours in total.
- Python for Everybody Specialization – Beginners
- I’ve done this course myself and can say for sure that it’s really good.
- The total effort is approximately 3 hours a week for 8 months, or ~100 hours in total.
- Applied Data Science with Python Specialization – Intermediate
- From what I’ve been told, this Coursera specialization is awesome.
- The total effort is approximately 7 hours a week for 5 months, or ~155 hours in total.
4. Find Data Sources
Before you start an analysis, make sure you have adequate data. Interested in making NBA shot charts? Great – there’s plenty of data available, for free too! Are you interested in predicting the outcome of soccer matches using team stats? Basic and advanced stats are available, for free, but you may need to pay to get a larger and more detailed data set. Looking to modelize boxing bouts with Markov Chains? You might be out of luck.
Keep in mind that it’s common to spend more than 50% of your time retrieving, cleaning, and preparing data. It’s probably the most challenging and frustrating part of any analysis.
We have started providing our own data sources to the public. Check them out here. Make sure to subscribe to our blog to get updates when we post more datasets.
For more data sources, see below.
Free data sources
Paid data sources
- Opta Data from Stats Perform – World Leaders in Sport Data.
- BigDataBall – Historical sports data for NBA, MLB, NFL, NHL and WNBA
- SportMonks – Fast and Reliable Sports Data and APIs.
Using Code to Pull Data for Free
Check out the below R packages and tutorials to scrape data off the web. Also, stay tuned and follow us for tutorials.
- nbastatR package for NBA data.
- Eurolig package for EuroLeague data.
- engsoccerdata package for three English leagues (League data, FA Cup data, Playoff data), several European leagues (Spain, Germany, Italy, Holland, France, Belgium, Portugal, Turkey, Scotland, Greece) as well as South Africa and MLS.
- worldfootballR to extract data from FBref, Transfermarkt, and Understat.
- Get ATP tennis data here.
- ufc.stats R package.
5. Understand the Analytics Techniques that Work for Sports
After getting a general idea about what data science and machine learning can offer, it’s time to apply them to sports. Unless you’re an academic researcher, I recommend going with the flow and trusting others on which methods and models to use.
I have personally found books to be more useful than courses for this part. Some of my favorite ones so far are:
- SprawlBall: A Visual Tour of the New Era of the NBA: Kirk Goldsberry is a pioneer in the field of basketball analytics. Every single inch of this book is enjoyable. It’s inspiring to see how images can say so much.
- Thinking Basketball this is a great book for beginners in basketball analytics. It gives a good view of the approaches
- Basketball Data Science: With Applications in R: this book from two professors of Statistics comes with ready R code. The explanations are excellent and easy to understand. I keep mentioning this in almost all of our basketball posts that this book is a MUST.
- Dean Oliver’s Basketball On Paper: Rules and Tools for Performance Analysis: an all-time classic. As one review puts it, “At the risk of succumbing to hyperbole, BASKETBALL ON PAPER is a revolutionary strike for statistical analysis of the game of basketball. . . . There has never been a basketball book quite like [it].”
- Football Hackers: The Science and Art of a Data Revolution: I only recently bought this. Half-way through and every page keeps me asking for more.
- Soccernomics (2018 World Cup Edition) is a really interesting and fun book! I love the way it manages to give answers to questions that all soccer fans have asked.
- Moneyball: you’ve probably watched the movie, but Michael Lewis’ writing style cannot be transferred to the screen. The book about the sports analytics success story that started it all.
This list can actually go on but I’ll cut it here. I am greatly interested in hearing about your favorite books so feel free to comment!
Some good quality courses on sports analytics can be found below in the specialization series. As expected, there is some overlapping with the content in the courses mentioned in the previous part.
- Sports Analytics Specialization – Intermediate
- Offered by the University of Michigan, this specialization uses Python
- The total effort is approximately 6 hours a week for 7 months, or ~180 hours in total.
Check out the below tutorials. They’re fairly simple to understand and the code can be applied easily to other datasets.
- How to make NBA shots charts in R
- How to Calculate Expected Goals (xG): A Simple Tutorial Using R
- An Introduction to Modelling Soccer Matches in R (part 1)
- Brief introduction to the new version of the eurolig package for analyzing play-by-play and shot location data from the Euroleague
- Hybrid Machine Learning Forecasts for the 2019 FIFA Women’s World Cup
- NBA Shot Charts Part 1: Getting the Data (Python)
- Predicting football player positions using k-Nearest Neighbors
- Introduction to K-Means with Python – Clustering Shot Creators in the Premier League
6. Become a Storyteller
Yuval Noah Harari wrote in his book Sapiens: A Brief History of Humankind that one of the reasons we as Homo Sapiens are the dominant species of humans is our ability to communicate. Use this ancient skill. As with all aspects of social life, either in personal or business environment, you need to communicate in a clear and engaging way.
To quote a Harvard Business Review article,
Telling stories is one of the most powerful means that leaders have to influence, teach, and inspire. What makes storytelling so effective for learning? For starters, storytelling forges connections among people, and between people and ideas. Stories convey the culture, history, and values that unite people. When it comes to our countries, our communities, and our families, we understand intuitively that the stories we hold in common are an important part of the ties that bind.Vanessa Boris
Below are links to some good courses and videos on presentation skills and storytelling.
For access to the Skillshare courses, you can sign up here for a full month of Premium for free. Skillshare is a Netflix-like online learning community with a lot of great short videos for a whole bunch of subjects. I’m using Skillshare to learn how to take better pictures of my 4-month-old baby girl 🙂
- The Art of Storytelling in Communication and Presentation: Stories that Inspire Actions
- 85 minutes of the science behind storytelling
- Introduction to Data Visualization: From Data to Design
- In 90 minutes, you’ll learn how to build a data set, transform information into graphics, and unlock the storytelling power of data.
- Storytelling for Leaders: How to Craft Stories That Matter
- 20 minutes that teach you how to craft a compelling story that matters to your audience.
- Data Visualization & Dashboarding with R Specialization – Beginners
- This Coursera Specialization offered by Johns Hopkins University, is simply amazing!
- I recommend going through at least the first two courses, preferably the 3rd as well.
- The total effort is approximately 5 hours a week for 4 months, or ~85 hours in total.
- Python Data Visualization – Beginners
- Offered by Rice University, this is the last course in the Introduction to Python Scripting Specialization.
- The overall effort is around 9 hours for this course.
7. Analyze Everything in Your Path
Tired of hearing people compare LeBron to the GOAT Michael Jordan? Are your friends criticizing you for believing that Giroud is a good striker? Do you want to top your fantasy league?
Hic Rhodus, hic salta. Just do it. Walk the talk.
Get your hands dirty. Get used to different datasets. Use different methods, try out different algorithms. I really don’t think there’s any other better way to go about than learning by doing.
8. Build your Personal Brand – Showcase Your Analyses
Did you create something cool? Did you make an interesting finding? Does your analysis look good?
Show the world!
Starting a social media page to display your findings is simple. Start talking on Twitter, start posting graphs on Instagram. Start a blog, post your analyses on LinkedIn or Facebook. Make a video explaining your analysis results on YouTube or TikTok. Build your personal brand using any combination of media you like.
Become the person your friends, classmates, and colleagues would go to for anything related to sports and analytics.
If you need an extra platform, reach out to us and we will happily consider posting your analysis on our blog.
Make sure your LinkedIn profile is updated. Try reaching out to people working in roles that you aspire to pursue later in your career. Connect with them and ask them a question or two about how they made it.
Become active on Twitter. Follow the experts and good content creators and comment on their posts. Make a bold statement and even message
Keep an eye out for conferences and meet-ups in your area. Join groups on Facebook or Reddit, ask questions, meet people.
Share your ideas and ask others for their opinions. Listen to others and question what you already know.
Whatever your goal is, breathe, think, and do sports analytics. Listen to a relevant podcast when you’re on the bus or the train. Watch a YouTube video as you’re waiting for the pizza to be delivered. Challenge what the sportscasters and radio presenters are saying. Think about what data you would need and what analysis you should do to prove – or disprove – your point.
Spend time every day to do some sort of analysis.
The Magic Recipe…
…does not exist. Use your own judgment. Find what you enjoy and follow it. One course might not suit you, so try the next.
Do make sure you have a good enough grasp on the four pillars I mentioned at the start of the article. Trust your natural tendency – if you like the communication part, focus on that. If you get excited with statistical analyses, do your thing.
At the end of the day, having followed the 10 steps above, you’ll have gained skills and tools that can be applied not only to sports but also to any other field of analytics.