Welcome to this course
This is an equal opportunity course that offers you a supportive and inclusive space to learn. Everyone, no matter their age, gender, race, or nationality, can be successful in this course. People like you are joining from all over the world and we value this diversity. We hope you enjoy learning about topics that are important to you.
StatLearning now self paced!
The active course run for Statistical Learning has ended, but the course is now available in a self paced mode. You are welcome to join the course and work through the material and exercises at your own pace. When you have completed the exercises with a score of 50% or higher, you can generate your Statement of Accomplishment from within the course.
The course will remain available for an extended period of time. We anticipate the content will be available until at least December 31, 2020. You will be notified by email of any changes to content availability beforehand.
About This Course
This is an introductory-level course in supervised learning, with a
focus on regression and classification methods. The syllabus
includes: linear and polynomial regression, logistic regression and
linear discriminant analysis; cross-validation and the bootstrap,
model selection and regularization methods (ridge and lasso);
nonlinear models, splines and generalized additive models; tree-based
methods, random forests and boosting; support-vector machines. Some
unsupervised learning methods are discussed: principal components and
clustering (k-means and hierarchical).
This is not a math-heavy class, so we try and describe the methods
without heavy reliance on formulas and complex mathematics. We focus
on what we consider to be the important elements of modern data
analysis. Computing is done in R. There are lectures devoted to R,
giving tutorials from the ground up, and progressing with more
detailed sessions that implement the techniques in each chapter.
The lectures cover all the material in
An Introduction to Statistical
Learning, with Applications in R by James, Witten, Hastie and
Tibshirani (Springer, 2013). The pdf for this book is available for free on the book website.
Prerequisites
First courses in statistics, linear algebra, and computing.
Course Staff
Trevor Hastie
Trevor Hastie is the John A Overdeck Professor of Statistics at
Stanford University. Hastie is known for his research in applied
statistics, particularly in the fields of data mining, bioinformatics
and machine learning. He has published four books and over 180
research articles in these areas. Prior to joining Stanford
University in 1994, Hastie worked at AT&T Bell Laboratories for 9
years, where he helped develop the statistical modeling environment
popular in the R computing system. He received his B.S. in statistics
from Rhodes University in 1976, his M.S. from the University of Cape
Town in 1979, and his Ph.D from Stanford in 1984. Professor Hastie is
an elected fellow of the Institute of Mathematical Statistics, the American Statistical Association, the International Statistics Institute, the South African Statistical Association and the Royal Statistical Society. He has received a number of awards and honors, including the Myrto Lefkopolous award from Harvard in 1994, the Parzen Prize for Innovation in 2014, and the Distnguished Rhodes University Alumni award in 2015, and was elected to the National Academy of Sciences in 2018.
Rob Tibshirani
Robert Tibshirani is a Professor in the Departments
Health Research and Policy and Statistics at Stanford University.
In his work he has made important contributions to the analysis of
complex datasets, most recently in genomics and proteomics. His
most well-known contribution is the Lasso, which uses L1 penalization
in regression and related problems.
He has co-authored over 200 papers and three books.
Professor Tibshirani co-authored the first study that linked cell phone
usage with car accidents, a widely cited article that has played a role
in the introduction of legislation that restricts the use of phones
while driving. He is one of the most widely cited authors in the entire
mathematical sciences field.
Professor Tibshirani is a Fellow of the American Statistical Association,
the Institute of Mathematical Statistics and the Royal Society of Canada.
He won the
prestigious COPSS Presidents's award in 1996,
the NSERC Steacie award in 1997
and was elected to the National Academy of Sciences in 2012.
Course Production Team
Will Fithian and Sam Gross produced and formatted the quiz questions and review questions. Daniela Witten helped present some of the material in Chapter 5. Wes Choy managed the video production. Greg Maximov filmed and edited most of the course videos, as well as the interviews and group recordings. Greg Bruhns, Monica Diaz and Marc Sanders assisted with Open edX.
Frequently Asked Questions
Do I need to buy a textbook?
No, a free online version of An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and
Tibshirani (Springer, 2013) is available from that website. Springer has agreed to this, so no need to worry about copyright. Of course you may not distribiute printed versions of this pdf file.
Is R and RStudio available for free.
Yes. You get R for free from
http://cran.us.r-project.org/. Typically it installs with a click. You get RStudio from http://www.rstudio.com/, also for free, and a similarly easy install.
How many hours of effort are expected per week?
We anticipate it will take approximately 3-5 hours per week to go through the materials and exercises in each section.
Will I receive a statement of accomplishment?
Yes, if you complete the course, and achieve a passing grade of 50% on the quizzes, you can generate a Statement of Accomplishment from within the course. If you get 90% or higher, your statement will be "with distinction".
Icons CCBY: Noun Project Cara Foster, Adrien Coquet