Skip to main content

Mining Massive Datasets

This course ends at 12 noon Pacific Time on March 26, 2020. If a Statement of Accomplishment is available for this course, it will be available for download until March 31, 2020.

About This Course

Welcome to the self-paced version of Mining of Massive Datasets!

The course is based on the text Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, and Jeff Ullman, who by coincidence are also the instructors for the course. The book is published by Cambridge Univ. Press, but by arrangement with the publisher, you can download a free copy Here. The material in this on-ine course closely matches the content of the Stanford course CS246.

The major topics covered include: MapReduce systems and algorithms, Locality-sensitive hashing, Algorithms for data streams, PageRank and Web-link analysis, Frequent itemset analysis, Clustering, Computational advertising, Recommendation systems, Social-network graphs, Dimensionality reduction, and Machine-learning algorithms.


The course is intended for graduate students and advanced undergraduates in Computer Science. At a minimum, you should have had courses in Data structures, Algorithms, Database systems, Linear algebra, Multivariable calculus, and Statistics.

Course Staff

Jure Leskovec

Jure Leskovec

Jure is an associate professor of computer science at Stanford. His research area is mining of large social and information networks. He is the author of the Stanford Network Analysis Platform, a general-purpose network analysis and graph mining library. For more information, see his Home Page.

Anand Rajaraman

Anand Rajaraman

Anand is a serial entrepreneur, venture capitalist, and academic, based in Silicon Valley. He founded two successful startups, Junglee (acquired by Amazon) and Kosmix (acquired by Walmart). At Amazon, he was co-inventor of Mechanical Turk. Currently, he is a founding partner of Milliways Labs, an early-stage venture-capital firm. For more information, see his Blog, called "Datawocky".

Jeff Ullman

Jeffrey D. Ullman

Jeff Ullman is a retired professor of Computer Science at Stanford. His Home Page offers additional information about the instructor.

Frequently Asked Questions

Do I need to buy a textbook?

No. The course follows the text Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, and Jeff Ullman. It is published by Cambridge University Press, but by permission of the publishers, you can download a free copy Here.

How much work is expected?

The amount of work will vary, depending on your background and the ease with which you follow mathematical and algorithmic ideas. However, 10 hours per week is a good guess.

Will statements of accomplishment be offered?

Yes. You need to get 50% of the marks (half for homework, half for the final). An SoA with Distinction requires 80% of the marks.

  1. Course Number

  2. Classes Start

  3. Classes End

  4. Estimated Effort

    5 hours per week
  5. Price