Skip to main content

Mining Massive Datasets

Enrollment is Closed


This is an archived course. This course ended in 2016 and it is no longer possible to enroll in it. If a Statement of Accomplishment was made available in this course to enrolled learners who earned a passing score before the course ended, it will be available for download until March 31, 2020.

About This Course

The course is based on the text Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, and Jeff Ullman, who by coincidence are also the instructors for the course. The book is published by Cambridge Univ. Press, but by arrangement with the publisher, you can download a free copy Here. The material in this on-ine course closely matches the content of the Stanford course CS246.

The major topics covered include: MapReduce systems and algorithms, Locality-sensitive hashing, Algorithms for data streams, PageRank and Web-link analysis, Frequent itemset analysis, Clustering, Computational advertising, Recommendation systems, Social-network graphs, Dimensionality reduction, and Machine-learning algorithms.


The course is intended for graduate students and advanced undergraduates in Computer Science. At a minimum, you should have had courses in Data structures, Algorithms, Database systems, Linear algebra, Multivariable calculus, and Statistics.

Course Staff

Jure Leskovec

Jure Leskovec

Jure is an associate professor of computer science at Stanford. His research area is mining of large social and information networks. He is the author of the Stanford Network Analysis Platform, a general-purpose network analysis and graph mining library. For more information, see his Home Page.

Anand Rajaraman

Anand Rajaraman

Anand is a serial entrepreneur, venture capitalist, and academic, based in Silicon Valley. He founded two successful startups, Junglee (acquired by Amazon) and Kosmix (acquired by Walmart). At Amazon, he was co-inventor of Mechanical Turk. Currently, he is a founding partner of Milliways Labs, an early-stage venture-capital firm. For more information, see his Blog, called "Datawocky".

Jeff Ullman

Jeffrey D. Ullman

Jeff Ullman is a retired professor of Computer Science at Stanford. His Home Page offers additional information about the instructor.

Frequently Asked Questions

Do I need to buy a textbook?

No. The course follows the text Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, and Jeff Ullman. It is published by Cambridge University Press, but by permission of the publishers, you can download a free copy Here.

How much work is expected?

The amount of work will vary, depending on your background and the ease with which you follow mathematical and algorithmic ideas. However, 10 hours per week is a good guess.

Will statements of accomplishment be offered?

Yes. You need to get 50% of the marks (half for homework, half for the final). An SoA with Distinction requires 80% of the marks.

  1. Course Number

  2. Classes Start

  3. Classes End

  4. Estimated Effort

    10 hours per week
  5. Price