CSL7030: Algorithms for Big Data (winter 2020)

Syllabus

  • Introduction: Randomized algorithms, Universal Hash Family, Probabilistic Algorithm Analysis, Approximation Algorithms, $\epsilon$-Approximation Schemes.
  • Sketching and Streaming: Extremely Small-Space Data Structures, CountMin Sketch, Count Sketch, Turnstile Streaming, AMS Sketch, Graph Sketching, Graph Connectivity
  • Map-Reduce: Map-Reduce Algorithms in Constrains Settings such as small memory, few machines, few rounds, and small total work, Efficient Parallel Algorithms
  • External memory and cache-obliviousness: Minimizing I/O for large datasets, Algorithms and data structures such as B-trees, Buffer trees, Multiway merge sort

Learning Materials

  1. Scribing of Lectures Algorithms for Big Data By Prof. Jelani Nelson, Harvard University

  2. Algorithmic Techniques for Big Data Analysis By Prof. Barna Saha, University of Minnesota Twin Cities

  3. https://www.sketchingbigdata.org/

Grading Policy

Quiz

  • There will be 3 to 4 Quizzes
  • Dates and Syllabus will be announced in Classroom

Review Project

  • Goal is to write a review on a specific topic of “Algorithms for Big Data”
  • The review should be comprehensive between 3000-4000 words. Do not write review only on one/two papers.
  • Review should be submitted with Single Column, double spacing pdf generated with LaTeX (use any standard template e.g., ACM, IEEE, Elsevier etc.)
You may work in a group of 2/3 person. Expectation of larger group will be higher.

Important Dates:

  • December 10, 2020: Groups Identified
  • December 25, 2020: Abstract Submission
  • January 20, 2021: Submission
  • January 30, 2021: First Report
  • February 5, 2021: Final Submission
Plagiarism tolerance is 7% from single source and 15% cumulative, anything more leads to lower or no marks at all.

Presentation + Viva

  • Between Jan 18-30, 2021
  • Exact dates will be announced latter