CSL7630: Algorithms for Big Data

Offered for Spring 2022

  • Credits L-T-P [C]: 3-0-0 [3]
  • Where: Google Meet (Link available in Moodle)
  • LMS: https://classroom.sumankundu.info/moodle/
  • Slot: R
    • Thursday 6:00 PM to 7:30 PM
    • Saturday 4:00 PM to 5:00 PM
  • Please white list the email
    • noreply[at]classroom[dot]sumankundu[dot]info

Syllabus

  • Introduction: Randomized algorithms, Universal Hash Family, Probabilistic Algorithm Analysis, Approximation Algorithms, $\epsilon$-Approximation Schemes, Sublinear time complexity, Sublinear Algorithms.
  • Property Testing: Testing list’s sortedness or monotonicity, Distribution testing, Testing properties of bounded degree graphs, Dense graphs and General graphs.
  • Sketching and Streaming: Extremely Small-Space Data Structures, CountMin Sketch, Count Sketch, Turnstile Streaming, AMS Sketch, Graph Sketching, Graph Connectivity
  • Map-Reduce: Map-Reduce Algorithms in Constrains Settings such as small memory, few machines, few rounds, and small total work, Efficient Parallel Algorithms
  • External memory and cache-obliviousness: Minimizing I/O for large datasets, Algorithms and data structures such as B-trees, Buffer trees, Multiway merge sort

Learning Materials

  1. Scribing of Lectures Algorithms for Big Data By Prof. Jelani Nelson, Harvard University

  2. Algorithmic Techniques for Big Data Analysis By Prof. Barna Saha, University of Minnesota Twin Cities

  3. https://www.sketchingbigdata.org/

  4. Introduction to Property Testing by Oded Goldreich

Grading Policy

  • Quiz: 12% (https://www.classmarker.com/)
  • Assignments: 10%
  • Review Project:
    • Research Work: 12%
    • Peer Review: 8%
    • Presentation: 8% (dates to be announced)
    • Report: 10%
  • Major: 40%

Quiz

  • There will be 3 to 4 Quizzes
  • One lowest will be removed from the grade
  • Dates and Syllabus will be announced in Moodle

Assignments:

  • There will be 3 to 4 programming assignments
  • Best 2 will be considered for grading
  • Programming or Analytical Assignment

Review Project

  • Goal is to write a review on a specific topic of “Algorithms for Big Data”
  • The review report should be between 3000-4000 words.
  • Each of you need to read at least 3 papers on the topic.
  • Review should be submitted with Single Column, double spacing pdf generated with LaTeX (Template will be shared through Moodle.)

Important Dates:

  • January 15, 2022: Topics Will be Floated
  • January 20, 2022: Topic Allocation
  • February 28, 2022: Mid-term Progress Report (3 min presentation)
  • March 31, 2022: Report Submission
  • April 7, 2022: Peer-review Allocation
  • April 16, 2022: Peer-review Submission
  • April 22, 2022: Final Report Submission Incorporating Review

Plagiarism tolerance is 7% from single source and 15% cumulative, anything more will reduce your marks as follows:

  • Any logical/conceptual/formulation plagiarism: zero marks
  • Other form of plagiarism (above 50%): zero marks
  • Otherwise: Percentage of plagiarism will be deducted from the obtained mark