CSL7630: Algorithms for Big Data
Offered for Spring 2022
- Credits L-T-P [C]: 3-0-0 [3]
- Where: Google Meet (Link available in Moodle)
- LMS: https://classroom.sumankundu.info/moodle/
- Slot: R
- Thursday 6:00 PM to 7:30 PM
- Saturday 4:00 PM to 5:00 PM
- Please white list the email
- noreply[at]classroom[dot]sumankundu[dot]info
Syllabus
- Introduction: Randomized algorithms, Universal Hash Family, Probabilistic Algorithm Analysis, Approximation Algorithms, $\epsilon$-Approximation Schemes, Sublinear time complexity, Sublinear Algorithms.
- Property Testing: Testing list’s sortedness or monotonicity, Distribution testing, Testing properties of bounded degree graphs, Dense graphs and General graphs.
- Sketching and Streaming: Extremely Small-Space Data Structures, CountMin Sketch, Count Sketch, Turnstile Streaming, AMS Sketch, Graph Sketching, Graph Connectivity
- Map-Reduce: Map-Reduce Algorithms in Constrains Settings such as small memory, few machines, few rounds, and small total work, Efficient Parallel Algorithms
- External memory and cache-obliviousness: Minimizing I/O for large datasets, Algorithms and data structures such as B-trees, Buffer trees, Multiway merge sort
Learning Materials
-
Scribing of Lectures Algorithms for Big Data By Prof. Jelani Nelson, Harvard University
-
Algorithmic Techniques for Big Data Analysis By Prof. Barna Saha, University of Minnesota Twin Cities
-
Introduction to Property Testing by Oded Goldreich
Grading Policy
- Quiz: 12% (https://www.classmarker.com/)
- Assignments: 10%
- Review Project:
- Research Work: 12%
- Peer Review: 8%
- Presentation: 8% (dates to be announced)
- Report: 10%
- Major: 40%
Quiz
- There will be 3 to 4 Quizzes
- One lowest will be removed from the grade
- Dates and Syllabus will be announced in Moodle
Assignments:
- There will be 3 to 4 programming assignments
- Best 2 will be considered for grading
- Programming or Analytical Assignment
Review Project
- Goal is to write a review on a specific topic of “Algorithms for Big Data”
- The review report should be between 3000-4000 words.
- Each of you need to read at least 3 papers on the topic.
- Review should be submitted with Single Column, double spacing pdf generated with LaTeX (Template will be shared through Moodle.)
Important Dates:
- January 15, 2022: Topics Will be Floated
- January 20, 2022: Topic Allocation
- February 28, 2022: Mid-term Progress Report (3 min presentation)
- March 31, 2022: Report Submission
- April 7, 2022: Peer-review Allocation
- April 16, 2022: Peer-review Submission
- April 22, 2022: Final Report Submission Incorporating Review
Plagiarism tolerance is 7% from single source and 15% cumulative, anything more will reduce your marks as follows:
- Any logical/conceptual/formulation plagiarism: zero marks
- Other form of plagiarism (above 50%): zero marks
- Otherwise: Percentage of plagiarism will be deducted from the obtained mark