CSL7630: Algorithms for Big Data
Offered for Spring 2022
Credits L-T-P [C]: 3-0-0 [3]
Where: Google Meet (Link available in Moodle)
LMS: https://classroom.sumankundu.info/moodle/
Slot: R
Thursday 6:00 PM to 7:30 PM
Saturday 4:00 PM to 5:00 PM
Please white list the email
noreply[at]classroom[dot]sumankundu[dot]info
Syllabus
Introduction: Randomized algorithms, Universal Hash Family, Probabilistic Algorithm Analysis, Approximation Algorithms, $\epsilon$-Approximation Schemes, Sublinear time complexity, Sublinear Algorithms.
Property Testing: Testing list’s sortedness or monotonicity, Distribution testing, Testing properties of bounded degree graphs, Dense graphs and General graphs.
Sketching and Streaming: Extremely Small-Space Data Structures, CountMin Sketch, Count Sketch, Turnstile Streaming, AMS Sketch, Graph Sketching, Graph Connectivity
Map-Reduce: Map-Reduce Algorithms in Constrains Settings such as small memory, few machines, few rounds, and small total work, Efficient Parallel Algorithms
External memory and cache-obliviousness: Minimizing I/O for large datasets, Algorithms and data structures such as B-trees, Buffer trees, Multiway merge sort
Learning Materials
Scribing of Lectures Algorithms for Big Data By Prof. Jelani Nelson, Harvard University
Algorithmic Techniques for Big Data Analysis By Prof. Barna Saha, University of Minnesota Twin Cities
https://www.sketchingbigdata.org/
Introduction to Property Testing by Oded Goldreich
Grading Policy
Quiz: 12% (https://www.classmarker.com/ )
Assignments: 10%
Review Project:
Research Work: 12%
Peer Review: 8%
Presentation: 8% (dates to be announced)
Report: 10%
Major: 40%
Quiz
There will be 3 to 4 Quizzes
One lowest will be removed from the grade
Dates and Syllabus will be announced in Moodle
Assignments:
There will be 3 to 4 programming assignments
Best 2 will be considered for grading
Programming or Analytical Assignment
Review Project
Goal is to write a review on a specific topic of “Algorithms for Big Data”
The review report should be between 3000-4000 words.
Each of you need to read at least 3 papers on the topic.
Review should be submitted with Single Column, double spacing pdf generated with LaTeX (Template will be shared through Moodle.)
Important Dates:
January 15, 2022: Topics Will be Floated
January 20, 2022: Topic Allocation
February 28, 2022: Mid-term Progress Report (3 min presentation)
March 31, 2022: Report Submission
April 7, 2022: Peer-review Allocation
April 16, 2022: Peer-review Submission
April 22, 2022: Final Report Submission Incorporating Review
Plagiarism tolerance is 7% from single source and 15% cumulative, anything more will reduce your marks as follows:
Any logical/conceptual/formulation plagiarism: zero marks
Other form of plagiarism (above 50%): zero marks
Otherwise: Percentage of plagiarism will be deducted from the obtained mark