Big Data Analytics

CMPS 499-003 / CSCE 598-003
Tuesday and Thursday - 3:30 pm to 4:45 pm
OLVR 116
Spring 2018

Instructor: Dr. Vijay Raghavan
Office: OLVR 305
Office Hours: Tue: 1:30 pm - 3:00 pm
Wed: 4:00 pm - 5:30 pm
Phone: (337) 482-6603
Grader: TBA
Office Hours:  

Page Content


Click here to check the class roster

Please check and let me (TA) know if your name is not in the roster !


Math 362 and CMPS 460(G) or permission of instructor required


Essentials of Big Data analytics. Topics include: challenges and opportunities posed by Big Data in a variety of domains, predictive analytics or other advanced methods to extract value from data, innovative statistical techniques to glean insights from data, frameworks for parallelizing data pre-processing and data analytics, such as,  Hadoop and Spark, and distributed algorithms to accelerate knowledge discovery.

Policies on Cheating

Cheating: It should be strictly noted that any sort of cheating will NOT be tolerated. All work you submitted must be entirely your own. If any student is found cheating in an assignment (either programming or non-programming), he/she will be given a 0 for that assignment. This includes both the person showing their work and the person involved in copying. If any student is found cheating in a test, he/she will be given either a grade of 'C' or 'F' or in some cases will also be brought to the attention of Dean (Again includes both the person showing their work and the person involved in copying).



Grading Policy

*Typically, a term project involves the design and implementation of analytic applications with Big Data tools / platforms or user interface or other system components for Big Data Search (Reduction) systems.

Class Notes

Index Lecture   Link Link
1 Introduction   [pdf] [ppt]
2 Introduction II   [pdf]
3 Visual Analytics Sandbox   [pdf] [ppt]
4 Hadoop Basics   [pdf] [ppt]
5 Introduction to Spark   [pdf] [ppt]
6 Data Preprocessing in Spark   [pdf] [ppt]
7 Big Data Project Failures   [pdf] [ppt]
8 Matrix-Vector Multiplication by MapReduce   [pdf] [ppt]
9 Types of Data   [pdf]
10 Clustering Tips and Tricks   [pdf]
11 Sentiment Analysis with PySpark   [pdf] [ppt]


  1. Assignment 1
  2. Assignment 2


  1. All non-programming assignments should be written legibly (Please check Policies on Cheating).
  2. Before submission a photo-copy of the assignment should be made (for reference).
  3. Only the original should be submitted.
  4. Retain the photocopy. DO NOT submit it.
  5. Please staple the question paper on top of the answer sheet.
  6. Answer sheets that are not stapled properly will not be graded.
  7. All assignments should be done individually unless otherwise stated.
  8. Academic dishonesty will be prosecuted in accordance with the rules and regulations specified by the university.
  9. All answer sheets should be numbered.
  10. While answering questions please begin answering individual questions on separate pages.
  11. Please provide an index, stating each question number and the corresponding page number where its answer can be found.

Final Project

Class project proposal and the final report should have the following details

Last updated: January 27, 2018