Information Storage and Retrieval
CSCE 561
Monday and Wednesday - 2:30 pm to 3:45 pm
OLVR 118
Fall 2020
Instructor: | Dr. Vijay Raghavan |
Office: | OLVR 305 |
Office Hours: | Tue: 3:00 pm - 4:30 pm Wed: 4:00 pm - 5:30 pm |
Phone: | (337) 482-6603 |
Email: | raghavan@louisiana.edu |
Grader: | Titli Sarkar |
Office: | LINC Lab |
Office Hours: | Appointment Only |
Phone: | (337) 522-8307 |
Email: | titli010203@gmail.com C00222141@louisiana.edu |
Page Content
- Roster
- Prerequisites
- Course Outline
- Policies on Cheating
- References
- Grading Policy
- Class Notes
- Assignments
- Final Project
- Useful Links
- Popular Information Retrieval Systems
- Sample Exam Papers
Roster
Click here to check the class rosterPlease check and let me (TA) know if your name is not in the roster !
Prerequisites
CMPS 460 or consent of the instructor.
Some background knowledge on WWW protocols for database access from web browsers is assumed.
Outline
Modern retrieval systems that operate on text databases can provide interactive, user-customizable techniques for retrieval. In contrast, many tools available for accessing text databases on the Web use techniques that are quite primitive. Thus, there is a need to make state-of-the-art search algorithms available over the Internet. In this context, we will explore intelligent information retrieval techniques and protocols associated with the implementation of Web-browser based interfaces to document database servers. It is also important to extend the search algorithms to heterogeneous (multimedia) data. This aspect requires the development of appropriate indexing schemes in order that the search algorithms applicable to text databases can be extended to other types (e.g., pictures, video, sound) of data. The course will consider research issues in this context. We will also look at some aspects of database and text mining.
Policies on Cheating
Cheating: It should be strictly noted that any sort of cheating will NOT be tolerated. All work you submitted must be entirely your own. If any student is found cheating in an assignment (either programming or non-programming), he/she will be given a 0 for that assignment. This includes both the person showing their work and the person involved in copying. If any student is found cheating in a test, he/she will be given either a grade of 'C' or 'F' or in some cases will also be brought to the attention of Dean (Again includes both the person showing their work and the person involved in copying).
References
- Introduction to Information Retrieval. [Link]
- Salton, "Automatic Text Processing", Addison-Wesley, 1989.
- Salton and McGill, "Introduction to Modern Information Retrieval", McGraw Hill, 1983
- C. J. van Rijsbergen, "Information Retrieval", Second Edition, Butterworths, 1979. [Full_Text]
- R. Baeza-Yates and B. Ribiero-Neto, "Modern InformationRetrieval", Textbook Paperback, May 1999, ISBN: 020139829X.
- Pawan Lingras, Rajendra Akerkar, "Building a Intellegent Web: theory and Practice", Jones and Bartlett Learning, 2010: Chapter 2[PDF]
- Alistair Moffat, Justin Zobel and David Hawking, "Recommended Reading for IR Research Students", SIGIR Forum 39, 2005[PDF]
- Ed Greengrass, "Information Retrieval: A Survey", 30 November 2000 [PDF]
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. [Link]
Note: The books in bold are available for overnight use from the Reserve Section of the Drupre Library.
Note: Lots of relevant materials can be obtained from the Internet. Also, visit my webpage and click on ''Some URLs of Interest to my Students'' and other links that interest you.[Link]
Grading Policy
- Term Project: 30-40%*
- Homework Assignments & Short Quizzes: 25-30%
- Term Test: 10-15%
- Final Exam: 20-30%
*Typically, a term project involves the design and implementation of search and indexing algorithms or interface requirements or other infromation retrieval system components.
Class Notes
Index | Lecture | Link | Link | |
1 | Introduction | [pdf] | ||
2 | IR Models(part 1) - Boolean Retrieval | [pdf] | [pdf] | |
3 | IR Models(part 2) | [pdf] | ||
4 | Retrieval System Evaluation (part 1) | [pdf] | ||
5 | Retrieval System Evaluation (part 2) | [pdf] | ||
6 | RUBRIC (part 1) | [pdf] | ||
7 | RUBRIC (part 2) | [ppt] | ||
8 | Vector Space Model | [pdf] | ||
9 | GVSM | [pdf] | ||
10 | Learning (Part 1) | [pdf] | ||
11 | Learning (Part 2) | [pdf] | ||
12 | Notes for Probablistic Retrieval Model | [pdf] |
Assignments
- Coming soon ..
Note:
- All non-programming assignments should be written legibly (Please check Policies on Cheating).
- Before submission a photo-copy of the assignment should be made (for reference).
- Only the original should be submitted.
- Retain the photocopy. DO NOT submit it.
- Please staple the question paper on top of the answer sheet.
- Answer sheets that are not stapled properly will not be graded.
- All assignments should be done individually unless otherwise stated.
- Academic dishonesty will be prosecuted in accordance with the rules and regulations specified by the university.
- All answer sheets should be numbered.
- While answering questions please begin answering individual questions on separate pages.
- Please provide an index, stating each question number and the corresponding page number where its answer can be found.
Final Project
- Coming soon...
Useful Links
Chapter 1 from Salton's Book [Scanned_PDF]. - What do you say after you say, 'I work in IR'? [PDF]
- A General Mathematical Model For Information Retrieval Systems [PDF]
- Manning, Raghavan, Schuetze [MRS], Chapter 2.1
- MRS: Chapter 1
- Boolean Retrieval Model[Link]
- Fuzzy Set Theory to Document Retrieval[Scaned_PDF]
- MRS: Chapter 2.2; 2.4
- A Critical Investigation of Recall and Precision as Measures of Retrieval System Performance [PDF]
- MRS: Chapter 8.1; 8.2
- Text Retrieval Quality: A Primer[Link]
- MRS: Chapter 8.3; 8.4
- RUBRIC: A Rule System for information Retrieva[PDF]
- Enhancing Internet Search Engines to Achieve Concept-based Retrieval [PPT]
- Personalized Search[PDF]
- Concept Map for Beginners[Link]
- A Critical Analysis of Vector Space Model for Information Retrival[Scaned_PDF]
- MRS: Chapter 6.2; 6.3
- On Modeling of Information Retrieval Concept in Vector Spaces[PDF]
- Pattern Recognition: Statistical, Structural, and Neural Approaches [Scanned_PDF]
- Linear Structure for Information Retrieval [Scaned_PDF]
- MRS: Chapter 9; 9.1.1
- MRS: Chap. 11.1; 11.2; 11.3, 11.3.2 (skip 11.3.1)
- Content and Link Structure Analysis for Searching the Web[PDF]
- The Shape of the Web and Its Implications for Searching the Web[PDF]
General IR resources
Introduction
IR Models (part 1) - Boolean Retrieval
IR Models (part 2)
Retrieval System Evaluation (part 1)
Retrieval System Evaluation (part 2)
RUBRIC (part 1)
RUBRIC (part 2)
Vector Space Model
GVSM
Learning (Part 1)
Learning (Part 2)
Probabilistic Retrieval Model
Web Search and Page Ranking
Metasearch engines and Deep Web Crawling
Popular Information Retrieval Systems
- MG: public domain indexing and retrieval system (source code) for text, images, and textual images, from book "Managing Gigabytes"
- Prise: a indexing and search engines (PRISE) developed by NIST
- Lemur: an information retrieval system supporting "language model"
- SMART: an well-known IR test bed (source code) by Prof. Salton
- Bow: a useful library of source code useful for writing information retrieval programs
Last updated: August 25, 2017