-->
Course Title: Information Retrieval and Search Engine
Course no: CSC-414 Full Marks: 70+10+20
Credit hours: 3 Pass Marks: 28+4+8
Nature of course: Theory (3 Hrs.) + Lab (3 Hrs.)
Course Synopsis: Advanced aspects of Information Retrieval and Search Engine
Goals: To study advance aspects of information retrieval and search engine, encompassing the principles, research results and commercial application of the current technologies.
Course Detail:
Unit 1. Introduction: 2 Hrs.
Goals and history of IR. The impact of the web on IR. The role of artificial intelligence (AI) in IR.
Unit 2. Basic IR Models: 4 Hrs.
Boolean and vector-space retrieval models; ranked retrieval; text-similarity metrics; TF-IDF (term frequency/inverse document frequency) weighting; cosine similarity.
Unit 3. Basic Tokenizing, Indexing, and Implementation of Vector-Space
Retrieval: 4 Hrs.
Simple tokenizing, stop-word removal, and stemming; inverted indices; efficient processing with sparse vectors; Java implementation.
Unit 4. Experimental Evaluation of IR: 4 Hrs.
Performance metrics: recall, precision, and F-measure; Evaluations on benchmark text collections.
Unit 5. Query Operations and Languages: 3 Hrs.
Relevance feedback; Query expansion; Query languages.
Unit 6. Text Representation: 5 Hrs.
Word statistics; Zipf's law; Porter stemmer; morphology; index term selection; using thesauri. Metadata and markup languages (SGML, HTML, XML).
Unit 7. Search Engine: 5 Hrs.
Search engines; spidering; metacrawlers; directed spidering; link analysis (e.g. hubs and authorities, Google PageRank); shopping agents.
Unit 8. Text Categorization and Clustering: 7 Hrs.
Categorization algorithms: Rocchio; naive Bayes; decision trees; and nearest neighbor. Clustering algorithms: agglomerative clustering; k-means; expectation maximization (EM). Applications to information filtering; organization; and relevance feedback.
Unit 9. Recommender Systems: 3 Hrs.
Collaborative filtering and content-based recommendation of documents and products.
Unit 10. Information Extraction and Integration: 3 Hrs.
Extracting data from text; XML; semantic web; collecting and integrating specialized information on the web.
Unit 11. Advanced IR Models: 3 Hrs.
Probabilistic models; Generalized Vector Space Model; Latent Semantic Indexing (LSI).
Unit 12. Advanced Indexing and Searching Text: 5 Hrs.
Efficient string searching and pattern matching.
Laboratory works: Design and development of search engine.
Text Books:
- Modern Information Retrieval, Ricardo Baeza-Yates, Berthier Ribeiro-Neto.
- Information Retrieval; Data Structures & Algorithms: Bill Frakes
Homework
Assignment: Assignment should be given from the throughout the semester.
Computer Usage: No specific
Prerequisite: Server side programming language
Category Content: Science Aspect: 25%
Design Aspect: 75%
No comments:
Post a Comment