Home      Affiliated Colleges      Course content      First Sem     Second Sem     Third Sem     Fourth Sem     Fifth Sem     Sixth Sem     Seventh Sem     Eighth Sem     Lab report 4th sem     Contact    

Thursday, June 16, 2011

CSC-414: Information Retrieval and Search Engine


-->
Course Title: Information Retrieval and Search Engine
Course no: CSC-414                                                                                                         Full Marks: 70+10+20
Credit hours: 3                                                                                                  Pass Marks: 28+4+8

Nature of course: Theory (3 Hrs.) + Lab (3 Hrs.)

Course Synopsis: Advanced aspects of Information Retrieval and Search Engine

Goals:   To study advance aspects of information retrieval and search engine, encompassing the principles, research results and commercial application of the current technologies.

Course Detail:

Unit 1. Introduction:                                                                                                                      2 Hrs.

Goals and history of IR. The impact of the web on IR. The role of artificial intelligence (AI) in IR.

Unit 2. Basic IR Models:                                                                                                                                4 Hrs.

Boolean and vector-space retrieval models; ranked retrieval; text-similarity metrics; TF-IDF (term frequency/inverse document frequency) weighting; cosine similarity.

Unit 3. Basic Tokenizing, Indexing, and Implementation of Vector-Space
                Retrieval:                                                                                                                                            4 Hrs.

Simple tokenizing, stop-word removal, and stemming; inverted indices; efficient processing with sparse vectors; Java implementation.

Unit 4. Experimental Evaluation of IR:                                                                                   4 Hrs.

Performance metrics: recall, precision, and F-measure; Evaluations on benchmark text collections.

Unit 5. Query Operations and Languages:                                                                            3 Hrs.

Relevance feedback; Query expansion; Query languages.

Unit 6. Text Representation:                                                                                                      5 Hrs.

Word statistics; Zipf's law; Porter stemmer; morphology; index term selection; using thesauri. Metadata and markup languages (SGML, HTML, XML).

Unit 7. Search Engine:                                                                                                                   5 Hrs.

Search engines; spidering; metacrawlers; directed spidering; link analysis (e.g. hubs and authorities, Google PageRank); shopping agents.


Unit 8. Text Categorization and Clustering:                                                                         7 Hrs.

Categorization algorithms: Rocchio; naive Bayes; decision trees; and nearest neighbor. Clustering algorithms: agglomerative clustering; k-means; expectation maximization (EM). Applications to information filtering; organization; and relevance feedback.

Unit 9. Recommender Systems:                                                                                                                3 Hrs.

Collaborative filtering and content-based recommendation of documents and products.

Unit 10. Information Extraction and Integration:                                                               3 Hrs.

Extracting data from text; XML; semantic web; collecting and integrating specialized information on the web.

Unit 11. Advanced IR Models:                                                                                                    3 Hrs.

Probabilistic models; Generalized Vector Space Model; Latent Semantic Indexing (LSI).

Unit 12. Advanced Indexing and Searching Text:                                                               5 Hrs.

Efficient string searching and pattern matching.

Laboratory works:   Design and development of search engine.

Text Books:

  1. Modern Information Retrieval, Ricardo Baeza-Yates, Berthier Ribeiro-Neto.
  2. Information Retrieval; Data Structures & Algorithms: Bill Frakes

Homework
Assignment:                      Assignment should be given from the throughout the semester.

Computer Usage:            No specific

Prerequisite:                     Server side programming language

Category Content:           Science Aspect:                25%
                                                Design Aspect:  75%

No comments:

Post a Comment

^ Scroll to Top Related Posts with Thumbnails ^ Go to Top