作者jevix (B.Pt.Ga.)
看板NTU-Exam
标题[试题] 97上 唐牧群 资讯检索 期中考
时间Tue Dec 2 22:28:53 2008
课程名称︰资讯检索
课程性质︰系上必修
课程教师︰唐牧群
开课学院:文学院
开课系所︰图资系
考试日期(年月日)︰97/12/02
考试时限(分钟):3小时
是否需发放奖励金:是
(如未明确表示,则不予发放)
试题 :
1. Here is an imaginary database that contains the following 5 document:
D1: "
a dog barks
at a cat
and it fell
from a tree"
D2: "
a dog watches ants
on the bark
of a tree"
D3: "
a dog watches
another dog watches
a cat"
D4: "
a dog barks
at a cat watches
another cat"
D5: "
the bark fell
from the tree
as a cat watches"
(Terms in the stop word list have been marked with grey).
Please
1.Calculate document frequncy (DF) and IDF weight for each index term (simply
use N/n without logarithm).
2.Create an inverted file for the database where each cell contains the TF*IDF
weight of each term in the documents.
3.Give the ranking after the user submits the query "cat watch dog bark ant"
4.After the first iteration, the user marks D1, D3, D4 as relevant, and D2 and
D5 as non-relevant, what would be the new ranking using Rocchio's method
where α=1.0 β=1.0 γ=1.0
Answer 4 out of the following 5 questions.
2.Unlike data retrieval where perfect precision and recall are guaranteed,
information retrieval is more of a probabilistic process where information
conveyed in the retrieved documents might or might not answer user's
information needs. What are the possible causes behind the uncertainty of IR?
3.Define the following concepts and explain how they are related to one
another:"specificity", "precision" and"IDF(Inverse Document Frequency); and
"exhaustivity", "recall" and"TF(Term Frequency).
4.Explain three basic models in information retrieve: Boolean, Vector space
and Probabilistic.
5.Explain the rationales behind eliciting user's relevance feedback and how it
can improve search results. What are the two mechanisms with which relevant
terms can be identified an extracted (hint: IQE and AQE)?
6.How does interactive view of IP different from tje traditional view of IR?
How does it propose to improve retrieval performance (i.e. what's tje most
crucial component of IR process and how it can be improved)? Can you think
of a few (1 or 2) techniques or approaches we have gone thtough in the class
that aim at improving this particular component of IR?
--
※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 140.112.245.126