作者xavier13540 (柊 四千)
看板NTU-Exam
标题[试题] 110-1 陈信希 资讯检索与撷取 期中考
时间Tue Dec 30 09:34:28 2025
课程名称︰资讯检索与撷取
课程性质︰资工系选修
课程教师︰陈信希
开课学院:电机资讯学院
开课系所︰资讯工程学系
考试日期(年月日)︰2021/11/11
考试时限(分钟):180
试题 :
1. Term frequency and inverse document frequency are commonly used to measure
the importance of a term in a document and a query. We aim to select terms
with discriminative power within a document and between documents to repre-
sent a document. How term frequency and inverse document frequency achieve
the goal? (10 points)
2. A long document is usually composed of passages describing several topics.
On the one hand, it is relatively easier to retrieve long documents than
short documents with keyword-based approach. On the other hand, the repre-
sentation of long documents tends to be vague when average word (term) em-
bedding approach is used for aggregation. Do you have any ideas to deal with
these issues in keyword-based approach and term embedding-based approach?
(10 points)
3. In language modeling, each individual document can be considered as a docu-
ment model for retrieval. Besides, a document collection can be also used to
learn a collection model for smoothing in retrieval. Please describe the
idea of integrating document model and collection model for IR. (10 points)
4. To model term-term relationship is important in information retrieval. Va-
rious methods from conventional counting-based approach to current predic-
tion-based approach have been proposed. Please show one method from each ap-
proach to compute inter-term relationship. (10 points)
5. (a) What are the typical similarities and topical similarities? (5 points)
(b) Term representations learned from models based on different size of con-
texts (e.g., document, short window size, or short context) may capture
different similarities (typical similarities or topical similarities).
Please explain this statement. (5 points)
(c) Exact matching and embedding space based matching have different effects
on retrieval. Please discuss this point. (5 points)
6. An IR model is a quadruple $[D, Q, F, R(q_i, d_j)]$ where
$D$ is a set of logical views for the documents in the collection,
$Q$ is a set of logical views for the user queries,
$F$ is a framework for modeling documents and queries, and
$R(q_i, d_j)$ is a ranking function.
Please specify the framework $F$ and the ranking function $R$ for each of
the following models. (15 points)
(a) BM25 Model
(b) Translation Model
(c) Term Embedding Model
7. Query expansion aims to introduce new query terms to the original query.
Please specify how query expansion is introduced to each of the following
models. (15 points)
(a) Vector Space Model
(b) Language Model
(c) Term Embedding Model
8. In SIGIR 2016, two tutorial speakers classify "Question Answering from Docu-
ments" into an "easy" problem in IR. In contrast, they regard "Question Ans-
wering from Knowledge Base" as a "hard" problem in IR. Do you agree such a
classification? Please show your thoughts. (10 points)
9. Neural information retrieval systems typically use chaining pipeline. Are
there any practical considerations? Please suggest a cascade pipeline to ex-
plain your idea. (10 points)
10. We often encounter mis-conception, mis-translation, and mis-formulation pro-
blems to transform an information need to a query in ad hoc retrieval. You
have learned fundamentals of information retrieval during the first half of
semester. Please show the lessons to deal with these problems. (10 points)
--
第01话 似乎在课堂上听过的样子 第02话 那真是太令人绝望了
第03话 已经没什麽好期望了 第04话 被当、21都是存在的
第05话 怎麽可能会all pass 第06话 这考卷绝对有问题啊
第07话 你能面对真正的分数吗 第08话 我,真是个笨蛋
第09话 这样成绩,教授绝不会让我过的 第10话 再也不依靠考古题
第11话 最後留下的补考 第12话 我最爱的学分
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 111.249.65.236 (台湾)
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/NTU-Exam/M.1767058471.A.939.html
1F:→ rod24574575 : 收录资讯系! 12/30 22:55