作者wang19980531 (中立评论员)
看板DataScience
标题[讨论] Teacher Student Model Semi-supervised
时间Mon Jan 25 19:47:43 2021
在 Billion-scale semi-supervised learning for image classification(Facebook A
I Research)
当中有提到student model不将D与D-hat合起来train的原因:
Remark: It is possible to use a mixture of data in D and Dˆ for training like
in previous approaches [34]. However, this requires for searching for optimal
mixing parameters, which depend on other parameters. This is resource-intensi
ve in the case of our large-scale training. Additionally, as shown later in ou
r analysis, taking full advantage of large-scale un- labelled data requires ad
opting long pre-training schedules, which adds some complexity when mixing is
involved.
不太确定第一个原因searching for mixing parameters指的是?
及第二个原因 D+D-hat不是在training student model前就准备好了吗?
为何会增加complexity
谢谢大家
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 140.115.59.247 (台湾)
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/DataScience/M.1611575265.A.D9C.html
1F:推 yiefaung: 应该是指D跟D hat sample的比例吧 他ref的那篇是固定6:4 01/26 20:21
2F:→ yiefaung: 他懒得调参 01/26 20:21
3F:→ yiefaung: 第二个就是全部用下去要train很久 所以乾脆不用 01/26 20:24
4F:→ wang19980531: 好的 谢谢~ 01/26 21:20