作者wtchen (没有存在感的人)
看板LinuxDev
标题[问题] 使用mlockall不能完全避免page fault?
时间Thu Oct 29 01:49:40 2015
正在试着评估RPi中如果用mlockall把memory锁住会不会改善latency
用着名的cyclictest (v0.92)+perf得到以下结果:
sudo perf stat ./cyclictest -p 90 - m -c 0 -i 3000 -n -h 250 -q -l 10000
# Total: 000009985
# Min Latencies: 00038
# Avg Latencies: 00082
# Max Latencies: 00386
# Histogram Overflows: 00015
Performance counter stats for
'./cyclictest -p 90 -m -c 0 -i 3000 -n -h 250 -q -l 10000':
818.925000 task-clock (msec) # 0.027 CPUs utilized
13,362 context-switches # 0.016 M/sec
0 cpu-migrations # 0.000 K/sec
56 page-faults # 0.068 K/sec
471,078,551 cycles # 0.575 GHz (50.34%)
282,495,112 stalled-cycles-frontend # 59.97% frontend cycles idle (51.67%)
13,419,172 stalled-cycles-backend # 2.85% backend cycles idle (52.93%)
68,489,877 instructions # 0.15 insns per cycle
# 4.12 stalled cycles per insn (38.41%)
7,553,254 branches # 9.223 M/sec (30.02%)
1,627,813 branch-misses # 21.55% of all branches (34.01%)
30.232651000 seconds time elapsed
如果不加-m参数(不用mlockall):
sudo perf stat ./cyclictest -p 90 -c 0 -i 3000 -n -h 250 -q -l 10000
# Total: 000009988
# Min Latencies: 00038
# Avg Latencies: 00080
# Max Latencies: 00407
# Histogram Overflows: 00012
Performance counter stats for
'./cyclictest -p 90 -c 0 -i 3000 -n -h 250 -q -l 10000':
772.978000 task-clock (msec) # 0.026 CPUs utilized
13,363 context-switches # 0.017 M/sec
0 cpu-migrations # 0.000 K/sec
66 page-faults # 0.085 K/sec
444,135,743 cycles # 0.575 GHz (41.26%)
271,762,254 stalled-cycles-frontend # 61.19% frontend cycles idle (48.87%)
8,522,179 stalled-cycles-backend # 1.92% backend cycles idle (56.53%)
65,640,536 instructions # 0.15 insns per cycle
# 4.14 stalled cycles per insn (37.62%)
7,453,674 branches # 9.643 M/sec (34.44%)
1,584,489 branch-misses # 21.26% of all branches (25.24%)
30.197211000 seconds time elapsed
看起来Max latencies会因为-m变小一点
我的问题在於,page-faults只有因为-m变稍小一点,并没有完全解决
请问这是正常的吗?我还以为mlockall住就不会有PF了。
感谢
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 90.41.67.118
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/LinuxDev/M.1446054583.A.EB6.html
1F:推 yvb: 光载入程式本身text和libs, 就会发生很多次 page-faults 了. 10/29 16:24
2F:→ wtchen: 所以除了一开始initialize的部份以外,就不会再有PF了吗? 10/29 19:28
做了个实验:把loop提高10倍看PF次数有没有提高
有mlockall的情况下:page-faults维持在55-56没增加
Performance counter stats for
'./cyclictest -p 90 -m -c 0 -i 3000 -n -h 250 -q -l 100000':
7202.248000 task-clock (msec) # 0.024 CPUs utilized
130,818 context-switches # 0.018 M/sec
0 cpu-migrations # 0.000 K/sec
55 page-faults # 0.008 K/sec
4,079,431,733 cycles # 0.566 GHz (48.12%)
2,569,771,515 stalled-cycles-frontend # 62.99% frontend cycles idle (49.99%)
69,883,756 stalled-cycles-backend # 1.71% backend cycles idle (51.78%)
643,633,565 instructions # 0.16 insns per cycle
# 3.99 stalled cycles per insn (34.40%)
72,253,517 branches # 10.032 M/sec (32.91%)
15,166,468 branch-misses # 20.99% of all branches (31.47%)
300.240982143 seconds time elapsed
没有mlockall:page-faults维持在66-67
Performance counter stats for
'./cyclictest -p 90 -c 0 -i 3000 -n -h 250 -q -l 100000':
7181.634000 task-clock (msec) # 0.024 CPUs utilized
130,892 context-switches # 0.018 M/sec
0 cpu-migrations # 0.000 K/sec
67 page-faults # 0.009 K/sec
4,072,629,665 cycles # 0.567 GHz (49.76%)
2,537,027,318 stalled-cycles-frontend # 62.29% frontend cycles idle (49.79%)
70,191,503 stalled-cycles-backend # 1.72% backend cycles idle (50.05%)
627,997,620 instructions # 0.15 insns per cycle
# 4.04 stalled cycles per insn (34.31%)
71,914,012 branches # 10.014 M/sec (33.07%)
15,190,645 branch-misses # 21.12% of all branches (33.44%)
300.195795144 seconds time elapsed
看起来loop增加并没有增加page-faults... (不管有无mlockall)
※ 编辑: wtchen (90.41.214.241), 10/29/2015 19:46:17
※ 编辑: wtchen (90.41.214.241), 10/29/2015 19:51:13
3F:推 yvb: ...... 你认为什麽情况下会发生 page fault ? 10/29 21:58
4F:→ wtchen: 我以为当process因为sleep或time slice超过後 10/30 04:00
5F:→ wtchen: 被swap,之後重新回到memory才会有page fault的动作 10/30 04:01
6F:→ wtchen: mlockall我看man,他的功用是 10/30 04:02
7F:→ wtchen: preventing that memory from being paged to the swap 10/30 04:02
8F:→ wtchen: 所以我以为mlockall = no swap 10/30 04:04
9F:→ yvb: 你可能把 swapping (paging) 和 context switching 搞混了... 10/30 16:51
10F:→ yvb: 要不要看一下 wikipedia 的资料, 或用 google 确认一下差别? 10/30 16:51
11F:→ final01: page fault是应该减少没错,可是cold page fault无法免 10/31 00:00
12F:→ wtchen: 我有一点混淆没错,不过我的用意是不要loop到一半 10/31 00:06
13F:→ wtchen: sleep的时候variable被丢到swap,结果sleep完 10/31 00:07
14F:→ wtchen: 要找variable找不到而发生page fault 10/31 00:07
15F:→ wtchen: 这样从swap->RAM就要浪费时间load,使得timing不准确 10/31 00:08
16F:推 yvb: 除非主记忆体不足, kernel 不会没事乱搞 swapping... 11/07 05:35
17F:→ yvb: 至於 timing 准不准确, 得看需要的精确度有多高... 11/07 05:36
18F:→ yvb: 不同 CPU 做 context switching 的 overhead 也不同. 11/07 05:37