ethtool 只对实体网卡有作用
所以我先把系统的p3p1 , p3p1定速(Intel X540 10G NIC)
DEVICE=p3p1
#HWADDR=A0:36:9F:7C:63:68
TYPE=Ethernet
UUID=24819cec-d318-49bf-854f-a321be47013e
MASTER=bond0
SLAVE=yes
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
ETHTOOL_OPTS="speed 10000 duplex full autoneg off"
一共做了p3p1 , p3p2两个ports
单一port定速是没有任何问题,但是bond之後看起来就失效了
(Auto是on , 直接用指令关auto也没用)
Settings for p3p1:
Supported ports: [ TP ]
Supported link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Advertised link modes: 10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Speed: 10000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
switch端已经设定定速完成,系统内也没连线问题
但是根本就不知道有没有定速
起因是因为在系统上常常会看到下列讯息(p3p1,p3p2轮着来)
kernel: ixgbe 0000:04:00.0: p3p1: NIC Link is Down
kernel: bond0: link status definitely down for interface p3p1, disabling it
kernel: bond0: first active interface up!
ixgbe 0000:04:00.0: p3p1: NIC Link is Up 10 Gbps, Flow Control: None
kernel: bond0: link status definitely up for interface p3p1, 10000 Mbps full d
up
网卡换过/线换过(Cat.6)/重开过/firmware更新/driver更新/swithc换port, 都没改善
於是想做个定速测试试试
在同个switch有其他台server 也有插Intel 540 10G,
1台是Oracle Linux 6.7 (无问题)
1台是Oracle Linux 6.8 (有类似问题)
这台是RHEL 6.8 (此篇提的机器)
测到现在我觉得是不是Linux 6.8有问题...
伺服器是Dell R630
2F:→ galic: 是这个issues吗?11/24 13:53
3F:→ galic: 不对 照你的版本应该被fixed 我再查查 11/24 13:54
4F:→ galic: 你环境描述再清楚一点 像是你更新後的版号之类的...11/24 14:09
这个issue有查过,但是系统上没有rx miss
更新:
ethtool -i p3p1
driver: ixgbe
version: 5.2.4
firmware-version: 0x800005f6, 18.0.17
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Bonding driver version 3.7.1
OS 版本号 : (uname -a去掉日期)
2.6.32-642.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
目前系统上ifconfig状态如下:(只列出跟bonding有关的,去掉ip)
bond0 Link encap:Ethernet HWaddr A0:36:9F:E1:5F:C0
inet addr:xxxxxxxxx Bcast:172.21.103.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:1401076722 errors:0 dropped:0 overruns:0 frame:0
TX packets:1634636464 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:803655790323 (748.4 GiB) TX bytes:1140369457904 (1.0 TiB)
p3p1 Link encap:Ethernet HWaddr A0:36:9F:E1:5F:C0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:306679435 errors:0 dropped:0 overruns:0 frame:0
TX packets:720537271 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:165806032881 (154.4 GiB) TX bytes:527301564338 (491.0 GiB)
p3p2 Link encap:Ethernet HWaddr A0:36:9F:E1:5F:C0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:701875754 errors:0 dropped:0 overruns:0 frame:0
TX packets:464806464 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:415406736029 (386.8 GiB) TX bytes:311857294678 (290.4 GiB)
bonding setup(IP有关的去掉)
DEVICE="bond0"
ONBOOT="yes"
USERCTL="no"
BOOTPROTO="none"
NM_CONTROLLED="no"
BONDING_OPTS="mode=4 miimon=100 xmit_hash_policy=layer3+4 lacp_rate=1"
miimon改过1000, layer尝试过拿掉或改成2+3 , 3, 4 ,都没改善。
※ 编辑: GoldDeath (1.163.235.91), 11/24/2017 22:26:54
5F:→ galic: 有点诡异的情况... 我是建议能更新的都更新到最新版本看看 11/25 00:31
6F:→ galic: 然後看能不能把debug msg. level调高一点 看有什麽有用资讯 11/25 00:31
7F:→ galic: 问题应该不是出在bond driver 11/25 00:32
9F:→ galic: ethtool -s p3p1 --msglvl 0xffff 试试 11/25 00:35
10F:→ GoldDeath: 了解,下周上班试试,前几天交换机刚定速 11/25 19:11
11F:→ GoldDeath: 要是再不行我只能跟老外讨论看看升OS 11/25 19:12
12F:→ GoldDeath: 感谢帮忙,後续我再修文更新 11/25 19:13
13F:→ GoldDeath: 开了Current message level: 0x0000ffff (65535) 11/27 13:46
14F:→ GoldDeath: 但是/var/log/messages没有特别的讯息,还是跟之前一样 11/27 13:46
15F:→ galic: 应该看kern.log或syslog喔 messages不会有debug用的讯息 11/27 23:12
16F:→ galic: ↑dmesg 11/27 23:13
17F:→ GoldDeath: 啊,蠢了,忘记看dmesg 11/28 20:09
18F:→ GoldDeath: 更新,还是没更多的讯息,跟之前一样 12/30 18:46
19F:→ GoldDeath: 问题终於解决,10G switch 韧体更新搞定 02/14 17:50