请问：更改底层数据接收程后引起linux系统崩溃

2012-10-18

请教：更改底层数据接收程后引起linux系统崩溃！请教高手，我从int netif_receive_skb(struct sk_buff *skb)

请教：更改底层数据接收程后引起linux系统崩溃！
请教高手，我从int netif_receive_skb(struct sk_buff *skb)设置一个自定义的函数指针（钩子），加载一下自定义模块，想把从特定接口过来的数据包截下来，进行一定加工处理后（主要是加封装包头），再从另外一个接口发送出去，发送之前我查询了路由表，找到路由，然后用下边语句发送：

if (dst->hh){
printk("cached\n");
ret = neigh_hh_output(dst->hh, skb);
if(ret < 0)
printk("send fail!\n");
return ret;
}
else if (dst->neighbour){
printk("no cached\n");
ret = dst->neighbour->output(skb);
if(ret < 0)
printk("send fail!\n");
return ret;
}

调试时，如果cach命中，就能正确的发送到目的地址，如果cach没命中，只要不用进行arp解析也没事，也能正常发送，但是只要一进行arp解析，系统就崩溃，不是在发arp请求是崩，就是在收到arp应答时崩：

45,00,00,6c,00,00,00,00,
40,11,f6,95,c0,a8,01,02,
c0,a8,01,99,00,00,14,7f,
00,58,6c,03,00,20,00,50,
00,00,00,00,06,00,03,7f,
46,e0,29,00,45,00,00,3c,
89,21,00,00,80,01,2c,d8,
81,00,00,00,c0,a8,01,de,
c0,a8,01,99,08,00,d6,5b,
05,00,72,00,61,62,63,64,
65,66,67,68,69,6a,6b,6c,
6d,6e,6f,70,71,72,73,74,
75,76,77,61,62,63,64,65,
66,67,68,69,
no cached
CPU 0 Unable to handle kernel paging request at virtual address 0000000c, epc == 80195128, ra == 80195130
Oops[#1]:
Cpu 0
$ 0 : 00000000 00000000 00000000 00000000
$ 4 : 810612c0 810612c0 00000000 00000000
$ 8 : 00000000 00000000 00000000 00000000
$12 : 00000000 3b9aca00 00200200 00000000
$16 : 8792a200 878bc8a0 878bc8a4 00000002
$20 : 8792f000 c0a80199 87995820 00000003
$24 : 00000000 8000e2a0
$28 : 80214000 80215cf0 00200200 80195130
Hi : 00000082
Lo : 00000000
epc : 80195128 arp_solicit+0xa0/0x1ec
Tainted: P
ra : 80195130 arp_solicit+0xa8/0x1ec
Status: 1100ff03 KERNEL EXL IE
Cause : 80800008
BadVA : 0000000c
PrId : 0001974c (MIPS 74Kc)
Modules linked in: quick_trans ath_pktlog(P) umac ath_dev(P) ath_dfs(P) ath_rate_atheros(P) ath_hal(P) asf(P) adf(P) private_m
essage usb_storage ehci_hcd usbcore athrs_gmac
Process swapper (pid: 0, threadinfo=80214000, task=80216000, tls=00000000)
Stack : 8792a200 8014ac10 8792a200 808815e0 80257568 802574e8 00000020 8792a200
878bc8a0 8014e0e0 00000004 8792a200 87995820 00000000 80215d70 80257568
802574e8 80257468 802573e8 8015f770 ffffaa1d 80221a58 80221a5c 80221a74
802571e0 00000100 8015f4d0 8002dbc8 00000000 00000040 00000000 00000000
80215d70 80215d70 00000001 00000100 802570b0 00000000 00000004 0000000a
...
Call Trace:
[<80195128>] arp_solicit+0xa0/0x1ec
[<8015f770>] neigh_timer_handler+0x2a0/0x440
[<8002dbc8>] run_timer_softirq+0x14c/0x1d8
[<800296e8>] __do_softirq+0xb0/0x148
[<800297c8>] do_softirq+0x48/0x6c
[<80003ad8>] plat_irq_dispatch+0x58/0x480
[<8000670c>] ret_from_irq+0x0/0x4
[<80006284>] r4k_wait_irqoff+0x20/0x24
[<8000853c>] cpu_idle+0x24/0x44
[<80227a5c>] start_kernel+0x340/0x35c

find route success

45,00,00,38,00,00,00,00,
40,11,f8,db,c0,a8,00,6f,
c0,a8,00,1a,00,00,14,7f,
00,24,26,d4,00,20,00,50,
00,00,00,00,06,00,03,7f,
42,07,81,00,00,00,f5,81,
80,00,00,00,00,00,00,00,
no cached
CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 80161ac0, ra == 80161964
Oops[#1]:
Cpu 0
$ 0 : 00000000 00000061 00000000 00000000
$ 4 : 802274a0 00000000 00000000 00014733
$ 8 : 0000dc45 8795684a 00000010 879312c0
$12 : 0000006b fff7ffff 00200200 00100100
$16 : 8702b540 87956870 87956820 00000002
$20 : 00000001 00000001 87956844 00000000
$24 : 00000010 801984e4
$28 : 80218000 80219bf8 87931000 80161964
Hi : 00000093
Lo : db927100

epc : 80161ac0 neigh_update+0x374/0x428
Tainted: P
ra : 80161964 neigh_update+0x218/0x428
Status: 1100ff03 KERNEL EXL IE
Cause : 0080000c
BadVA : 00000000
PrId : 0001974c (MIPS 74Kc)
Modules linked in: quick_trans ath_pktlog(P) umac ath_dev(P) ath_dfs(P) ath_rate_atheros(P) ath_hal(P) asf(P) adf(P) private_message cdc_ether usbnet usb_stora
ge ehci_hcd usbcore athrs_gmac
Process swapper (pid: 0, threadinfo=80218000, task=8021a000, tls=00000000)
Stack : 61746520 41435449 878bd9ec 878bd9a0 00000000 00000000 87956820 80220000
803c4670 87931000 878bd8a0 80219c58 803c4678 8792fb20 0000000c 8019843c
00000000 8702b6c0 879312c0 80409e02 87931000 801ad410 879312c0 80409e02
c0a8001a c0a8006f 878bd8a0 8792fb20 879583c0 879312c0 8785c000 8792fbc0
00000000 8106550c 0000000c 80157f34 8021c030 87816d40 87816d70 8001fea0
...
Call Trace:
[<80161ac0>] neigh_update+0x374/0x428
[<8019843c>] arp_process+0x6ac/0x74c
[<801ae04c>] br_handle_frame_finish+0xfc/0x158
[<801ae264>] br_handle_frame+0x1bc/0x1f8
[<80158184>] netif_receive_skb+0x3c0/0x4f0
[<8794218c>] athr_gmac_recv_packets+0x2e4/0x47c [athrs_gmac]
[<800290a0>] tasklet_action+0x88/0xdc
[<80029808>] __do_softirq+0xb0/0x148
[<800298e8>] do_softirq+0x48/0x6c
[<800067ec>] ret_from_irq+0x0/0x4
[<80006364>] r4k_wait_irqoff+0x20/0x24
[<8000861c>] cpu_idle+0x24/0x44
[<8022ba5c>] start_kernel+0x340/0x35c

高手帮忙分析一下什么问题，看是不是不能这么用！

[解决办法]
崩溃的大部分原因是指针操作错误，在内核更要特别小心
[解决办法]
很常见的问题
空指针错误
CPU 0 Unable to handle kernel paging request at virtual address 0000000c, epc == 80195128, ra == 80195130
也是访问了一个结构体指针中偏移为0xc的成员，只是这个结构体是空指针

你可以根据Call Trace: 定位到具体出错的内核函数
例如
arp_solicit+0xa0/0x1ec

就是最后出错的位置，你要把调试信息编译进内核（make menuconfig里选），然后反汇编
arp_solicit 偏移0xa0 的指令就是出错的位置，反汇编成c后就能很直观看出来哪个结构体指针是空指针
0x1ec是arp_solicit 的总长度

就说这么多，怎么反汇编内核自己baidu吧

热点排行

UNIXLINUX

请问：更改底层数据接收程后引起linux系统崩溃