请问提取网页内容(如下)的正则表达式
网页源码:
<tr>
<td valign= "top "> <table width= "743 " border= "0 " align= "center " cellpadding= "0 " cellspacing= "0 ">
<tr>
<td width= "743 " height= "38 " background= "pic/z10.jpg "> <table width= "96% " border= "0 " align= "center " cellpadding= "0 " cellspacing= "0 " class= "unnamed1 ">
<tr>
<td height= "30 " align= "left "> <font color= "555555 " size= "2 "> <strong> 游戏区域 </strong> </font> </td>
<td width= "134 " align= "right "> <font color= "555555 " size= "2 "> <strong> 目前价格 </strong> </font> </td>
<td width= "132 " align= "right "> <font color= "555555 " size= "2 "> <strong> 收购量 </strong> </font> </td>
<td width= "80 " align= "center "> <font color= "555555 " size= "2 "> <strong> 状态 </strong> </font> </td>
<td width= "58 " align= "center "> <font color= "555555 " size= "2 "> <strong> 操作 </strong> </font> </td>
</tr>
</table>
</td>
</tr>
<tr>
<td height= "500 " valign= "top " background= "pic/z12.jpg " align= "center ">
<table cellpadding= "0 " cellspacing= "0 " border= "0 " width= "95% " align= "center ">
<tr>
<td align= "left ">
<table width= "700 " border= "0 " align= "center " cellpadding= "0 " cellspacing= "0 " class= "unnamed1 ">
<tr align= "left ">
<td >
Aegwynn US-Alliance //需提取
</td>
<td width= "134 " align= "right ">
0.88
元/ //需提取
Gold
</td>
<td width= "132 " align= "right "> <font color= "535353 ">
0 //需提取
</font> </td>
<td width= "80 " align= "center "> <font color= "535353 ">
满仓 //需提取
</font> </td>
<td width= "58 " height= "25 " align= "center ">
</td>
</tr>
</table>
</td> </tr>
<tr> <td>
<img src= "pic/line1.jpg " width= "100% " height= "1 " alt= " " /> </td>
</tr>
<tr>
<td align= "left ">
<table width= "700 " border= "0 " align= "center " cellpadding= "0 " cellspacing= "0 " class= "unnamed1 ">
<tr align= "left ">
<td >
Aegwynn US-Horde
</td>
<td width= "134 " align= "right ">
0.94
元/
Gold
</td>
<td width= "132 " align= "right "> <font color= "535353 ">
0 </font> </td>
<td width= "80 " align= "center "> <font color= "535353 ">
满仓
</font> </td>
<td width= "58 " height= "25 " align= "center ">
</td>
</tr>
</table>
</td> </tr>
<tr> <td>
<img src= "pic/line1.jpg " width= "100% " height= "1 " alt= " " /> </td>
</tr>
网页格式:
游戏区域 目前价格 收购量 状态 操作
Aegwynn US-Alliance 0.88 元/ Gold 0 满仓
Aegwynn US-Horde 0.94 元/ Gold 0 满仓
Aerie 'peak US-Alliance 0.94 元/ Gold 0 满仓
Aerie 'peak US-Horde 0.99 元/ Gold 0 满仓
Agamaggan US-Alliance 0.82 元/ Gold 0 满仓
请问提取:游戏区域 ,目前价格 ,收购量 ,状态 的正则表达式.
[解决办法]
太长了,就采取了点偷懒的办法,假定格式都是这样的,楼主试下吧,如果有不符合的,我再看看吧
string yourStr = richTextBox1.Text;
MatchCollection mc = Regex.Matches(yourStr, @ " <table[^> ]*?> \s+ <tr[^> ]*?> \s+ <td[^> ]*?> (? <area> [^ <]*?) </td> \s+ <td[^> ]*?> (? <price> [^ <]*?) </td> \s+ <td[^> ]*?> <font[^> ]*?> (? <num> [^ <]*?) </font> </td> \s+ <td[^> ]*?> <font[^> ]*?> (? <state> [^ <]*?) </font> </td> ", RegexOptions.IgnoreCase);
foreach (Match m in mc)
{
richTextBox2.Text += m.Groups[ "area "].Value + "\n "; //游戏区域
richTextBox2.Text += m.Groups[ "price "].Value + "\n "; //目前价格
richTextBox2.Text += m.Groups[ "num "].Value + "\n "; //收购量
richTextBox2.Text += m.Groups[ "state "].Value + "\n "; //状态
}