作者gemini392 (铭)
看板RegExp
标题Re: [问题] 抓取网页里面的资讯
时间Fri Aug 21 20:44:35 2009
※ 引述《gemini392 (铭)》之铭言:
: 最近刚接触正规表示法
: 想请问大家
: <td class="defaultstyle" align="left">想撷取下来的资讯</td>
: 该如何写会比较好??
: 目前我比较直觉想到的是
: '/<td class="defaultstyle" align="left">(.+?)<\/td>/'
: 但改了又改就是抓不到 /_\
: 恳请大家帮忙解惑一下
: 谢谢各位 :)
以下附上程式码,我是用C#写的
麻烦大家帮忙解惑了 @@"
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.Collections.Specialized;
using System.Net;
using System.Text.RegularExpressions;
using System.IO;
namespace Capture
{
public partial class Form1 : Form
{
private const string _borrowlist =
"
http://140.115.156.76/borrowlist.htm";
private const string _userAgent = "Mozilla/4.0 (compatible; MSIE 8.0;
Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET
CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729";
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
WebClient webClient = new WebClient();
webClient.Headers[HttpRequestHeader.UserAgent] = _userAgent;
webClient.Encoding = Encoding.UTF8;
byte[] firstResponse = webClient.DownloadData(_borrowlist);
string firstRes = Encoding.UTF8.GetString(firstResponse);
Regex regex = new Regex("class=\"defaultstyle\"
align=\"left\">(.+?)</td>", RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(firstRes);
foreach (Match match in matches)
{
textBox1.Text = match.Value;
}
}
}
}
--
※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 140.115.156.76