C#.Net基于正则表达式抓取百度百家文章列表的方法示例

时间：2022-06-25 07:48:45 编辑：袖梨来源：一聚教程网

工作之余，学习了一下正则表达式，鉴于实践是检验真理的唯一标准，于是便写了一个利用正则表达式抓取百度百家文章的例子，具体过程请看下面源码：

一、获取百度百家网页内容

publicList GetUrl()
{
  try
  {
    stringurl ="http://baijia.**b*aidu.com/";
    WebRequest webRequest = WebRequest.Create(url);
    WebResponse webResponse = webRequest.GetResponse();
    StreamReader reader =newStreamReader(webResponse.GetResponseStream());
    stringresult = reader.ReadToEnd();
    reader.Close();
    webResponse.Close();
    returnAnalysisHtml(result);
  }
  catch(Exception ex)
  {
    throwex;
  }
}

二、通过正则表达式筛选

publicList AnalysisHtml(stringhtmlContent)
{
  List list =newList();
  stringstrPattern ="
.*s*(?[^.*)"s*target="_blank"s*class="feeds-item-more"s*mon=".*s*">.*s*";
  Regex regex =newRegex(strPattern, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant);
  if(regex.IsMatch(htmlContent))
  {
    MatchCollection matchCollection = regex.Matches(htmlContent);
    foreach(Match matchinmatchCollection)
    {
      string[] str =newstring[3];
      str[0] = match.Groups[1].Value;//获取到的是列表数据的标题
      str[1] = match.Groups[2].Value;//获取到的是内容
      str[2] = match.Groups[3].Value;//获取到的是链接到的地址
      list.Add(str);
    }
  }
  returnlist;
}

推荐专题

最新下载

热门教程

C#.Net基于正则表达式抓取百度百家文章列表的方法示例

相关文章

热门栏目

php教程

asp.net教程

手机开发

css教程

网页制作

办公数码

jsp教程