asp.net 获取网页Document时常会用到
edited by:曹永思-博客园
1、获取某个class的div内的标签
获取<div class="imgList2">****</div>内的标签
方法一:
string g = " <div.*?class=\"imgList2\">(?<html>[\\s\\S]*?)</div>"; Regex reg = new Regex(g, RegexOptions.None); MatchCollection mc = reg.Matches(strResult); string v = ""; foreach (Match m in mc) { v += m.Value + "\r\n"; }
方法二(通用方法,获取指定前后内容之间的内容):
string list_a_group_str = GetValue(strResult.Trim(), "<div class=\"imgList2\">", "</div>");
public static string GetValue(string str, string start, string end) { Regex regex = new Regex(string.Concat(new string[] { "(?<=(", start, "))[.\\s\\S]*?(?=(", end, "))" }), RegexOptions.Multiline | RegexOptions.Singleline); return regex.Match(str).Value; }
2、获取所有a标签的href和text
获取<div class="page both\"></div>里所有a标签的href和text
string list_page_group_str = GetValue(strResult.Trim(), "<div class=\"page both\">", "</div>"); Regex reg = new Regex(@"(?is)<a(?:(?!href=).)*href=([‘""]?)(?<url>[^""\s>]*)\1[^>]*>(?<text>(?:(?!</?a\b).)*)</a>"); MatchCollection mc = reg.Matches(list_page_group_str); foreach (Match m in mc) { string url = m.Groups["url"].Value + "\n"; string text = m.Groups["text"].Value + "\n"; }
时间: 2024-11-05 22:33:36