http://www.umetrip.com/mskyweb/fs/fc.do?flightNo=ZH9987&date=2014-11-29&channel=
http://www.umetrip.com/mskyweb/main/index.html
求大神们帮助写一条正则表达式匹配出划红线的内容,非常感谢~急需啊
—-
球高手帮助啊感恩不尽~
—-
别想了。有几处是图片来的,必须得用图像识别。虽然这种图像简单,但一两句代码是搞不定的
—-
图片我就只需要抓取到它的src地址就行了
—-
求高手帮忙啊???求正则表达式啊
—- 10分
不知道为什么这么多人总想用正则来取页面内容,正则是很强大,但是对于网页这种结构,非常多重复/类似的元素,而且很可能网页改变一点正则就不对了,你又要取很多内容,把正则写出来然后维护它会累死的。
这种需求可以用 HTMLAgilityPack + fizzler (nuget),这样可以用LINQ + css选择器的方式来查询获取到的html。
比如对于那个页面取前两个红框就是以下代码,其它的都类似,就好像在前端用js查询dom,比正则更好理解也更好维护
var doc = new HtmlDocument(); doc.LoadHtml(html, Encoding.UTF8); var a = doc.DocumentNode.QuerySelector(".del_com .tit"); Console.WriteLine(a.Element("span").InnerText); Console.WriteLine(a.Element("h1").InnerText);
—-
我之前没有用过这个,可以吗其它几个也可以用这个取道吗
—-
大神谢谢你我试试这种方法能不能匹配到我想要的数据,我知道可以用正则匹配,但是我对正则不太熟
—- 10分
public static string GetValueFromWeb(string url) { string html = null; WebRequest req = WebRequest.Create(url); WebResponse res = req.GetResponse(); Stream stream = res.GetResponseStream(); Encoding encode = Encoding.GetEncoding("UTF-8"); StreamReader sr = new StreamReader(stream, encode); char[] readbuffer = new char[256]; int n = sr.Read(readbuffer, 0, 256); while (n > 0) { string str = new string(readbuffer, 0, n); html += str; n = sr.Read(readbuffer, 0, 256); } return html; } public static string NoHTML(string Htmlstring) //替换HTML标记 { //删除脚本 Htmlstring = Regex.Replace(Htmlstring, @"<script[^>]*?>.*?</script>", "", RegexOptions.IgnoreCase); //删除HTML Htmlstring = Regex.Replace(Htmlstring, @"<(.[^>]*)>", "", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"([\r\n])[\s]+", "", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"-->", "", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"<!--.*", "", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(quot|#34);", "\"", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(amp|#38);", "&", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(lt|#60);", "<", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(gt|#62);", ">", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(nbsp|#160);", " ", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(iexcl|#161);", "\xa1", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(cent|#162);", "\xa2", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(pound|#163);", "\xa3", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&(copy|#169);", "\xa9", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"&#(\d+);", "", RegexOptions.IgnoreCase); Htmlstring = Regex.Replace(Htmlstring, @"<img[^>]*>;", "", RegexOptions.IgnoreCase); Htmlstring.Replace("<", ""); Htmlstring.Replace(">", ""); Htmlstring.Replace("\r\n", ""); //Htmlstring = HttpContext.Current.Server.HtmlEncode(Htmlstring).Trim(); return Htmlstring; } private void button10_Click(object sender, EventArgs e) { string s = GetValueFromWeb("http://www.umetrip.com/mskyweb/fs/fc.do?flightNo=ZH9987&date=2014-11-29&channel="); string ok = NoHTML(s);//取出网页中的内容 }
—-
这样写是取到了这个页面上面的所有数据呀,我只是想取到我想要的我画线的那几个数据啊
—-
求大神帮助啊
http://www.umetrip.com/mskyweb/fs/fc.do?flightNo=ZH9791&date=2014-12-01&channel=
http://www.umetrip.com/mskyweb/fs/fc.do?flightNo=ZH9791&date=2014-12-01&channel=
—-
//获取航班动态数据
public void GetDate()
{
string FlightNo = “ZH9987”;
string TakeOffDate = “2014-12-01”;
string URL = “http://www.umetrip.com/mskyweb/fs/fc.do?flightNo=” + FlightNo + “&date=” + TakeOffDate + “&channel=”;
Html html = new Html();
string htmlCode = html.GetHTML(URL, “UTF-8”);//得到指定页面的html代码,第一个参数为url(貌似都知道),第二个是目标网页的编码集
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
web.OverrideEncoding = Encoding.GetEncoding(“UTF-8”);
HtmlAgilityPack.HtmlDocument htmlDoc = web.Load(URL);
//HtmlNode navNode = htmlDoc.GetElementbyId(“.del_com .tit”);
//HtmlNode navNode = htmlDoc.DocumentNode.QuerySelector(“.del_com .tit”);
//div[2]表示文章链接a位于post_list里面第3个div节点中
HtmlNodeCollection list = htmlDoc.DocumentNode.SelectNodes(@”/html/body/div[2]/div[2]/div[2]”);
foreach (HtmlNode node in list)
{
try
{
//string s = node.Element(“h1”).InnerText;
//string s2 = node.Element(“h1″).InnerText;
//HtmlAgilityPack.HtmlNodeCollection anchors = htmlDoc.DocumentNode.SelectNodes(@”/html/body/div[2]/div[2]/div[2]/div[1]/h1″);//获取始发地和到达地
//HtmlAgilityPack.HtmlNodeCollection anchorss = htmlDoc.DocumentNode.SelectNodes(@”/html/body/div[2]/div[2]/div[2]/div[8]/div[2]/div[1]/div[1]/dl[1]/dt/img”);
if (node.InnerHtml != “” && node.InnerHtml != null)
{
public void GetDate()
{
string FlightNo = “ZH9987”;
string TakeOffDate = “2014-12-01”;
string URL = “http://www.umetrip.com/mskyweb/fs/fc.do?flightNo=” + FlightNo + “&date=” + TakeOffDate + “&channel=”;
Html html = new Html();
string htmlCode = html.GetHTML(URL, “UTF-8”);//得到指定页面的html代码,第一个参数为url(貌似都知道),第二个是目标网页的编码集
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
web.OverrideEncoding = Encoding.GetEncoding(“UTF-8”);
HtmlAgilityPack.HtmlDocument htmlDoc = web.Load(URL);
//HtmlNode navNode = htmlDoc.GetElementbyId(“.del_com .tit”);
//HtmlNode navNode = htmlDoc.DocumentNode.QuerySelector(“.del_com .tit”);
//div[2]表示文章链接a位于post_list里面第3个div节点中
HtmlNodeCollection list = htmlDoc.DocumentNode.SelectNodes(@”/html/body/div[2]/div[2]/div[2]”);
foreach (HtmlNode node in list)
{
try
{
//string s = node.Element(“h1”).InnerText;
//string s2 = node.Element(“h1″).InnerText;
//HtmlAgilityPack.HtmlNodeCollection anchors = htmlDoc.DocumentNode.SelectNodes(@”/html/body/div[2]/div[2]/div[2]/div[1]/h1″);//获取始发地和到达地
//HtmlAgilityPack.HtmlNodeCollection anchorss = htmlDoc.DocumentNode.SelectNodes(@”/html/body/div[2]/div[2]/div[2]/div[8]/div[2]/div[1]/div[1]/dl[1]/dt/img”);
if (node.InnerHtml != “” && node.InnerHtml != null)
{
}
else
{
}
}
catch (Exception e)
{
rt_box.AppendText(e.Message.ToString() + “\r\n”);
}
}
}
我程序是这样写的为什么不行呢》???
—- 50分
foreach (HtmlNode node in list)
{
{
if (!node.Attributes.Equals(null))
{
sb.AppendLine(node.InnerText);
}
}
if (sb != null)
{
string txt = sb.ToString().Trim();
textBox1.Text = txt;
}
—- 2分
StringBuilder sb = new StringBuilder();
—- 8分
foreach (HtmlNode node in list)
{
string s1 = node.SelectSingleNode(@”div[1]”).InnerText;
string s2 = node.SelectSingleNode(@”div[2]”).InnerText;
{
string s1 = node.SelectSingleNode(@”div[1]”).InnerText;
string s2 = node.SelectSingleNode(@”div[2]”).InnerText;
}
CodeBye 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权 , 转载请注明C# 正则表达式过滤指定数据!