This is a small code snippet I’m going to show you for getting the title from a
remote webpage. First off, what we want to do is getting the complete HTML
response form a web request. Then we want to parse the html text and search for
the text between the tags <title> and </title> and return this text to the code.
Here is how I have solved this problem.
public class TitleScraper {
private string url;
public TitleScraper(string url) {
this.url = url;
}
public string Title { get; set; }public void Scrape() {
WebRequest request = WebRequest.Create(this.url);
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
StreamReader sr = new StreamReader(data);
string html = sr.ReadToEnd();
string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";
System.Text.RegularExpressions.Regex ex = new System.Text.RegularExpressions.Regex(regex, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Title = ex.Match(html).Value.Trim();
}
}
This is a class that contains one private string variable, one public string property, an overload and
a public method called Scrape.
The class takes a string parameter in the construction; this parameter is the complete URL of the web page
you want to get the title of.
TitleScraper TitleObject = new TitleScraper(@"http://example.com");
After construction of a TitleScraper object you just call the method Scrape.
string title = TitleObject.Scrape();
And you now have the title as a string in the title variable.
Hope this will help you.