C# - Get Source of a HTML Page

3 replies
C# - Get Source of a HTML Page

Simple C# bases software which a user enters the website url and this software retreives the source code and puts it into a text box, you could use this to scrape information from google and then extract specific items/information you want.



Code:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
// Included System.Net Class
using System.Net;
namespace getsauce
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
        private void button1_Click(object sender, EventArgs e)
        {
            // Create a new webclient
            System.Net.WebClient newClient = new System.Net.WebClient();
            newClient.Proxy = null;
            // Check if textbox1 contains 'http://' <- is required
            if (textBox1.Text.Contains("http://"))
            {
                // Create new variable, which downloads the website source
                string getsauce = newClient.DownloadString(textBox1.Text);
                // Put downloaded source into textbox
                textBox2.Text = getsauce.ToString();
            }
            // Else we'll add then download source
            else
            {
                // Add 'http://' then download page source, and put in textbox
                textBox1.Text = "http://" + textBox1.Text;
                // Download source
                string getsauce = newClient.DownloadString(textBox1.Text);
                // Display source into textbox
                textBox2.Text = getsauce.ToString();
            }
        }
    }
}
Original Source:
http://www.coderzspot.com/index.php?...f-a-html-page/
#html #page #source
  • Profile picture of the author vick2011
    Updated Source code a little bit.
    {{ DiscussionBoard.errors[4783381].message }}
  • Profile picture of the author liamweb
    Read URL Content in VB.NET

    Imports System.Net
    Imports System.IO
    Public Class Form1
    Private Sub Button1_Click(ByVal sender As System.Object, _
    ByVal e As System.EventArgs) Handles Button1.Click
    Dim inStream As StreamReader
    Dim webRequest As WebRequest
    Dim webresponse As WebResponse
    webRequest = webRequest.Create(TextBox1.Text)
    webresponse = webRequest.GetResponse()
    inStream = New StreamReader(webresponse.GetResponseStream())
    TextBox2.Text = inStream.ReadToEnd()
    End Sub
    End Class

    Liam
    {{ DiscussionBoard.errors[4828091].message }}
    • Profile picture of the author andrejvasso
      I have no idea of C#, but I am curious:

      Does this software display the "real", "generated" source (which you would get using some kind of webdev. plugin or by selecting all and viewing source code for selected part)?

      Or does it simply display what you would see when you "Right click -> View Source"?

      In case it does work like "Right click -> View Source", I´d suggest you change it to use the first method (this way you see the generated, "real" source code.. for instance when the page is using ajax you would see the dynamically loaded part too).
      {{ DiscussionBoard.errors[4832739].message }}

Trending Topics