[C#] Could someone tell me what's wrong with this?

by Rubik
13 replies
Code:
using (WebClient wc = new WebClient())
                    {
                        string html = wc.DownloadString("URL");
                        var regex = new Regex("(?<=<p>)(.*\n?)(?=</p>)");
                        MatchCollection col = regex.Matches(html.ToString());
                        foreach(Match match in col) {
                            listBox2.Items.Add(match.Value.ToString());
                        }
                    }
I'm guessing it's because everything is within "using (WebClient wc = new WebClient())", but I don't know how to use my string "html" outside of it.

Any help would be appreciated. Thank you.
#wrong
  • Profile picture of the author IdeaBox
    Code:
    string html = "";
    using (WebClient wc = new WebClient())
                        {
                            html = wc.DownloadString("URL");
                            var regex = new Regex("(?<=<p>)(.*n?)(?=</p>)");
                            MatchCollection col = regex.Matches(html.ToString());
                            foreach(Match match in col) {
                                listBox2.Items.Add(match.Value.ToString());
                            }
                        }
    if you want to use it outside of that scope place it in the beginning of your class.
    {{ DiscussionBoard.errors[8427745].message }}
    • Profile picture of the author Rubik
      Originally Posted by IdeaBox View Post

      Code:
      string html = "";
      using (WebClient wc = new WebClient())
                          {
                              html = wc.DownloadString("URL");
                              var regex = new Regex("(?<=<p>)(.*n?)(?=</p>)");
                              MatchCollection col = regex.Matches(html.ToString());
                              foreach(Match match in col) {
                                  listBox2.Items.Add(match.Value.ToString());
                              }
                          }
      if you want to use it outside of that scope place it in the beginning of your class.
      Thank you, but what I meant by that is, will the regex code that I have after html = wc.DownloadString("URL"); work where it's at?

      With the code I had, html was the page's source.
      {{ DiscussionBoard.errors[8427766].message }}
  • Profile picture of the author IdeaBox
    It will indeed work where it's at as long as the work is NOT in new thread. If it is you'll get an error complaining about illegal cross-threading.

    If you're wanting to use your 'col' variable outside of your function simply define the variable outside of the function and replace

    Code:
    MatchCollection col = regex.Matches(html.ToString());
    with

    Code:
    col = regex.Matches(html.ToString());
    {{ DiscussionBoard.errors[8427791].message }}
    • Profile picture of the author Rubik
      Originally Posted by IdeaBox View Post

      It will indeed work where it's at as long as the work is NOT in new thread. If it is you'll get an error complaining about illegal cross-threading.

      If you're wanting to use your 'col' variable outside of your function simply define the variable outside of the function and replace

      Code:
      MatchCollection col = regex.Matches(html.ToString());
      with

      Code:
      col = regex.Matches(html.ToString());
      Thank you again.
      There must be something wrong with the code I have then. It's supposed to get the page source (which it does for sure), then search the source code for values between <p> and </p> and print them to a listBox, (which is isn't doing).
      {{ DiscussionBoard.errors[8427817].message }}
      • Profile picture of the author Brandon Tanner
        Originally Posted by Rubik View Post

        Thank you again.
        There must be something wrong with the code I have then. It's supposed to get the page source (which it does for sure), then search the source code for values between <p> and </p> and print them to a listBox, (which is isn't doing).
        Regex can be weird sometimes... I never use it unless I absolutely have to for. For what you want to do, I would use Split string instead. For the first split, use <p> as the delimiter. Discard the first value in the resulting array. Then for each of the remaining values in the array, split them by using </p> as the delimiter.

        BTW- if you don't know how to split a string using a multiple-character delimiter, see this.
        Signature

        {{ DiscussionBoard.errors[8428238].message }}
        • Profile picture of the author IdeaBox
          I don't think that would work to well. I'd use IndexOf and Substring. That is if you want to change your code around.

          Here is a working example that I actually use all the time.

          Code:
          public static List<string> GetDataInBetween(string content, string start,string finish)
                  {
                      var rtn = new List<string>();
                    
                      var finish = 0;
                      var index = 0;
                      while (index != -1)
                      {
                          try
                          {
                              index = source.IndexOf(start, finish, StringComparison.Ordinal);
                              if (index == -1) break;
                              finish = source.IndexOf(end, index, StringComparison.Ordinal);
                              var sub = content.Substring(index + start.Length, finish - index - start.Length);
                              if (sub != null) rtn.Add(sub);
                          }
                          catch (Exception)
                          {                    
                              index = -1;
                          }
                      }
                      return rtn;
                  }
          Just pass in your html string and it'll return a List<string> with all the values.

          Code:
          var p = GetDataInBetween(html,"<b>","</b>");
          foreach(var i in p)
           {
              listBox2.Items.Add(i);
          }
          {{ DiscussionBoard.errors[8428299].message }}
          • Profile picture of the author Brandon Tanner
            Originally Posted by IdeaBox View Post

            I don't think that would work to well.
            It works great. I've done it many, many times. Not to mention, Split string is almost always more efficient than Regex (ie less overhead).

            Substring would work fine too.
            Signature

            {{ DiscussionBoard.errors[8428373].message }}
  • Profile picture of the author IdeaBox
    try

    Code:
    <p[^>](.+?)</p>
    {{ DiscussionBoard.errors[8427854].message }}
    • Profile picture of the author Rubik
      Originally Posted by IdeaBox View Post

      try

      Code:
      <p[^>](.+?)</p>
      Thank you, but that didn't seem to work either.
      {{ DiscussionBoard.errors[8427880].message }}
  • Profile picture of the author IdeaBox
    give me a moment to open VS and test.
    {{ DiscussionBoard.errors[8427882].message }}
  • Profile picture of the author IdeaBox
    Code:
    private void button1_Click(object sender, EventArgs e)
            {
                using (WebClient wc = new WebClient())
                {
                    string html = wc.DownloadString("http://phonemonkey.us/");
                    var regex = new Regex("(?<=<p>)(.*n?)(?=</p>)");
                    MatchCollection col = regex.Matches(html.ToString());
                    foreach (Match match in col)
                    {
                        listBox2.Items.Add(match.Value.ToString());
                    }
                }
            }
    Works well for me
    {{ DiscussionBoard.errors[8427886].message }}
    • Profile picture of the author Rubik
      Originally Posted by IdeaBox View Post

      Code:
      private void button1_Click(object sender, EventArgs e)
              {
                  using (WebClient wc = new WebClient())
                  {
                      string html = wc.DownloadString("http://phonemonkey.us/");
                      var regex = new Regex("(?<=<p>)(.*n?)(?=</p>)");
                      MatchCollection col = regex.Matches(html.ToString());
                      foreach (Match match in col)
                      {
                          listBox2.Items.Add(match.Value.ToString());
                      }
                  }
              }
      Works well for me
      The problem must be something else on my side. Thank you for the help.
      {{ DiscussionBoard.errors[8427896].message }}
  • Profile picture of the author Rubik
    Thank you guys. I actually got it working, but what if I wanted to use something other than <p></p>?

    For example, I need to get the value 123456 from this...
    id":"123456"},"user

    I tried ("(?<=id\":\")(.*\n?)(?=\"},\"user)"); even thought that's probably way way off.
    {{ DiscussionBoard.errors[8428435].message }}

Trending Topics