html - PHP Regex to match everything between <body style=...> and </body> tag -


I have found a curl function that catches everything on a specific page, but I only have to I want the elements only. Send me the & lt; Body & gt; and & lt; / Body & gt; To match everything, this nifty rezox got what worked, but then I realized that the pages in which I need to use curls actually have a body tag with style information inside them. , So that I really want to match << em> & lt; Body style = ... & gt; and & lt; / Body & gt; Does anyone know the regex expression for that match? Here are all my code here ...

  & lt ;? Php error_reporting (E_ALL); Ini_set ("display_errors", "1"); $ PageToLoad = $ _POST ['Load']; Function get_data ($ url) {$ ch = curl_init (); $ Timeout = 5; Curl_setopt ($ CH, CURLOPT_HEADER, 0); Curl_setopt ($ ch, CURLOPT_URL, $ url); Curl_setopt ($ CH, CURLOPT_RETURNTRANSFER, 1); Curl_setopt ($ CH, CURLOPT_CONNECTTIMEOUT, $ timeout); Curl_setopt ($ CH, CURLOPT_SSL_VERIFYPEER, incorrect); Curl_setopt ($ CH, CURLOPT_FOLLOWLOCATION, true); Curl_setopt ($ CH, CURLOPT_USERAGENT, 'Mozilla / 5.0 (Windows; U; Windows NT 5.1; N-US; RV .: 1.8.1.13) Gecko / 20080311 Firefox / 2.0.0.13'); $ Data = curl_xac ($ ch); Curl_close ($ ch); $ Return data; } $ Html = get_data ($ pageToLoad); $ NewHtml = preg_match ("~  

When you are included in the attributes as part of your search pattern An attribute value can be either single or double cited, and will be able to manage most parsons, even if some have forgotten to quote, or the quotes may not match. Since you are only looking for a special feature name, its easy but still available, such as if you are searching for the attribute names that exist in the form of values ​​in another attribute.

(HECK, your original simple regex will match incorrectly to some incompatible strings such as gt <... .

Since a style feature is almost always an equal sign, I will use that fact to find it. I will also make sure that I match the element of the body, and not some impossible mutant, As with the example above.

   gt;] * style \ s * = [^> gt] * & gt; (. * ?) & Lt; / body & gt;  

It is essentially your original regex but between \ s [^> gt;] * style \ s * = .

  1. \ s < / Code> ensures that there is space after the body element so that it is only one body element.
  2. [^ & gt;] * matches any character , But & gt; 0 or more times
  3. Style
  4. \ s * Allows white space between
  5. = matches string "="

    Make me an example Has pressed hard to think about who will foot the Rigeks, which will not cause any problems with the parser. I think someone in the opening of the element & lt; A white space has been added between and body , or they had space or any other letter in the end of the body . Plus anyone has to leave all the elements of closed body together.

    You can add examples to regex, but perhaps in any case you will encounter in the wild, which I have given is ok work.


Comments

Popular posts from this blog

apache - 504 Gateway Time-out The server didn't respond in time. How to fix it? -

c# - .net WebSocket: CloseOutputAsync vs CloseAsync -

c++ - How to properly scale qgroupbox title with stylesheet for high resolution display? -