quagmire02 All American 44225 Posts user info edit post |
okay, i'm new to parsing/evaluating XML using PHP and this is serving as one of my tests for myself...basically, it's a small site map, outlined in an XML file called sitemap.xml:
<?xml version="1.0" encoding="utf-8"?> <toc> <links> <link section="about products clients"><a href="/about/">About</a></link> <link section="about contact products clients">Products <subsection> <link section="about contact products clients"><a href="/prod1/">Product 1</a></link> <link section="about contact products clients"><a href="/prod2/">Product 2</a></link> <link section="about contact products clients"><a href="/prod3/">Product 3</a></link> </subsection></link> <link section="products clients"><a href="/clients/">Client Login</a></link> </links> </toc>
that's not the real site map (because the real one's much longer and has nothing to do with products or clients or anything), but you get the idea...anyway, it's loaded in index.php:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Site Map Test</title> </head> <body>
<!-- begin navigation --> <ul> <li><a href="index.php?p=about">About Us</a></li> <li><a href="index.php?p=clients">Information for Clients</a></li> <li><a href="index.php?p=contact">Contact Information</a></li> <li><a href="index.php?p=products">Product Information</a></li> <li><a href="index.php">Home</a></li> </ul> <!-- end navigation --->
<!-- begin content --> <?php
// initializes new XML DOM document $xmldoc = new DOMDocument();
// if the XML fails to load, displays an error if(!$xmldoc->load("sitemap.xml")) { die("Failed to load XML file."); }
// sets different section display criteria if (isset($_GET['p'])) $PAGE = $_GET['p']; switch ($PAGE) { case "about": $title = "About Us"; $section = "about"; break; case "clients": $title = "Information for Clients"; $section = "clients"; break; case "contact": $title = "Contact Information"; $section = "contact"; break; case "products": $title = "Product Information"; $section = "products"; break; default: $title = "Full Site Content"; $section = ""; break; }
// if the section isn't blank (all but full contents) // searches XML document for all occurrences of section attribute if ($section != "") { foreach($xmldoc->getElementsByTagName('link') as $element) { if(!$element->hasAttribute('section') || strpos($element->getAttribute('section'),$section) == false) { $element->parentNode->removeChild($element); } } }
// converts the filtered XML into a string $filtered = $xmldoc->saveXML();
// replaces XML tags with HTML tags for display purposes $filtered = str_replace("links","ul class=\"toc\"",$filtered); $filtered = str_replace("link","li",$filtered); $filtered = str_replace("subsection","ul",$filtered); $filtered = str_replace("/link","/li",$filtered); $filtered = str_replace("/subsection","/ul",$filtered); $filtered = preg_replace("/ section=\"[^\"]*\"/","",$filtered);
// displays the title of and the filtered results echo "<h1>".$title."</h1>".$filtered;
?> <!-- end content -->
</body> </html>
i don't get any errors when i open the index.php page...in fact, it shows the full site content just fine...but if i start selecting the links, while i don't get any errors, i don't get the correct displays...for example, if i select "about us", everything EXCEPT the "client login" link should show up (because they're all tagged that way), but that's not what happens...i get "products," "products 2," and "client login"
if you copy and paste those two snippets of code into their own files (sitemap.xml and index.php) and put them in their own folder, you can try it out for yourself...i'm sure i'm missing something crucial in my XML parsing, and it may be something VERY stupid on my part, but i don't see it
[Edited on January 28, 2009 at 9:47 AM. Reason : formatting]1/28/2009 9:46:35 AM |
evan All American 27701 Posts user info edit post |
first of all instead of using strpos i would explode the section attribute by whitespace, then use in_array() in your logic gate
i'm looking at the rest now.] 1/28/2009 10:22:23 AM |
quagmire02 All American 44225 Posts user info edit post |
^ that's a good point...i probably should explode it, instead
but no, it's still not working...where you posted it, if you click on "about us", you get:
About Us - Products - Product 2 - Client Login
but you SHOULD get (based on the tagging):
About Us - About - Products - Product 1 - Product 2 - Product 3
"client login" isn't tagged as "about", so it shouldn't up...conversely, both "about" and "product 1" and "product 3" SHOULD show up, but they don't
it might be because i'm using strpos(), though...i'm thinking that's the issue, since it seems to skip, but i don't immediately understand WHY...hmmm...
[Edited on January 28, 2009 at 10:29 AM. Reason : ah well, i'll keep this up just for reference ]1/28/2009 10:27:47 AM |
evan All American 27701 Posts user info edit post |
yeah
i think i see it now. the DOM parser parses EVERY tag it sees, not just your XML. it's picking up your <a> tags inside the link tags and removing them (via the removeChild in the if/then because they don't pass the !$element->hasAttribute('section') test).
also, santitize your superglobals before you use them, son exec() can do fun things.] 1/28/2009 10:38:48 AM |
quagmire02 All American 44225 Posts user info edit post |
Quote : | "it's picking up your <a> tags inside the link tags and removing them (via the removeChild in the if/then because they don't pass the !$element->hasAttribute('section') test)." |
i don't think so...the output is showing the <a> tags intact (because $element is set as 'link' only), yes?
Quote : | "also, santitize your superglobals before you use them, son exec() can do fun things." |
QFT...i have a sanitizing function that's run on the superglobal inputs...i just didn't include it in here
[Edited on January 28, 2009 at 10:55 AM. Reason : .]1/28/2009 10:53:40 AM |
scud All American 10804 Posts user info edit post |
you have markup inside of markup and that's just a no-no. If you use a CDATA block you can tell the parser not to treat what's inside as parsed data
Instead of: <link section="about contact products clients"><a href="/prod1/">Product 1</a></link>
Consider: <link section="about contact products clients"><[CDATA[<a href="/prod1/">Product 1</a>]]></link> 1/28/2009 12:04:04 PM |
BigMan157 no u 103354 Posts user info edit post |
for
strpos($element->getAttribute('section'),$section) == false
use === instead of ==
you're getting a return position of 0, which the double equal doesn't differentiate from false but the triple equal does
this should get you partway there
// if the section isn't blank (all but full contents) // searches XML document for all occurrences of section attribute if ($section != "") { foreach($xmldoc->getElementsByTagName('link') as $element) { if(strpos($element->getAttribute('section'),$section)===false) { $element->parentNode->removeChild($element); } } }
[Edited on January 28, 2009 at 12:21 PM. Reason : might as well just post it]
[Edited on January 28, 2009 at 12:21 PM. Reason : damn user tags ]1/28/2009 12:19:41 PM |
quagmire02 All American 44225 Posts user info edit post |
^^ doing that gives me errors:
Quote : | "Warning: DOMDocument::load() [domdocument.load]: StartTag: invalid element name" |
^ why do you say "partway"? that seems to do the trick 1/28/2009 12:25:22 PM |
BigMan157 no u 103354 Posts user info edit post |
leftover from various edits 1/28/2009 12:36:17 PM |
quagmire02 All American 44225 Posts user info edit post |
ah, in that case...many thanks to everyone's help...it's working splendidly, now...i don't think i'd ever have caught the necessity of the third =
1/28/2009 12:37:18 PM |
evan All American 27701 Posts user info edit post |
oh, hah, yeah, that's why i hate strpos.
=== is explicit equal, so 0 !== false. it matches on both value and type, so bools can't equal ints. == is just plain equal, so 0 == false. it doesn't give a crap about types.
strpos returns false if the string wasn't found anywhere, but 0 if it's the first character, which is the case for your "about" stuff.] 1/28/2009 12:58:54 PM |
Noen All American 31346 Posts user info edit post |
I know you are just learning this, but this is an incredibly bad way to be parsing XML.
I would very very highly recommend learning to use the XML parser built into PHP5+ http://www.php.net/xml
It's a royal pain in the ass to learn and setup for small parsing activities like you are doing here, but in the long run if you plan on doing anything real with XML it will quickly save you a ton of time and headaches in the long run. 1/28/2009 2:00:03 PM |
quagmire02 All American 44225 Posts user info edit post |
^ i don't mind suggestions as to better ways to do things...i'm just curious as to the reasons behind the suggestion...what's bad about the way i'm parsing it? a lot of overhead? messy?
is simplexml just a subset of the xml parser, or are they separate?
i don't plan on doing much with xml (at least, i don't have much cause to, right now)...really, i was bored at work and thought that a flat xml file would serve the purpose of a basic sitemap pretty easily and so i figured i'd screw around
[Edited on January 28, 2009 at 3:48 PM. Reason : .] 1/28/2009 3:46:36 PM |
evan All American 27701 Posts user info edit post |
traversing the DOM tree gets ugly in a hurry when you have even a moderately complex document. it's not fun at all.
simplexml is just another extension, like libxml or the xml parser or any of the other XML extensions. http://us3.php.net/manual/en/refs.xml.php 1/28/2009 4:17:33 PM |
Noen All American 31346 Posts user info edit post |
^hit the nail on the head. Handling simple structures is pretty easy to code-your-own, but it gets very unpleasant quickly when you start flexing your xml muscles.
And like so many things in PHP, it's worth learning how the parser works to understand the basics, and then go find an extension library to obfuscate the calls and make life easy on you. I learned this lesson the hard way back when php5 first hit, trying to write my own full parser implementation. The deeper I got into it, the more I kept having to refactor the code to get its functionality expanded.
I ended up using libxml + a few modifications and it made life a lot more fun 1/28/2009 5:52:31 PM |
quagmire02 All American 44225 Posts user info edit post |
so, in the collective opinion of those who know more than me...simplexml or xml parser?
also, the suggested code for this:
Quote : | "you have markup inside of markup and that's just a no-no. If you use a CDATA block you can tell the parser not to treat what's inside as parsed data" |
didn't work...anyone have suggestions as to what i should put there? i mean, i wasn't aware that you couldn't have markup within markup, but if that's the case, how should i structure it?
i'm not denying that i should learn the xml parser, for future reference, but are there any suggestions as to how i should restructure/recode the xml document itself? this is purely for my own edification...if i'm not following the standards (in that, technically, my coding/structure is wrong or breaks the rules), please let me know what i should change
thxu, all
[Edited on January 29, 2009 at 9:13 AM. Reason : .]1/29/2009 8:59:31 AM |
evan All American 27701 Posts user info edit post |
he was mainly talking about how you have the 'a' tags within 'link' tags without explicitly stating that those are not part of your xml markup but are content.
and also how you have text data and xml markup within the same tag (the 'link' tag with Products at the end of it, immediately followed by the 'subsection' tag)
neither of those are correct xml basically, only put one type of data within a tag. if it's more xml markup, fine. if it's html, stick it in a cdata block.
and personally i like the xml parser, it's more flexible
] 1/29/2009 9:22:56 AM |
qntmfred retired 40726 Posts user info edit post |
1/29/2009 9:56:10 AM |
A Tanzarian drip drip boom 10995 Posts user info edit post |
When you guys (who do this professionally) are outlining an XML document, what do you take into consideration when deciding if information should be included as an attribute or as element content? 1/31/2009 1:11:19 PM |