Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions
 Discussion Forums
 HTML, XML, JavaScript...
 Software Reviews
 Editors,Others...
 Top100
 JavaScript Tutorials, ...
 Tutorials
 ASP, CSS, Databases...
 Discussion List
 FAQ, Roundup, Configure ...
 Authoring
 HTML, JavaScript, CSS...
 Design
 Layout, Navigation,...
 Graphics
 Tools, Colors, Images...
 Software
 Browsers, Editors, XML...
 Internet
 Domains, E-Commerce, ...
 WDVL Resources
  Intermdiate, Tutorials,...
 WDVL
 Discussion Lists, Top 100,...
 Technology Jobs


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us
Dental Insurance
Disney World Tickets
Find Software
Cell Phones
Get Business Software
Promotional Items
Promotional Products
Computer Deals
Promotional Pens
Corporate Awards
Promos and Premiums
Imprinted Promotions
Corporate Gifts
Imprinted Gifts

Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers



Quality Management ROI Calculator - Focus on Test Automation
The Rational Quality Management ROI calculator is intended to give you an idea of what return you can garner from implementing our functional testing solutions. Our quality management solutions offer tools to develop a continuous process, powered by automation to govern software delivery. »

Gartner MarketScope: Application Quality Management Solutions, 1Q 08
This Gartner MarketScope provides guidance for enterprises seeking to purchase tools to manage risk and software quality. We focus on tools fit for large-scale enterprise use and that are ready out of the box to manage quality requirements and functional testing. »

Whitepaper: Tips for Writing Good Use Cases
Writing a good use case isnt easy, but, fortunately, our experience can be your guide. The concepts and principles assembled here represent the works of many people at IBM, and they form a foundation of proven best practices. »

Whitepaper: The Role of Integrated Requirements Management in Software Delivery
Learn about the critical role integrated requirements management can play in helping ensure your business goals and IT projects are continuously aligned-whether you are sourcing, integrat-ing, building or maintaining your software. It also looks at ways that integration and automation can help ensure managing projects and the required changes can be executed using manageable processes that satisfy stakeholders and development teams. »
Top 10 Articles
  1. Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions
  2. JavaScript Tutorial for Programmers
  3. Design
  4. JavaScript Tutorial for Programmers - Objects
  5. JavaScript Tutorial for Programmers - JavaScript Grammar
  6. JavaScript Tutorial for Programmers - Versions of JavaScript
  7. Cascading Style Sheets
  8. JavaScript Tutorial for Programmers - Embedding JavaScript
  9. JavaScript Tutorial for Programmers - Functions
  10. Authoring JavaScript
Domain Name Lookup
Search to find the availability of a domain name. Just enter the complete domain name with extension (.com, .net, .edu)

Parsing Attributes with Ease

August 9, 1999

Parsing the <TITLE> tag is particularly easy because it is a simple tag with no extra attributes. Many tags, though, do possess modifying attributes which are crucial to parsing. Consider, for example, the <META> tag which possesses two attributes: NAME and CONTENT. Seen in the wild, a <META> tag may look like this in its native habitat:

<META 
	 NAME="Keywords" 
	 CONTENT="food,cuisine,cooking,recipes">

Imagine that we're writing code to parse a document for its meta keywords. Since the <META> tag can contain information other than keywords (e.g. description, author, copyright, etc.), we can't simply grab the first <META> tag we find and call it a day. Rather, we need to analyze the NAME attribute of each <META> tag until we find the "Keywords"-specific tag, and then we can harvest the information contained in the CONTENT attribute.

Parse only for the "Keywords" META tag.
sub parse_meta_keywords{
#parse and output meta data
$parser=HTML::TokeParser->new(\$webPage);
while (my $token=$parser->get_tag("meta"))
 { if ($token->[1]{name}=~/keywords/i)
    { print "<p><h2>Meta Keywords</h2> ".
            $token->[1]{content}."</p>" }
 }
}

Once again, we reset TokeParser to the start of the document contained in $webPage. The while loop ensures that TokeParser will find each <META> tag in the document, not merely the first. We want to analyze each tag to see if it's a "Keyword" tag. Although there shouldn't be more than one such tag, and we could justifiably exit the loop once the tag has been found, it is possible that somebody thoughtlessly placed multiple <META> Keyword tags in a single document.

TokeParser's get_tag() method grab the next <META> tag it sees and returns its various components as array reference assigned to $token. Array references can be confusing, and some of the syntax that follows is weird because we're dealing with an array reference. It may be best not to worry about why in this case and focus on how -- you can simply replicate this syntax in your own code without too much heartache over why the syntax looks as it does.

Suffice it to say, we can access the attributes of the tag as a hash of $token->[1]. Thus, $token->[1]{name} returns the value of the NAME attribute in this tag. Similarly, $token->[1]{content} will return the CONTENT attribute, and you can extend this syntax to any attribute for whatever tag you are parsing.

In our example, we check to see if the NAME attribute for the snagged tag contains "Keywords" (case insensitive). If yes, the CONTENT attribute of this tag is output to the screen; otherwise, this is not a Keywords <META> tag and we move on to find the next <META> tag, until parsing has completed.


Pulling Tags Like Taffy: TokeParser
The Perl You Need to Know
The Proof is in the Parsing: A Web Page Summarizer


Up to => Home / Authoring / Languages / Perl / PerlfortheWeb




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers