Thursday, January 22, 2009

Using Perl and Regular Expressions to Process HTML Files - Part 1

Like many web content authors, over the past few years I've had many occasions when I've low cost call conferencing to clean up a bunch of HTML files that have been generated by a word processor or publishing package. Initially, I used to clean up the files manually, opening each one in turn, and making the same set of updates to each one. This works fine when you only have a few files to fix, but when you have hundreds or even thousands to do, you can very quickly be looking at weeks or even months of work. A few years ago someone put me on to the idea of using Perl and regular expressions to perform this 'cleaning up' process.

Why write an article about Perl and regular expressions I hear you say. Well, that's a good point. After all the web is full of tutorials on Perl and regular expressions. What I found though, was that when I was trying to find out how I could process HTML files, I found it difficult to find tutorials that met my criteria. I'm not saying they don't exist, I just couldn't find them. Sure, I could find tutorials that explained everything I needed to know about regular expressions, and I could find plenty of tutorials about how to program in Perl, and even how to use regular expressions within Perl scripts. What I couldn't find though, was a tutorial that explained how to open one or more HTML or text files, make updates to those files using regular expressions, and then save and close the files.

The Goal

When converting documents into HTML the goal is always to achieve a seamless conversion from the source document (for example, a word processor document) to HTML. The last thing you need is for your content birth control to be spending hours, or even days, fixing untidy HTML code after it has been converted.

Many applications offer excellent tools for converting documents to HTML and, in combination with a well designed cascading style sheet (CSS), can often produce perfect results. Sometimes though, there are little bits of HTML code that are a bit messy, normally caused by authors not applying paragraph tags or styles correctly in the source document.

Why Perl?

The reason why Perl is such a good language to use for this task is because it is excellent at processing text files, which let's face it, is all HTML files are. Perl is also the de facto standard for the use of regular expressions, which you can use to search for, and replace/change, bits of text or code in a file.

What is Perl?

Perl (Practical Extraction and Report Language) is a general purpose programming language, which means it can be used to do anything that any other programming language can do. Having said that, Perl is very good at doing certain things, and not so good at others. Although you could do it, you wouldn't normally develop a user interface in Perl as it would be birth control easier to use a language like Visual Basic to do this. What Perl is really good at, is processing text. This makes it a great choice for manipulating HTML files.

What is a Regular Expression?

A regular expression is a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are not unique to Perl - many languages, including JavaScript and PHP can use them - but Perl handles them better than any other language.

In part 2, we'll look at our first example Perl script

About the Author: John Dixon is a web developer working through his own company dixondevelopment.co.ukJohn Dixon Technology Limited The company also develops and supplies a dixondevelopment.co.uk/earningstracker.htmfree accounting-bookkeeping software tool called Earnings Tracker. The company's web site contains various articles, tutorials, news feeds, and a finance and business blog.

Guild Wars - Eye of the North Gameplay Features

Dungeons and the Guild Wars Eye of the North Maps

Most of your time in Eye of the North will be spent in dungeons going as deep as 5 levels each. Some dungeons add puzzle aspects refinance my mortgage the game to a slight degree and require to collect keys to advance to the next level of that particular dungeon. Each dungeon will have it's own boss on the final level of that dungeon. You must defeat this special boss to clear the dungeon. The completion of a dungeon will reward all in the party some decent loot.

The maps in Eye of the North vary slightly from previous games. In Eye of the North mission maps typically will display overlapping the region map. On the region maps, undiscovered regions will be transparant, revealing just a little. Mission maps on the other hand are completely blurred. Only after you completely explore the dungeon will you be able to have full view of the map. If you leave a dungeon unexplored it will be reverted to black when you return.

GW: Eye of the North does provide you with a little help when exploring dungeons. Usually somewhere near the entrance into the dungeon you can find a dungeon area map that will highlight important areas of the dungeon. Since dungeons will usually require keys and the defeat of the dungeon boss, att conference call area maps in Guild Wars Eye of the North will highlight both of these with icons.

Each time you complete a dungeon you will be rewarded with a new completed page in your quest logbook. You can then exchange completed quest logs to one of the Title Track factions for massive amounts of reputation and experience points.

The maps in Eye of the North, and really the whole expansion seems to resemble Blizzard's Diablo series. Dungeons have always held a major role in Guild Wars games and have become a major staple of the game in the expansion, similar to Diablo's dungeon crawling.

Guild Wars: Eye of the North Mini games:

Eye of the North brings the zyban of mini games to the Guild Wars universe. Different factions have their own mini games that are used to earn bonus reputation points in that particular faction.

Dwarven Boxing is obviously similar to boxing which is played against a computer controlled opponent. A "K.O." is required to win a round.

Polymock is a game that is strangely similar to Pokmon. Each player is allowed 3 creatures to do their battling.

The Norn Fighting Tournament is my favorite mini game of all. This mini game is very much like your regular fighting game such as Street Fighter. You must win a majority of the six total matches to win the rewards.

Tom Kranz is a an avid player of eyeofthenorth.infoEye of the North the expansion to the award winning aboutguildwarsGuild Wars Follow the links for more information on hubpageshub/Eye-of-the-NorthEye of the North maps and Guild Wars gold.