Wednesday, May 28, 2008

C++ Regular Expression for Perl Programmer

Boost Libraries are probably the best things happened to C++ after STL. With a great emphasis on generic programming and heavy use of templates, Boost allows C++ to be used as a higher level language than it used to be. Although the implementation of Boost libraries is somewhat complex, using them is actually quite easy.

One of the potentially useful libraries is the regular expression library, which provides Perl-like regular expression support. The following is a Perl script to extract the host name from an URL:
my $url = "http://www.mywebsite.com/index.html";
if ($url =~ /http:\/\/((\w+\.)+\w+)/) {
print "$1\n";
}
Here is a C++ program doing similar thing using Boost regular expression library:
#include <iostream>
#include <string>
// Boost regular expression header file
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

int main()
{
string url = "http://www.mywebsite.com/index.html";
regex re("http://((\\w+\\.)+\\w+)/");
smatch what;
if (regex_search(url, what, re)) {
cout << what[1] << endl;
}
}
The regular expression used in Boost is very similar to Perl's. However, the escaping of characters is somewhat different. For example, in Boost, there is no need to escape '/' (doing so will cause a "unrecognized escape character" compiler error). While the '\' for special notations like "\w", "\s" must be escaped in Boost, so it looks like "\\w".

The line:
smatch what;
defines a variable what to store the matching results. what[1] is equivalent to $1, what[2] is to $2, and so on.

As you can see, the use of regular expression in Boost is quite straightforward for Perl programmer.

No comments: