23 September, 2010

Regular Expressions using Javascript

Sometimes regular expressions can be confusing, but they are great tools for matching a string against a character pattern. They are used for validation of user entry or changing the document content. You can replace a 40 line if/else code with just one line of regular expression. For starters, regular expressions can be fear striking but as you move on with the flow its keeps of getting more and more interesting.
Different languages support regular expressions and they are not as tough as they seem at first sight. Many languages support "find" ,"replace" and "search" feature in regular expressions.
So in this article i will be writing about regular expressions using javascript, to the point that I am also informed to.
Basic Syntax
Let's say you want to search for a string "rain" in a text. You can use two different formats for this.
First is using String Notation


var srchFor = /rain/; //do not use quotation marks
//Second is Object Constructor
var srchFor = new RegExp('rain');
//You can check this as follows:
var str = "When will it rain";
alert(srchFor.test(str));


The expression can be checked using match(), test(), search() or exec() method, all of these are listed later.
Let's say now you want to match a word that starts with some string, in this case, 'rain', you can check this by:


var srchFor = /^rain/;
var srchFor = new RegExp('^rain');
//Or you want to match a word that has just some string and en, in this case, has only 'rain' in it:
var srchFor = /^rain$/;
var srchFor = new RegExp('^rain$');


Here, ^ and $ are starting and ending indicator respectively.
And if want to match a word that ends with some string, in this case 'rain', then


var srchFor = /rain$/;
//Case sensitivity can also be checked. Like,
//if you want to find a word that ends with 'rain' regardless of the case,
//then:
var srchFor = /rain$/i; //Here i refers to case-insensitive
var srchFor = new RegExp('rain$', 'i');


Lets, say we have string that have the word 'rain' several times, now if we want to match a word that is repeated several times, then we can add a global
parameter 'g', doing this will return the matches as array. Such as:

var srchFor = /rain/g; //g refers to global


By default regular expressions match patterns only in single-line strings, so if we want to have a match for multiline strings using regular expressions then 'm' can be used. Such as,

var srchFor = /rain/m; //This matches for rain in multiline

And the parameters can be used in conjunction as well, like,

var srchFor = /rain/igm; OR /rain/gim OR /rain/min; etc in any order

So the above pattern matches 'rain' in multiline,regardless of case and returns array.

Period Character (.)
The dot character means match anything. Such as, match a string that has r at beginning and n at end. Such as,

var srchFor = /r.n/;

The above pattern matches 'ran', 'rin', 'ron', or even r#n or r n, etc.
However you can limt your choices by using square brackets, like

var srchFor = /r[au]n/;

The above pattern matches 'ran' or 'run'.
Exclude you choice, such as match a string excluding some , such as if you want to exclude 'ran' from searcgh then, do the following,

var srchFor = /r[^a]n/;
//But the square brackets match only one character at a time,
//so if you want to match multiple characters then pipes can be used,
var srchFor = /r(^a|u|i|eig|e)/;

This matches 'run', 'rin', 'reigh' and 'ren' but does not match 'ran'.

Escaping Characters
Certain characters need to be escaped, such as: +, /, -, (, ), *, {, }, and ?
Such as /r.n/ matches ran, run, but /r\.n/ only matches "r.n".
Lets us say, you want to validate email address using regular expression, then do the following:


var srchFor = /^[\W]+(\.[\W]+)*@([\W]+\.)+[a-z]{2,7}$/i;


In above case,
\W is shortcut for [^a-zA-Z0-9_]; match characters that have a to Z characters or 0 to 9 and underscore.
+ means 1 or more times possible
* means 0 or more times possible
? means 0 or 1 times possible
{n} means n times possible
{n,m} means n to m times possible
So,

var srchFor = /^[\W]+(\.[\W]+)*@([\W]+\.)+[a-z]{2,7}$/i;

/^[\W]+(\.[\W]+)* matches sudhi, or sudhi.test
then add @ symbol, then
([\W]+\.) matches oncemore, oncemore.co
then add dot (.),
[a-z]{2,7} means 2 to 7 times the a-z characters are possible.
Other shortcuts are
\d means [0-9] Only integers
\D means [^0-9] All characters but integers
\w means [a-zA-Z0-9_] All alphanumeric characters and the underscore
\W means [^a-zA-Z0-9_] All nonalphanumeric characters
\b means N/A Word boundary
\B means N/A Not word boundary
\s means [\t\n\r\f\v] All whitespace
\S means [^\t\n\r\f\v] No whitespace

Methods Using Regular Expressions
There are several methods that take regular expressions as parameters. The expression itself—
the things inside the slashes or the RegExp constructor—is called a pattern, as it matches what
you want to retrieve or test for.
• pattern.test(string): Returns true or false depending on whether it matches the string
• pattern.exec(string): Returns array on finding match
• string.match(pattern): Returns array of strings on finding match
• string.search(pattern): Matches the string and the pattern and returns the positions and returns -1 if not found
• string.replace(pattern, replaceString): Matches the string against the pattern and replaces every positive match with replaceString.
• string.split(pattern, limit): Matches the string against the pattern and splits it into array



So, Regular expressions only match characters; you cannot do calculations with them. And they are language independent.

1 comment:

Anonymous said...

wow.. really informative. thanks