Appendix: Regular Expressions

Regular expressions are a powerful method of matching strings in your text. This appendix describes the following:

Special characters in regular expressions

Different characters in a regular expression match different things. A list of all special (or ``magical'') characters is as follows:


Note: In the list below, the phrase ``an occurrence of a regular expression'' means an occurrence of a string that matches the regular expression.

If a regular expression could match two different parts of the line, it matches the earlier one. If both begin in the same place, but match different lengths, or the same length in different ways, then the rules are more complicated.

In general:

The match chosen is the one that uses the earliest possibility in the first choice that has to be made. If there's more than one choice, the next is made in the same manner (earliest possibility), subject to the decision on the first choice, and so on.

For example, (ab|a)b*c could match the string abc in one of two ways. The first choice is between ab and a. Since ab is specified earlier in the expression and does lead to a successful overall match, it's chosen. Since the b is already spoken for, the b* must match its last possibility, since it must respect the earlier choice.

If there are no OR bars (|) present and only one *, +, or ?, the net effect is that the longest possible match is chosen. So the regular expression ab*, presented with xabbbby, matches abbbbb. Note that if ab* is tried against xabyabbbz, it matches ab just after x, because that occurrence starts earlier in the string than abbb.

Regular expression BNF

A pseudo-BNF for regular expressions is:

reg-exp
{branch}|{branch}|...
branch
{piece}{piece}...
piece
{atom{* or + or ?}}{atom{* or + or ?}}...

The following table lists what each of the special characters matches:

Character: Matches:
*0 or more occurrences of atom
+1 or more occurrences of atom
?an occurrence of atom, or the null string.
atom
(reg-exp) or range or @ or ^ or $ or \char or char
range
[{^} char and/or char_lo-char_hi]

where ^ causes negation of range.

. (dot)
Match any character.
^
Match the start of the line.
$
Match the end of the line.
@
Search with case sensitivity.
~
Search without case sensitivity.
!
If an exclamation mark (!) occurs as the first character in a regular expression, all magic characters are treated as special characters. An exclamation mark is treated as a regular character if it occurs anywhere but at the very start of the regular expression.
char
Any character
\char
Forces \char to be accepted as char (that is, ignoring any special meaning of char)

Regular expressions and filenames

When specifying a filename in the Watcom Debugger, it's possible to use a file-matching regular expression. This expression is similar to a regular expression, with a few differences:

  1. A dot (.) specifies an actual dot in the filename.
  2. An asterisk (*) is the same as (matches 0 or more characters).
  3. A question mark (?) is the same as a regular expression dot (.); that is, a question mark matches exactly one character.
  4. A caret (^) has no meaning.
  5. A dollar sign ($) has no meaning.
  6. The backslash (\) has no meaning. It and the slash (/) are used to separate directories in a path name.

Suppose we have the following list of files:

  • a.c
  • abc.c
  • abc
  • bcd.c
  • bad
  • xyz.c

The following examples show how the files from the above list are matched by various filename regular expressions.

a*.c
All files that start with a and end in .c. Therefore, it matches a.c and abc.c
(a|b)*.c
All files that start with an a or a b, and end in .c. Therefore, it matches a.c, abc.c, and bcd.c
*d.c
All files that end in d.c. Therefore, it matches bcd.c
*
All files.
*.*
All files that have a dot in them. Therefore, it matches a.c, abc.c, bcd.c, xyz.c