Introduction
to Regular Expressions (RegEx) in AutoHotkey
By Jack Dunning
Regular Expressions (commonly called RegEx or RegExp) in AutoHotkey is
not a beginning level script writing topic and there certainly is
nothing regular about Regular Expressions. I've spent a number of
months exploring the programming tool and have developed a healthy
respect for its flexibility and power. Many (including myself) have
avoided using RegEx due to its enigmatic code which at times appears
almost incomprehensible. It's not like normal program code with If-Then-Else
statements and Loops.
Writing a RegEx is not merely a matter of following a logical sequence.
It often requires a non-linear look at the problem. I've found that
what helps me most is the analogy I picture in my brain pan. That image
gives me a basis for what a RegEx is trying to do. ("Try" is a good
word when describing RegExs. Whereas the usual programming either works
or doesn't work, RegEx "tries" to find pattern matches. If none are
found, it moves on.)
Wikipedia
describes a Regular Expression as "a sequence of characters that forms
a search pattern, mainly for use in pattern matching with strings, or
string matching, i.e. 'find and replace'-like operations." I would
describe RegEx as a data mining machine. RegEx is like a train rolling
down a track of computer characters looking for patterns which match a
specific set of given parameters. If it finds characters which match
the pattern set, it grabs them and puts them on board the train.
As
the RegEx train runs down the line, it continues picking up
characters—as long as they fit the written instruction set. Some groups
of characters may be saved for later reuse (backreferences). At times
RegEx may look back at previous characters for validation or forward to
coming data for confirmation (backward and forward assertions—see Chapter Twelve of A Beginner's Guide
to
Using Regular Expressions in AutoHotkey).
While a particular RegEx may be forgiving in what it will accept on
board, if the pattern does not completely match the given set of
criteria, the entire group (including all previously collected
characters) is kicked off the train and RegEx continues rolling along
looking for the another possible set of matching characters. This
continues until it either hits the ends of the line or finds a complete
solution to its data schedule. Then RegEx stops. The RegEx data mining
machine can be started up again by placing it in a Loop which restarts the same
search from a point just beyond its current solution.
This
data mining train is the image I visualize when working with a RegEx.
The key to understanding RegEx is knowing what the conductor is trying
to do when it interprets the special symbols in a RegEx set of
instructions to bring the right character passengers on board. It took
me a while to comprehend that the primary purpose of the AutoHotkey
RegEx functions are data extractions, RegExMatch(), or data
correction, RegExReplace() as discussed in Chapter Three of A Beginner's Guide
to
Using Regular Expressions in AutoHotkey.
Practical
Uses for RegEx in AutoHotkey
Maybe
the most important question is, "If Regular Expressions can be so
confusing, why bother?" Often when doing simple text searches or
replacements it's quicker and easier to use functions built into a
scripting language. RegEx may be adding needless complication. However,
a RegEx might do with one expression what takes several lines of code
when using those other functions. It may take slightly longer to
complete (a few more microseconds), but the added flexibility could
make the seemingly impossible a reality. RegEx has more power and
flexibility than a standard search and/or replace.
For
example, IP addresses are many and varied—although they all conform to
the same pattern. Each IP consists of four numbers (one to three digits
long and between zero and 255) separated by a dot. With the proper
RegEx, the engine can search through a document pulling out only the IP
addresses. Then those extracted addresses can be used to find where the
IP is located. AutoHotkey with RegEx creates
a Web IP lookup app finding
extracted IP address locations throughout the world. (This example and
many more of the following examples using RegEx in AutoHotkey are
discussed in the
e-book A Beginner's Guide
to Using Regular Expressions in AutoHotkey.)
Another
use for a RegEx may be to find
duplicate words in a
document. This can be done with other functions, but it would take a
few lines of code with conditionals (If-Then), whereas only one
RegEx line is needed. How about swapping
the first and last words in
selected text?
Maybe
you want to strip all
of the HTML code out of
a Web page leaving only the text? Or, possibly you need to extract a
list of all of the Web
links found in a Web
page? Regular Expressions are the best way to ensure that a properly
formatted, valid e-mail
address is entered into
a data field.
Pulling
the numbers out of
alphanumeric data is
relatively simple with RegEx. Or, maybe a key symbol (escape character)
needs to be inserted in
front of (or behind) each
in a group of special characters.
If
it's a pattern you need to locate (and possibly manipulate) in your
haystack of data, then RegEx may be your best bet for finding that
needle. This may be all the incentive you need to explore the mysteries
of Regular Expressions.
The
Mechanics of RegEx in AutoHotkey
Critical
to using a RegEx is understanding how it works. RegEx is a system for
finding matches within strings (text) which may be file names,
variables, or the contents of a file. (Such a search string is called
the "Haystack" in the online documentation for the AutoHotkey Regex
functions, while the search expression is called the "NeedleRegEx"—as
in "needle in a haystack.").
Knowing
what a RegEx will actually do depends on how well you understand what
it is trying to do. A RegEx starts at the beginning of a string and
looks at each character one by one until if finds a match for the
entire expression. If it finds a match (NeedleRegEx) it stops looking,
otherwise it continues until it reaches the end of the input string
(Haystack). In AutoHotkey the function used for matching a RegEx is RegExMatch() which
returns the numeric location of the first character of first occurrence
of a match. (A numeric location is found by counting the number of
characters from the beginning of the Haystack to the first character in
the NeedleRegEx.)
For
example, in its simplest form the NeedleRegEx might be a lowercase a (or any other letter,
number, or character). The RegEx engine will search the Haystack
looking for an a.
If found, it stops and returns the location of the letter:
FoundPos := RegExMatch(Haystack, "a")
FoundPos is the location of the
first occurrence and Haystack is the input string. Note
that the RegEx itself (a the needle we want to find)
appears within double quotes. If Haystack is "the quick brown fox
jumped over the lazy dog" the a in "lazy"
is found at position number 38 (FoundPos)
(or the 38th character in the string including spaces). If there is no a in Haystack, the needle is
not matched andFoundPos returns 0 (zero).
To
make the RegEx slightly more complicated we add another character to
our RegEx:
FoundPos := RegExMatch(Haystack, "ab")
Now
our needle in the Haystack is the ab letter combination. RegEx
will again look for the letter auntil
it finds a match. Only then will it look at the letter "b" for a match
of the next character. If there is no following "b" then it drops
everything and continues looking for the next "a" again. For example, ifHaystack is "Abby has always been
absent from the abbey", then FoundPos is 22.
What?
That FoundPos coincides with the "ab" in "absent", not the first "ab" in "Abby."
This brings us to an important concept—RegEx is case sensitive. If you
want to find a capital letter, it better be capitalized in the RegEx.
The word "Abby" in the haystack is skipped as a match because the "A"
is uppercase while the needle is "a" lowercase.
Note:
There is an option to make the RegEx case insensitive, but that will be
left for another chapter. That's the problem with RegEx. There are so
many possibilities and options that it's easy to get confused.
As
RegEx moves through the Haystack it stops at each letter
"a", then checks for a letter "b" immediately following it, but none
are found until reaching the word "absent" starting at position
22. Having found a complete match, RegExMatch() stops.
This
is the essence of how RegEx works. If more characters are added to the
expression'sNeedleRegEx,
then more is required to find a match. However, in the problem of
validating numbers (for example in the Calorie Count app originally
discussed in the book AutoHotkey
Applications) the digits can be any numbers, but no letters.
Using
Ranges in RegEx Matches
The
simple way to match any number in the RegEx is to give a range of
options. This is done by enclosing all the optional characters within
square brackets […].
For example, placing all the vowels within square brackets makes each
one a possible match:
FoundPos := RegExMatch(Haystack, "c[aeiou]t")
This
function would return FoundPos for "cat",
"cot",
or "cut"—whichever
one is found first. Proceeding through the Haystack, the RegEx engine
stops at each occurrence of the letter "c", then tries to match the
next character with either "a", "e", "i", "o", or "u", but no other
character. If one of those options is not found, the search continues
looking for another "c" character. If found, the vowels are checked
again. If there is a match, the third character is checked to see if it
is the "t" character. If yes, the RegEx engine stops searching and
returns the location of the "c" character. If no, it continues moving
down the Haystack until it either finds a complete match or reaches the
end of the line.
In
our situation we want to use the numeric digits [0123456789]. (The
order of the digits inside of the square brackets doesn't matter.) If
we wanted to match two digits in a row then [0123456789][0123456789] would
do the job. The problem is that we don't know how many digits in a row
we need to match. It could be one, two, three or more—at least
theoretically. At those times when you don't know how many characters
will occur in a row, rather than repeating the range for each matching
character, adding the plus + sign after the range (or
character) will do the job:
FoundPos := RegExMatch(Haystack, "[0123456789]+")
This
RegEx search function will match one or more digits in a row until a
non-digit is encountered—returning the location of the first digit in FoundPos.
Tip:
Ranges of numbers or letters can be shortened by using a hyphen. For
example, [0-9] is the same as [0123456789]. [A-Z] is the same as all capital
letters while [a-z] is all lowercase letters.
All letters and digits can be represented by [a-zA-Z0-9]. To
shorten the expression for the numeric digit range even more use \d in place of [0-9]. Our shortened
function becomes:
FoundPos := RegExMatch(Haystack, "\d+")
This
will match one or more numeric digits in a row.
There
are many more symbols and operators used in Regular Expression.
For a overview see this AutoHotkey RegEx Quick Reference.
This is a simplified introduction. To truly understand how to use
Regular Expressions there are numerous online tutorials, but there is
no substitute for doing it yourself.
For
a more detailed example of how AutoHotkey Regular Expressions can solve
difficult search-and-replace problems, see "A Perfect Place to Use an AutoHotkey Regular
Expression (RegEx in Text Replacement)."
A Beginner's Guide to Using Regular Expressions in
AutoHotkey: Exploring the Mysteries of RegEx
This
Beginner's Guide to
Using Regular Expressions in AutoHotkey
is not a beginning level AutoHotkey book, but an introduction to using
Regular Expressions in AutoHotkey (or most other programming
languages). To get the most from this book you should already have a
basic understanding of AutoHotkey (or another programming language).
Regular Expressions (RegEx) are a powerful way to search and alter
documents without the limitations of most of the standard matching
functions. At first, the use of RegEx can be confusing and mysterious.
This book clears up the confusion with easy analogies for understanding
how RegEx works and examples of practical AutoHotkey applications.
"Regular Expressions in AutoHotkey" will take you to the next level in
AutoHotkey scripting while adding more flexibility and power to your
Windows apps. (This book is also available
at Amazon.com)
For More Information
If you're interested
in testing AutoHotkey to see if it might be right for you, then go to "Installing
AutoHotkey and Writing Your First Script."
This page shows you how to get up and running with AutoHotkey,
plus it offers links to other articles on how to use AutoHotkey.
To see more of the many possible applications for AutoHotkey check out
"Free AutoHotkey Scripts and Apps for Learning."
If
you want more information in either the Amazon Kindle format, EPUB
format for use on the iPad and other types of tablet computers (or on
your
PC), or PDF for printing on notebook size paper, then check out the
following e-books by Jack Dunning:
See how
to get this
e-book
FREE, AutoHotkey
Tricks You Ought to Do with Windows!
Now
available in e-book format, Jack's A
Beginner's Guide to AutoHotkey, Absolutely the Best Free Windows
Utility Software Ever!: Create Power Tools for Windows XP, Windows
Vista, Windows 7 and Windows 8.
Building Power Tools for Windows XP, Windows Vista, Windows 7 and
Windows 8, AutoHotkey is the most powerful, flexible, free
Windows utility software available. Anyone can instantly add more of
the functions that they want in all of their Windows programs, whether
installed on their computer or while working on the Web. AutoHotkey has
a universality not found in any other Windows utility—free or
paid.
Now in its second edition (October 2013),
Jack
takes you through his learning experience as he explores writing simple
AutoHotkey scripts for adding repetitive text in any program or on the
Web, running programs with special hotkeys or gadgets, manipulating the
size and screen location of windows, making any window always-on-top,
copying and moving files, and much more. Each chapter builds on the
previous chapters. (The second edition now includes a chapter index of
the AutoHotkey commands used in the book, plus Internet links directly
to each commmand to the official AutoHotkey Web site.)
Also available at Amazon.com for the Kindle and
Kindle software.
* * *
Jack's
latest AutoHotkey book which is comprised of updated, reorganized and
indexed chapters from many of his sample applications is now available
at Amazon for Kindle hardware
(or free software) users. The book is organized and broken up into
parts
by topic. The book is not for the complete beginner since it builds on
the information in A Beginner's Guide to AutoHotkey.
However, if a person is reasonably computer literate, they could go
directly to this book for ideas and techniques without the first book.
Jack shows how to build real world AutoHotkey applications. The
AutoHotkey commands used are included in a special index to the
chapters in which they appear. Even I can't remember everything I
wrote."
Also available at Amazon.com for the Kindle and Kindle
software.
To get more detailed
information about AutoHotkey and see a List of
AutoHotkey commands
visit the AutoHotkey
Web site.
Some
More AutoHotkey Uses
AutoHotkey is a scripting language which can make
almost everything easier on Windows computers. It can be a simple
one-line script in a text file which enters your e-mail address after
only typing a couple of characters (i.e. "m@" when typed becomes
"myemailaddress@mymailserver.com"). There are some power apps which can
make your computer life much easier. For
example:
• Autocorrect
over 5,000 commonly misspelled words in any Windows program
or on the
Web.
• Set
a reminder for a later
meeting.
• Use
QuickLinks to replace the
missing Windows 8 Start Menu (or just to make life easier in any
version of Windows).
|