pybot
This page will INTRODUCE you to a basic topic.
This page will INTRODUCE you to a basic topic.
This page describes a bit of REGEX.
This page describes a bit of REGEX.

A regular expression, or regex, is a series of semantically meaningful characters which is used by pywikipedia to find and match patterns within your the whole body of your wiki's text.

Most pywikipedia scripts allow the option to include regex statements via the nomenclature, -regex:"regex expression".

Case study

It's perhaps better to leave it to Wikipedia and other sources to define regex in detail. Here at Pybot, we'll just "define by doing", and present this practical example that most Wikians can recognise.

Imagine that you've discovered that your editors have repeated a mistake many times on your site. For whatever reason, in every infobox about a book, they've decided to put brackets around the pagecount variable, and so now you've got a lot of redlinks to simple, three-digit numbers. You'll want to get rid of those links, but preserve the numbers themselves. So how do you do this, given that:

Your only option in such a case is regex.

Here's what you'd do:

python replace.py -regex -summary:"de-linking page numbers in infobox" 
"pagecount( *?)=( *?)\[\[(.*?)\]\]" "pagecount\1=\2\3" -catr:"Books"

As you can see, the regex is a whole bunch of symbols that appear meaningless, but which actually have quite a bit of kick. Going character-by-character from left to right, here's what this expression means:

When you put all the characters so far together, you get, "Look for every instance of "pagecount" followed by all the spaces between it and the equals sign, and include the equals sign, too."

Regex on this wiki

What we aim to do on this wiki is to give you little snippets of regex to help you solve all manner of common — and sometimes obscure — problems of wiki maintenance. Our regex library is in the Regex namespace. If you want to see some of these ready-made solutions, please click through to the Regex Repository.