Egemen Savascioglu
Egemen

Engineering 101: Regex

A beginner-friendly introduction to Regular Expressions (Regex), explaining what they are, why they are useful, and covering the fundamental syntax with practical examples.

Engineering Regex Programming Tutorial

If a whole formed by seemingly meaningless characters coming together gains a meaning, those characters are incredibly valuable. ^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,6}$ You might not understand anything from this right now, but by the time you finish reading this blog, you will learn its meaning, where it is used, and how you can use it.

What is Regex?

A Regular Expression is a sequence of characters that forms a search pattern. It is primarily used for string matching and manipulation. Think of it as “Ctrl+F” on steroids. Instead of just searching for an exact word, Regex allows you to search for patterns.

Why Learn Regex?

Regex is a universal language. It is supported by almost every modern programming language (Python, JavaScript, Java, Go, etc.) and text editor. You can use it for:

  1. Data Validation: Ensuring user input (like emails, passwords, or phone numbers) follows a specific format.
  2. Text Search and Replace: Finding specific patterns in a massive log file or refactoring code.
  3. Data Extraction: Pulling out specific pieces of information (like URLs or dates) from a block of raw text.

The Basic Syntax

Let’s break down the most common Regex building blocks.

1. Anchors

Anchors don’t match characters; they match positions.

  • ^ : Matches the beginning of a string.
  • $ : Matches the end of a string.
  • Example: ^Hello will match “Hello World”, but not “Say Hello”.

2. Character Classes

These are shortcuts for matching common types of characters.

  • \d : Matches any digit (0-9).
  • \w : Matches any word character (alphanumeric & underscore).
  • \s : Matches any whitespace character (spaces, tabs).
  • . : Matches any single character except a newline.

3. Custom Sets

You can create your own character classes using square brackets.

  • [abc] : Matches ‘a’, ‘b’, or ‘c’.
  • [a-z] : Matches any lowercase letter from a to z.
  • [^0-9] : The ^ inside brackets means not. This matches anything that is not a digit.

4. Quantifiers

Quantifiers tell Regex how many times the preceding element should occur.

  • * : Matches 0 or more times.
  • + : Matches 1 or more times.
  • ? : Matches 0 or 1 time (makes it optional).
  • {n} : Matches exactly n times.
  • {n,m} : Matches between n and m times.

Putting It Together: Practical Examples

Let’s look at some real-world examples to see how these blocks fit together.

Example 1: Validating a Year

Let’s say we want to validate a 4-digit year. Regex: ^\d{4}$

  • ^ : Start of string.
  • \d{4} : Exactly 4 digits.
  • $ : End of string.

Example 2: The Email Validator (Simplified)

Remember the gibberish at the beginning of the post? Let’s decipher a slightly simplified version: Regex: ^\w+@\w+\.\w{2,}$

  • ^ : Start of the string.
  • \w+ : One or more word characters (the username).
  • @ : The literal ’@’ symbol.
  • \w+ : One or more word characters (the domain name).
  • \. : The literal dot ’.’ (we use a backslash \ to escape it, because . normally means ‘any character’).
  • \w{2,} : Two or more word characters (the top-level domain, like ‘com’ or ‘io’).
  • $ : End of the string.

Conclusion

Regex can feel like learning a cryptic new language, but it is an indispensable tool in an engineer’s toolkit. The best way to learn is by doing. Next time you need to validate input or search through text, try using a Regular Expression instead of writing custom logic.

Tip: Tools like Regex101 are fantastic for testing and debugging your Regex patterns in real-time.

Happy pattern matching!

← Back to articles