Why Regular Expressions are a game changer!

WhatsApp+Image+2021-04-05+at+12.44.41+AM.jpg

Regular expressions aka Regex are a great tool for finding a pattern within a string or a set of strings. The most common use cases for Regex’s are:

  • To validate user-generated input.

  • To search and match patterns through a large body of text such as a find and replace operation in a document.

  • To split a string into tokens.

Regular expressions can sometimes be very intimidating because of their syntax that looks like a pile of gibberish. But once you get to learn and get a hang of it, you will have a very powerful and universal way to parse text or code in any programming language.

Initially, when I started learning it, I felt that Regex was very complicated. The syntax can look very frustrating but recently after looking through good resources and being on the defensive side of security (Blue Team), researching about different types of malware, I use it everyday day and it has started being a great and powerful tool to boost up my workflow.

To start with here are a few regex basics :

Expressions || Purpose

. => Match any single character except a line break.

* => Match Zero or more occurrences of the preceding expression.

.* => Match any character zero or more times.

+ => Match one or more occurrences of the preceding expression.

.+ => Match any character one or more times.

*? => Match Zero or more occurrences the preceding expression.

+? => Match one or more occurrences of the preceding expression.

^ => Anchor the match string to the beginning of the line or string.

\r?$ => Anchor the match string to the end of a line.

$ => Anchor the match string to the end of the file.

[abc] => Match any single character in a set.

[a-f]. => Match any character in a range of characters.

\w => Match any word character.

\s => Match any whitespaces character.

\d => Match any decimal digit characters.

[“‘] => Match any opening and closing apostrophes.

Before using regular expressions in your code, you can test them using an online regex evaluator. https://regex101.com/ is a great tool.

Let’s look at a malware sample that is injected into WordPress core files. This injected malicious code is obfuscated using Base64 encoding and placed very cleverly in WordPress files.

B9461CA1-7856-455A-BB46-DCF700173A0D.jpeg

This malicious piece of code is changing permissions , creating spam directories and reading contents of files from a malicious redirection link.

The obfuscated code when decoded looks something like this:

08E68253-A582-4716-A0F2-B018531CCA9C.jpeg

This is the particular regex to match this sample malware code.

8029409F-3B8C-4E39-81AD-695AE51999C0_4_5005_c.jpeg

To test this regular expression, I used the regex101 tool to identify and match malicious pattern of the code in the file structure. After matching the pattern, we can comprehensive clean the malware of the file so that it is no longer infected and the user’s website can function regularly.

1E41AE88-CAD8-4A0C-A082-A0A805E2CDC5.jpeg

If you perform a task like the above manually, you may take a while before you manage to crack this. You would have to keep copy-pasting a whole bunch of code and if the code was larger it would have probably taken hours. Regex in this matter can save you time as well as effort. Learning regular expressions can be a lifelong investment and it has endless use cases. The trade-off between complexity, maintainability, performance, and correctness should be a conscious decision. It is difficult to read regex patterns let alone debug them.

Previous
Previous

What are the perils of Unrestricted File Uploads?

Next
Next

How does malware affect your website and how can you remove it?