Rust: Pattern Matching

12 February 2023

Pattern matching is a language feature feature popularized by functional programming languages such as Standard ML and Haskell. It’s a natural complement to algebraic data types, but it’s not limited to them.

Pattern matching with match

The match expression implements pattern matching: you give it an expression, and then a list of values that the expression could have, with a code block that defines what to do in each case. For an integer expression, that could look like this:

let n = 2;
match n {
    1 => println!("one"),
    2 => println!("two"),
    _ => println!("something else"),
}

As you can probably guess, this will print the word “two”. This match expression has three arms, each consisting of a pattern, an arrow =>, and an expression. The last pattern is an underscore, used as a placeholder that can match anything. Patterns in Rust are checked in the order they’re written, and they have to be exhaustive, meaning they must cover every possible value of the expression, so an underscore is often used as the last pattern to match anything the other patterns didn’t catch.

A match expression can be a statement by itself, as in the last example, but it can also be part of a larger statement such as a let binding:

let output = match n {
    1 => "one",
    2 => "two",
    _ => "something else",
};
println!("{}", output);

Now it becomes clear why the patterns must be exhaustive: if we had only the patterns 1 and 2, what would be the value of output when n = 3?

We can combine multiple patterns with the | symbol:

let output = match n {
    1 | 2 | 3 => "small number",
    3 | 4 | 5 => "bigger number",
    _ => "something else",
};

You can think of | as “or”: “1 or 2 or 3: print ‘small number’”. In this case we could also use range expressions:

let output = match n {
    1..=3 => "small number",
    3..=5 => "bigger number",
    _ => "something else",
};

Right now inclusive range expressions (for example, 1..4 instead of 1..=3) aren’t allowed. I’m not quite sure why -- it looks like that’ll change at some point.

Pattern matching for characters works just like for integers:

let c = 'Ä';
let letter_type = match c {
    'a'..='z' => "lower-case letter",
    'A'..='Z' => "upper-case letter",
    'Ä' | 'Ö' | 'Ü' => "umlaut",
    _ => "some other letter",
};

Character ranges follow the order in Unicode and the German umlauts are not grouped together in Unicode, so 'Ä'..='Ü' would not work correctly here. (Fortunately, EBCDIC didn’t become the universal standard, otherwise 'A'..='Z' also wouldn’t work right.)

Okay, integers and characters are nice, but what else can we pattern-match on? I already mentioned algebraic data types, so naturally it works for enums, tuples, and structs. I’ll continue my example from Rust: Algebraic Data Types:

enum Release {
    Alpha,
    Beta,
    Final,
}

struct Version(u32, u32, u32, Release);

fn is_stable(version: &Version) -> bool {
    match version {
        Version(_, _, _, Release::Alpha) => false,
        Version(_, _, _, Release::Beta) => false,
        Version(0, _, _, _) => false,
        _ => true,
    }
}

Notice that the patterns here overlap even without taking the underscore one into account: for Version(0, 1, 2, Release::Beta), the first and third pattern both match. That’s fine; the patterns are checked in the order they’re written and the first one that matches counts.

If Version was a struct rather than a tuple struct, the code would look something like this:

struct Version {
    major: u32,
    minor: u32,
    patch: u32,
    release: Release,
}
 
fn is_stable(version: &Version) -> bool {
    match version {
        Version { release: Release::Alpha, .. } => false,
        Version { release: Release::Beta, .. } => false,
        Version { major: 0, .. } => false,
        _ => true,
    }
}

For structs, .. tells the compiler to ignore the remaining fields. That also works when matching on an array or a slice:

let numbers = &[0, 1, 0];
match numbers {
    &[0, ..] => println!("starts with zero!"),
    &[.., 0] => println!("ends with zero!"),
    &[a, .., z] => println!("goes from {} to {}", a, z),
    _ => {}
}

This prints “starts with zero”. The {} in the last arm is an empty block. We can use that for cases where we don’t want to do anything.

The third pattern (&[a, .., z]) shows something else interesting: we can use variables in a pattern, and when a value matches the pattern, those variables will be bound to whatever was at the corresponding position in the value.

We can also pattern match on strings, or &str to be precise:

fn parse_boolean(s: &str) -> Option<bool> {
    match s {
        "true" => Some(true),
        "false" => Some(false),
        _ => None,
    }
}

If you were hoping to use regular expressions like with Ruby’s case expressions, I’ll have to disappoint you: range expressions for integers and characters are really the only “advanced” type of pattern allowed. I think the idea is that pattern matching should be a simple operation that can run in a fixed number of CPU instructions. That’s true of range expressions (they’re basically one >= comparison and one <= comparison), but certainly not of regular expression evaluation.

Pattern matching on floating-point numbers is possible at the moment, but it results in a compiler warning which is set to become an error in the future, so it’s probably smart to avoid it.

Guards

Let’s say we have a set of rows that make up a table, stored in this data structure:

struct Row {
    header: bool,
    number: u32,
    text: String,
}

We want to print the header row in bold text, and after that we want to show every other row with a grey background. A match expression with a guard lets us distinguish these cases in a nice readable way:

let row_format = match row {
    Row { header: true, .. }            => bold_text,
    Row { number: n, .. } if n % 2 == 0 => white_background,
    _                                   => grey_background,
};

When the second pattern (Row { number: n, .. }) matches, the expression n % 2 == 0 is evaluated. If it’s false, the mattern matching continues with the next arm as if the pattern had not matched.

Binding variables

Speaking of bindings, so far we’ve only looked at pretty simple cases where a variable is bound to an integer. Let’s look at an example with nested structures:

#[derive(Debug)]
struct Name {
    given: String,
    family: String,
}

#[derive(Debug)]
struct Patient {
    name: Name,
    age: u8,
}

Let’s look at some simple code to print the data:

match p {
    Patient { name: n, age: a } if a < 18 => println!("child: {:?}", n),
    Patient { name: n, .. }               => println!("adult: {:?}", n),
}
println!("Patient info: {:?}", p);   // not allowed

That doesn’t quite work: the compiler is complaining about a “partially moved” value. We didn’t take into account the issue of ownership. The binding n in the patterns involves a transfer of ownership -- a move -- just like a binding with let would, so it’s moving the Name value out of the Patient and we end up with a partially moved value. This wasn’t an issue in the previous examples because those used integers, which have copy semantics, so the values were copied, not moved.

The solution, of course, is to use references and borrowing. The ref keyword is used in pattens to indicate binding by reference:

match p {
    Patient { name: ref n, age: a }
        if a < 18                   => println!("child: {:?}", n),
    Patient { name: ref n, .. }     => println!("adult: {:?}", n),
}
println!("Patient info: {:?}", p);

We can also use mutable references to modify fields:

match p {
    Patient { name: ref mut n, age: a } if a < 18 => {
        n.given = String::from("--");
        n.family = String::from("--");
        println!("child, name redacted for privacy");
    },
    Patient { name: ref n, .. } => println!("adult: {:?}", n),
}

There’s one more special syntax for bindings in patterns. Sometimes you want bind a name to some part of the pattern, for example if you’re using an integer range and you also want to have the actual integer in a variable:

match p {
    Patient { age: a, .. } if a < 18 => println!("child age {}", a),
    Patient { age: a @ 19..=24, .. } => println!("young adult, {}", a),
    Patient { age: a, .. }           => println!("adult age {}", a),
}

This example is a bit artifical but I think you can see how the @ syntax would be useful with more complex nested patterns.

Pattern matching without match

The match expression isn’t the only way to use pattern matching. For one, you can use simple patterns in let bindings:

let coordinate = (0, 120, 30);
let (_, y, _) = coordinate;
println!("Y: {}", y);   // prints 120

Another way is the if let construct, which combines pattern matching with an if:

if let (0, y, _) = coordinate {
    println!("x is zero, y is {}", 120);
}

There’s also a while let statement that does the same thing for while loops.