12 February 2023
Pattern matching is a language feature feature popularized by functional programming languages such as Standard ML and Haskell. It’s a natural complement to algebraic data types, but it’s not limited to them.
The match
expression implements pattern matching: you give it an expression, and then a
list of values that the expression could have, with a code block that defines what to do in each
case. For an integer expression, that could look like this:
let n = 2;
match n {
1 => println!("one"),
2 => println!("two"),
_ => println!("something else"),
}
As you can probably guess, this will print the word “two”. This match expression has three
arms, each consisting of a pattern, an arrow =>
, and an expression.
The last pattern is an underscore, used as a placeholder that can match anything. Patterns in Rust
are checked in the order they’re written, and they have to be exhaustive, meaning they must cover
every possible value of the expression, so an underscore is often used as the last pattern to match
anything the other patterns didn’t catch.
A match expression can be a statement by itself, as in the last example, but it can also be part of a larger statement such as a let binding:
let output = match n {
1 => "one",
2 => "two",
_ => "something else",
};
println!("{}", output);
Now it becomes clear why the patterns must be exhaustive: if we had only the patterns
1
and 2
, what would be the value of output
when
n = 3
?
We can combine multiple patterns with the |
symbol:
let output = match n {
1 | 2 | 3 => "small number",
3 | 4 | 5 => "bigger number",
_ => "something else",
};
You can think of |
as “or”: “1 or 2 or 3: print ‘small number’”. In this case we could
also use range expressions:
let output = match n {
1..=3 => "small number",
3..=5 => "bigger number",
_ => "something else",
};
Right now inclusive range expressions (for example, 1..4
instead of 1..=3
)
aren’t allowed. I’m not quite sure why -- it looks like that’ll change at some point.
Pattern matching for characters works just like for integers:
let c = 'Ä';
let letter_type = match c {
'a'..='z' => "lower-case letter",
'A'..='Z' => "upper-case letter",
'Ä' | 'Ö' | 'Ü' => "umlaut",
_ => "some other letter",
};
Character ranges follow the order in Unicode and the German umlauts are not grouped together in
Unicode, so 'Ä'..='Ü'
would not work correctly here. (Fortunately, EBCDIC didn’t become the universal standard,
otherwise 'A'..='Z'
also wouldn’t work right.)
Okay, integers and characters are nice, but what else can we pattern-match on? I already mentioned algebraic data types, so naturally it works for enums, tuples, and structs. I’ll continue my example from Rust: Algebraic Data Types:
enum Release {
Alpha,
Beta,
Final,
}
struct Version(u32, u32, u32, Release);
fn is_stable(version: &Version) -> bool {
match version {
Version(_, _, _, Release::Alpha) => false,
Version(_, _, _, Release::Beta) => false,
Version(0, _, _, _) => false,
_ => true,
}
}
Notice that the patterns here overlap even without taking the underscore one into account: for
Version(0, 1, 2, Release::Beta)
, the first and third pattern both match. That’s fine;
the patterns are checked in the order they’re written and the first one that matches counts.
If Version
was a struct rather than a tuple struct, the code would look something like
this:
struct Version {
major: u32,
minor: u32,
patch: u32,
release: Release,
}
fn is_stable(version: &Version) -> bool {
match version {
Version { release: Release::Alpha, .. } => false,
Version { release: Release::Beta, .. } => false,
Version { major: 0, .. } => false,
_ => true,
}
}
For structs, ..
tells the compiler to ignore the remaining fields. That also works
when matching on an array or a slice:
let numbers = &[0, 1, 0];
match numbers {
&[0, ..] => println!("starts with zero!"),
&[.., 0] => println!("ends with zero!"),
&[a, .., z] => println!("goes from {} to {}", a, z),
_ => {}
}
This prints “starts with zero”. The {}
in the last arm is an empty block. We can use
that for cases where we don’t want to do anything.
The third pattern (&[a, .., z]
) shows something else interesting: we can use variables
in a pattern, and when a value matches the pattern, those variables will be bound to whatever was at
the corresponding position in the value.
We can also pattern match on strings, or &str
to be precise:
fn parse_boolean(s: &str) -> Option<bool> {
match s {
"true" => Some(true),
"false" => Some(false),
_ => None,
}
}
If you were hoping to use regular expressions like with Ruby’s case
expressions, I’ll
have to disappoint you: range expressions for integers and characters are really the only “advanced”
type of pattern allowed. I think the idea is that pattern matching should be a simple operation that
can run in a fixed number of CPU instructions. That’s true of range expressions (they’re basically
one >=
comparison and one <=
comparison), but certainly not of
regular expression evaluation.
Pattern matching on floating-point numbers is possible at the moment, but it results in a compiler warning which is set to become an error in the future, so it’s probably smart to avoid it.
Let’s say we have a set of rows that make up a table, stored in this data structure:
struct Row {
header: bool,
number: u32,
text: String,
}
We want to print the header row in bold text, and after that we want to show every other row with a grey background. A match expression with a guard lets us distinguish these cases in a nice readable way:
let row_format = match row {
Row { header: true, .. } => bold_text,
Row { number: n, .. } if n % 2 == 0 => white_background,
_ => grey_background,
};
When the second pattern (Row { number: n, .. }
) matches, the expression n % 2 ==
0
is evaluated. If it’s false, the mattern matching continues with the next arm as if the
pattern had not matched.
Speaking of bindings, so far we’ve only looked at pretty simple cases where a variable is bound to an integer. Let’s look at an example with nested structures:
#[derive(Debug)]
struct Name {
given: String,
family: String,
}
#[derive(Debug)]
struct Patient {
name: Name,
age: u8,
}
Let’s look at some simple code to print the data:
match p {
Patient { name: n, age: a } if a < 18 => println!("child: {:?}", n),
Patient { name: n, .. } => println!("adult: {:?}", n),
}
println!("Patient info: {:?}", p); // not allowed
That doesn’t quite work: the compiler is complaining about a “partially moved” value. We didn’t take
into account the issue of ownership. The binding
n
in the patterns involves a transfer of ownership -- a move -- just like a binding
with let
would, so it’s moving the Name
value out of the
Patient
and we end up with a partially moved value. This wasn’t an issue in the
previous examples because those used integers, which have copy semantics, so the values were copied,
not moved.
The solution, of course, is to use references and
borrowing. The ref
keyword is used in pattens to indicate binding by reference:
match p {
Patient { name: ref n, age: a }
if a < 18 => println!("child: {:?}", n),
Patient { name: ref n, .. } => println!("adult: {:?}", n),
}
println!("Patient info: {:?}", p);
We can also use mutable references to modify fields:
match p {
Patient { name: ref mut n, age: a } if a < 18 => {
n.given = String::from("--");
n.family = String::from("--");
println!("child, name redacted for privacy");
},
Patient { name: ref n, .. } => println!("adult: {:?}", n),
}
There’s one more special syntax for bindings in patterns. Sometimes you want bind a name to some part of the pattern, for example if you’re using an integer range and you also want to have the actual integer in a variable:
match p {
Patient { age: a, .. } if a < 18 => println!("child age {}", a),
Patient { age: a @ 19..=24, .. } => println!("young adult, {}", a),
Patient { age: a, .. } => println!("adult age {}", a),
}
This example is a bit artifical but I think you can see how the @
syntax would be
useful with more complex nested patterns.
The match expression isn’t the only way to use pattern matching. For one, you can use simple patterns in let bindings:
let coordinate = (0, 120, 30);
let (_, y, _) = coordinate;
println!("Y: {}", y); // prints 120
Another way is the if let
construct, which combines pattern matching with an
if
:
if let (0, y, _) = coordinate {
println!("x is zero, y is {}", 120);
}
There’s also a while let
statement that does the same thing for while
loops.