12 February 2023
Pattern matching is a language feature feature popularized by functional programming languages such as Standard ML and Haskell. It’s a natural complement to algebraic data types, but it’s not limited to them.
The match
expression implements pattern matching: you
give it an expression, and then a list of values that the expression
could have, with a code block that defines what to do in each case. For
an integer expression, that could look like this:
let n = 2;
match n {
1 => println!("one"),
2 => println!("two"),
_ => println!("something else"),
}
As you can probably guess, this will print the word “two”. This match
expression has three arms, each consisting of a
pattern, an arrow =>
, and an expression. The
last pattern is an underscore, used as a placeholder that can match
anything. Patterns in Rust are checked in the order they’re written, and
they have to be exhaustive, meaning they must cover every possible value
of the expression, so an underscore is often used as the last pattern to
match anything the other patterns didn’t catch.
A match expression can be a statement by itself, as in the last example, but it can also be part of a larger statement such as a let binding:
let output = match n {
1 => "one",
2 => "two",
_ => "something else",
};
println!("{}", output);
Now it becomes clear why the patterns must be exhaustive: if we had
only the patterns 1
and 2
, what would be the
value of output
when n = 3
?
We can combine multiple patterns with the |
symbol:
let output = match n {
1 | 2 | 3 => "small number",
3 | 4 | 5 => "bigger number",
_ => "something else",
};
You can think of |
as “or”: “1 or 2 or 3: print ‘small
number’”. In this case we could also use range expressions:
let output = match n {
1..=3 => "small number",
3..=5 => "bigger number",
_ => "something else",
};
Right now inclusive range expressions (for example, 1..4
instead of 1..=3
) aren’t allowed. I’m not quite sure why –
it looks like that’ll change at some point.
Pattern matching for characters works just like for integers:
let c = 'Ä';
let letter_type = match c {
'a'..='z' => "lower-case letter",
'A'..='Z' => "upper-case letter",
'Ä' | 'Ö' | 'Ü' => "umlaut",
_ => "some other letter",
};
Character ranges follow the order in Unicode and the German umlauts
are not grouped together in Unicode, so 'Ä'..='Ü'
would not
work correctly here. (Fortunately, EBCDIC didn’t become the
universal standard, otherwise 'A'..='Z'
also wouldn’t work
right.)
Okay, integers and characters are nice, but what else can we pattern-match on? I already mentioned algebraic data types, so naturally it works for enums, tuples, and structs. I’ll continue my example from Rust: Algebraic Data Types:
enum Release {
Alpha,
Beta,
Final,
}
struct Version(u32, u32, u32, Release);
fn is_stable(version: &Version) -> bool {
match version {
Version(_, _, _, Release::Alpha) => false,
Version(_, _, _, Release::Beta) => false,
Version(0, _, _, _) => false,
_ => true,
}
}
Notice that the patterns here overlap even without taking the
underscore one into account: for
Version(0, 1, 2, Release::Beta)
, the first and third
pattern both match. That’s fine; the patterns are checked in the order
they’re written and the first one that matches counts.
If Version
was a struct rather than a tuple struct, the
code would look something like this:
struct Version {
major: u32,
minor: u32,
patch: u32,
release: Release,
}
fn is_stable(version: &Version) -> bool {
match version {
Version { release: Release::Alpha, .. } => false,
Version { release: Release::Beta, .. } => false,
Version { major: 0, .. } => false,
_ => true,
}
}
For structs, ..
tells the compiler to ignore the
remaining fields. That also works when matching on an array or a
slice:
let numbers = &[0, 1, 0];
match numbers {
&[0, ..] => println!("starts with zero!"),
&[.., 0] => println!("ends with zero!"),
&[a, .., z] => println!("goes from {} to {}", a, z),
_ => {}
}
This prints “starts with zero”. The {}
in the last arm
is an empty block. We can use that for cases where we don’t want to do
anything.
The third pattern (&[a, .., z]
) shows something else
interesting: we can use variables in a pattern, and when a value matches
the pattern, those variables will be bound to whatever was at the
corresponding position in the value.
We can also pattern match on strings, or &str
to be
precise:
fn parse_boolean(s: &str) -> Option<bool> {
match s {
"true" => Some(true),
"false" => Some(false),
_ => None,
}
}
If you were hoping to use regular expressions like with Ruby’s
case
expressions, I’ll have to disappoint you: range
expressions for integers and characters are really the only “advanced”
type of pattern allowed. I think the idea is that pattern matching
should be a simple operation that can run in a fixed number of CPU
instructions. That’s true of range expressions (they’re basically one
>=
comparison and one <=
comparison),
but certainly not of regular expression evaluation.
Pattern matching on floating-point numbers is possible at the moment, but it results in a compiler warning which is set to become an error in the future, so it’s probably smart to avoid it.
Let’s say we have a set of rows that make up a table, stored in this data structure:
struct Row {
header: bool,
number: u32,
text: String,
}
We want to print the header row in bold text, and after that we want to show every other row with a grey background. A match expression with a guard lets us distinguish these cases in a nice readable way:
let row_format = match row {
Row { header: true, .. } => bold_text,
Row { number: n, .. } if n % 2 == 0 => white_background,
_ => grey_background,
};
When the second pattern (Row { number: n, .. }
) matches,
the expression n % 2 == 0
is evaluated. If it’s false, the
mattern matching continues with the next arm as if the pattern had not
matched.
Speaking of bindings, so far we’ve only looked at pretty simple cases where a variable is bound to an integer. Let’s look at an example with nested structures:
#[derive(Debug)]
struct Name {
given: String,
family: String,
}
#[derive(Debug)]
struct Patient {
name: Name,
age: u8,
}
Let’s look at some simple code to print the data:
match p {
Patient { name: n, age: a } if a < 18 => println!("child: {:?}", n),
Patient { name: n, .. } => println!("adult: {:?}", n),
}
println!("Patient info: {:?}", p); // not allowed
That doesn’t quite work: the compiler is complaining about a
“partially moved” value. We didn’t take into account the issue of ownership. The binding
n
in the patterns involves a transfer of ownership – a move
– just like a binding with let
would, so it’s moving the
Name
value out of the Patient
and we end up
with a partially moved value. This wasn’t an issue in the previous
examples because those used integers, which have copy semantics, so the
values were copied, not moved.
The solution, of course, is to use references and borrowing. The
ref
keyword is used in pattens to indicate binding by
reference:
match p {
Patient { name: ref n, age: a }
if a < 18 => println!("child: {:?}", n),
Patient { name: ref n, .. } => println!("adult: {:?}", n),
}
println!("Patient info: {:?}", p);
We can also use mutable references to modify fields:
match p {
Patient { name: ref mut n, age: a } if a < 18 => {
n.given = String::from("--");
n.family = String::from("--");
println!("child, name redacted for privacy");
},
Patient { name: ref n, .. } => println!("adult: {:?}", n),
}
There’s one more special syntax for bindings in patterns. Sometimes you want bind a name to some part of the pattern, for example if you’re using an integer range and you also want to have the actual integer in a variable:
match p {
Patient { age: a, .. } if a < 18 => println!("child age {}", a),
Patient { age: a @ 19..=24, .. } => println!("young adult, {}", a),
Patient { age: a, .. } => println!("adult age {}", a),
}
This example is a bit artifical but I think you can see how the
@
syntax would be useful with more complex nested
patterns.
The match expression isn’t the only way to use pattern matching. For one, you can use simple patterns in let bindings:
let coordinate = (0, 120, 30);
let (_, y, _) = coordinate;
println!("Y: {}", y); // prints 120
Another way is the if let
construct, which combines
pattern matching with an if
:
if let (0, y, _) = coordinate {
println!("x is zero, y is {}", 120);
}
There’s also a while let
statement that does the same
thing for while
loops.