Rust: Algebraic Data Types

Home · Blog

8 January 2023

Algebraic data types are one of those computer-science concepts that sound scary but are actually really useful, and not very hard to understand once you see them in practice. In fact, once you’ve gotten used to algebraic data types you’ll wonder why every programming language doesn’t have them.

The idea is that you can define your own data types in one of two ways:

What’s important is that you can combine both approaches. For example, you could have a Color type with two possible values: either an RBG value, which combines three integers for red, green, and blue, or a CMYK value, which combines four integers for cyan, magenta, yellow, and black. You can nest product and sum types any way you like: a product type containg a sum type that contains both a product type and a sum type, which contains another sum type, …

Algebraic data types in Rust

In Rust, you use enums as sum types, and structs as product types. (Tuples and tuple structs can also work as product types.) I talked about these in Rust: Tuples and Structs and Rust: Enums, but one thing I didn’t mention is that enum variants can contain values:

#[derive(Debug)]
enum Color {
    RGB(u8, u8, u8),
    CMYK(u8, u8, u8, u8),
}

let c = Color::RGB(0x10, 0x10, 0x80);
let d = Color::CMYK(80, 60, 40, 100);
println!("{:?}", c); // prints RGB(16, 16, 128)
println!("{:?}", d); // prints CMYK(80, 60, 40, 100)

RGB and CMYK look a lot like tuple structs now… could we make them more like structs? Why, of course we can!

#[derive(Debug)]
enum Color {
    RGB { red: u8, green: u8, blue: u8 },
    CMYK { cyan: u8, magenta: u8, yellow: u8, black: u8 },
}

let c = Color::RGB { red: 0x10, green: 0x10, blue: 0x80 };
println!("{:?}", c); // prints RGB { red: 16, green: 16, blue: 128 }

The old trick of using variable names that match field names also works:

let red = 0x10;
let green = 0x10;
let blue = 0x80;
let c = Color::RGB { red, green, blue };

Another option is to define a separate struct for each variant:

#[derive(Debug)]
struct RGBColor(u8, u8, u8);

#[derive(Debug)]
struct CMYKColor(u8, u8, u8, u8);

#[derive(Debug)]
enum Color {
    RGB(RGBColor),
    CMYK(CMYKColor),
}

let c = Color::RGB(RGBColor(0x50, 0xd0, 0xd0));
println!("{:?}", c); // prints RGB(RGBColor(80, 208, 208))

This is more verbose but it’s useful if there are place where you’d use the RGBColor and CMYKColor types by themselves. As usual, Rust gives you more options than some languages, and it’s up to you as the programmer to choose the one that fits best.

Let’s look at an example with an enum nested within a struct. We want to represent version numbers like “1.0.0”, with an optional “alpha” or “beta” suffix:

#[derive(Debug, PartialEq, PartialOrd)]
enum Release {
    Alpha,
    Beta,
    Final,
}

#[derive(Debug, PartialEq, PartialOrd)]
struct Version(u32, u32, u32, Release);

let for_testing = Version(1, 0, 0, Release::Alpha);
let almost_done = Version(1, 0, 0, Release::Beta);
let big_release = Version(1, 0, 0, Release::Final);
println!("{:?}", big_release); // prints Version(1, 0, 0, Final)
println!("{}", for_testing < almost_done); // prints true
println!("{}", big_release > almost_done); // prints true
println!("{}", almost_done == big_release); // prints false

The way < and > are inferred automatically is rather neat, but it only works because I’ve carefully chosen the order of the variants and fields. If this was real code I’d probably add a comment to tell other programmers to take that into account if they need to make changes here.