09 January 2019

Keeping a list of functions in Rust

What if you wanted to create your own Rust attribute that’s sort of like #[test], but not #[test]?

That is, you’ve got an attribute macro—let’s call it #[subcommand]—and at run time, you want a list of all functions tagged with that attribute. Maybe your program is a command-line build tool with several subcommands, and you want to use this list to process the command line, or even just to print out a nice help message.

You can, of course, manually write out the list you want, and just accept that you’re going to have to maintain it:

pub fn subcommands() -> Vec<SubcommandFn> {
    vec![
        build_command,
        run_command,
        help_command,
        ...
    ]
}

That’s how cargo does it. Can we get rid of that boilerplate?

It's not easy to see how! The list would be a global variable. Rust strongly discourages those. Worse, your #[subcommand] attribute would have to somehow add values to the list, and Rust has no support for initialization code (code that runs “on startup”, before main)—much less sprinkling that kind of code throughout a crate.

You could try writing a build.rs script that looks at your code, finds all the #[subcommand] functions, and builds the list. But this would be tricky; it’s easy to imagine it failing to find functions that are defined inside macros, for example.

Today I asked a few friends how they might approach this, and got two different answers, each astonishing in its way.

  • You can use a custom linker section.

    Amazingly, you can put the #[link_section="your_section_name"] attribute on the constants generated by your #[subcommand] procedural macro, and the Rust compiler will put those constants together, in a special place in the resulting binary. Accessing this at run time is then some platform-specific magic.

    I didn’t ask for details. Sometimes it’s better not to know.

  • Maybe it’s better not to do this.

    Put yourself in the shoes of a programmer who’s unfamiliar with this codebase. Is it helpful to you that the list is autogenerated? How do you find the code for the build subcommand? You'd have to know where that one function is defined, and it could be anywhere, because the #[subcommand] attribute works anywhere.

    This friend writes: “Since the user's experience is of a global table of subcommands, one could argue that the code should reflect that, that adding a subcommand entails adding an entry to a global list. Then it’d be vastly easier to find definitions, to find conflicts, to notice overlap.”

No comments: