A context-aware error library for Rust that supports arbitrary attached user data
June 10th, 2022
We are proud to announce a new error library for Rust: error-stack
- a context-aware error library with arbitrary attached user data.
This post outlines some of our experiences with error-handling in Rust, why we built the error-stack
library, the architectural choices we made along the way, and what you can expect from the crate.
At HASH, we've been building a simulation engine — written in Rust — called HASH Engine (or 'hEngine' for short). It's a large codebase, with a number of contributors, and its size and complexity has grown over time (check out the code on GitHub). Across some big refactors, as well as everyday feature development, we found error handling became unwieldy and ourselves struggling to keep up. In a bid to improve our debugging process and reduce maintenance overheads we decided to revisit our strategy for managing errors.
Up until recently our approach to error handling centered around using thiserror
, a fantastic crate which abstracts away a lot of Rust's verbosity when creating errors. Previously we would create enumerations (generally large, top-level ones, per crate and/or major module) to describe our different error types, and use thiserror
's macros to handle a lot of the trait implementations. We didn't want to be using Box<dyn Error>
everywhere, meaning we wanted typed errors, so we utilized thiserror
's ability to generate From
implementations from a macro to handle the transformation of our error types across the module and crate boundaries. (thiserror
lets you succinctly define a From
implementation like so: IoError(#[from] std::io::Error)
).
Our approach led us to adopt a pattern where we'd call functions across boundaries (for example, from the engine
module to our simulation
crate), necessitating transformation between error types. For ergonomic reasons, we'd then create a From
implementation for the whole Error
enumeration:
// mod engine
#[derive(ThisError)]
enum Error {
#[error("Simulation error: {0}")]
Simulation(#[from] crate::simulation::Error),
...
}
This would then often turn into error messages that looked like the following:
Error: Engine Error: Simulation Error: Experiment Error: Datastore Error: No such file or directory (os error 2)
These error messages would tell us very little about the context that led to the error and weren't very helpful in actually tracking down the causes of problems. And while we have logging infrastructure in place, which is helpful with identifying events that lead to errors, it relies on a few things:
the logging level is set to something detailed enough to point the person debugging to the correct location (not knowing how to reproduce the error can make it difficult to just rerun with RUST_LOG=trace
);
there are actually enough log statements in the correct places to give enough information;
the log statements have been written in a way that's helpful, and provide the correct information (which is a challenge by itself, as you'll often find you needed information you hadn't thought about before).
Perhaps most importantly though, hEngine is a multi-threaded, multi-language project. The consequence of this is that there will often be a mix of logs from various threads appearing before an error, resulting in it not always being straightforward to tell which piece of logic may be causing a problem.
It's worth mentioning that thiserror
is able to capture backtraces, which helps a fair bit. We started utilizing this by adding a backtrace
field for some of our error variants, however, in doing so we ran into two issues.
Firstly, we needed to explicitly add the field to all error variants expected to have a backtrace:
// mod engine
#[derive(ThisError)]
enum Error {
#[error("Simulation error: {0}")]
Simulation(#[from] crate::simulation::Error, Backtrace),
...
}
Secondly, this required rerunning the program with RUST_BACKTRACE=1
, or always running the program with it on.
Additionally, while backtraces help us locate the line of code which generated the error (which helps with disambiguating the example error message shown above), it's still missing a lot of other information that could prove helpful, such as which iteration of a loop caused the error, or what path was missing.
As mentioned, we've tried to keep our errors well-typed by defining them as variants within various enumerations. This introduced a lot of overhead that we struggled with during some periods of rapid development. A significant subset of errors seemed to happen once, and as such, we'd generally define a variant like so on our enums:
#[derive(ThisError)]
enum Error {
/// Used when errors need to propagate but are too unique to be typed
#[error("{0}")]
Unique(String),
...
}
impl From<&str> for Error {
fn from(s: &str) -> Self {
Error::Unique(s.to_string())
}
}
impl From<String> for Error {
fn from(s: String) -> Self {
Error::Unique(s)
}
}
This was ergonomic at the time of error creation because we could quickly create one:
Error::from("Failed to initialize Context Package Creators")
However, in practice, we found ourselves often misusing these due to their relative convenience. This would lead to errors such as the following…
OrchestratorError::from(format!("Not a valid initial state file: {path:?}"))
…where we ended converting the Path
to a (Debug) string rather than attaching the actual object in case needed at the point of handling.
This was through no fault of thiserror
, but our usage of it in practice, and the patterns we found ourselves settling into.
thiserror
supplies a macro to generate From
implementations for your error types, which is great when it works, helping avoid boilerplate becoming huge and unmaintainable. Unfortunately, it can't handle cyclic dependencies where you might have two-way conversions between your error types. While we have since majorly refactored our code-base to have a much better separation of concerns, we used to have great interdependency which caused this problem to occur more often than ideal. Due to this, we found ourselves employing an embarrassingly reproachable pattern in our code:
// mod experiment_controller
...
.map_err(|experiment_err| Error::from(experiment_err.to_string()))?;
We already had an experiment::Error
From
experiment_controller::Error
implementation, which meant that we were unable to define a From
implementation going the other way. To get around this (initially temporarily, but as with many things, this ended up sticking around for a while) we simply turned the experiment error into a String
and utilized the Error::Unique
variant again.
This was clearly not ideal, and we ended up with a weird mixture of well-defined error variants and ones that were just captured in strings. The conversions also resulted in information loss depending on the Display
implementation, including the loss of important pieces of information like the backtrace.
We really love the Rust ecosystem, and although there are many great error crates out there, we struggled to find one that aligned with our needs, and which worked well with some of our project's inherent complexity. The closest error crate we could find was anyhow
(or its fork eyre
), however, we had some additional wants:
encourage the user to provide a new error type if the scope is changed, usually by crossing module/crate boundaries (for example, a ConfigParseError
when parsing a configuration file vs. an IoError
when reading a file from disk)
be able to use those error types within return types, without needing to handle difficult From
logic
be able to attach any data to an error without a lot of configuration [1], not just string-like types, which then can be requested when handling the error.
Based on our experiences outlined above, we made a few additional design decisions that differentiated our approach to error handling from those existing crates designed to facilitate it. Generally speaking, these decisions were made to force developers to be more explicit when creating or chaining errors, but most importantly, we don't supply in-built support for creating errors from strings. While it is convenient, we want to encourage developers to think more about their errors by building concrete error types like std::io::Error
or std::num::ParseIntError
Note: It's still possible to create something like
Error::Unique
but we do not provide shortcut functionality for that within the library, hoping that a lot of its uses can instead be solved through “attachments”.
[1] Clarification: eyre
and other libraries are able to support the inclusion of arbitrary data, whether that be through a custom Handler implementation, or by creating Error types that wrap the data. We'd like to be able to easily attach anything to the Err
arm of a Result
without needing bespoke configurations.
error-stack
is centered around taking a base error and building up a full richer picture of it as it propagates. We call this a Report
(you might recognize the naming choice of Report
from the eyre
crate, which we thought was fitting).
This Report
on the error is made-up of a combination of two main concepts:
Contexts
Attachments
A Report
is organized as a stack, where you push contexts and attachments. We call this the Frame
stack as each context and attachment is contained in a Frame
alongside the code location of where it was created. The Report
is able to iterate through the Frame
stack from the most recent one to the root (on a 'Last In, First Out' basis).
These are views of the world, they help describe the current section of code's way of seeing the error — a high-level description of the error. The current context of a Report<T>
is captured in its generic argument.
When an error is created, this might often be the base error type, so for a given error E
, this will be turned into Report<E>
when creating a Report
from E
. We expect this parameter to change at major module boundaries and across crate boundaries, where the specificity of the error might not have much meaning anymore:
fn parse_config(path: impl AsRef<Path>) -> Result<Config, Report<ParseConfigError>> {
let path = path.as_ref();
// First, we have a base error:
let io_result = fs::read_to_string(path); // Result<File, std::io::Error>
// Convert the base error into a Report:
let io_report = io_result.report(); // Result<File, Report<std::io::Error>>
// Change the context, which will eventually return from the function:
let config_report = io_report
.change_context(ParseConfigError::new())?; // Result<File, Report<ParseConfigError>>
// ...
}
Having the concept of a current context has an important implication: when leaving the scope of a function it's likely that the return type from the caller function has a different type signature. This implies that a new Context has to be added. This encourages the developer to think more about their error types and forces reasonable good cause traces.
For a Context
we provide a trait, which is implemented for std::error::Error
by default (if the Error
is Send + Sync + 'static
):
pub trait Context: Display + Debug + Send + Sync + 'static {
fn provide<'a>(&'a self, demand: &mut Demand<'a>) {}
}
provide()
is an optional feature making use of the Provider
RFC. When providing data here, they later can be requested by calling Report::request_ref()
or Report::request_value()
.
Note: The implementation for the Provider
RFC is not available on nightly at the time of writing, so we have vendored in the functionality for now. We will switch to the upstream implementation in core::any
with a future release of error-stack
. Using the Provider
API currently requires the Rust nightly compiler.
Frequently you'll want to attach specific information to an error, requiring it to be encapsulated within the error type, similar to crates like anyhow
or failure
where you are able to attach string-like types. However, error-stack
is not only able to attach messages but also any other data to the Report
, for example, it's possible to attach a Suggestion
. Using the example from above, we add a contextual message and a Suggestion
(not offered by error-stack
) to the Report
:
fn parse_config(path: impl AsRef<Path>) -> Result<Config, Report<ParseConfigError>> {
let path = path.as_ref();
let content = fs::read_to_string(path)
.report()
.change_context(ParseConfigError::new())
.attach_printable_lazy(|| format!("Could not read file {path:?}"))
.attach(Suggestion("Use a file you can read next time!"))?;
// ...
}
It's then possible to request the Suggestion
s from the Report
, which returns an iterator (most recent attachment first) as you can attach the same type multiple times:
if let Err(report) = parse_config("config.json") {
eprintln!("{report:?}");
for suggestion in report.request_ref::<Suggestion>() {
eprintln!("Suggestion: {suggestion}");
}
}
Note: As mentioned before, calling request_ref()
on a Report
currently requires a nightly compiler as this is using the Provider API.
This will eventually output:
Could not parse configuration file
at main.rs:6:10
- Could not read file "config.json"
- 1 additional opaque attachment
Caused by:
0: No such file or directory (os error 2)
at main.rs:5:10
Stack backtrace:
0: error_stack::report::Report<T>::new
at error-stack/src/report.rs:187:18
1: error_stack::context::<impl core::convert::From<C> for error::report::Report<C>>::from
at error-stack/src/context.rs:87:9
2: <core::result::Result<T,E> as error::result::IntoReport>::report
at error-stack/src/result.rs:204:31
3: parse_config
at main.rs:4:19
4: main
at main.rs:13:26
5: ...
Suggestion: Use a file you can read next time!
You might think that this comes with a lot of performance overhead, but with speed being a key concern of hEngine, we've taken this into account. The impact on the ‘happy' success code path is minimal; a Report
is only one pointer in size and requesting from a Report
only happens at the time of error reporting. We use a similar data layout as anyhow
and thus expect similar performance.
However, we've purposefully added in some requirements that result in a little bit of extra upfront development overhead. We think having explicitly typed Result
values is worth requiring, and especially helpful in identifying different types across module/crate boundaries, so we still want users to create types (Context
s). This approach makes error-stack
usable not only for binaries but also for libraries. In most cases, the caller of a library only cares about the error directly returned from a library, not an underlying OS error. As the current context is always known, the user directly has all type information without optimistically trying to downcast to an error type (which remains possible). This also implies that more times than not a developer is forced to add a new context because the type system requires it. We've found from experience that this automatically greatly improves error reporting.
We haven't made it easy to create a new Report
from a String
because it requires the type to implement Context
. It's still possible to use a newtype UniqueError(String)
if a user wants to, but we discourage this usage because we've found that a lot of those places benefit from attaching a string message on a well-defined context instead. In any case, other libraries most probably still have to change the context from UniqueError
to their context because of the explicit type requirement.
By consciously imposing more restrictions in some areas we hope to reduce the overall cost of development, in keeping with the philosophy underpinning Rust itself.
We don't have space here to talk about all of the features of error-stack
, but here are a few more teasers:
The complete crate is written for no-std
environments. However, when using std
, a blanket implementation for Context
for any Error
is provided
The blanket implementation for Error
also makes the library compatible with almost all other libraries using the official trait
This means that if you pick this up, you should be able to migrate gradually (like we're doing with hEngine)
Just like eyre
, the crate is able to set its own Display
and Debug
hooks that are called when printing the Report
, which means you can write your own custom formatting and do other cool things
We have in-built support for backtraces on the nightly channel and feature-flagged support for tracing
to capture a SpanTrace
Check out the error-stack
repository for even more information about the library.
As previously noted, we are still in the process of migrating our own codebase to error-stack
and we intend to make further improvements as we apply the crate to more of hEngine and our internal code at HASH.
During development of the library we included a Provider
implementation while waiting for the implementation of the Provider
RFC to be merged. This is the reason why the API currently requires the nightly release channel, and we will be updating the library to swap to the core::any
implementation in version 0.2.0.
We cannot guarantee full backward compatibility until this crate hits v1.0.0
. However, we will aim to avoid breaking changes, and in the event of any will announce them beforehand and provide a migration guide.
Contributions are greatly welcome. View the project code on GitHub and, if you like it, please star the repository!
We would like to thank the Error Project Group for their work on the Error
trait and the Provider RFC. It's unlikely we would have written this library without the latter — or at least not in a way we would have been truly satisfied with.
This crate is published under the MIT License.
Subscribe to our mailing list to get our monthly newsletter – you’ll be first to hear about partnership opportunities, new releases, and product updates
We're hiring full-stack TypeScript/React developers to help build the Block Protocol and HASH.
Learn more