Four charts to understand causes of death across the lifespan: A dataviz walkthrough
The key question I ask myself while making data visualization: What question am I trying to answer?
This is a new section within Scientific Discovery, called “Notes”, where I share thoughts on technical topics usually with a research-focused angle. You can opt out of this section while still keeping up with my regular posts.
Much of my work these days is about data visualization. It still feels a little new and I’d been so used to writing that I hadn’t fully grasped why visualization mattered until recently.
But the big reason I’ve learnt to visualize data is to explore and learn about a topic in a way that would be difficult with just a written scientific description.
I get the feeling that many people underestimate how valuable this can be, so in this post I’m hoping to give you a greater appreciation for how data visualization can help explore, probe, and understand a topic more deeply.
Each chart in this post can be recreated with code I’ve shared on my GitHub.
The key thing to remember with data visualization is to ask: what question am I trying to answer?
Iʼll do this with a topic I’ve often written about — causes of death. This time I’m interested in the question: How do causes of death vary with age?
It’s a fairly broad question, and it’s not obvious how to answer it. So Iʼll start by just exploring a few different chart types on the topic. To do this, I’ve created charts using national data from death certificates across the United States, between 2018 and 2021.
This first and the simplest stacked bar chart looks at the number of deaths from each cause across age groups.
What stands out is that most deaths tend to occur at older ages, and come from a range of causes, but especially cardiovascular diseases (pinkish brown) and cancers (blue).
I also can see that ‘external causes’ (in red) are prominent in youth. This is a broad category that includes violence, accidents, suicides, overdoses, and some other causes of death.1
And infancy also contributes to a large number of deaths overall. Most infant deaths are from perinatal conditions (purple).
At the same time, there’s something strange about this chart. It looks like many of these causes of death are leveling off with age. Is that really the case?
To understand whether the risks of death are leveling off with age, I can instead look at the death rate from each cause at different ages instead.
I’ve shown this with line charts below. The data again comes from death certificates in the United States between 2018 and 2021.
As you can see, the risks don’t level off. They continue to rise exponentially with age for many causes of death. But some causes are also very deadly during infancy.
And you can see clearly that ‘external causes’ don’t fit the usual pattern: they rise very suddenly during adolescence. In fact, they are the only cause of death with mortality rates that high at that age.
I’ve written an article about this here: How do the risks of death change with age — and how have they changed over time?
But why did causes of death appear to level off with age in the previous chart, showing the number of deaths?
Well, let’s think about what the number of deaths depends on — both the risk of dying and the actual number of people who had been alive at that age and were at risk of dying.
So let’s plot the population size at each age in the United States. I’ve shown this in a frequency chart below.2
Now I can see that the population is actually smaller at older ages, which is why the total number of people dying had started plateauing as well.
Since weʼve now solved that little mystery, let’s go back to the chart showing death rates.
While I like this chart, I think it has drawbacks.
I canʼt easily answer the question ‘How do causes of death vary with age’, because I can’t compare them very easily on this chart across age groups. Each cause of death is shown with a different y-axis scale. So if I was trying to compare them, I’d have to keep flitting my eyes between the y-axes, the x-axes, and the different cause-of-death labels, to try to compare any pair of causes, which is annoying.
And since some of them are so much smaller than others, if I put them all on the same y-axis scale, some causes would look so small that I wouldn’t be able to see precisely how their risks changed with age.
A friend suggested I try something else: if I stack death rates from different causes, then I can see them all on the same chart and also see the total death rate from different causes at different ages.
I’ve shown this in the chart below. Here, the total height of the curve shows the total annual death rate, out of 100,000 people at that age.
[Stacking rates is an approach that often makes me feel queasy, but it works here because causes of death are exclusive — people only have a single ‘underlying cause of death’ coded from their death certificate.]
This chart helps clear up some of the previous confusions. It also shows that the total risk of death isn’t leveling off, but continues to rise with age. And it helps to see the relative risks of dying from different causes at each age.
On the other hand, the age-patterns within each cause of death aren’t very easy to see anymore, because of the stacking. It isn’t clear here that cancer and cardiovascular death rates continue to rise with age, for example.
As for the youngest ages, honestly I can barely tell what’s going on.
So let’s try a fourth way of exploring the data. This time, I’ll look at the relative share of deaths from each cause, at different ages. This stretches it out to fill the space.
Now I can see precisely what share of deaths at each age results from different causes.
This helps me see that most deaths in childhood and adolescence are caused by ‘external causes’.
I can also see that at older ages, diseases have gained prominence. Cancers and cardiovascular diseases have become the most common causes of death at older ages, and causes of death have also become more varied.
But what’s not helpful is that, by focusing on the relative share, this chart obscures the underlying reasons for these trends.
For example, although external causes look very prominent here, deaths from other causes at older ages actually dominate overall, as we saw in the first chart.
Or take a look at the apparent decline in cancers after the age of around 65, for example. Unfortunately, this doesn’t reflect an underlying decline in the risks of dying from cancer.
Both cancers and cardiovascular diseases actually continue to rise exponentially with age, as we saw before. (I’ve repeated the chart below.) If you look at the y-axis scales, you’ll notice that cardiovascular disease deaths have grown much more than cancers. This is why cancer deaths appeared to decline — they were declining only in relative terms.
With all these charts, I’ve now learnt a lot about different aspects of data on causes of death across the lifespan. I’ve also explained what I like about each chart and what I don’t.
But which one do I prefer?
Letʼs think about the question I had initially: How do causes of death vary by age?
Unfortunately, there wasn’t just one chart that answered it satisfyingly, and I think the question was just too broad to answer with a single chart.
But each chart helped me gain a broader understanding of the topic, and told me something meaningful on its own.
So here are the four charts again, this time titled with the specific questions I think they answer.
This first chart below — the number of deaths — simply shows how many people are dying from each cause at each age. It also helps to see the total number of deaths at each age.
The second chart below — the death rates — shows how the risk of dying from each cause varies with age.
It’s not geared towards comparing different causes, but to seeing the age patterns within each cause separately.
The third chart below, where death rates are stacked, actually helps me compare them. It also helps me see the total risk of death across causes. It answers the question: what are the risks of dying from different causes at each age?
The fourth and final chart below — the relative share — had been a confusing chart to understand at first.
Many different factors affect this figure: not just the risks from each cause, but also how they compare and displace each other at older ages.
But if we step back, the question it answers is actually fairly simple. What are people dying from at different ages?
Each chart I’ve shown answers a valuable question on its own. Of course, not every chart needs to be titled with a question. I’ve placed these questions in the titles because they help remind me of what exactly I’m trying to explore and understand.
I think exploring different types of charts has helped me understand the topic overall, and also helped understand what each type of chart is useful for.
If you want to recreate these charts, you’ll be able to find the data and code on my GitHub. There you’ll also find code to split the charts by sex. There’s also another version to split ‘external causes’ into its subcategories. (If you do decide to recreate or adapt any charts with my code, I’d love to know what you create!)
I hope you enjoyed this! As I shared earlier, this is a new section within Scientific Discovery called “Notes”, where I share thoughts on sometimes more technical topics.
Iʼll treat this as a separate blog, but you can opt out of these posts separately here.
As always, if you’ve spotted any errors in this post, I’d very much appreciate knowing. I offer rewards for it!
See you next time! :)
– Saloni
You can explore the full list of causes of death on the ICD website, which is fairly easy to explore.
The chart showing the number of deaths from each cause looked at deaths from 2018–2021 combined, so it showed the total number of deaths across four years. But in this chart, I’ve focused on the population size in a single year, 2018, so that the numbers are more intuitive to a reader.
I really love this, excellent visualizations and explanations of each. I would be thrilled if you did a similar set of visualizations that look at “risk factors“ for mortality. I love using good visualizations for risk factors to help people understand exactly how they should prioritize their efforts not to die, at various ages.
Very nice graphs
I wonder how one could calculate the "number of life years lost" due to a particular cause of death, if it's even a coherent concept
I feel like it would match the idea of the "seriousness" or "importance" of a particular danger relatively well