Your data vis “Spidey-sense” & the need for a robust “utility belt”

@theboysmithy did a great piece on coming up with an alternate view for a timeline for an FT piece.

Here’s an excerpt (read the whole piece, though, it’s worth it):

Here is an example from a story recently featured in the FT: emerging- market populations are expected to age more rapidly than those in developed countries. The figures alone are compelling: France is expected to take 157 years (from 1865 to 2022) to triple the proportion of its population aged over 65, from 7 per cent to 21 per cent; for China, the equivalent period is likely to be just 34 years (from 2001 to 2035).

You may think that visualising this story is as simple as creating a bar chart of the durations ordered by length. In fact, we came across just such a chart from a research agency.

But, to me, this approach generates “the feeling” — and further scrutiny reveals specific problems. A reader must work hard to memorise the date information next to the country labels to work out if there is a relationship between the start date and the length of time taken for the population to age. The chart is clearly not ideal, but how do we improve it?

Alan went on to talk about the process of improving the vis, eventually turning to Joseph Priestly for inspiration. Here’s their makeover:

Alan used D3 to make this, which had me head scratching for a bit. Bostock is genius & I :heart: D3 immensely, but I never really thought of it as a “canvas” for doing general data visualization creation for something like a print publication (it’s geared towards making incredibly data-rich interactive visualizations). It’s 100% cool to do so, though. It has fine-grained control over every aspect of a visualization and you can easily turn SVGs into PDFs or use them in programs like Illustrator to make the final enhancements. However, D3 is not the only tool that can make a chart like this.

I made the following in R (of course):

The annotations in Alan’s image were (99% most likely) made with something like Illustrator. I stopped short of fully reproducing the image (life is super-crazy, still), but could have done so (the entire image is one ggplot2 object).

This isn’t an “R > D3” post, though, since I use both. It’s about (a) reinforcing Alan’s posits that we should absolutely take inspiration from historical vis pioneers (so read more!) + need a diverse visualization “utility belt” (ref: Batman) to ensure you have the necessary tools to make a given visualization; (b) trusting your “Spidey-sense” when it comes to evaluating your creations/decisions; and, (c) showing that R is a great alternative to D3 for something like this :-)

Spider-man (you expected headier references from a dude with a shield avatar?) has this ability to sense danger right before it happens and if you’re making an effort to develop and share great visualizations, you definitely have this same sense in your DNA (though I would not recommend tossing pie charts at super-villains to stop them). When you’ve made something and it just doesn’t “feel right”, look to other sources of inspiration or reach out to your colleagues or the community for ideas or guidance. You can and do make awesome things, and you do have a “Spidey-sense”. You just need to listen to it more, add depth and breadth to your “utility belt” and keep improving with each creation you release into the wild.

R code for the ggplot vis reproduction is below, and it + the CSV file referenced are in this gist.

library(ggplot2)
library(dplyr)

ft <- read.csv("ftpop.csv", stringsAsFactors=FALSE)

arrange(ft, start_year) %>%
  mutate(country=factor(country, levels=c(" ", rev(country), "  "))) -> ft

ft_labs <- data_frame(
  x=c(1900, 1950, 2000, 2050, 1900, 1950, 2000, 2050),
  y=c(rep(" ", 4), rep("  ", 4)),
  hj=c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
  vj=c(1, 1, 1, 1, 0, 0, 0, 0)
)

ft_lines <- data_frame(x=c(1900, 1950, 2000, 2050))

ft_ticks <- data_frame(x=seq(1860, 2050, 10))

gg <- ggplot()

# tick marks & gridlines
gg <- gg + geom_segment(data=ft_lines, aes(x=x, xend=x, y=2, yend=16),
                        linetype="dotted", size=0.15)
gg <- gg + geom_segment(data=ft_ticks, aes(x=x, xend=x, y=16.9, yend=16.6),
                        linetype="dotted", size=0.15)
gg <- gg + geom_segment(data=ft_ticks, aes(x=x, xend=x, y=1.1, yend=1.4),
                        linetype="dotted", size=0.15)

# double & triple bars
gg <- gg + geom_segment(data=ft, size=5, color="#b0657b",
                        aes(x=start_year, xend=start_year+double, y=country, yend=country))
gg <- gg + geom_segment(data=ft, size=5, color="#eb9c9d",
                        aes(x=start_year+double, xend=start_year+double+triple, y=country, yend=country))

# tick labels
gg <- gg + geom_text(data=ft_labs, aes(x, y, label=x, hjust=hj, vjust=vj), size=3)

# annotations
gg <- gg + geom_label(data=data.frame(), hjust=0, label.size=0, size=3,
                      aes(x=1911, y=7.5, label="France is set to take\n157 years to triple the\nproportion ot its\npopulation aged 65+,\nChina only 34 years"))
gg <- gg + geom_curve(data=data.frame(), aes(x=1911, xend=1865, y=9, yend=15.5),
                      curvature=-0.5, arrow=arrow(length=unit(0.03, "npc")))
gg <- gg + geom_curve(data=data.frame(), aes(x=1915, xend=2000, y=5.65, yend=5),
                      curvature=0.25, arrow=arrow(length=unit(0.03, "npc")))

# pretty standard stuff here
gg <- gg + scale_x_continuous(expand=c(0,0), limits=c(1860, 2060))
gg <- gg + scale_y_discrete(drop=FALSE)
gg <- gg + labs(x=NULL, y=NULL, title="Emerging markets are ageing at a rapid rate",
                subtitle="Time taken for population aged 65 and over to double and triple in proportion (from 7% of total population)",
                caption="Source: http://on.ft.com/1Ys1W2H")
gg <- gg + theme_minimal()
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(plot.margin=margin(10,10,10,10))
gg <- gg + theme(plot.title=element_text(face="bold"))
gg <- gg + theme(plot.subtitle=element_text(size=9.5, margin=margin(b=10)))
gg <- gg + theme(plot.caption=element_text(size=7, margin=margin(t=-10)))
gg

转自:https://rud.is/b/2016/06/16/your-data-vis-spidey-sense-the-need-for-a-robust-utility-belt/

时间: 2024-08-29 06:57:39

Your data vis “Spidey-sense” & the need for a robust “utility belt”的相关文章

notes: the architecture of GDB

1. gdb structure at the largest scale,GDB can be said to have two sides to it:1. The "symbol side" is concerned with symbolic information about the program. Symbolic information includes function and variable names and types, linenumbers, machin

Multivariate Adaptive Regression Splines (MARSplines)

Introductory Overview Regression Problems Multivariate Adaptive Regression Splines Model Selection and Pruning Applications Technical Notes: The MARSplines Algorithm Technical Notes: The MARSplines Model Introductory Overview Multivariate Adaptive Re

PCI Express(四) - The transaction layer

原文出处:http://www.fpga4fun.com/PCI-Express4.html 感觉没什么好翻译的,都比较简单,主要讲了TLP的帧结构 In the transaction layer, we receive "packets". There is a 32-bits bus and the packets arrive on the bus (packet lengths are always multiples of 32-bits). Maybe one packe

使用神经网络来识别手写数字【译(三)- 用Python代码实现

实现我们分类数字的网络 好,让我们使用随机梯度下降和 MNIST训练数据来写一个程序来学习怎样失败手写数字. 我们也难怪Python (2.7) 来实现.只有 74 行代码!我们需要的第一个东西是 MNIST数据.如果有 github 账号,你可以将这些代码库克隆下来, git clone https://github.com/mnielsen/neural-networks-and-deep-learning.git 或者你可以到这里 下载. Incidentally, 当我先前说到 MNIS

fio2.1.10--HOWTO

Table of contents ----------------- 1. Overview 2. How fio works 3. Running fio 4. Job file format 5. Detailed list of parameters 6. Normal output 7. Terse output 8. Trace file format 9. CPU idleness profiling 1.0 Overview and history ---------------

python excel 文件合并

Combining Data From Multiple Excel Files Introduction A common task for python and pandas is to automate the process of aggregating data from multiple files and spreadsheets. This article will walk through the basic flow required to parse multiple Ex

Swift2

Oct 7, 2014 创建第一个Swift 应用Building Your First Swift App Video 目前Swift 博客关注在先进的编程话题,包括Swift 语言的设计概念.我们认为这对新开发者和该接触Swift 的开发者是有帮助的.为了让每个人都能够使用,我们放了一段小的视频,演示如何从零开始用Swift创建一个应用,用少于10分钟的时间. So far the Swift blog has focused on advanced programming topics,

Oracle Management Packs

http://kerryosborne.oracle-guy.com/2008/10/oracle-management-packs/ There has been quite a bit of confusion about the licensing of Oracle Management Packs and about their functionality. Here are a couple of links to the Oracle Docs which describe wha

Git From the Inside Out

Ref: https://codewords.recurse.com/issues/two/git-from-the-inside-out Git from the inside out Mary Rose Cook This essay explains how Git works. It assumes you understand Git well enough to use it to version control your projects. The essay focuses on