Visualizing Data: When, Why, and How
Part 3: The Importance of Integrity: Maps – Potentials & Pitfalls
This article is part of a multi-part series on data visualization. Parts 1 and 2 focus on using data visualization throughout the data science workflow and determining when visualizing your data is an appropriate approach for communicating information. Part 3 focuses on factors that affect effective and honest communication of a data story. This is the final article of Part 3.
When data has an underlying geographical relationship, it often makes sense to plot it on a map. Maps are brilliantly intuitive, and often contribute substantially to our understanding of patterns in data. However, they also involve potential pitfalls that can affect the viewer’s interpretation of the data. Being aware of these issues can help you avoid them, whether you are an analyst creating a map or a viewer interpreting mapped data in the media.
Two potential issues include conflating territory size with the significance of the mapped data, and hidden correlations that affect data interpretation. Both issues ultimately have to do with standardization or normalization.
Conflating region size with significance or count
When a map contains regions (countries, provinces, states, territories, etc.) that have a range of sizes, it is easy to conflate region size with the significance of the mapped data. That is to say, the data in larger regions can appear more substantial or more common than the data in smaller regions. There may be cases when this is appropriate, but others were this is not. Ultimately, this comes down to standardization: is the variable being mapped dependent on space or area, or is it dependent on the territory as a singular unit, i.e., count?
For example, what do you notice when you look at the plot below?
At first glance, it may appear that Group 3 (yellow) is the largest, and Group 2 (teal) might be comparable, but Group 1 (purple) looks somewhat less important.
In fact, the colors indicate the rank of each state (plus Washington, D.C.) with regard to total state (or district) area. By definition, 17 states (or districts) must be in the top third of states’ areas, 17 states must be in the middle third, etc. In fact, each group plotted above has exactly 17 members. And because rankings have to do with individual counts of states, and not state area, this leads to a confusing graph.
One of the best examples of this issue in the media is plotting election results on maps, where the “importance” or weight of each state is actually proportional to electoral college votes, rather than state area. Because of this, large states with fewer electoral college votes are visually more important than dense, small states with more electoral college votes. This can skew a viewer’s impression as to the degree of victory or loss of one candidate or another.
See this interview with Thomas Powell for more discussion on this issue, as well as this NYTimes article on the many ways of mapping election results. In particular, the latter article notes that “the goal of the printed map is not to reveal who won the 2012 election — a simple bar chart would do that much better. The goal is to reveal intricately detailed geographic patterns…” Again, this brings us back to the key question: what is the purpose of your visualization, and what story are you trying to communicate?
When data plotted on a map is correlated with an underlying variable that changes over the map space, this affects the conclusions that you can draw from the map. For example, imagine we were interested in the number of pets in households across Canada, and we have access to data on the number of pet and pet supply stores and veterinarians in each province and territory.1 Plotted on a map, the data looks like the following:
Looking at the maps above, one might think that Ontario is full of pet-lovers! It appears that there are far more pet stores and veterinarians in Ontario, and possibly Quebec, than in other provinces or territories.
However, this pattern looks strikingly like another pattern – that of population sizes. Mapped similarly, the distribution of the population across Canada looks like the following map.
If you normalize the data on pet stores and veterinarians for population size and plot it as a per capita (per 1000 people) value, it looks like this:
(Note that PEI has 2.1 veterinarians per 1000 people, but I’ve chosen a scale that goes up to 1 to highlight the differences at the lower end of the scale, and because PEI doesn’t show well on this map.)
Now a completely different pattern emerges! Normalized for population, pet stores are more common in British Columbia than the rest of the country, and veterinarians are surprisingly abundant in PEI, Saskatchewan, and Alberta. With this normalization, the original data has a more reasonable context. On the other hand, now trends may be just as driven by population size as by your original variable – for example, the Yukon only has three pet stores compared to the Northwest’s Territories’ two, but its small population makes that extra pet store very noticeable. Also, the abundance of veterinarians per capita in PEI, SK, and AB is likely related to the fact that the five veterinary schools in Canada are located in these three provinces plus QC and ON, the latter of which are more populous and thus do not stand out on this population-normalized map. (More on veterinarian abundances momentarily.)
Which is the more appropriate map to show? Again, it depends on your question. Are you interested in how many pet stores are in different areas, perhaps because you are making a decision about advertising that has to do with absolute number of pet stores? If so, the first maps of non-normalized values might make sense. Are you interested in whether overall interest level in pets varies across the country? Then it would make more sense to plot the data per capita, as in the second map. Are you more generally interested in variation across the country, and aren’t sure of your story or use case yet? Then perhaps you should look at both maps, to get a better sense of the dataset overall.
If you’re curious about all those vets in Saskatchewan and PEI…
(Note that PEI is at a whopping 160 veterinarians per pet store, but isn’t very visible on the map, and requires a color scale that makes other trends more difficult to see.)
Looked at this way, it is clear that veterinarians and pet stores are relatively evenly matched across the country, except for in Saskatchewan and PEI, where there are substantially more veterinarians. Why might this be? Perhaps this trend is related to farm animals, which would be associated with veterinarians but not with pet stores.
One more point on maps: For maps just as for other visualizations, it is ideal to use a discrete or binned scale, because it makes it easier to compare between different colors. However, with so few regions, this leads to either very large bins or only one region per bin. Thus, I chose a continuous scale so that more of the variation in the data is visible.
- For a great discussion on the difference between data visualization for exploration and for explanation, see this O’Reilly article on the importance of data visualization
- For a truly wonderful book that is often considered the authoritative guide on visualizing data, see Edward Tufte’s The Visual Display of Quantitative Information
- I would be remiss if I did not mention Ivan Valiela’s Doing Science (Chapter 9: Presenting Data in Figures) as a reference to the scientist who first fostered my now-strong opinions about scientific data visualization.
This is the last installment of the article series Visualizing Data: Why, When, and How. We hope that you have enjoyed reading it, and that you’ve picked up some new tools for incorporating visualization into your own work and for assessing how a visualization’s interpretation can be affected by its composition. And if you learned something new, please don’t hesitate to let us know on Twitter
For more discussion of this content, you can also head over to YouTube for Lindsay Brin’s May 2018 ODSC East talk.
- Data sources: Approximate number of veterinarians in Canada: https://www.canadianveterinarians.net/about/statistics; Number of pet and pet supplies stores in Canada as of December 2016, by region: https://www.statista.com/statistics/452864/number-of-pet-stores-by-region-in-canada/; Canadian population by year, by province and territory http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/demo02a-eng.htm