10 Thoughts About Scatterplots
I have been loving Eric Balash’s Back 2 Viz Basics challenge. Despite the fact that I do Tableau almost all day long at work, and then again some at night, I love the opportunity to flex my muscles on the basic but very necessary charts. The intro to this challenge was on bar charts (see my viz here). And I approached it by walking users through my thought process of building the chart as I manipulated it from one option to the next. Not only did this help me think through the pros and cons of each, but I hope it helped others realize some of those often overlooked differences as well.
I decided to participate in the next and current challenge as well: scatterplots. I love a good scatterplot and was excited to dig in. And while I had a little better idea of the design I was hoping for out of the gate, I still wanted to document the things I attended to and the decisions I made.
So here it is, 10 things I thought about while making my scatterplot (for the most part in the order I thought of them).
DISCLAIMER: Not all of these are ‘right’, or best practices. But I am being transparent about what my stream of consciousness is like while vizzing. The hope is less that you take away the same decisions that I made and more that it helps people become more intentional about their choices.
Things Covered:
Number format
Axes
Gridlines and borders
Reference Lines
Color
Legend
Titles
Font
Tooltips
Wording
Number format
Do yourself a big favor and just get in the habit of setting some things right at the beginning of opening a new workbook, default number format being one of them. I already knew that one of these values was a percentage and rather than format at the sheet level everywhere, I wanted to set it and forget it.
X and Y axis
A. What Goes Where: When I originally visualized this I had the variables switched with the winning seasons on the X and the seasons on the Y. But I remember very early on being told that the cause should be on the X and the effect should be on the Y when making a scatterplot. Now this isn’t as simple because I don’t know which is causing the other. Do they get brought back because they’re winning more and therefore get to coach more seasons or do they win more with more experience? That’s for the data (or maybe some experts) to tell you. But this orientation made the most sense to me in the end to think first, how long have they been doing it then how well.
B. Display: There were a couple of adjustments I made here. First was adjusting the Y axis to not start at 0. I try to tread lightly here because I think it can skew the users perception of the trend. But I will say I feel far more comfortable doing so when the axis is visible. This was part of my inclination to keep them showing. Another was that they are two different units. If they were seasons coached and seasons won I might not be as likely but I try to eliminate room for confusion anywhere I can and this could be one of them. I’ll get more into the formatting of the axes in my font considerations.
Gridlines and borders
As the chart begins to take shape a bunch of clutter comes naturally into the view. Please note: you can turn off some of these things at the workbook formatting level if you’d like! I rarely find gridlines helpful or borders in scatter plots. I didn’t feel like either were necessary here so I turned them off. I also turned off zero lines which some people might not agree with. If both axes started at 0 I would’ve kept them but only one was displayed and it felt uneven.
Reference lines
Here’s where things started to get fun. I remembered seeing this viz by Ellen Blackburn (if you’re not following her you should be) and I loved the idea of being able to anchor the analysis to a particular person, in this case the coach. So I decided to add two lines (one for each measure on the scatter plot) centered on a particular coach where you could select a different coach with a parameter action.
Color
A. Value: The previous consideration helped solve something that had been in the back of my head this entire time. What to put on color. I did school at first, I’m not sure why, but I thought there would be more overlap than there was (this is partially from my ignorance around the actual data). I think maybe if there was a field for conference (I believe this is what you call the groupings of schools), that might have been a better initial play. But an important thing I want to call out here is not to put things on color just for the sake of! It should provide value in some way. Once I made the decision about the reference lines it made sense to put the quadrant values on color.
B. Choices: Now, what colors to choose? I get so much serotonin from this part of vizzing. I love color. And I regularly save color combinations that I like so I went to my pinterest board on color inspo and found this one. Some things to note about my choice - I know I needed 5 distinct colors but I wanted two almost color families. One signifying the coach and ‘higher’ performers and the other to signifer ‘lower’ performers (I’m using these terms loosely because an argument can be made that someone with a higher percentage and less experience might actually be a better coach). So the turquoise tones I set for the coach and the upper right quadrant and the warmer (pink, orange, yellow) tones I set for the rest. I’d also like to call out that for business use it is HIGHLY unlikely that I would use colors as flamboyant as these. I would go for a much more muted palette.
Legend
Colors, picked. Now to tell people what’s up. I often try to make my legends do a little extra work for me. If they’re taking up space on my dashboard, they might as well earn it! Don’t get me wrong, there’s plenty of value in just alerting people to what the colors of your viz mean. But why not provide additional context as well? So in addition to the colors and what they signify it also calls out the selected coach and the number of coaches in each bucket and serves as a highlighter for the chart.
Titles
I had gone back and forth between labeling this “Titles” or “Direction”, because that’s what your titles should do, help direct the user to understand what they’re looking at and how to interact with your viz. There are three on this dashboard. High level, what am I visualizing? NCAA D1 Men’s Basketball Coaches. A lot of people take this opportunity to ask a question or say something punny but I went the simple and straightforward route. One way I often see people making a misstep here, especially with scatterplots, is making a claim that the data doesn’t support. Be careful here. Even a strong relationship between two variables only means so much. A couple other important things to call out here though are where I highlighted above the chart and the legend the actions the user can take. Including this in your viz can be important especially if you’re unsure of the Tableau knowledge of the audience. I don’t want to expect that they’ll happen upon both of those pieces of interactivity so I made it explicit. Also noted the variables in the viz and the coach that’s serving as a point of comparison.
Font
I rarely stray from Tableau Book for my Public vizzes. But 2 things I played around with were font size and color.
A. Axes: The default is a bold, but smaller font for the title and a larger font for the axis labels with both being darker. I am going to have a hard time articulating this but when the titles were darker, bolder, and smaller they just felt a little smushy. I wanted it to be easier to read but also not calling for as much attention. I already put these in the chart title so they’re more just a reminder which value is on which access. So I increased the size, moved from medium to book, and chose a lighter color. As for the axis labels, I didn’t think they needed to be as prominent as they are on default. The reason for this is because I think of them (for a scatterplot specifically) as more for reference. If I wanted the user to know the exact values I would’ve made a table. This was more about the relationship. So I made them lighter and smaller
B. Titles: I tried to create a hierarchy of importance by decreasing the size and color for lower portions of the title. But decided to bold the coach name because if there was one thing I was hoping the user picked up from the lowest line it would be that.
Tooltip
A. Do you need them? Yes and no. I have them on the chart, but not the legend. I don’t think tooltips should be used just because they’re on by default. If they aren’t going to include any additional information outside of what’s on the view then I turn them off. Which is what I decided with the legend. I didn’t have anything to call out that wasn’t already visible. But for the scatterplot, this was going to be the primary way to understand who the coach was and what the values were.
B. What should they say? I actually really liked the way they defaulted. Straightforward - coach, school, values. No muss, no fuss. I added and formatted the quadrant reasoning and it felt like it said everything it needed to.
Wording
I’m really excited to be writing about this part because it’s something I find both satisfying and frustrating at the same time. I’m constantly worrying about how to keep things concise and informative at the same time. Where I went through the most iterations was on the legend. How to describe the differentiating colors. I wanted to be careful not to describe something as better than another. I also mulled over using the appropriate terminology (e.g., I had less originally when it should be fewer since it’s quantifiable). Another consideration was making it nimble. The way I originally had the tooltip said, for example, ‘fewer seasons coaching, lower win percentage than the selected coach’. Which is great, until you hovered over the selected coach and it said “Selected coach than the selected coach”. Making sure your wording makes sense in all scenarios takes some finessing.
And that’s all she wrote. To be transparent, this viz took me an hour from downloading the dataset to publishing. That’s right - one chart, one hour. There’s a lot of considerations that I worked through. Some may be flawed, but I hope something in this blog helped you think more about your choices with your charts!