Colorizing points in a base \(R\) plot
By default, base \(R\) plot
uses hollow circles for points, perfectly adequate for a single data set, but less so for multivariate data because the edges are too thin for color to stand out well. My go-to option: set the pch
argument to 16 and the col
argument to the color of my choice.
Background
pch
is the argument that specifies the shape of a point in a plot
. The three basic selections for a circle shape are:
pch | colorizing options |
---|---|
1 (or the default when pch is not given) | the edge color can be changed but not the interior |
16 (a so-called “solid circle”) | the interior can be changed but not the edge |
21 (a so-called “filled circle”) | the interior and the edge can be different |
The pch=16
and pch=21
colorizing options apply to other shapes that also fall into their respective “pch groups”: 15-20 for solid shapes and 21-25 for filled shapes. To see which shapes correspond to which pch
value, check out help("points")
as well as many posts on the web such as this one.
Circles and squares are illustrated in light and dark backgrounds below.1 IMO, points “pop” from their interior, not from their edge.2
Notice how
- For the default (
pch == 1
)- the interior color cannot be changed and is always transparent so the background always shows through
- the default edge color is black so the point virtually disappears on a dark background
- For solid shapes (
pch in 15:20
)col
specifies both the interior and edge colors, necessarily the samebg
– specified or not – has no impact
- For filled shapes (
pch in 21:25
)col
specifies the edge colorbg
specifies the interior color, defaulting to “transparent” if unspecified3
Example
Here is a bivariate example using the mtcars
dataset in \(R\) and Paul Tol’s “bright” palette.4 5
green = "#228833"
magenta = "#AA3377"
# build plot title -- see stackoverflow citation in footnote
a = quote(paste("miles per gallon vs displacement (i"))
b = quote(n^3)
c = quote(")")
e <- substitute(a * b * c, list(a = a, b = b, c = c))
with(mtcars,
plot(disp, mpg
, pch = 16
, col = c(green, magenta)[as.numeric(vs)+1]
, main = e
)
)
legend("topright", c("v-engine", "straight-block"), col = c(green, magenta), pch = 16)
Not only do smaller engines get better gas mileage, but high-displacement straight-blocks were nonexistent in 1973.
Bottom line
Use base \(R\)’s default black circles to quickly visualize sequential data.
For colored circles use pch = 16
and col = color_of_your_choice
.
Use pch = 21
when it is useful to differentiate a point’s edge from its interior.
Use color-blind friendly colors whenever possible.
Try pch = "."
for dots when you have many points but you don’t want lines.
Postscript
The complementary colors green (#228833) and magenta (#AA3377) used in this post come from Paul Tol’s color-blind friendly bright color palette.6 Tol’s “Notes” page is worth visiting for other helpful colorizing advice. This non-color-blind author would be interested in reader feedback regarding the distinguishability of the colors used in this post.
The end of business today, 12/12/2020, marks 251.387 months since the end of the last millenium.
Generated with Rmarkdown in RStudio.
R Environment
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mondate_0.10.01.02
loaded via a namespace (and not attached):
[1] compiler_4.0.3 magrittr_1.5 tools_4.0.3 htmltools_0.5.0
[5] yaml_2.2.1 stringi_1.4.6 rmarkdown_2.3 knitr_1.28
[9] stringr_1.4.0 xfun_0.14 digest_0.6.25 rlang_0.4.6
[13] evaluate_0.14
Regarding the color of the default background, the italicised phrases are from \(R\) help pages: normally “white” from
help("par")
, often transparent fromhelp("frame")
↩︎Called “border” in \(R\)↩︎
Per
help("par")
: “For many devices the initial value [of the plot background] is set from thebg
argument of the device, and for the rest it is normally "white".”↩︎Okabe and Ito have a wonderful site that discusses Color Universal Design: https://jfly.uni-koeln.de/color/. Okabe/Ito and Tol palettes can be displayed with \(R\) code downloadable from here: Goedhart, Joachim. (2019, August 29). Material related to the blog “Dataviz with Flying Colors”. Zenodo. http://doi.org/10.5281/zenodo.3381072↩︎
Technique for superscript in title from https://stackoverflow.com/questions/34193276/concatenate-several-math-expressions-in-r↩︎
For additional perspectives on color-impaired visualizations, see https://thenode.biologists.com/data-visualization-with-flying-colors/research/ and https://venngage.com/blog/color-blind-friendly-palette/ and https://jfly.uni-koeln.de/color/.↩︎
Nice color variance!
ReplyDeleteThanks to Paul Tol.
DeleteWhat is the best option for when data points overlap?
ReplyDeleteSome R packages have sophisticated methods for dealing with overlapping data, but in base R the choices are limited. The best choice that does not depend on the device is via the function jitter. Try this site for that and other options: http://www.rensenieuwenhuis.nl/r-sessions-13-overlapping-data-points/
ReplyDelete