Introduction
A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are used in various forms of analytics, however, this R package specifically focuses on providing an efficient way for creating interactive heatmaps for categorical data or continuous data that can be grouped into categories.
This package is originally being developed for Verkehrsbetriebe Zürich (VBZ), the public transport operator in the Swiss city of Zurich, to illustrate the utilization of different routes and vehicles during different times of the day. Therefore, it groups utilization data (e.g. persons per m^2) into different categories (e.g. low, medium, high utilization) and illustrates it for certain stops over time in a heatmap.
This package can easily be integrated into a shiny dashboard which supports additional interactions with other plots (e.g. boxplot, histogram, forecast) by using plotly events. A mini-demo app is provided in a separate github repository named catmaply_shiny.
This work is based on the plotly.js engine.
Please submit feature requests
This package is still under active development. If you have features you would like to have added, please submit your suggestions (and bug-reports) at: https://github.com/VerkehrsbetriebeZuerich/catmaply/issues/
News
You can see the most recent changes of the package in NEWS.md.
Installation
To install the latest (“cutting-edge”) GitHub version run:
# make sure that you have the corrent RTools installed.
# as you might need to build some packages from source
# if you don't have RTools installed, you can install it with:
# install.packages('installr'); install.Rtools() # not tested on windows
# or download it from here:
# https://cran.r-project.org/bin/windows/Rtools/
# in any case, make sure that you select the correct version,
# otherwise the installation will fail.
# then you'll need devtools
# if (!require('devtools'))
# install.packages('devtools')
# finally install the package
# devtools::install_github('VerkehrsbetriebeZuerich/catmaply')
To get the latest version on CRAN, perform:
#install.packages("catmaply")
Thereafter, you can start using the package as usual:
Usage
Catmaply provides data of VBZ to easily start experimenting with the package. For demonstration purposes and simplicity, we will use it in this notebook. As usual, you can access the data as follows.
data("vbz")
df <- na.omit(vbz[[1]]) %>%
filter(.data$vehicle == "PO")
str(df)
#> tibble [1,707 × 13] (S3: tbl_df/tbl/data.frame)
#> $ trip_seq : int [1:1707] 6 24 22 4 14 42 44 50 28 30 ...
#> $ stop_seq : int [1:1707] 1 1 1 1 1 1 1 1 1 1 ...
#> $ stop_name : chr [1:1707] "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" ...
#> $ trip_id : int [1:1707] 44044 33329 22130 33325 33327 22134 33333 12702 82990 12698 ...
#> $ circulation_name : int [1:1707] 8 6 4 6 6 4 6 2 10 2 ...
#> $ line_name : int [1:1707] 4 4 4 4 4 4 4 4 4 4 ...
#> $ vehicle : Factor w/ 3 levels "","CO","PO": 3 3 3 3 3 3 3 3 3 3 ...
#> $ occupancy : num [1:1707] 18.38 67.77 86.8 5.98 21.04 ...
#> $ occ_category : int [1:1707] 1 2 3 1 1 1 1 1 2 1 ...
#> $ departure_time : Factor w/ 3276 levels "","04:58:12",..: 74 435 393 49 223 839 885 1023 519 563 ...
#> $ number_of_measurements: int [1:1707] 58 47 45 47 47 45 49 51 42 51 ...
#> $ occ_cat_name : Factor w/ 6 levels "","high","low",..: 3 4 5 3 3 3 3 3 4 3 ...
#> $ direction : int [1:1707] 1 1 1 1 1 1 1 1 1 1 ...
#> - attr(*, "na.action")= 'omit' Named int [1:281] 145 146 147 148 149 294 295 296 297 298 ...
#> ..- attr(*, "names")= chr [1:281] "145" "146" "147" "148" ...
The main columns of the vbz
data.frame can be described
as follows:
-
trip_seq
shows the order of the trips -
stop_name
shows the names of the stops, that need to be ordered bystop_seq
-
occ_category
shows the category of the data point (e.g. 1 - very few people in the bus, 5 - bus is full) -
occupancy
is e.g. the number of people per m^2.
Default behaviour
Catmaply expects at least arguments for both axis (x
,
y
) and the fields (z
). To visualize the
occupancy for all stops and trips, we can put the stop_seq
on y
, trip_seq
on x
and
occupancy
on z
as follows:
catmaply(
df,
x = trip_seq,
y = stop_seq,
z = occ_category
)
By default, catmaply produces an interactive heatmap with a
rangeslider, legend items that show and/or hide data for a specific
occupancy category by clicking on it and a hover label, that shows the
values for x
, y
and z
on
hover.
Also, please note that you can use both, column names with and
without quotes as column references for e.g. x, y, z. E.g. if we want to
put the stop_names on the y
axis, we can simply put the
stop_name
on y
and order it using
stop_seq
(the x axis has of course the same functionality);
as shown in the following:
catmaply(
df,
x = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occ_category
)
What about more expressive labels for the legend? You can use any
column for the legend entries, as long as it matches the category in the
fields. So, if you would like to show occ_cat_name
(“low”,
“medium”, “high”) in the legend instead of occ_category
(1,2,3,..), you can set the parameter legend_col
and
overwrite the legend labels as follows:
catmaply(
df,
x = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occ_category,
legend_col = occ_cat_name
)
Not happy with the color palette? Then let’s change it :-). To change the color palette you can either submit a color palette vector or a function that is able to return one.
Note: that the color palette function needs to take n as first argument, whereas n defines the number of colors to be produced.
catmaply(
df,
x = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occ_category,
color_palette = viridis::magma,
legend_col = occ_cat_name
)
You don’t like all this interactivity of the legend or you need more
performance for large plots? Let’s turn the interactive legend off by
setting legend_interactive = FALSE
.
Color ranges per category
How about illustrating differences in one category? You can illustrate values that are particularly low (high) within one category by specifying one color range per category.
To show one color range per category, we have to put a continuous number in the fields and categorize it with a categorical column, so in our example:
-
occupancy
in the fields -
occ_category
is the categorization over these fields.
Also, lets add a more expressive x_label to the plot, essentially
concatenating columns trip
, vehicle
and
circulation_name
.
df <- df%>%
mutate(x_label = paste(
formatC(trip_seq, width=3, flag="0"),
vehicle,
formatC(circulation_name, width=2, flag = "0"),
sep="-")
)
catmaply(
df,
x = x_label,
x_order = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::magma,
legend_col = occ_cat_name
)
This works also for non-interactive legends.
Axis formatting
Now, lets mess around with axis formatting; let’s change
-
font_color
to the “purplish” color used in the logo (#6D65AB) -
font_size
to 10 pt. -
font_family
to “verdana” -
x_tickangle
to 80 and, just for fun, -
y_tickangle
to -10
catmaply(
df,
x='x_label',
x_order = 'trip_seq',
x_tickangle = 80,
y = "stop_name",
y_order = "stop_seq",
y_tickangle = -10,
z = "occupancy",
categorical_color_range = TRUE,
categorical_col = 'occ_category',
color_palette = viridis::magma,
font_size = 10,
font_color = '#6D65AB',
font_family = "verdana",
legend_col = occ_cat_name
)
Hover
What about a custom hover label? Catmaply allows to define custom
hover templates with the parameter hover_template
; which
can take html
tags to make it more appealing. Here is an
example that creates bold column names of the respective values; also it
rounds the occupancy to 2 decimal points.
catmaply(
df,
x=x_label,
x_order = trip_seq,
x_tickangle = 80,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
legend_col = occ_cat_name
)
Note: usually it is a good idea to add the
<extra></extra>
tag at the end to hide trace information.
Legend and fancy hover template is too much info; you want simplicity and, thus, hide the legend altogether? Go ahead.. :-)
catmaply(
df,
x=x_label,
x_order = trip_seq,
x_tickangle = 80,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
legend_col = occ_cat_name,
legend = FALSE
)
Minimalist or visual person that does not like hover nor legend…?
catmaply(
df,
x=x_label,
x_order = trip_seq,
x_tickangle = 80,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_hide = TRUE,
legend_col = occ_cat_name,
legend = FALSE
)
Ok, no hover but legend it is… (for completeness).
Time Axis
Hmm, didn’t we say that we want to show the development over time? Wouldn’t it make sense then, if we could illustrate time dynamically on the x axis?
Lets check out how a dynamic x axis can be created if you put a
column of type PSIXct
or POSIXt
on the x axis.
Lets check it out by calculating the departure datetime of each
drive.
df <- df %>%
na.omit() %>%
dplyr::group_by(
trip_seq
) %>%
dplyr::mutate(
departure_date_time = min(na.omit(lubridate::ymd_hms(paste("2020-08-01", departure_time))))
) %>%
dplyr::ungroup()
catmaply(
df,
x=departure_date_time,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
)
)
Currently, formatting of the time axis is optimised to analyse daily
data; e.g. if you sample the utilization throughout the year and then
summarise it to get the utilization of a typical day. Thus, the
formatting of the max zoom level is still hours and not years. However,
you can change the individual formatting of the respective zoom level by
setting the tickformatstops
parameter. So, if you want to
e.g. remove the h, m, s and ms that indicate the unit of time above, you
could achieve this as follows (more infos can be found in the tick
formatting example of plotly:
catmaply(
df,
x=departure_date_time,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
tickformatstops=list(
list(dtickrange = list(NULL, 1000), value = "%H:%M:%S.%L"),
list(dtickrange = list(1000, 60000), value = "%H:%M:%S"),
list(dtickrange = list(60000, 3600000), value = "%H:%M"),
list(dtickrange = list(3600000, 86400000), value = "%H:%M"),
list(dtickrange = list(86400000, 604800000), value = "%H:%M"),
list(dtickrange = list(604800000, "M1"), value = "%H:%M"),
list(dtickrange = list("M1", "M12"), value = "%H:%M"),
list(dtickrange = list("M12", NULL), value = "%H:%M")
)
)
Slider
Besides the rangeslider, it is also possible to use a simple slider/scrollbar. This might be especially favourable for annotated heatmaps shown in the next section
You can switch to a slider as easy as follows:
catmaply(
df,
x=x_label,
x_order = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
rangeslider = FALSE, # to prevent warning
legend_interactive = FALSE, # to prevent warning
slider = TRUE # activate slider
)
To add a prefix for the current value (the one above the slider); set
the slider_currentvalue_prefix
as follows:
catmaply(
df,
x=x_label,
x_order = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
rangeslider = FALSE, # to prevent warning
legend_interactive = FALSE, # to prevent warning
slider = TRUE, # activate slider,
slider_currentvalue_prefix = "Trip: "
)
Also, you can show/hide various elements of the slider, more specifically the steps, ticks and current value. This gets you almost a scrollbar feeling:
catmaply(
df,
x=x_label,
x_order = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
rangeslider = FALSE, # to prevent warning
legend_interactive = FALSE, # to prevent warning
slider = TRUE, # activate slider,
slider_currentvalue_visible = FALSE,
slider_step_visible = FALSE,
slider_tick_visible = FALSE
)
The slider steps are created automatically for you, however, you can also define them yourself. Catmaply provides two modes to alter the way the slider steps are created: auto, and custom list.
Mode auto alters the way the steps are automatically created for you.
You need to provide parameter slider_steps
a list with the
following parameters:
-
slider_start
(numeric): the starting-point of the slider on thex
axis. -
slider_range
(numeric): the size of the window of the slider. -
slider_shift
(numeric): how much units the window should be moved to the right for each step. -
slider_step_name
(column): the name of the step; which must match the occurrence ofx
.
For example, if want to create a slider that starts at trip 5 with a window of size 15 trips and a shift of size 10 trips; you can do this as follows:
catmaply(
df,
x=x_label,
x_order = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
rangeslider = FALSE, # to prevent warning
legend_interactive = FALSE, # to prevent warning
slider = TRUE, # activate slider,
slider_currentvalue_visible = FALSE,
slider_steps=list(
slider_start=1,
slider_range=15,
slider_shift=10,
slider_step_name="x" # same name as x axis (must be character)
)
)
Besides mode auto, you can also get full control of the slider steps by providing a list with the following elements:
-
name
name of the step -
range
range to be covered by the step
A simple example is shown below:
catmaply(
df,
x=x_label,
x_order = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::inferno,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
rangeslider = FALSE, # to prevent warning
legend_interactive = FALSE, # to prevent warning
slider = TRUE, # activate slider,
slider_currentvalue_visible = FALSE,
slider_steps = list(
list(name="Very important step one", range=c(12, 37)),
list(name="Very important step two", range=c(87, 111))
)
)
Annotations
Sometimes it makes sense to add annotations to a heatmap. With catmaply, you can add and format annotations relatively easily with the following parameters:
-
text
column name holding the values of the text to be displayed. -
text_color
the color of the text (similar to font_color). -
text_size
the size to be used for the text (similar to font_size). -
text_font_family
the font family to be used for the text (similar to font_family).
Adding annotations to the previous slider example can be achieved as follows:
catmaply(
df,
x=x_label,
x_order = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occupancy,
text = occ_category,
text_color="#000",
text_size=12,
text_font_family="Open Sans",
categorical_color_range = TRUE,
categorical_col = occ_category,
color_palette = viridis::plasma,
hover_template = paste(
'<br><b>Trip</b>:', trip_seq,
'<br><b>Stop Name</b>:', stop_name,
'<br><b>Occupancy category</b>:', occ_cat_name,
'<br><b>Occupancy</b>:', round(occupancy, 2),
'<extra></extra>'
),
rangeslider = FALSE, # to prevent warning
legend_interactive = FALSE, # to prevent warning
slider = TRUE, # activate slider,
slider_currentvalue_visible = FALSE,
slider_steps=list(
slider_start=1,
slider_range=15,
slider_shift=10,
slider_step_name="x" # same name as x axis (must be character)
)
)
Note: annotations can also be used with rangeslider or no slider at all. However, lots of annotations might have a negative influence on the performance. Only text values that are not NA are used for annotations.
Credits
This package only exists thanks to the amazing work done by many people in the open source community. Beyond the many people working on the pipeline of R, thanks should go to the plotly team, and especially to Carson Sievert and others working on the R package of plotly. Also, a special thanks to VBZ for providing advice on the functionality of the package as well as providing the vbz dataset; also, I would like to thank VBZ for investing time to test the package and for using it in your awesome shiny VBZ dashboard.
Session info
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 catmaply_0.9.4
#>
#> loaded via a namespace (and not attached):
#> [1] viridis_0.6.4 plotly_4.10.4 sass_0.4.8 utf8_1.2.4
#> [5] generics_0.1.3 tidyr_1.3.0 stringi_1.8.3 digest_0.6.34
#> [9] magrittr_2.0.3 timechange_0.3.0 evaluate_0.23 grid_4.3.2
#> [13] fastmap_1.1.1 jsonlite_1.8.8 gridExtra_2.3 httr_1.4.7
#> [17] purrr_1.0.2 fansi_1.0.6 crosstalk_1.2.1 viridisLite_0.4.2
#> [21] scales_1.3.0 lazyeval_0.2.2 textshaping_0.3.7 jquerylib_0.1.4
#> [25] cli_3.6.2 rlang_1.1.3 ellipsis_0.3.2 munsell_0.5.0
#> [29] withr_3.0.0 cachem_1.0.8 yaml_2.3.8 tools_4.3.2
#> [33] memoise_2.0.1 colorspace_2.1-0 ggplot2_3.4.4 vctrs_0.6.5
#> [37] R6_2.5.1 lubridate_1.9.3 lifecycle_1.0.4 stringr_1.5.1
#> [41] fs_1.6.3 htmlwidgets_1.6.4 ragg_1.2.7 pkgconfig_2.0.3
#> [45] desc_1.4.3 pkgdown_2.0.7 pillar_1.9.0 bslib_0.6.1
#> [49] gtable_0.3.4 glue_1.7.0 data.table_1.14.10 systemfonts_1.0.5
#> [53] xfun_0.41 tibble_3.2.1 tidyselect_1.2.0 knitr_1.45
#> [57] farver_2.1.1 htmltools_0.5.7 rmarkdown_2.25 compiler_4.3.2