hvPlot examples¶
How to install¶
To be able to run the examples, Snappy](https://github.com/google/snappy) must also be installed.
With Spack you can provide Snappy in your kernel, for example with:
$ spack env activate python-311
$ spack install snappy
Alternatively, you can install Snappy with other package managers, for example
for Debian/Ubuntu:
$ sudo apt install libsnappy-dev
for Windows:
Snappy requires Microsoft Visual C++ ≥ 14.0, which can be installed with the Microsoft C++ Build Tools.
for Mac OS:
$ brew install snappy
Afterwards, additional packages should be installed for your kernel, for example with:
$ pipenv install intake intake-parquet s3fs python-snappy pyviz-comms
…
Introduction¶
First we import NumPy and pandas to create a small set of random data:
[1]:
import numpy as np
import pandas as pd
index = pd.date_range("1/1/2000", periods=1000)
df = pd.DataFrame(
np.random.randn(1000, 4), index=index, columns=list("ABCD")
).cumsum()
df.head()
[1]:
A | B | C | D | |
---|---|---|---|---|
2000-01-01 | 1.431985 | 1.378913 | 0.539567 | 0.257977 |
2000-01-02 | 1.266573 | 0.834050 | 1.176750 | 1.458363 |
2000-01-03 | 1.799283 | 0.945437 | 1.792629 | -0.220764 |
2000-01-04 | 2.131248 | 0.160377 | 1.186063 | -0.702810 |
2000-01-05 | 3.574972 | 2.274926 | 1.759324 | -1.297244 |
pandas.plot ()
API¶
pandas offers Matplotlib-based plotting with the .plot()
method by default:
[2]:
%matplotlib inline
df.plot();

The result is a PNG image that can be easily displayed, but is otherwise static.
Note: In pandas > 0.25.0 the backend can be exchanged, for example with
pd.options.backend.plotting == "holoviews",
. You can find more information on this at pandas-API.
.hvplot()
¶
If we switch to import hvplot.pandas
and the df.hvplot
method instead of %matplotlib inline
, an interactively explorable bokeh diagram is now generated with panning and zoom in/out as well as clickable legends:
[3]:
import hvplot.pandas
df.hvplot()
[3]:
Such an interactive diagram makes it much easier to explore the data without having to write additional code.
Native hvPlot
API¶
For the above diagram, hvPlot has dynamically added the pandas .hvplot()
method so that you can use the same syntax as for pandas plots. If you prefer a more explicit approach, you can work directly with the hvPlot objects instead:
[4]:
import holoviews as hv
from hvplot import hvPlot
hv.extension("bokeh")
plot = hvPlot(df)
plot(y=["A", "B", "C", "D"])
[4]:
Help¶
If you are working in IPython or Jupyter notebooks, the hvplot
methods automatically complete valid keywords. For example, if you press the Tab key after declaring the plot type, all valid keywords and the document string will be displayed:
df.hvplot.line(TAB
Outside of an interactive environment, hvplot.help
displays all information for a plot type, for example:
[ ]:
hvplot.help("line")
See also:
Further information on the available options can be found in Customization.
Plotting¶
In the following examples, dask hvPlot API is used in addition to the pandas API:
[6]:
import hvplot.dask
The hvplot.sample_data
module creates these data sets as Intake data catalogues, which we can load with pandas:
[7]:
from hvplot.sample_data import airline_flights, us_crime
crime = us_crime.read()
print(type(crime))
crime.head()
<class 'pandas.core.frame.DataFrame'>
[7]:
Year | Population | Violent crime total | Murder and nonnegligent Manslaughter | Legacy rape /1 | Revised rape /2 | Robbery | Aggravated assault | Property crime total | Burglary | ... | Violent Crime rate | Murder and nonnegligent manslaughter rate | Legacy rape rate /1 | Revised rape rate /2 | Robbery rate | Aggravated assault rate | Property crime rate | Burglary rate | Larceny-theft rate | Motor vehicle theft rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1960 | 179323175 | 288460 | 9110 | 17190 | NaN | 107840 | 154320 | 3095700 | 912100 | ... | 160.9 | 5.1 | 9.6 | NaN | 60.1 | 86.1 | 1726.3 | 508.6 | 1034.7 | 183.0 |
1 | 1961 | 182992000 | 289390 | 8740 | 17220 | NaN | 106670 | 156760 | 3198600 | 949600 | ... | 158.1 | 4.8 | 9.4 | NaN | 58.3 | 85.7 | 1747.9 | 518.9 | 1045.4 | 183.6 |
2 | 1962 | 185771000 | 301510 | 8530 | 17550 | NaN | 110860 | 164570 | 3450700 | 994300 | ... | 162.3 | 4.6 | 9.4 | NaN | 59.7 | 88.6 | 1857.5 | 535.2 | 1124.8 | 197.4 |
3 | 1963 | 188483000 | 316970 | 8640 | 17650 | NaN | 116470 | 174210 | 3792500 | 1086400 | ... | 168.2 | 4.6 | 9.4 | NaN | 61.8 | 92.4 | 2012.1 | 576.4 | 1219.1 | 216.6 |
4 | 1964 | 191141000 | 364220 | 9360 | 21420 | NaN | 130390 | 203050 | 4200400 | 1213200 | ... | 190.6 | 4.9 | 11.2 | NaN | 68.2 | 106.2 | 2197.5 | 634.7 | 1315.5 | 247.4 |
5 rows × 22 columns
Alternatively, we can use dask.DataFrame
:
[8]:
flights = airline_flights.to_dask().persist()
print(type(flights))
flights.head()
<class 'dask.dataframe.core.DataFrame'>
[8]:
year | month | day | dayofweek | dep_time | crs_dep_time | arr_time | crs_arr_time | carrier | flight_num | ... | taxi_in | taxi_out | cancelled | cancellation_code | diverted | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2008.0 | 11.0 | 15.0 | 6.0 | 1411.0 | 1420.0 | 1535.0 | 1546.0 | b'OO' | 4391.0 | ... | 5.0 | 11.0 | 0.0 | None | 0.0 | NaN | NaN | NaN | NaN | NaN |
1 | 2008.0 | 11.0 | 28.0 | 5.0 | 1222.0 | 1230.0 | 1345.0 | 1356.0 | b'OO' | 4391.0 | ... | 5.0 | 15.0 | 0.0 | None | 0.0 | NaN | NaN | NaN | NaN | NaN |
2 | 2008.0 | 11.0 | 22.0 | 6.0 | 1414.0 | 1420.0 | 1540.0 | 1546.0 | b'OO' | 4391.0 | ... | 5.0 | 10.0 | 0.0 | None | 0.0 | NaN | NaN | NaN | NaN | NaN |
3 | 2008.0 | 11.0 | 15.0 | 6.0 | 1304.0 | 1305.0 | 1507.0 | 1519.0 | b'OO' | 4392.0 | ... | 10.0 | 9.0 | 0.0 | None | 0.0 | NaN | NaN | NaN | NaN | NaN |
4 | 2008.0 | 11.0 | 22.0 | 6.0 | 1323.0 | 1305.0 | 1536.0 | 1519.0 | b'OO' | 4392.0 | ... | 5.0 | 21.0 | 0.0 | None | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.0 |
5 rows × 29 columns
The plot API¶
The interfaces
dask.dataframe.DataFrame.hvplot
pandas.DataFrame.hvplot
intake.DataSource.plot
and their Series
equivalents offer a powerful high-level API for generating even complex plots. The .hvplot
API can be used either directly or as a namespace to generate specific plot types.
The most explicit method of using the plot API is to specify the names of the columns to be plotted on the x or y axis:
[9]:
crime.hvplot.line(x="Year", y="Violent Crime rate")
[9]:
The diagram type can also be specified with kind
:
[10]:
crime.hvplot(x="Year", y="Violent Crime rate", kind="scatter")
[10]:
You can use the by
variable to group the data in one or more additional columns. As an example, the departure delay ("depdelay"
) is shown below as a function of "distance"
and the data is grouped by "carrier"
:
[11]:
flight_subset = flights[flights.carrier.isin([b"OH", b"F9"])]
flight_subset.hvplot(
x="distance",
y="depdelay",
by="carrier",
kind="scatter",
alpha=0.2,
persist=True,
)
[11]:
In the example above, we have explicitly specified the x and y axes.
Otherwise, the pandas index column would be used for the x-axis and all non-index columns with the default label value
would be used for the y-axis. If you only want to specify the y-axis label explicitly, you can use the value_label
option.
[12]:
crime.hvplot(
x="Year",
y=["Violent Crime rate", "Robbery rate", "Burglary rate"],
value_label="Rate (per 100k people)",
)
[12]:
The hvplot
namespace¶
Instead of the kind
argument, we can also use the hvplot
namespace for the plot call. The supported plot types can be easily determined using tab completion, so
crime.hvplot.TAB
Available diagram types are
area() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked
bar() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked
bivariate() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked
box() draws a box-whisker diagram in which the distribution of one or more variables is compared
heatmap()
draws hex binshexbin() draws the distribution of one or more histograms as a set of containers
histogram()
draws the kernel density estimate of one or more variableskde() draws the kernel density estimate of one or more variables
line()
draws a line chart (for example for a time series)step() draws a step diagram that resembles a line diagram
scatter() draws a scatter diagram in which two variables are compared
table() creates a SlickGrid data table
violin()
draws a violin diagram in which the distribution of one or more variables is compared using kernel density estimation
area()
¶
Like most other chart types, the area
chart supports the three ways of defining a chart described above. An area graph is most useful when plotting multiple variables in a stacked graph. This can be achieved by specifying the x
, y
and by
columns or using columns
and index
/use_index
as options for the x
-axis.
[13]:
crime.hvplot.area(x="Year", y=["Robbery", "Aggravated assault"])
[13]:
We can also explicitly set stacked
to False
and define an alpha
value and to be able to compare the values directly:
[14]:
crime.hvplot.area(
x="Year", y=["Aggravated assault", "Robbery"], stacked=False, alpha=0.4
)
[14]:
Another use for an area chart is to visualise the dispersion of a value. For example, if we are using the flight dataset, we may want to see the spread of mean delay values between airlines. To do this, we calculate the mean delay by day and carrier and then the minimum/maximum mean delay for all carriers. Since the output of hvplot
is just a regular holoviews object, we can use the overlay operator (*
) to place the charts on top of each other.
[15]:
delay_min_max = (
flights.groupby(["day", "carrier"])["carrier_delay"]
.mean()
.groupby("day")
.agg([np.min, np.max])
)
delay_mean = flights.groupby("day")["carrier_delay"].mean()
delay_min_max.hvplot.area(
x="day", y="amin", y2="amax", alpha=0.2
) * delay_mean.hvplot()
[15]:
bar()
¶
In the simplest case, we can use .hvplot.bar
. To rotate the labelling on the x-axis by 90°, we specify rot=90
.
[16]:
crime.hvplot.bar(x="Year", y="Violent Crime rate", rot=90)
[16]:
If we want to compare multiple columns instead, we can specify a list of columns. With the stacked
option, we can then compare the column values more easily:
[17]:
crime.hvplot.bar(
x="Year",
y=["Violent crime total", "Property crime total"],
stacked=True,
rot=90,
width=800,
legend="top_left",
)
[17]:
scatter()
¶
The scatter diagram supports many of the functions of the above diagram types, but can also be coloured using the c
option.
[18]:
crime.hvplot.scatter(x="Violent Crime rate", y="Burglary rate", c="Year")
[18]:
To use colour to display a dimension, the cmap
option can be used to specify the colour map to be used. In addition, the colour bar can be deactivated with colorbar=False
.
step()
¶
A step chart is very similar to a line chart, but instead of interpolating linearly between samples, the step chart visualises discrete steps. The position of the steps can be controlled with the where
keyword and the values "pre"
, "mid"
(default) and "post"
.
[19]:
crime.hvplot.step(x="Year", y=["Robbery", "Aggravated assault"])
[19]:
hexbin()
¶
You can use the hexbin
method to create hexagonal bin charts. They can be a useful alternative to scatter plots when the data is too dense to plot each point individually. Since our flight data is not evenly distributed on a linear scale, we use the logz
option for a logarithmic scale.
[20]:
flights.hvplot.hexbin(
x="airtime", y="arrdelay", width=600, height=500, logz=True
)
[20]:
bivariate()
¶
You can use the bivariate
method to create a 2D density diagram. In addition to hexbin diagrams, bivariate diagrams are another alternative to scatter diagrams if the data is too dense to plot each point individually.
[21]:
crime.hvplot.bivariate(
x="Violent Crime rate", y="Burglary rate", width=600, height=500
)
[21]:
heatmap()
¶
heatmap
can display the relationship between three variables and display C
in addition to the variables x
and y
. In addition, the values for each container are calculated from the samples using the reduce_function
.
[22]:
flights.compute().hvplot.heatmap(
x="day", y="carrier", C="depdelay", reduce_function=np.mean, colorbar=True
)
[22]:
table()
¶
In contrast to all other plot types, you can only specify for a table whether all columns or only a subset with columns
should be displayed.
[23]:
crime.hvplot.table(
columns=["Year", "Population", "Violent Crime rate"], width=400
)
[23]:
hist()
¶
The drawing of distributions differs slightly from other plots, as in the simple case they only represent one variable. Therefore, no index
or x
-value needs to be specified for this plot type, but instead
declare a single
y
variable, for examplesource.plot.hist(variable)
ordeclare a
y
variable and aby
variable, for examplesource.plot.hist(variable, by="Group")
ordeclare columns or plots all columns, for example
source.plot.hist(columns=["A", "B", "C"])
[24]:
crime.hvplot.hist(y="Violent Crime rate")
[24]:
Alternatively, we can also display the distribution of several columns:
[25]:
columns = ["Violent Crime rate", "Property crime rate", "Burglary rate"]
crime.hvplot.hist(y=columns, bins=50, alpha=0.5, legend="top", height=400)
[25]:
We can also group the data by other variables and split the carriers into their own subplots
:
[26]:
flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.hist(
"depdelay",
by="carrier",
bins=20,
bin_range=(-20, 100),
width=300,
subplots=True,
)
[26]:
kde()
, density()
¶
You can also create density plots with hvplot.kde()
or hvplot.density()
:
[27]:
crime.hvplot.kde(y="Violent Crime rate")
[27]:
It is also possible to compare the distribution of several columns:
[28]:
columns = ["Violent Crime rate", "Property crime rate", "Burglary rate"]
crime.hvplot.kde(y=columns, alpha=0.5, value_label="Rate", legend="top_right")
[28]:
hvplot.kde
also supports the by
keyword:
[29]:
flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.kde(
"depdelay", by="carrier", xlim=(-20, 70), width=300, subplots=True
)
[29]:
box()
¶
Just like the other distribution-based diagram types, the box-whisker diagram supports the drawing of a single column:
[30]:
crime.hvplot.box(y="Violent Crime rate")
[30]:
It also supports multiple columns and the same options as mentioned above: legend
, invert
and value_label
:
[31]:
columns = [
"Burglary rate",
"Larceny-theft rate",
"Motor vehicle theft rate",
"Property crime rate",
"Violent Crime rate",
]
crime.hvplot.box(
y=columns,
group_label="Crime",
legend=False,
value_label="Rate (per 100k)",
invert=True,
)
[31]:
The use of the by
keyword to split the data into several subsets is also supported:
[32]:
flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.box("depdelay", by="carrier", ylim=(-10, 70))
[32]:
Composite diagrams¶
One of the main strengths of HoloViews is the simple creation of various diagrams. Individual diagrams can be superimposed or combined using the *
and +
operators.
See also:
[33]:
crime.hvplot(x="Year", y="Violent Crime rate") * crime.hvplot.scatter(
x="Year", y="Violent Crime rate", c="k"
)
[33]:
We can also create various diagrams and tables together:
[34]:
(
crime.hvplot.bar(x="Year", y="Violent Crime rate", rot=90, width=550)
+ crime.hvplot.table(
["Year", "Population", "Violent Crime rate"], width=420
)
)
[34]:
Big Data¶
In the previous examples, we summarised the relatively large airline dataset by creating subsets for display. However, we can also aggregate the data using Datashader instead, rendering the entire available raw dataset (if the resolution of the screen allows it).
[35]:
flights.hvplot.scatter(x="distance", y="airtime", datashade=True)
[35]:
groupby
¶
Thanks to HoloViews’ ability to explore a parameter space with a series of widgets, we can apply a group along a specific column or dimension, for example, display the distribution of departure delays grouped by carrier and day, allowing users to choose which day to display:
[36]:
flights.hvplot.violin(
y="depdelay", by="carrier", groupby="dayofweek", ylim=(-20, 60), height=500
)
[36]: