hvPlot examples

How to install

To be able to run the examples, Snappy](https://github.com/google/snappy) must also be installed.

With Spack you can provide Snappy in your kernel, for example with:

$ spack env activate python-311
$ spack install snappy

Alternatively, you can install Snappy with other package managers, for example

  • for Debian/Ubuntu:

    $ sudo apt install libsnappy-dev
    
  • for Windows:

    Snappy requires Microsoft Visual C++ ≥ 14.0, which can be installed with the Microsoft C++ Build Tools.

  • for Mac OS:

    $ brew install snappy
    

Afterwards, additional packages should be installed for your kernel, for example with:

$ pipenv install intake intake-parquet s3fs python-snappy pyviz-comms
…

Introduction

First we import NumPy and pandas to create a small set of random data:

[1]:
import numpy as np
import pandas as pd


index = pd.date_range("1/1/2000", periods=1000)
df = pd.DataFrame(
    np.random.randn(1000, 4), index=index, columns=list("ABCD")
).cumsum()

df.head()
[1]:
A B C D
2000-01-01 1.431985 1.378913 0.539567 0.257977
2000-01-02 1.266573 0.834050 1.176750 1.458363
2000-01-03 1.799283 0.945437 1.792629 -0.220764
2000-01-04 2.131248 0.160377 1.186063 -0.702810
2000-01-05 3.574972 2.274926 1.759324 -1.297244

pandas.plot () API

pandas offers Matplotlib-based plotting with the .plot() method by default:

[2]:
%matplotlib inline

df.plot();
../../../../_images/bokeh_integration_holoviews_hvplot_examples_5_0.png

The result is a PNG image that can be easily displayed, but is otherwise static.

Note: In pandas > 0.25.0 the backend can be exchanged, for example with pd.options.backend.plotting == "holoviews",. You can find more information on this at pandas-API.

.hvplot()

If we switch to import hvplot.pandas and the df.hvplot method instead of %matplotlib inline, an interactively explorable bokeh diagram is now generated with panning and zoom in/out as well as clickable legends:

[3]:
import hvplot.pandas

df.hvplot()
[3]:

Such an interactive diagram makes it much easier to explore the data without having to write additional code.

Native hvPlot API

For the above diagram, hvPlot has dynamically added the pandas .hvplot() method so that you can use the same syntax as for pandas plots. If you prefer a more explicit approach, you can work directly with the hvPlot objects instead:

[4]:
import holoviews as hv

from hvplot import hvPlot


hv.extension("bokeh")

plot = hvPlot(df)
plot(y=["A", "B", "C", "D"])
[4]:

Help

If you are working in IPython or Jupyter notebooks, the hvplot methods automatically complete valid keywords. For example, if you press the Tab key after declaring the plot type, all valid keywords and the document string will be displayed:

df.hvplot.line(TAB

Outside of an interactive environment, hvplot.help displays all information for a plot type, for example:

[ ]:
hvplot.help("line")

See also:

Further information on the available options can be found in Customization.

Plotting

In the following examples, dask hvPlot API is used in addition to the pandas API:

[6]:
import hvplot.dask

The hvplot.sample_data module creates these data sets as Intake data catalogues, which we can load with pandas:

[7]:
from hvplot.sample_data import airline_flights, us_crime


crime = us_crime.read()
print(type(crime))
crime.head()
<class 'pandas.core.frame.DataFrame'>
[7]:
Year Population Violent crime total Murder and nonnegligent Manslaughter Legacy rape /1 Revised rape /2 Robbery Aggravated assault Property crime total Burglary ... Violent Crime rate Murder and nonnegligent manslaughter rate Legacy rape rate /1 Revised rape rate /2 Robbery rate Aggravated assault rate Property crime rate Burglary rate Larceny-theft rate Motor vehicle theft rate
0 1960 179323175 288460 9110 17190 NaN 107840 154320 3095700 912100 ... 160.9 5.1 9.6 NaN 60.1 86.1 1726.3 508.6 1034.7 183.0
1 1961 182992000 289390 8740 17220 NaN 106670 156760 3198600 949600 ... 158.1 4.8 9.4 NaN 58.3 85.7 1747.9 518.9 1045.4 183.6
2 1962 185771000 301510 8530 17550 NaN 110860 164570 3450700 994300 ... 162.3 4.6 9.4 NaN 59.7 88.6 1857.5 535.2 1124.8 197.4
3 1963 188483000 316970 8640 17650 NaN 116470 174210 3792500 1086400 ... 168.2 4.6 9.4 NaN 61.8 92.4 2012.1 576.4 1219.1 216.6
4 1964 191141000 364220 9360 21420 NaN 130390 203050 4200400 1213200 ... 190.6 4.9 11.2 NaN 68.2 106.2 2197.5 634.7 1315.5 247.4

5 rows × 22 columns

Alternatively, we can use dask.DataFrame:

[8]:
flights = airline_flights.to_dask().persist()
print(type(flights))
flights.head()
<class 'dask.dataframe.core.DataFrame'>
[8]:
year month day dayofweek dep_time crs_dep_time arr_time crs_arr_time carrier flight_num ... taxi_in taxi_out cancelled cancellation_code diverted carrier_delay weather_delay nas_delay security_delay late_aircraft_delay
0 2008.0 11.0 15.0 6.0 1411.0 1420.0 1535.0 1546.0 b'OO' 4391.0 ... 5.0 11.0 0.0 None 0.0 NaN NaN NaN NaN NaN
1 2008.0 11.0 28.0 5.0 1222.0 1230.0 1345.0 1356.0 b'OO' 4391.0 ... 5.0 15.0 0.0 None 0.0 NaN NaN NaN NaN NaN
2 2008.0 11.0 22.0 6.0 1414.0 1420.0 1540.0 1546.0 b'OO' 4391.0 ... 5.0 10.0 0.0 None 0.0 NaN NaN NaN NaN NaN
3 2008.0 11.0 15.0 6.0 1304.0 1305.0 1507.0 1519.0 b'OO' 4392.0 ... 10.0 9.0 0.0 None 0.0 NaN NaN NaN NaN NaN
4 2008.0 11.0 22.0 6.0 1323.0 1305.0 1536.0 1519.0 b'OO' 4392.0 ... 5.0 21.0 0.0 None 0.0 0.0 0.0 0.0 0.0 17.0

5 rows × 29 columns

The plot API

The interfaces

  • dask.dataframe.DataFrame.hvplot

  • pandas.DataFrame.hvplot

  • intake.DataSource.plot

and their Series equivalents offer a powerful high-level API for generating even complex plots. The .hvplot API can be used either directly or as a namespace to generate specific plot types.

The most explicit method of using the plot API is to specify the names of the columns to be plotted on the x or y axis:

[9]:
crime.hvplot.line(x="Year", y="Violent Crime rate")
[9]:

The diagram type can also be specified with kind:

[10]:
crime.hvplot(x="Year", y="Violent Crime rate", kind="scatter")
[10]:

You can use the by variable to group the data in one or more additional columns. As an example, the departure delay ("depdelay") is shown below as a function of "distance" and the data is grouped by "carrier":

[11]:
flight_subset = flights[flights.carrier.isin([b"OH", b"F9"])]
flight_subset.hvplot(
    x="distance",
    y="depdelay",
    by="carrier",
    kind="scatter",
    alpha=0.2,
    persist=True,
)
[11]:

In the example above, we have explicitly specified the x and y axes.

Otherwise, the pandas index column would be used for the x-axis and all non-index columns with the default label value would be used for the y-axis. If you only want to specify the y-axis label explicitly, you can use the value_label option.

[12]:
crime.hvplot(
    x="Year",
    y=["Violent Crime rate", "Robbery rate", "Burglary rate"],
    value_label="Rate (per 100k people)",
)
[12]:

The hvplot namespace

Instead of the kind argument, we can also use the hvplot namespace for the plot call. The supported plot types can be easily determined using tab completion, so

crime.hvplot.TAB

Available diagram types are

  • area() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked

  • bar() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked

  • bivariate() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked

  • box() draws a box-whisker diagram in which the distribution of one or more variables is compared

  • heatmap() draws hex bins

  • hexbin() draws the distribution of one or more histograms as a set of containers

  • histogram() draws the kernel density estimate of one or more variables

  • kde() draws the kernel density estimate of one or more variables

  • line() draws a line chart (for example for a time series)

  • step() draws a step diagram that resembles a line diagram

  • scatter() draws a scatter diagram in which two variables are compared

  • table() creates a SlickGrid data table

  • violin() draws a violin diagram in which the distribution of one or more variables is compared using kernel density estimation

area()

Like most other chart types, the area chart supports the three ways of defining a chart described above. An area graph is most useful when plotting multiple variables in a stacked graph. This can be achieved by specifying the x, y and by columns or using columns and index/use_index as options for the x-axis.

[13]:
crime.hvplot.area(x="Year", y=["Robbery", "Aggravated assault"])
[13]:

We can also explicitly set stacked to False and define an alpha value and to be able to compare the values directly:

[14]:
crime.hvplot.area(
    x="Year", y=["Aggravated assault", "Robbery"], stacked=False, alpha=0.4
)
[14]:

Another use for an area chart is to visualise the dispersion of a value. For example, if we are using the flight dataset, we may want to see the spread of mean delay values between airlines. To do this, we calculate the mean delay by day and carrier and then the minimum/maximum mean delay for all carriers. Since the output of hvplot is just a regular holoviews object, we can use the overlay operator (*) to place the charts on top of each other.

[15]:
delay_min_max = (
    flights.groupby(["day", "carrier"])["carrier_delay"]
    .mean()
    .groupby("day")
    .agg([np.min, np.max])
)
delay_mean = flights.groupby("day")["carrier_delay"].mean()

delay_min_max.hvplot.area(
    x="day", y="amin", y2="amax", alpha=0.2
) * delay_mean.hvplot()
[15]:

bar()

In the simplest case, we can use .hvplot.bar. To rotate the labelling on the x-axis by 90°, we specify rot=90.

[16]:
crime.hvplot.bar(x="Year", y="Violent Crime rate", rot=90)
[16]:

If we want to compare multiple columns instead, we can specify a list of columns. With the stacked option, we can then compare the column values more easily:

[17]:
crime.hvplot.bar(
    x="Year",
    y=["Violent crime total", "Property crime total"],
    stacked=True,
    rot=90,
    width=800,
    legend="top_left",
)
[17]:

scatter()

The scatter diagram supports many of the functions of the above diagram types, but can also be coloured using the c option.

[18]:
crime.hvplot.scatter(x="Violent Crime rate", y="Burglary rate", c="Year")
[18]:

To use colour to display a dimension, the cmap option can be used to specify the colour map to be used. In addition, the colour bar can be deactivated with colorbar=False.

step()

A step chart is very similar to a line chart, but instead of interpolating linearly between samples, the step chart visualises discrete steps. The position of the steps can be controlled with the where keyword and the values "pre", "mid" (default) and "post".

[19]:
crime.hvplot.step(x="Year", y=["Robbery", "Aggravated assault"])
[19]:

hexbin()

You can use the hexbin method to create hexagonal bin charts. They can be a useful alternative to scatter plots when the data is too dense to plot each point individually. Since our flight data is not evenly distributed on a linear scale, we use the logz option for a logarithmic scale.

[20]:
flights.hvplot.hexbin(
    x="airtime", y="arrdelay", width=600, height=500, logz=True
)
[20]:

bivariate()

You can use the bivariate method to create a 2D density diagram. In addition to hexbin diagrams, bivariate diagrams are another alternative to scatter diagrams if the data is too dense to plot each point individually.

[21]:
crime.hvplot.bivariate(
    x="Violent Crime rate", y="Burglary rate", width=600, height=500
)
[21]:

heatmap()

heatmap can display the relationship between three variables and display C in addition to the variables x and y. In addition, the values for each container are calculated from the samples using the reduce_function.

[22]:
flights.compute().hvplot.heatmap(
    x="day", y="carrier", C="depdelay", reduce_function=np.mean, colorbar=True
)
[22]:

table()

In contrast to all other plot types, you can only specify for a table whether all columns or only a subset with columns should be displayed.

[23]:
crime.hvplot.table(
    columns=["Year", "Population", "Violent Crime rate"], width=400
)
[23]:

hist()

The drawing of distributions differs slightly from other plots, as in the simple case they only represent one variable. Therefore, no index or x-value needs to be specified for this plot type, but instead

  • declare a single y variable, for example source.plot.hist(variable) or

  • declare a y variable and a by variable, for example source.plot.hist(variable, by="Group") or

  • declare columns or plots all columns, for example source.plot.hist(columns=["A", "B", "C"])

[24]:
crime.hvplot.hist(y="Violent Crime rate")
[24]:

Alternatively, we can also display the distribution of several columns:

[25]:
columns = ["Violent Crime rate", "Property crime rate", "Burglary rate"]
crime.hvplot.hist(y=columns, bins=50, alpha=0.5, legend="top", height=400)
[25]:

We can also group the data by other variables and split the carriers into their own subplots:

[26]:
flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.hist(
    "depdelay",
    by="carrier",
    bins=20,
    bin_range=(-20, 100),
    width=300,
    subplots=True,
)
[26]:

kde(), density()

You can also create density plots with hvplot.kde() or hvplot.density():

[27]:
crime.hvplot.kde(y="Violent Crime rate")
[27]:

It is also possible to compare the distribution of several columns:

[28]:
columns = ["Violent Crime rate", "Property crime rate", "Burglary rate"]
crime.hvplot.kde(y=columns, alpha=0.5, value_label="Rate", legend="top_right")
[28]:

hvplot.kde also supports the by keyword:

[29]:
flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.kde(
    "depdelay", by="carrier", xlim=(-20, 70), width=300, subplots=True
)
[29]:

box()

Just like the other distribution-based diagram types, the box-whisker diagram supports the drawing of a single column:

[30]:
crime.hvplot.box(y="Violent Crime rate")
[30]:

It also supports multiple columns and the same options as mentioned above: legend, invert and value_label:

[31]:
columns = [
    "Burglary rate",
    "Larceny-theft rate",
    "Motor vehicle theft rate",
    "Property crime rate",
    "Violent Crime rate",
]
crime.hvplot.box(
    y=columns,
    group_label="Crime",
    legend=False,
    value_label="Rate (per 100k)",
    invert=True,
)
[31]:

The use of the by keyword to split the data into several subsets is also supported:

[32]:
flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.box("depdelay", by="carrier", ylim=(-10, 70))
[32]:

Composite diagrams

One of the main strengths of HoloViews is the simple creation of various diagrams. Individual diagrams can be superimposed or combined using the * and + operators.

See also:

[33]:
crime.hvplot(x="Year", y="Violent Crime rate") * crime.hvplot.scatter(
    x="Year", y="Violent Crime rate", c="k"
)
[33]:

We can also create various diagrams and tables together:

[34]:
(
    crime.hvplot.bar(x="Year", y="Violent Crime rate", rot=90, width=550)
    + crime.hvplot.table(
        ["Year", "Population", "Violent Crime rate"], width=420
    )
)
[34]:

Big Data

In the previous examples, we summarised the relatively large airline dataset by creating subsets for display. However, we can also aggregate the data using Datashader instead, rendering the entire available raw dataset (if the resolution of the screen allows it).

[35]:
flights.hvplot.scatter(x="distance", y="airtime", datashade=True)
[35]:

groupby

Thanks to HoloViews’ ability to explore a parameter space with a series of widgets, we can apply a group along a specific column or dimension, for example, display the distribution of departure delays grouped by carrier and day, allowing users to choose which day to display:

[36]:
flights.hvplot.violin(
    y="depdelay", by="carrier", groupby="dayofweek", ylim=(-20, 60), height=500
)
[36]: