hvPlot examples¶

How to install¶

To be able to run the examples, Snappy](https://github.com/google/snappy) must also be installed.

With Spack you can provide Snappy in your kernel, for example with:

$ spack env activate python-311
$ spack install snappy

Alternatively, you can install Snappy with other package managers, for example

for Debian/Ubuntu:
```
$ sudo apt install libsnappy-dev
```
for Windows:

Snappy requires Microsoft Visual C++ ≥ 14.0, which can be installed with the Microsoft C++ Build Tools.
for Mac OS:
```
$ brew install snappy
```

Afterwards, additional packages should be installed for your kernel, for example with:

$ pipenv install intake intake-parquet s3fs python-snappy pyviz-comms
…

Introduction¶

First we import NumPy and pandas to create a small set of random data:

[1]:

import numpy as np
import pandas as pd


index = pd.date_range("1/1/2000", periods=1000)
df = pd.DataFrame(
    np.random.randn(1000, 4), index=index, columns=list("ABCD")
).cumsum()

df.head()

[1]:

	A	B	C	D
2000-01-01	1.431985	1.378913	0.539567	0.257977
2000-01-02	1.266573	0.834050	1.176750	1.458363
2000-01-03	1.799283	0.945437	1.792629	-0.220764
2000-01-04	2.131248	0.160377	1.186063	-0.702810
2000-01-05	3.574972	2.274926	1.759324	-1.297244

pandas`.plot ()` API¶

pandas offers Matplotlib-based plotting with the .plot() method by default:

[2]:

%matplotlib inline

df.plot();

../../../../_images/bokeh_integration_holoviews_hvplot_examples_5_0.png

The result is a PNG image that can be easily displayed, but is otherwise static.

Note: In pandas > 0.25.0 the backend can be exchanged, for example with pd.options.backend.plotting == "holoviews",. You can find more information on this at pandas-API.

`.hvplot()`¶

If we switch to import hvplot.pandas and the df.hvplot method instead of %matplotlib inline, an interactively explorable bokeh diagram is now generated with panning and zoom in/out as well as clickable legends:

[3]:

import hvplot.pandas

df.hvplot()

[3]:

Such an interactive diagram makes it much easier to explore the data without having to write additional code.

Native `hvPlot` API¶

For the above diagram, hvPlot has dynamically added the pandas .hvplot() method so that you can use the same syntax as for pandas plots. If you prefer a more explicit approach, you can work directly with the hvPlot objects instead:

[4]:

import holoviews as hv

from hvplot import hvPlot

hv.extension("bokeh")

plot = hvPlot(df)
plot(y=["A", "B", "C", "D"])

[4]:

Help¶

If you are working in IPython or Jupyter notebooks, the hvplot methods automatically complete valid keywords. For example, if you press the Tab key after declaring the plot type, all valid keywords and the document string will be displayed:

df.hvplot.line(TAB

Outside of an interactive environment, hvplot.help displays all information for a plot type, for example:

[ ]:

hvplot.help("line")

See also:

Further information on the available options can be found in Customization.

Plotting¶

In the following examples, dask hvPlot API is used in addition to the pandas API:

[6]:

import hvplot.dask

The hvplot.sample_data module creates these data sets as Intake data catalogues, which we can load with pandas:

[7]:

from hvplot.sample_data import airline_flights, us_crime


crime = us_crime.read()
print(type(crime))
crime.head()

<class 'pandas.core.frame.DataFrame'>

[7]:

	Year	Population	Violent crime total	Murder and nonnegligent Manslaughter	Legacy rape /1	Revised rape /2	Robbery	Aggravated assault	Property crime total	Burglary	...	Violent Crime rate	Murder and nonnegligent manslaughter rate	Legacy rape rate /1	Revised rape rate /2	Robbery rate	Aggravated assault rate	Property crime rate	Burglary rate	Larceny-theft rate	Motor vehicle theft rate
0	1960	179323175	288460	9110	17190	NaN	107840	154320	3095700	912100	...	160.9	5.1	9.6	NaN	60.1	86.1	1726.3	508.6	1034.7	183.0
1	1961	182992000	289390	8740	17220	NaN	106670	156760	3198600	949600	...	158.1	4.8	9.4	NaN	58.3	85.7	1747.9	518.9	1045.4	183.6
2	1962	185771000	301510	8530	17550	NaN	110860	164570	3450700	994300	...	162.3	4.6	9.4	NaN	59.7	88.6	1857.5	535.2	1124.8	197.4
3	1963	188483000	316970	8640	17650	NaN	116470	174210	3792500	1086400	...	168.2	4.6	9.4	NaN	61.8	92.4	2012.1	576.4	1219.1	216.6
4	1964	191141000	364220	9360	21420	NaN	130390	203050	4200400	1213200	...	190.6	4.9	11.2	NaN	68.2	106.2	2197.5	634.7	1315.5	247.4

5 rows × 22 columns

Alternatively, we can use dask.DataFrame:

[8]:

flights = airline_flights.to_dask().persist()
print(type(flights))
flights.head()

<class 'dask.dataframe.core.DataFrame'>

[8]:

	year	month	day	dayofweek	dep_time	crs_dep_time	arr_time	crs_arr_time	carrier	flight_num	...	taxi_in	taxi_out	cancellation_code	carrier_delay	weather_delay	nas_delay	security_delay	late_aircraft_delay
0	2008.0	11.0	15.0	6.0	1411.0	1420.0	1535.0	1546.0	b'OO'	4391.0	...	5.0	11.0	None	NaN	NaN	NaN	NaN	NaN
1	2008.0	11.0	28.0	5.0	1222.0	1230.0	1345.0	1356.0	b'OO'	4391.0	...	5.0	15.0	None	NaN	NaN	NaN	NaN	NaN
2	2008.0	11.0	22.0	6.0	1414.0	1420.0	1540.0	1546.0	b'OO'	4391.0	...	5.0	10.0	None	NaN	NaN	NaN	NaN	NaN
3	2008.0	11.0	15.0	6.0	1304.0	1305.0	1507.0	1519.0	b'OO'	4392.0	...	10.0	9.0	None	NaN	NaN	NaN	NaN	NaN
4	2008.0	11.0	22.0	6.0	1323.0	1305.0	1536.0	1519.0	b'OO'	4392.0	...	5.0	21.0	None	0.0	0.0	0.0	0.0	17.0

5 rows × 29 columns

The plot API¶

The interfaces

dask.dataframe.DataFrame.hvplot
pandas.DataFrame.hvplot
intake.DataSource.plot

and their Series equivalents offer a powerful high-level API for generating even complex plots. The .hvplot API can be used either directly or as a namespace to generate specific plot types.

The most explicit method of using the plot API is to specify the names of the columns to be plotted on the x or y axis:

[9]:

crime.hvplot.line(x="Year", y="Violent Crime rate")

[9]:

The diagram type can also be specified with kind:

[10]:

crime.hvplot(x="Year", y="Violent Crime rate", kind="scatter")

[10]:

You can use the by variable to group the data in one or more additional columns. As an example, the departure delay ("depdelay") is shown below as a function of "distance" and the data is grouped by "carrier":

[11]:

flight_subset = flights[flights.carrier.isin([b"OH", b"F9"])]
flight_subset.hvplot(
    x="distance",
    y="depdelay",
    by="carrier",
    kind="scatter",
    alpha=0.2,
    persist=True,
)

[11]:

In the example above, we have explicitly specified the x and y axes.

Otherwise, the pandas index column would be used for the x-axis and all non-index columns with the default label value would be used for the y-axis. If you only want to specify the y-axis label explicitly, you can use the value_label option.

[12]:

crime.hvplot(
    x="Year",
    y=["Violent Crime rate", "Robbery rate", "Burglary rate"],
    value_label="Rate (per 100k people)",
)

[12]:

The `hvplot` namespace¶

Instead of the kind argument, we can also use the hvplot namespace for the plot call. The supported plot types can be easily determined using tab completion, so

crime.hvplot.TAB

Available diagram types are

area() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked
bar() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked
bivariate() draws an area chart similar to a line chart, except that the area under the curve is filled and optionally stacked
box() draws a box-whisker diagram in which the distribution of one or more variables is compared
heatmap() draws hex bins
hexbin() draws the distribution of one or more histograms as a set of containers
histogram() draws the kernel density estimate of one or more variables
kde() draws the kernel density estimate of one or more variables
line() draws a line chart (for example for a time series)
step() draws a step diagram that resembles a line diagram
scatter() draws a scatter diagram in which two variables are compared
table() creates a SlickGrid data table
violin() draws a violin diagram in which the distribution of one or more variables is compared using kernel density estimation

`area()`¶

Like most other chart types, the area chart supports the three ways of defining a chart described above. An area graph is most useful when plotting multiple variables in a stacked graph. This can be achieved by specifying the x, y and by columns or using columns and index/use_index as options for the x-axis.

[13]:

crime.hvplot.area(x="Year", y=["Robbery", "Aggravated assault"])

[13]:

We can also explicitly set stacked to False and define an alpha value and to be able to compare the values directly:

[14]:

crime.hvplot.area(
    x="Year", y=["Aggravated assault", "Robbery"], stacked=False, alpha=0.4
)

[14]:

Another use for an area chart is to visualise the dispersion of a value. For example, if we are using the flight dataset, we may want to see the spread of mean delay values between airlines. To do this, we calculate the mean delay by day and carrier and then the minimum/maximum mean delay for all carriers. Since the output of hvplot is just a regular holoviews object, we can use the overlay operator (*) to place the charts on top of each other.

[15]:

delay_min_max = (
    flights.groupby(["day", "carrier"])["carrier_delay"]
    .mean()
    .groupby("day")
    .agg([np.min, np.max])
)
delay_mean = flights.groupby("day")["carrier_delay"].mean()

delay_min_max.hvplot.area(
    x="day", y="amin", y2="amax", alpha=0.2
) * delay_mean.hvplot()

[15]:

`bar()`¶

In the simplest case, we can use .hvplot.bar. To rotate the labelling on the x-axis by 90°, we specify rot=90.

[16]:

crime.hvplot.bar(x="Year", y="Violent Crime rate", rot=90)

[16]:

If we want to compare multiple columns instead, we can specify a list of columns. With the stacked option, we can then compare the column values more easily:

[17]:

crime.hvplot.bar(
    x="Year",
    y=["Violent crime total", "Property crime total"],
    stacked=True,
    rot=90,
    width=800,
    legend="top_left",
)

[17]:

`scatter()`¶

The scatter diagram supports many of the functions of the above diagram types, but can also be coloured using the c option.

[18]:

crime.hvplot.scatter(x="Violent Crime rate", y="Burglary rate", c="Year")

[18]:

To use colour to display a dimension, the cmap option can be used to specify the colour map to be used. In addition, the colour bar can be deactivated with colorbar=False.

`step()`¶

A step chart is very similar to a line chart, but instead of interpolating linearly between samples, the step chart visualises discrete steps. The position of the steps can be controlled with the where keyword and the values "pre", "mid" (default) and "post".

[19]:

crime.hvplot.step(x="Year", y=["Robbery", "Aggravated assault"])

[19]:

`hexbin()`¶

You can use the hexbin method to create hexagonal bin charts. They can be a useful alternative to scatter plots when the data is too dense to plot each point individually. Since our flight data is not evenly distributed on a linear scale, we use the logz option for a logarithmic scale.

[20]:

flights.hvplot.hexbin(
    x="airtime", y="arrdelay", width=600, height=500, logz=True
)

[20]:

`bivariate()`¶

You can use the bivariate method to create a 2D density diagram. In addition to hexbin diagrams, bivariate diagrams are another alternative to scatter diagrams if the data is too dense to plot each point individually.

[21]:

crime.hvplot.bivariate(
    x="Violent Crime rate", y="Burglary rate", width=600, height=500
)

[21]:

`heatmap()`¶

heatmap can display the relationship between three variables and display C in addition to the variables x and y. In addition, the values for each container are calculated from the samples using the reduce_function.

[22]:

flights.compute().hvplot.heatmap(
    x="day", y="carrier", C="depdelay", reduce_function=np.mean, colorbar=True
)

[22]:

`table()`¶

In contrast to all other plot types, you can only specify for a table whether all columns or only a subset with columns should be displayed.

[23]:

crime.hvplot.table(
    columns=["Year", "Population", "Violent Crime rate"], width=400
)

[23]:

`hist()`¶

The drawing of distributions differs slightly from other plots, as in the simple case they only represent one variable. Therefore, no index or x-value needs to be specified for this plot type, but instead

declare a single y variable, for example source.plot.hist(variable) or
declare a y variable and a by variable, for example source.plot.hist(variable, by="Group") or
declare columns or plots all columns, for example source.plot.hist(columns=["A", "B", "C"])

[24]:

crime.hvplot.hist(y="Violent Crime rate")

[24]:

Alternatively, we can also display the distribution of several columns:

[25]:

columns = ["Violent Crime rate", "Property crime rate", "Burglary rate"]
crime.hvplot.hist(y=columns, bins=50, alpha=0.5, legend="top", height=400)

[25]:

We can also group the data by other variables and split the carriers into their own subplots:

[26]:

flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.hist(
    "depdelay",
    by="carrier",
    bins=20,
    bin_range=(-20, 100),
    width=300,
    subplots=True,
)

[26]:

`kde()`, `density()`¶

You can also create density plots with hvplot.kde() or hvplot.density():

[27]:

crime.hvplot.kde(y="Violent Crime rate")

[27]:

It is also possible to compare the distribution of several columns:

[28]:

columns = ["Violent Crime rate", "Property crime rate", "Burglary rate"]
crime.hvplot.kde(y=columns, alpha=0.5, value_label="Rate", legend="top_right")

[28]:

hvplot.kde also supports the by keyword:

[29]:

flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.kde(
    "depdelay", by="carrier", xlim=(-20, 70), width=300, subplots=True
)

[29]:

`box()`¶

Just like the other distribution-based diagram types, the box-whisker diagram supports the drawing of a single column:

[30]:

crime.hvplot.box(y="Violent Crime rate")

[30]:

It also supports multiple columns and the same options as mentioned above: legend, invert and value_label:

[31]:

columns = [
    "Burglary rate",
    "Larceny-theft rate",
    "Motor vehicle theft rate",
    "Property crime rate",
    "Violent Crime rate",
]
crime.hvplot.box(
    y=columns,
    group_label="Crime",
    legend=False,
    value_label="Rate (per 100k)",
    invert=True,
)

[31]:

The use of the by keyword to split the data into several subsets is also supported:

[32]:

flight_subset = flights[flights.carrier.isin([b"AA", b"US", b"OH"])]
flight_subset.hvplot.box("depdelay", by="carrier", ylim=(-10, 70))

[32]:

Composite diagrams¶

One of the main strengths of HoloViews is the simple creation of various diagrams. Individual diagrams can be superimposed or combined using the * and + operators.

See also:

Composing Elements

[33]:

crime.hvplot(x="Year", y="Violent Crime rate") * crime.hvplot.scatter(
    x="Year", y="Violent Crime rate", c="k"
)

[33]:

We can also create various diagrams and tables together:

[34]:

(
    crime.hvplot.bar(x="Year", y="Violent Crime rate", rot=90, width=550)
    + crime.hvplot.table(
        ["Year", "Population", "Violent Crime rate"], width=420
    )
)

[34]:

Big Data¶

In the previous examples, we summarised the relatively large airline dataset by creating subsets for display. However, we can also aggregate the data using Datashader instead, rendering the entire available raw dataset (if the resolution of the screen allows it).

[35]:

flights.hvplot.scatter(x="distance", y="airtime", datashade=True)

[35]:

`groupby`¶

Thanks to HoloViews’ ability to explore a parameter space with a series of widgets, we can apply a group along a specific column or dimension, for example, display the distribution of departure delays grouped by carrier and day, allowing users to choose which day to display:

[36]:

flights.hvplot.violin(
    y="depdelay", by="carrier", groupby="dayofweek", ylim=(-20, 60), height=500
)

[36]:

hvPlot examples¶

How to install¶

Introduction¶

pandas.plot () API¶

.hvplot()¶

Native hvPlot API¶

Help¶

Plotting¶

The plot API¶

The hvplot namespace¶

area()¶

bar()¶

scatter()¶

step()¶

hexbin()¶

bivariate()¶

heatmap()¶

table()¶

hist()¶

kde(), density()¶

box()¶

Composite diagrams¶

Big Data¶

groupby¶

pandas`.plot ()` API¶

`.hvplot()`¶

Native `hvPlot` API¶

The `hvplot` namespace¶

`area()`¶

`bar()`¶

`scatter()`¶

`step()`¶

`hexbin()`¶

`bivariate()`¶

`heatmap()`¶

`table()`¶

`hist()`¶

`kde()`, `density()`¶

`box()`¶

`groupby`¶