Data sources and transformations¶
Overview¶
Bokeh can work with Python lists, NumPy arrays, pandas series etc. These inputs are converted into a Bokeh ColumnDataSource
. Although Bokeh often does this transparently, it can occasionally be useful to create them explicitly.
[1]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
output_notebook()
Python dicts¶
The ColumnDataSource
can be imported from bokeh.models
:
[2]:
from bokeh.models import ColumnDataSource
ColumnDataSource
is an assignment of column names to value sequences. All columns must always have the same length:
[3]:
source = ColumnDataSource(
data={
"x": [1, 2, 3, 4, 5],
"y": [3, 7, 8, 5, 1],
}
)
So far, we have called functions like p.circle
by passing lists or data arrays directly. Bokeh then automatically creates a ColumnDataSource
for us. However, it is also possible to specify a ColumnDataSource
explicitly by passing a glyph method as the source argument:
[4]:
p = figure(width=400, height=400)
p.circle("x", "y", size=20, source=source)
show(p)
pandas.DataFrame
¶
It is also easy to create ColumnDataSource
objects directly from pandas-DataFrames:
[5]:
from bokeh.sampledata.iris import flowers as df
source = ColumnDataSource(df)
p = figure(width=400, height=400)
p.circle("petal_length", "petal_width", source=source)
show(p)
Transformations¶
If data sources do not need to be shared, Dicts, pandas.DataFrame
or GroupBy
objects can be passed directly to the Glyph method without explicitly creating a ColumnDataSource
. In this case, the conversion takes place automatically.
Glyph properties can be configured not only with names of columns from data sources, but also with transformation objects from bokeh.transform
. It is important to note that when using these objects, the transformations take place in the browser and not in Python.
cumsum
¶
In the following, we will first look at a cumsum
transformation that can generate a new sequence of values from a column by adding the values cumulatively. This can be useful for pie charts or doughnut charts:
[6]:
from math import pi
import pandas as pd
from bokeh.palettes import Category20c
from bokeh.transform import cumsum
x = {
"United States": 157,
"United Kingdom": 93,
"Japan": 89,
"China": 63,
"Germany": 44,
"India": 42,
"Italy": 40,
"Australia": 35,
"Brazil": 32,
"France": 31,
"Taiwan": 31,
"Spain": 29,
}
data = (
pd.Series(x).reset_index(name="value").rename(columns={"index": "country"})
)
data["color"] = Category20c[len(x)]
# represent each value as an angle = value / total * 2pi
data["angle"] = data["value"] / data["value"].sum() * 2 * pi
p = figure(
height=350,
title="Pie Chart",
toolbar_location=None,
tools="hover",
tooltips="@country: @value",
)
p.wedge(
x=0,
y=1,
radius=0.4,
# use cumsum to cumulatively sum the values for start and end angles
start_angle=cumsum("angle", include_zero=True),
end_angle=cumsum("angle"),
line_color="white",
fill_color="color",
legend_label="country",
source=data,
)
p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None
show(p)
linear_cmap
¶
With the linear_cmap
transformation, a linear colour assignment to the column of a data source can generate a new colour sequence:
[7]:
import numpy as np
from bokeh.transform import linear_cmap
N = 4000
data = dict(
x=np.random.random(size=N) * 100,
y=np.random.random(size=N) * 100,
r=np.random.random(size=N) * 1.5,
)
p = figure()
p.circle(
"x",
"y",
radius="r",
source=data,
fill_alpha=0.6,
# color map based on the x-coordinate
color=linear_cmap("x", "Viridis256", 0, 100),
)
show(p)