Data sources and transformations¶
Overview¶
Bokeh can work with Python lists, NumPy arrays, pandas series etc. These inputs are converted into a Bokeh ColumnDataSource. Although Bokeh often does this transparently, it can occasionally be useful to create them explicitly.
[1]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
output_notebook()
Python dicts¶
The ColumnDataSource can be imported from bokeh.models:
[2]:
from bokeh.models import ColumnDataSource
ColumnDataSource is an assignment of column names to value sequences. All columns must always have the same length:
[3]:
source = ColumnDataSource(
data={
"x": [1, 2, 3, 4, 5],
"y": [3, 7, 8, 5, 1],
}
)
So far, we have called functions like p.circle by passing lists or data arrays directly. Bokeh then automatically creates a ColumnDataSource for us. However, it is also possible to specify a ColumnDataSource explicitly by passing a glyph method as the source argument:
[4]:
p = figure(width=400, height=400)
p.circle("x", "y", size=20, source=source)
show(p)
pandas.DataFrame¶
It is also easy to create ColumnDataSource objects directly from pandas-DataFrames:
[5]:
from bokeh.sampledata.iris import flowers as df
source = ColumnDataSource(df)
p = figure(width=400, height=400)
p.circle("petal_length", "petal_width", source=source)
show(p)
Transformations¶
If data sources do not need to be shared, Dicts, pandas.DataFrame or GroupBy objects can be passed directly to the Glyph method without explicitly creating a ColumnDataSource. In this case, the conversion takes place automatically.
Glyph properties can be configured not only with names of columns from data sources, but also with transformation objects from bokeh.transform. It is important to note that when using these objects, the transformations take place in the browser and not in Python.
cumsum¶
In the following, we will first look at a cumsum transformation that can generate a new sequence of values from a column by adding the values cumulatively. This can be useful for pie charts or doughnut charts:
[6]:
from math import pi
import pandas as pd
from bokeh.palettes import Category20c
from bokeh.transform import cumsum
x = {
"United States": 157,
"United Kingdom": 93,
"Japan": 89,
"China": 63,
"Germany": 44,
"India": 42,
"Italy": 40,
"Australia": 35,
"Brazil": 32,
"France": 31,
"Taiwan": 31,
"Spain": 29,
}
data = (
pd.Series(x).reset_index(name="value").rename(columns={"index": "country"})
)
data["color"] = Category20c[len(x)]
# represent each value as an angle = value / total * 2pi
data["angle"] = data["value"] / data["value"].sum() * 2 * pi
p = figure(
height=350,
title="Pie Chart",
toolbar_location=None,
tools="hover",
tooltips="@country: @value",
)
p.wedge(
x=0,
y=1,
radius=0.4,
# use cumsum to cumulatively sum the values for start and end angles
start_angle=cumsum("angle", include_zero=True),
end_angle=cumsum("angle"),
line_color="white",
fill_color="color",
legend_label="country",
source=data,
)
p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None
show(p)
linear_cmap¶
With the linear_cmap transformation, a linear colour assignment to the column of a data source can generate a new colour sequence:
[7]:
import numpy as np
from bokeh.transform import linear_cmap
N = 4000
data = dict(
x=np.random.random(size=N) * 100,
y=np.random.random(size=N) * 100,
r=np.random.random(size=N) * 1.5,
)
p = figure()
p.circle(
"x",
"y",
radius="r",
source=data,
fill_alpha=0.6,
# color map based on the x-coordinate
color=linear_cmap("x", "Viridis256", 0, 100),
)
show(p)