Data sources and transformations

Overview

Bokeh can work with Python lists, NumPy arrays, pandas series etc. These inputs are converted into a Bokeh ColumnDataSource. Although Bokeh often does this transparently, it can occasionally be useful to create them explicitly.

[1]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure


output_notebook()
Loading BokehJS ...

Python dicts

The ColumnDataSource can be imported from bokeh.models:

[2]:
from bokeh.models import ColumnDataSource

ColumnDataSource is an assignment of column names to value sequences. All columns must always have the same length:

[3]:
source = ColumnDataSource(
    data={
        "x": [1, 2, 3, 4, 5],
        "y": [3, 7, 8, 5, 1],
    }
)

So far, we have called functions like p.circle by passing lists or data arrays directly. Bokeh then automatically creates a ColumnDataSource for us. However, it is also possible to specify a ColumnDataSource explicitly by passing a glyph method as the source argument:

[4]:
p = figure(width=400, height=400)
p.circle("x", "y", size=20, source=source)
show(p)

pandas.DataFrame

It is also easy to create ColumnDataSource objects directly from pandas-DataFrames:

[5]:
from bokeh.sampledata.iris import flowers as df


source = ColumnDataSource(df)
p = figure(width=400, height=400)
p.circle("petal_length", "petal_width", source=source)
show(p)

Transformations

If data sources do not need to be shared, Dicts, pandas.DataFrame or GroupBy objects can be passed directly to the Glyph method without explicitly creating a ColumnDataSource. In this case, the conversion takes place automatically.

Glyph properties can be configured not only with names of columns from data sources, but also with transformation objects from bokeh.transform. It is important to note that when using these objects, the transformations take place in the browser and not in Python.

cumsum

In the following, we will first look at a cumsum transformation that can generate a new sequence of values from a column by adding the values cumulatively. This can be useful for pie charts or doughnut charts:

[6]:
from math import pi

import pandas as pd

from bokeh.palettes import Category20c
from bokeh.transform import cumsum


x = {
    "United States": 157,
    "United Kingdom": 93,
    "Japan": 89,
    "China": 63,
    "Germany": 44,
    "India": 42,
    "Italy": 40,
    "Australia": 35,
    "Brazil": 32,
    "France": 31,
    "Taiwan": 31,
    "Spain": 29,
}

data = (
    pd.Series(x).reset_index(name="value").rename(columns={"index": "country"})
)
data["color"] = Category20c[len(x)]

# represent each value as an angle = value / total * 2pi
data["angle"] = data["value"] / data["value"].sum() * 2 * pi

p = figure(
    height=350,
    title="Pie Chart",
    toolbar_location=None,
    tools="hover",
    tooltips="@country: @value",
)

p.wedge(
    x=0,
    y=1,
    radius=0.4,
    # use cumsum to cumulatively sum the values for start and end angles
    start_angle=cumsum("angle", include_zero=True),
    end_angle=cumsum("angle"),
    line_color="white",
    fill_color="color",
    legend_label="country",
    source=data,
)

p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None

show(p)

linear_cmap

With the linear_cmap transformation, a linear colour assignment to the column of a data source can generate a new colour sequence:

[7]:
import numpy as np

from bokeh.transform import linear_cmap


N = 4000
data = dict(
    x=np.random.random(size=N) * 100,
    y=np.random.random(size=N) * 100,
    r=np.random.random(size=N) * 1.5,
)

p = figure()

p.circle(
    "x",
    "y",
    radius="r",
    source=data,
    fill_alpha=0.6,
    # color map based on the x-coordinate
    color=linear_cmap("x", "Viridis256", 0, 100),
)

show(p)