PdVega examples¶
Imports¶
[1]:
import numpy as np
import pandas as pd
import pdvega
Declarative description of the data visualisation¶
Vega-Lite can be used to declaratively describe how the data should be mapped to visualisation functions. With pdvega
, this specification is available in a similar way to the Matplotlib API: data.plot
simply needs to be replaced by data.vgplot
, where data refers to pandas series or DataFrame objects.
Loading a DataFrame
with time series of share prices:
[2]:
from vega_datasets import data
stocks = data.stocks(pivoted=True)
[3]:
stocks.plot.line()
[3]:
<Axes: xlabel='date'>

pdvega
API¶
[5]:
stocks.vgplot.line()

The result is beautiful data visualisations with a minimum of boilerplate. In addition, the diagrams created from pdvega
are interactive and can be moved and enlarged/shrunk.
Simple data visualisations with data.vgplot
¶
The central interface of pdvega
is the vgplot
attribute, which is added to pandas DataFrame
and Series
objects.
As with the pandas plots, there are two ways to create diagrams:
the
vgplot
attribute of a pandas object can be called directly, for example
iris.vgplot(kind="scatter", x="sepalLength", y="petalLength", c="species")
Alternatively, the specific method assigned to each chart type can also be called:
iris.vgplot.scatter(x="sepalLength", y="petalLength", c="species")
This approach offers the advantage that available plot types can be analysed via tab completion. The individual functions also provide more detailed documentation of the arguments available for each method.
Diagram types¶
The vgplot
API provides nine basic diagram types:
Line charts with vgplot.line
¶
The default chart type for vgplot
is a line chart.
Unless otherwise specified, the index of DataFrame
or Series
is used as the x-axis variable, and a separate line for the y-values of each column of the DataFrame
. If you only want to plot a subset of the columns, you can use pandas indexing to select the columns you are interested in:
[6]:
stocks[['AAPL', 'AMZN']].vgplot.line()

Line diagrams can be further customised. Information on this can be found in the documentation:
Scatter plots with vgplot.scatter
¶
[7]:
stocks.vgplot.scatter(x="AAPL", y="AMZN")

To further customise scatter plots, check out pdvega.FramePlotMethods.scatter().
Area plots with vgplot.area
¶
[8]:
stocks[["MSFT", "AAPL", "AMZN"]].vgplot.area()

Area diagrams can also be stacked. In this case, transparent areas are often helpful.
[9]:
stocks[["MSFT", "AAPL", "AMZN"]].vgplot.area(stacked=False, alpha=0.4)

Area diagrams can be further customised, see
Bar charts with vgplot.bar
¶
[10]:
np.random.seed(1234)
df = pd.DataFrame(np.random.rand(10, 2), columns=["a", "b"])
df.vgplot.bar()

As with area diagrams, you can stack the bars with stacked=True
:
[11]:
df.vgplot.bar(stacked=True)

In addition, horizontal bar charts can be created with barh
:
[12]:
df.vgplot.barh(stacked=True)

Histograms with vgplot.hist
¶
[13]:
df = pd.DataFrame(
{
"a": np.random.randn(1000) + 1,
"b": np.random.randn(1000),
"c": np.random.randn(1000) - 1,
},
columns=["a", "b", "c"],
)
df.vgplot.hist(bins=50, alpha=0.5)

Histograms can be further customised, see
Core density estimation diagrams with vgplot.kde
¶
Similar to histograms, kernel density estimation (KDE) diagrams generate smooth curves that indicate the density of the measurement points.
[14]:
df.vgplot.kde()

KDE diagrams can be further customised with
Heatmaps with vgplot.heatmap
¶
pandas.plotting
has a function for creating a hexagonal grouped heatmap of two-dimensional data. Unfortunately, neither Vega nor Vega-Lite currently support these hexagonal heatmaps. However, they do support Cartesian heatmaps, and this functionality is also included in pdvega
:
[15]:
df.vgplot.heatmap(x="a", y="b", C="c", gridsize=20)

Heatmap diagrams can be further customised, see pdvega.FramePlotMethods.heatmap().
Statistical visualisation with pdvega.plotting
¶
pdvega
also supports many of the more complex plotting routines available in the pandas.plotting submodule. Below we show the example of a multi-panel scatterplot matrix from Fisher’s Iris dataset:
[16]:
iris = data.iris()
pdvega.scatter_matrix(iris, "species", figsize=(7, 7))

You can interactively move and zoom in/out in this diagram. You can also select individual measuring points by holding down the Shift key.
Parallel coordinates¶
Another way to visualise multidimensional data is to view each dimension independently using a diagram with parallel coordinates. This can be realised with pdvega.parallel_coordinates()
, whereby the API corresponds to pandas.plotting.parallel_coordinates()
:
[17]:
pdvega.parallel_coordinates(iris, "species")

At a glance, you can recognise relationships between points and, in particular, that the ‘setosa’ species differs significantly from the other two species in the width and length of the petals.
Andrews curves¶
A similar approach to visualising data dimensions is known as the Andrews curve: The idea is to construct a Fourier series from the features of each object to qualitatively visualise the aggregated differences between classes. This can be done with the function pdvega.andrews_curves()
, which corresponds to the API of pandas.plotting.andrews_curves()
:
[17]:
pdvega.andrews_curves(iris, "species")

This gives a similar impression to the parallel coordinates, but provides less quantitative information on the characteristics that lead to this distinction.
Correlogram (lag plots)¶
Correlograms are implemented with pdvega.plotting.lag_plot()
where the API corresponds to pandas.plotting.lag_plot()
. In the following, the share prices of Amazon and Microsoft from 1998 to 2010 are visualised with a lag of 12 months:
[18]:
pdvega.lag_plot(stocks[["AMZN", "MSFT"]], lag=12)

It is immediately apparent from this plot that Amazon was far more volatile during this period: the values at each point in time showed only a very low correlation with the value one year later. Conversely, Microsoft’s value was much more stable during this decade.
We can also see this interpretation in the simple time series chart of each company’s share price:
[19]:
stocks[["AMZN", "MSFT"]].vgplot.line()
