Day 16 — What's inside a Jupyter notebook?

Today I was reading up on some Jupyter things, and wondered how images, widgets, and maps are stored within a notebook. I love the literate programming model where your docs, code and outputs; everything is viewable with just one file!

A Jupyter notebook is basically a Python dictionary. It has some metadata about the kernel used within a notebook, with a cells list which contains all the docs, code and outputs.


  {
      "cells": [],
      "metadata": {
      "kernelspec": {
          "display_name": "Python 3",
          "language": "python",
          "name": "python3"
      },
      "language_info": {
          "codemirror_mode": {
              "name": "ipython",
              "version": 3
          },
          "file_extension": ".py",
          "mimetype": "text/x-python",
          "name": "python",
          "nbconvert_exporter": "python",
          "pygments_lexer": "ipython3",
          "version": "3.6.9"
      }
      },
      "nbformat": 4,
      "nbformat_minor": 2
  }

Let's look at some notebooks! I've removed some fields to make code blocks smaller.

In [1]:
import os
In [2]:
os.getcwd()
Out[2]:
'/home/vinayak/dev/playground'

Under the hood:


  "cells": [
      {
          "cell_type": "code",
          "execution_count": 1,
          "outputs": [],
          "source": [
              "import os"
          ]
      },
      {
          "cell_type": "code",
          "execution_count": 2,
          "outputs": [
              {
                  "data": {
                      "text/plain": [
                          "'/home/vinayak/dev/playground'"
                      ]
                  },
                  "output_type": "execute_result"
              }
          ],
          "source": [
              "os.getcwd()"
          ]
      }
  ]

Each cell in the cells list is a dict with a cell_type (which can be code or markdown), an execution_count, the source (code) and outputs. The source list can have multiple strings since the code you write in a Jupyter notebook cell can span across multiple lines. The most interesting field is outputs, which is another list of dicts that contains each output and its type. The output type decides how an output is rendered when you open the notebook on a vanilla Jupyter server (using jupyter notebook), or on JupyterLab. In the above notebook, output type is text/plain since the output is just plain text.

But what about images, widgets, and maps?

In [1]:
import camelot
In [2]:
tables = camelot.read_pdf("foo.pdf")
In [3]:
camelot.plot(tables[0], kind="contour").show()

Under the hood:


  "cells": [
      ...
      {
          "cell_type": "code",
          "execution_count": 3,
          "outputs": [
              {
                  "data": {
                      "image/png": "A base64 encoded string",
                  },
                  "output_type": "display_data"
              }
          ],
          "source": [
              "camelot.plot(tables[0], kind=\"contour\").show()"
          ]
      }
  ]

Images are stored as a base64 encoded string! The output type is image/png.

In [1]:
from ipywidgets import interact
In [2]:
def f(x):
    return x
In [3]:
interact(f, x=10);

Under the hood:


  "cells": [
      ...
      {
          "cell_type": "code",
          "execution_count": 3,
          "outputs": [
              {
                  "data": {
                      "application/vnd.jupyter.widget-view+json": {
                          "model_id": "62b096eac6254feeb9624bdd53c1e54d",
                          "version_major": 2,
                          "version_minor": 0
                      }
                  },
                  "output_type": "display_data"
              }
          ],
          "source": [
              "interact(f, x=10);"
          ]
      }
  ]

Widgets are stored with a custom output type application/vnd.jupyter.widget-view+json, and only work when your notebook has a live kernel attached to it. The kernel stores the Python object the widget is built on, using a model_id. And the notebook uses this model_id to query the kernel every time the widget is changed. Since widgets are stateful, you get to see the following message when you restart your kernel:

A Jupyter widget could not be displayed because the widget state could not be found. This could happen if the kernel storing the widget is no longer available, or if the widget state was not saved in the notebook. You may be able to create the widget by running the appropriate cells.

In [1]:
from random import randint

import folium
import geopatra
import geopandas
In [2]:
cities = geopandas.read_file(
    geopandas.datasets.get_path("naturalearth_cities")
)
cities["value"] = [randint(1, 10) for c in cities.iterrows()]
In [3]:
cities.folium.plot()
Out[3]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Under the hood:


  "cells": [
      ...
      {
          "cell_type": "code",
          "execution_count": 3,
          "outputs": [
              {
                  "data": {
                      "text/html": ["All the html needed to render that map!"]
                  },
                  "output_type": "execute_result"
              }
          ],
          "source": [
              "cities.folium.plot()"
          ]
      }
  ]

Maps are displayed by storing all the HTML needed to render them! The output type is text/html.


Also, I recently learned that Jupyter comes from Ju(lia) + Pyt(hon) + R, the three languages that were initially supported!