jupyter-repo2docker¶
jupyter-repo2docker
is a tool to build, run, and push Docker
images from source code repositories.
repo2docker
fetches a repository
(from GitHub, GitLab, Zenodo, Figshare, Dataverse installations, a Git repository or a local directory)
and builds a container image in which the code can be executed.
The image build process is based on the configuration files found in the repository.
repo2docker
can be
used to explore a repository locally by building and executing the
constructed image of the repository, or as a means of building images that
are pushed to a Docker registry.
repo2docker
is the tool used by BinderHub
to build images on demand.
Please report Bugs, ask questions or contribute to the project.
Getting Started¶
Instructions and information on how to get started with repo2docker on your own machine. Select from the pages listed below to begin.
Installing repo2docker
¶
repo2docker requires Python 3.6 or above on Linux and macOS. See below for more information about Windows support.
Prerequisite: Docker¶
Install Docker as it is required to build Docker images. The Community Edition, is available for free.
Recent versions of Docker are recommended.
The latest version of Docker, 18.03
, successfully builds repositories from
binder-examples.
The BinderHub helm chart uses version
17.11.0-ce-dind
. See the
helm chart
for more details.
Optional: Mercurial¶
For Mercurial repositories, Mercurial and hg-evolve need to be installed. For example, on Debian based distributions, one can do:
sudo apt install mercurial
$(hg debuginstall --template "{pythonexe}") -m pip install hg-evolve --user
To install Mercurial on other systems, see here.
Note that for old Mercurial versions, you may need to specify a version for
hg-evolve. For example, hg-evolve==9.2
for hg 4.5 (which is installed with
apt on Ubuntu 18.4).
Installing with pip
¶
We recommend installing repo2docker
with the pip
tool:
python3 -m pip install jupyter-repo2docker
for the latest release. To install the most recent code from the upstream repository, run:
python3 -m pip install https://github.com/jupyterhub/repo2docker/archive/main.zip
For information on using repo2docker
, see Using repo2docker.
Installing from source code¶
Alternatively, you can install repo2docker from a local source tree, e.g. in case you are contributing back to this project:
git clone https://github.com/jupyterhub/repo2docker.git
cd repo2docker
python3 -m pip install -e .
That’s it! For information on using repo2docker
, see
Using repo2docker.
Windows support¶
Windows support for repo2docker
is still in the experimental stage.
An article about using Windows and the WSL (Windows Subsytem for Linux or Bash on Windows) provides additional information about Windows and docker.
Using repo2docker
¶
Note
Docker must be running in
order to run repo2docker
. For more information on installing
repo2docker
, see Installing repo2docker.
repo2docker
can build a reproducible computational environment for any repository that
follows The Reproducible Execution Environment Specification. repo2docker is called with the URL of a Git repository,
a DOI from Zenodo or Figshare,
a Handle or DOI from a Dataverse installation,
a SWHID of a directory of a revision archived in the
Software Heritage Archive,
or a path to a local directory.
It then performs these steps:
Inspects the repository for configuration files. These will be used to build the environment needed to run the repository.
Builds a Docker image with an environment specified in these configuration files.
Launches the image to let you explore the repository interactively via Jupyter notebooks, RStudio, or many other interfaces (optional)
Pushes the images to a Docker registry so that it may be accessed remotely (optional)
Calling repo2docker¶
repo2docker is called with this command:
jupyter-repo2docker <source-repository>
where <source-repository>
is:
a URL of a Git repository (
https://github.com/binder-examples/requirements
),a Zenodo DOI (
10.5281/zenodo.1211089
),a SWHID (
swh:1:rev:999dd06c7f679a2714dfe5199bdca09522a29649
), ora path to a local directory (
a/local/directory
)
of the source repository you want to build.
For example, the following command will build an image of Peter Norvig’s Pytudes repository:
jupyter-repo2docker https://github.com/norvig/pytudes
Building the image may take a few minutes.
Pytudes
uses a requirements.txt file
to specify its Python environment. Because of this, repo2docker
will use
pip
to install dependencies listed in this requirement.txt
file, and
these will be present in the generated Docker image. To learn more about
configuration files in repo2docker
visit Configuration Files.
When the image is built, a message will be output to your terminal:
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://0.0.0.0:36511/?token=f94f8fabb92e22f5bfab116c382b4707fc2cade56ad1ace0
Pasting the URL into your browser will open Jupyter Notebook with the dependencies and contents of the source repository in the built image.
Building a specific branch, commit or tag¶
To build a particular branch and commit, use the argument --ref
and
specify the branch-name
or commit-hash
. For example:
jupyter-repo2docker --ref 9ced85dd9a84859d0767369e58f33912a214a3cf https://github.com/norvig/pytudes
Tip
For reproducible builds, we recommend specifying a commit-hash to deterministically build a fixed version of a repository. Not specifying a commit-hash will result in the latest commit of the repository being built.
Where to put configuration files¶
repo2docker
will look for configuration files in:
A folder named
binder/
in the root of the repository.A folder named
.binder/
in the root of the repository.The root directory of the repository.
Having both binder/
and .binder/
folders is not allowed.
If one of these folders exists, only configuration files in that folder are considered, configuration in the root directory will be ignored.
Check the complete list of configuration files supported
by repo2docker
to see how to configure the build process.
Note
repo2docker
builds an environment with Python 3.7 by default. If you’d
like a different version, you can specify this in your
configuration files.
Debugging repo2docker with --debug
and --no-build
¶
To debug the docker image being built, pass the --debug
parameter:
jupyter-repo2docker --debug https://github.com/norvig/pytudes
This will print the generated Dockerfile
, build it, and run it.
To see the generated Dockerfile
without actually building it,
pass --no-build
to the commandline. This Dockerfile
output
is for debugging purposes of repo2docker
only - it can not
be used by docker directly.
jupyter-repo2docker --no-build --debug https://github.com/norvig/pytudes
Command line API¶
jupyter-repo2docker¶
Fetch a repository and build a container image
usage: jupyter-repo2docker [-h] [--help-all] [--version] [--config CONFIG]
[--json-logs] [--image-name IMAGE_NAME] [--ref REF]
[--debug] [--no-build]
[--build-memory-limit BUILD_MEMORY_LIMIT]
[--no-run] [--publish PORTS] [--publish-all]
[--no-clean] [--push] [--volume VOLUMES]
[--user-id USER_ID] [--user-name USER_NAME]
[--env ENVIRONMENT] [--editable]
[--target-repo-dir TARGET_REPO_DIR]
[--appendix APPENDIX] [--label LABELS]
[--build-arg BUILD_ARGS] [--subdir SUBDIR]
[--cache-from CACHE_FROM] [--engine ENGINE]
repo ...
- repo¶
Path to repository that should be built. Could be local path or a git URL.
- cmd¶
Custom command to run after building container
- -h, --help¶
show this help message and exit
- --help-all¶
Display all configurable options and exit.
- --version¶
Print the repo2docker version and exit.
- --config <config>¶
Path to config file for repo2docker
- --json-logs¶
Emit JSON logs instead of human readable logs
- --image-name <image_name>¶
Name of image to be built. If unspecified will be autogenerated
- --ref <ref>¶
Reference to build instead of default reference. For example branch name or commit for a Git repository.
- --debug¶
Turn on debug logging
- --no-build¶
Do not actually build the image. Useful in conjunction with –debug.
- --build-memory-limit <build_memory_limit>¶
Total Memory that can be used by the docker build process
- --no-run¶
Do not run container after it has been built
- --publish <ports>, -p <ports>¶
Specify port mappings for the image. Needs a command to run in the container.
- --publish-all, -P¶
Publish all exposed ports to random host ports.
- --no-clean¶
Don’t clean up remote checkouts after we are done
- --push¶
Push docker image to repository
- --volume <volumes>, -v <volumes>¶
Volumes to mount inside the container, in form src:dest
- --user-id <user_id>¶
User ID of the primary user in the image
- --user-name <user_name>¶
Username of the primary user in the image
- --env, -e¶
Environment variables to define at container run time
- --editable, -E¶
Use the local repository in edit mode
- --target-repo-dir <target_repo_dir>¶
Path inside the image where contents of the repositories are copied to, and where all the build operations (such as postBuild) happen. Defaults to ${HOME} if not set
- --appendix <appendix>¶
Appendix of Dockerfile commands to run at the end of the build. Can be used to customize the resulting image after all standard build steps finish.
- --label <labels>¶
Extra label to set on the image, in form name=value
- --build-arg <build_args>¶
Extra build arg to pass to the build process, in form name=value
- --subdir <subdir>¶
Subdirectory of the git repository to examine. Defaults to ‘’.
- --cache-from <cache_from>¶
List of images to try & re-use cached image layers from. Docker only tries to re-use image layers from images built locally, not pulled from a registry. We can ask it to explicitly re-use layers from non-locally built images by through the ‘cache_from’ parameter.
- --engine <engine>¶
Name of the container engine
Frequently Asked Questions (FAQ)¶
A collection of frequently asked questions with answers. If you have a question and have found an answer, send a PR to add it here!
How should I specify another version of Python?¶
One can specify a Python version in the environment.yml
file of a repository
or runtime.txt
file if using requirements.txt
instead of environment.yml
.
What versions of Python (or R or Julia…) are supported?¶
Python¶
Repo2docker officially supports the following versions of Python (specified in your environment.yml or runtime.txt file):
3.7 (added in 0.7, default in 0.8)
3.6 (default in 0.7 and earlier)
3.5
2.7
Additional versions may work, as long as the base environment can be installed for your version of Python. The most likely source of incompatibility is if one of the packages in the base environment is not packaged for your Python, either because the version of the package is too new and your chosen Python is too old, or vice versa.
I Python 2.7 is specified, a separate environment for the kernel will be installed with Python 2. The notebook server will run in the default Python 3.7 environment.
Julia¶
All Julia versions since Julia 0.7.0 are supported via a Project.toml file, and this is the recommended way to install Julia environments. Julia versions 0.6.x and earlier are supported via a REQUIRE file.
R¶
The default version of R is currently R 3.6.1. You can select the version of R you want to use by specifying it in the runtime.txt file.
We support R versions 3.4, 3.5 and 3.6.
Why is my repository failing to build with ResolvePackageNotFound
?¶
If you used conda env export
to generate your environment.yml
it will
generate a list of packages and versions of packages that is pinned to platform
specific versions. These very specific versions are not available in the linux
docker image used by repo2docker
. A typical error message will look like
the following:
Step 39/44 : RUN conda env update -n root -f "environment.yml" && conda clean -tipsy && conda list -n root
---> Running in ebe9a67762e4
Solving environment: ...working... failed
ResolvePackageNotFound:
- jsonschema==2.6.0=py36hb385e00_0
- libedit==3.1.20181209=hb402a30_0
- tornado==5.1.1=py36h1de35cc_0
...
We recommend to use conda env export --no-builds -f environment.yml
to export
your environment and then edit the file by hand to remove platform specific
packages like appnope
.
See How to automatically create a environment.yml that works with repo2docker for a recipe on how to create strict exports of
your environment that will work with repo2docker
.
Can I add executable files to the user’s PATH?¶
Yes! With a postBuild - Run code after installing the environment file, you can place any files that should be called
from the command line in the folder ~/.local/
. This folder will be
available in a user’s PATH, and can be run from the command line (or as
a subsequent build step.)
How do I set environment variables?¶
To configure environment variables for all users of a repository use the start configuration file.
When running repo2docker locally you can use the -e
or --env
command-line
flag for each variable that you want to define.
For example jupyter-repo2docker -e VAR1=val1 -e VAR2=val2 ...
Can I use repo2docker to bootstrap my own Dockerfile?¶
No, you can’t.
If you pass the --debug
flag to repo2docker
, it outputs the
intermediate Dockerfile that is used to build the docker image. While
it is tempting to copy this as a base for your own Dockerfile, that is
not supported & in most cases will not work. The --debug
output is
just our intermediate generated Dockerfile, and is meant to be built
in a very specific way. Hence the output of --debug
can not be
built with a normal docker build -t .
or similar traditional
docker command.
Check out the binder-examples GitHub organization for example repositories you can copy & modify for your own use!
Can I use repo2docker to edit a local host repository within a Docker environment?¶
Yes: use the --editable
or -E
flag (don’t confuse this with
the -e
flag for environment variables), and run repo2docker on a
local repository:
repo2docker -E my-repository/
This builds a Docker container from the files in that repository
(using, for example, a requirements.txt
or install.R
file),
then runs that container, while connecting the working directory
inside the container to the local repository outside the
container. For example, in case there is a notebook file (.ipynb
),
this will open in a local web browser, and one can edit it and save
it. The resulting notebook is updated in both the Docker container and
the local repository. Once the container is exited, the changed file
will still be in the local repository.
This allows for easy testing of the container while debugging some items, as well as using a fully customizable container to edit notebooks (among others).
Note
Editable mode is a convenience option that will bind the
repository to the container working directory (usually
$HOME
). If you need to mount to a different location in
the container, use the --volumes
option instead. Similarly,
for a fully customized user Dockerfile, this option is not
guaranteed to work.
Why is my R shiny app not launching?¶
If you are trying to run an R shiny app using the /shiny/folder_containing_shiny
url option, but the launch returns “The application exited during initialization.”,
there might be something wrong with the specification of the app. One way of debugging
the app in the container is by running the rstudio
url, open either the ui or
server file for the app, and run the app in the container rstudio. This way you can
see the rstudio logs as it tries to initialise the shiny app. If you a missing a
package or other dependency for the container, this will be obvious at this stage.
Why does repo2docker need to exist? Why not use tool like source2image?¶
The Jupyter community believes strongly in building on top of pre-existing tools whenever possible (this is why repo2docker buildpacks largely build off of patterns that already exist in the data analytics community). We try to perform due-diligence and search for other communities to leverage and help, but sometimes it makes the most sense to build our own new tool. In the case of repo2docker, we spent time integrating with a pre-existing tool called source2image. This is an excellent open tool for containerization, but we ultimately decided that it did not fit the use-case we wanted to address. For more information, here is a short blog post about the decision and the reasoning behind it.
How-to Guides¶
Short, actionable guides that cover specific topics with repo2docker. Select from the pages listed below to get started.
Configure the user interface¶
You can build several user interfaces into the resulting Docker image. This is controlled with various configuration files.
JupyterLab¶
JupyterLab is the default interface for repo2docker.
The following Binder URL will open the
pyTudes repository
and begin a JupyterLab session in the ipynb
folder:
https://mybinder.org/v2/gh/norvig/pytudes/HEAD?urlpath=lab/tree/ipynb
The /tree/ipynb
above is how JupyterLab directs you to a specific file
or folder.
To learn more about URLs in JupyterLab and Jupyter Notebook, visit starting JupyterLab.
Classic Notebook Interface¶
The classic notebook is also available without any configuration.
To switch to the classic notebook,
You do not need any extra configuration in order to allow the use
of the classic notebook interface.
You can launch the classic notebook interface from within a user
session by opening JupyterLab and replacing /lab/
with /tree/
in the default juptyerlab URL
like so:
http(s)://<server:port>/tree/
And you can switch back to JupyterLab by replacing /tree/
with /lab/
:
http(s)://<server:port>/lab/
nteract¶
nteract is a notebook interface built with React. It is similar to a more feature-filled version of the traditional Jupyter Notebook interface.
nteract comes pre-installed in any session that has been built from a Python repository.
You can launch nteract from within a user
session by replacing /tree
with /nteract
at the end of a notebook
server’s URL like so:
http(s)://<server:port>/nteract
For example, the following Binder URL will open the
pyTudes repository
and begin an nteract session in the ipynb
folder:
https://mybinder.org/v2/gh/norvig/pytudes/HEAD?urlpath=nteract/tree/ipynb
The /tree/ipynb
above is how nteract directs you to a specific file
or folder.
To learn more about nteract, visit the nteract website.
RStudio¶
The RStudio user interface is automatically enabled if a configuration file for
R is detected (i.e. an R version specified in runtime.txt
). If this is detected,
RStudio will be accessible by appending /rstudio
to the URL, like so:
http(s)://<server:port>/rstudio
For example, the following Binder link will open an RStudio session in the R demo repository.
http://mybinder.org/v2/gh/binder-examples/r/HEAD?urlpath=rstudio
Shiny¶
Shiny lets you create interactive visualizaions with R.
Shiny is automatically enabled if a configuration file for
R is detected (i.e. an R version specified in runtime.txt
). If
this is detected, Shiny will be accessible by appending
/shiny/<folder-w-shiny-files>
to the URL, like so:
http(s)://<server:port>/shiny/bus-dashboard
This assumes that a folder called bus-dashboard
exists in the root
of the repository, and that it contains all of the files needed to run
a Shiny app.
For example, the following Binder link will open a Shiny session in the R demo repository.
http://mybinder.org/v2/gh/binder-examples/r/HEAD?urlpath=shiny/bus-dashboard/
Stencila¶
Note
Stencila support has been removed due to changes in stencila making it incompatible. Please get in touch if you would like to help restore stencila support.
Choose languages for your environment¶
You can define many different languages in your configuration files. This page describes how to use some of the more common ones.
Python¶
Your environment will have Python (and specified dependencies) installed when you use one of the following configuration files:
requirements.txt
environment.yml
Note
By default, the environment will have Python 3.7.
Changed in version 0.8: Upgraded default Python from 3.6 to 3.7.
Specifying a version of Python¶
To specify a specific version of Python, you have two options:
Use environment.yml. Conda environments let you define the Python version in
environment.yml
. To do so, addpython=X.X
to your dependencies section, like so:name: python 2.7 dependencies: - python=2.7 - numpy
Use runtime.txt with requirements.txt. If you are using
requirements.txt
instead ofenvironment.yml
, you can specify the Python runtime version in a separate file calledruntime.txt
. This file contains a single line of the following form:python-X.X
For example:
python-3.6
The R Language¶
To ensure that R is installed, you must specify a version of R in a runtime.txt
file. This takes the following form:
r-YYYY-MM-DD
The date corresponds to the state of the MRAN repository at this day. Make sure
that you choose a day with the desired version of your packages. For example,
to use the MRAN repository on January 1st, 2018, add this line to runtime.txt
:
r-2018-01-01
Note that to install specific packages with the R environment, you should
use the install.R
configuration file.
Julia¶
To build an environment with Julia, include a configuration file called
Project.toml
. The format of this file is documented at
the Julia Pkg.jl documentation.
To specify a specific version of Julia to install, put a Julia version in the
[compat]
section of the Project.toml
file, as described
here: https://julialang.github.io/Pkg.jl/v1/compatibility/.
Languages not covered here¶
If a language is not “officially” supported by a build pack, it can often be
installed with a postBuild
script. This will run arbitrary bash
commands,
and can be used to download / install a language.
Using multiple languages at once¶
It may also be possible to combine multiple languages in a single environment. The details on how to accomplish this with all possible combinations are outside the scope of this guide. However we recommend that you take a look at the Multi-Language Demo repository for some inspiration.
How to automatically create a environment.yml
that works with repo2docker¶
This how-to explains how to create a environment.yml
that specifies all
installed packages and their precise versions from your environment.
The challenge¶
conda env export -f environment.yml
creates a strict export of all packages.
This is the most robust for reproducibility, but it does bake in potential
platform-specific packages, so you can only use an exported environment on the
same platform.
repo2docker
uses a linux based image as the starting point for every docker
image it creates. However a lot of people use OSX or Windows as their day to
day operating system. This means that the environment.yml
created by a strict
export will not work with error messages saying that certain packages can not
be resolved (ResolvePackageNotFound
).
The solution¶
Export your explicit install commands¶
To get a minimal environment.yml
that only contains the packages you
explicitly installed run
conda env export --from-history -f environment.yml
. We recommend that you
use this option to create your environment.yml
. The resulting
environment.yml
then contains a loose pinning of the versions used, e.g.
pandas=0.25
if you explicitly requested this pandas
version on
installation. If you didn’t list a version constraint during installation, it
will also not be listed in your environment.yml
.
While this approach doesn’t lead to perfect reproducibilty, it will contain just the same packages as if you would recreate the enviroment with the same commands again today.
Strict version export¶
Follow this procedure to create a strict export of your environment that will
work with repo2docker
and sites like mybinder.org.
We will launch a terminal inside a basic docker image, install the packages you need and then perform a strict export of the environment.
install repo2docker on your computer by following Installing repo2docker
in a terminal launch a basic repository
repo2docker https://github.com/binder-examples/conda-freeze
inside repo2dockeropen the URL printed at the end in a browser, the URL should look like
http://127.0.0.1:61037/?token=30e61ec80bda6dd0d14805ea76bb59e7b0cd78b5d6b436f0
open a terminal by clicking “New -> Terminal” next to the “Upload” button on the right hand side of the webpage
install the packages your project requires with
conda install <yourpackages>
use
conda env export -n root
to print the environmentcopy and paste the environment you just printed into a
environment.yml
in your projects repositoryclose your browser tabs and exit the repo2docker session by pressing Ctrl-C.
This will give you a strict export of your environment that precisely pins the versions of packages in your environment based on a linux environment.
Build JupyterHub-ready images¶
JupyterHub allows multiple
users to collaborate on a shared Jupyter server. repo2docker
can build
Docker images that can be shared within a JupyterHub deployment. For example,
mybinder.org uses JupyterHub and repo2docker
to allow anyone to build a Docker image of a git repository online and
share an executable version of the repository with a URL to the built image.
To build JupyterHub-ready Docker images with repo2docker
, the
version of your JupyterHub deployment must be included in the
environment.yml
or requirements.txt
of the git repositories you
build.
If your instance of JupyterHub uses DockerSpawner
, you will need to set its
command to run jupyterhub-singleuser
by adding this line in your
configuration file:
c.DockerSpawner.cmd = ['jupyterhub-singleuser']
Using repo2docker
as part of your Continuous Integration¶
We’ve created for you the continuous-build repository so that you can push a Docker container to Docker Hub directly from a GitHub repository that has a Jupyter notebook. Here are instructions to do this.
Getting Started¶
Today you will be doing the following:
Fork and clone the continuous-build GitHub repository to obtain the hidden
.circleci
folder.Creating an image repository on Docker Hub
Connecting your repository to CircleCI
Push, commit, or create a pull request to trigger a build.
You don’t need to install any dependencies on your host to build the container, it will be done on a continuous integration server, and the container built and available to you to pull from Docker Hub.
Step 1. Clone the Repository¶
First, fork the continuous-build GitHub repository to your account, and clone the branch via either:
git clone https://www.github.com/<username>/continuous-build
or
git clone git@github.com:<username>/continuous-build.git
Step 2. Choose your Configuration¶
The hidden folder .circleci/config.yml
has instructions for CircleCI
to automatically discover and build your repo2docker Jupyter notebook container.
The default template provided in the repository in this folder will do the most basic steps,
including:
Clone the repository with the notebook that you specify
Build a Docker image
Push the build image to Docker Hub
This repository aims to provide templates for your use. If you have a request for a new template, please let us know. We will add templates as they are requested to do additional tasks like test containers, run nbconvert, etc.
Thus, if I have a repository named myrepo
and I want to use the default configuration on circleCI,
I would copy it there from the continuous-build
folder. In the example below, I’m
creating a new folder called “myrepo” and then copying the entire folder there:
mkdir -p myrepo
cp -R continuous-build/.circleci myrepo/
You would then logically create a GitHub repository in the “myrepo” folder, add the circleci configuration folder, and continue on to the next steps.
cd myrepo
git init
git add .circleci
Step 3. Docker Hub¶
Go to Docker Hub, log in, and click the big blue
button that says “create repository” (not an automated build). Choose an organization
and name that you like (in the traditional format <ORG>/<NAME>
), and
remember it! We will be adding it, along with your
Docker credentials, to be encrypted CircleCI environment variables.
Step 4. Connect to CircleCI¶
If you navigate to the main app page you should be able to click “Add Projects” and then select your repository. If you don’t see it on the list, then select a different organization in the top left. Once you find the repository, you can click the button to “Start Building” and accept the defaults.
Before you push or trigger a build, let’s set up the following environment variables. Also in the project interface on CirleCi, click the gears icon next to the project name to get to your project settings. Under settings, click on the “Environment Variables” tab. In this section, you want to define the following:
CONTAINER_NAME
should be the name of the Docker Hub repository you just created.DOCKER_TAG
is the tag you want to use. If not defined, will use first 10 characters of commit.DOCKER_USER
andDOCKER_PASS
should be your credentials (to allowing pushing)REPO_NAME
should be the full GitHub url (or other) of the repository with the notebook. This doesn’t have to coincide with the repository you are using to do the build (e.g., “myrepo” in our example).
If you don’t define the CONTAINER_NAME
it will default to be the repository where it is
building from, which you should only do if the Docker Hub repository is named equivalently.
If you don’t define either of the variables from step 3. for the Docker credentials, your
image will build but not be pushed to Docker Hub. Finally, if you don’t define the REPO_NAME
it will again use the name of the repository defined for the CONTAINER_NAME
.
Step 5. Push Away, Merrill!¶
Once the environment variables are set up, you can push or issue a pull request
to see circle build the workflow. Remember that you only need the .circleci/config.yml
and not any other files in the repository. If your notebook is hosted in the same repository,
you might want to add these, along with your requirements.txt, etc.
Tip
By default, new builds on CircleCI will not build for
pull requests and you can change this default in the settings. You can easily add
filters (or other criteria and actions) to be performed during or after the build
by editing the .circleci/config.yml
file in your repository.
Step 5. Use Your Container!¶
You should then be able to pull your new container, and run it! Here is an example:
docker pull <ORG>/<NAME>
docker run -it --name repo2docker -p 8888:8888 <ORG>/<NAME> jupyter notebook --ip 0.0.0.0
For a pre-built working example, try the following:
docker pull vanessa/repo2docker
docker run -it --name repo2docker -p 8888:8888 vanessa/repo2docker jupyter notebook --ip 0.0.0.0
You can then enter the url and token provided in the browser to access your notebook. When you are done and need to stop and remove the container:
docker stop repo2docker
docker rm repo2docker
Configuring your repository¶
Information about configuring your repository to work with repo2docker, and controlling elements of the built environment using configuration files.
For information on where to put your configuration files see Where to put configuration files.
Configuration Files¶
repo2docker
looks for configuration files in the repository being built
to determine how to build it. In general, repo2docker
uses the same
configuration files as other software installation tools,
rather than creating new custom configuration files.
A number of repo2docker
configuration files can be combined to compose more
complex setups.
The binder examples organization on
GitHub contains a list of sample repositories for common configurations
that repo2docker
can build with various configuration files such as
Python and R installation in a repository.
A list of supported configuration files (roughly in the order of build priority) can be found on this page (and to the right).
environment.yml
- Install a conda environment¶
environment.yml
is the standard configuration file used by conda
that lets you install any kind of package,
including Python, R, and C/C++ packages.
repo2docker
does not use your environment.yml
to create and activate a new conda environment.
Rather, it updates a base conda environment defined here with the packages listed in your environment.yml
.
This means that the environment will always have the same default name, not the name
specified in your environment.yml
.
Note
You can install files from pip in your environment.yml
as well.
For example, see the binder-examples environment.yml
file.
You can also specify which Python version to install in your built environment
with environment.yml
. By default, repo2docker
installs
Python 3.7 with your environment.yml
unless you include the version of
Python in this file. conda
supports all versions of Python,
though repo2docker
support is best with Python 3.7, 3.6, 3.5 and 2.7.
Warning
If you include a Python version in a runtime.txt
file in addition to your
environment.yml
, your runtime.txt
will be ignored.
Pipfile
and/or Pipfile.lock
- Install a Python environment¶
pipenv allows you to manage a virtual
environment Python dependencies. When using pipenv
, you end up with
Pipfile
and Pipfile.lock
files. The lock file contains explicit details
about the packages that has been installed that met the criteria within the
Pipfile
.
If both Pipfile
and Pipfile.lock
are found by repo2docker, the former
will be ignored in favor of the lock file. Also note that these files
distinguish packages and development packages and that repo2docker will install
both kinds.
requirements.txt
- Install a Python environment¶
This specifies a list of Python packages that should be installed in your environment. Our requirements.txt example on GitHub shows a typical requirements file.
setup.py
- Install Python packages¶
To install your repository like a Python package, you may include a
setup.py
file. repo2docker installs setup.py
files by running
pip install -e .
.
Project.toml
- Install a Julia environment¶
A Project.toml
(or JuliaProject.toml
) file can specify both the
version of Julia to be used and a list of Julia packages to be installed.
If a Manifest.toml
is present, it will determine the exact versions
of the Julia packages that are installed.
REQUIRE
- Install a Julia environment (legacy)¶
A REQUIRE
file can specify both the version of Julia to be used and
which Julia packages should be used. The use of REQUIRE
is only
recommended for pre 1.0 Julia versions. The recommended way of installing
a Julia environment that uses Julia 1.0 or newer is to use a Project.toml
file. If both a REQUIRE
and a Project.toml
file are detected,
the REQUIRE
file is ignored. To see an example of a Julia repository
with REQUIRE
and environment.yml
, visit
binder-examples/julia-python.
install.R
- Install an R/RStudio environment¶
This is used to install R libraries pinned to a specific snapshot on
MRAN.
To set the date of the snapshot add a runtime.txt.
For an example install.R
file, visit our example install.R file.
apt.txt
- Install packages with apt-get¶
A list of Debian packages that should be installed. The base image used is usually the latest released version of Ubuntu.
We use apt.txt
, for example, to install LaTeX in our
example apt.txt for LaTeX.
DESCRIPTION
- Install an R package¶
To install your repository like an R package, you may include a
DESCRIPTION
file. repo2docker installs the package and dependencies
from the DESCRIPTION
by running devtools::install_local(getwd())
.
You also need to have a runtime.txt
file that is formatted as
r-<YYYY>-<MM>-<DD>
, where YYYY-MM-DD is a snapshot of MRAN that will be
used for your R installation.
postBuild
- Run code after installing the environment¶
A script that can contain arbitrary commands to be run after the whole repository has been built. If you
want this to be a shell script, make sure the first line is #!/bin/bash
.
Note that by default the build will not be stopped if an error occurs inside a shell script.
You should include set -e
or the equivalent at the start of the script to avoid errors being silently ignored.
An example use-case of postBuild
file is JupyterLab’s demo on mybinder.org.
It uses a postBuild
file in a folder called binder
to prepare
their demo for binder.
start
- Run code before the user sessions starts¶
A script that can contain simple commands to be run at runtime (as an
ENTRYPOINT
to the docker container). If you want this to be a shell script, make sure the
first line is #!/bin/bash
. The last line must be exec "$@"
or equivalent.
Use this to set environment variables that software installed in your container expects to be set. This script is executed each time your binder is started and should at most take a few seconds to run.
If you only need to run things once during the build phase use postBuild - Run code after installing the environment.
runtime.txt
- Specifying runtimes¶
Sometimes you want to specify the version of the runtime
(e.g. the version of Python or R),
but the environment specification format will not let you specify this information
(e.g. requirements.txt or install.R).
For these cases, we have a special file, runtime.txt
.
Note
runtime.txt
is only supported when used with environment specifications
that do not already support specifying the runtime
(when using environment.yml
for conda or Project.toml
for Julia,
runtime.txt
will be ignored).
Have python-x.y
in runtime.txt
to run the repository with Python version x.y.
See our Python2 example repository.
Have r-<RVERSION>-<YYYY>-<MM>-<DD>
in runtime.txt
to run the repository with R version RVERSION and libraries from a YYYY-MM-DD snapshot of MRAN.
RVERSION can be set to 3.4, 3.5, 3.6, or to patch releases for the 3.5 and 3.6 series.
If you do not specify a version, the latest release will be used (currently R 3.6).
See our R example repository.
default.nix
- the nix package manager¶
Specify packages to be installed by the nix package manager.
When you use this config file all other configuration files (like requirements.txt
)
that specify packages are ignored. When using nix
you have to specify all
packages and dependencies explicitly, including the Jupyter notebook package that
repo2docker expects to be installed. If you do not install Jupyter explicitly
repo2docker will no be able to start your container.
nix-shell is used to evaluate
a nix
expression written in a default.nix
file. Make sure to
pin your nixpkgs
to produce a reproducible environment.
To see an example repository visit nix binder example.
Dockerfile
- Advanced environments¶
In the majority of cases, providing your own Dockerfile is not necessary as the base images provide core functionality, compact image sizes, and efficient builds. We recommend trying the other configuration files before deciding to use your own Dockerfile.
With Dockerfiles, a regular Docker build will be performed.
Note
If a Dockerfile is present, all other configuration files will be ignored.
See the Advanced Binder Documentation for best-practices with Dockerfiles.
The Reproducible Execution Environment Specification¶
repo2docker scans a repository for particular Configuration Files, such
as requirements.txt
or REQUIRE
. The collection of files, their contents,
and the resulting actions that repo2docker takes is known
as the Reproducible Execution Environment Specification (or REES).
The goal of the REES is to automate and encourage existing community best practices
for reproducible computational environments. This includes installing pacakges using
community-standard specification files and their corresponding tools,
such as requirements.txt
(with pip
), REQUIRE
(with Julia), or
apt.txt
(with apt
). While repo2docker automates the
creation of the environment, a human should be able to look at a REES-compliant
repository and reproduce the environment using common, clear steps without
repo2docker software.
Currently, the definition of the REE Specification is the following:
Any directory containing zero or more files from the Configuration Files list is a valid reproducible execution environment as defined by the REES. The configuration files have to all be placed either in the root of the directory, in a
binder/
sub-directory or a.binder/
sub-directory.
For example, the REES recognises requirements.txt
as a valid config file.
The file format is as defined by the requirements.txt
standard of the Python
community. A REES-compliant tool will install a Python interpreter (of unspecified version)
and perform the equivalent action of pip install -r requirements.txt
so that the
user can afterwards run python and use the packages installed.
Contributing¶
The repo2docker community is welcoming of all kinds of help and participation from others. Below are a few ways that you can get involved, as well as resources for understanding the structure and design of the repo2docker package.
Contributing to repo2docker¶
Thank you for thinking about contributing to repo2docker! This is an open source project that is developed and maintained entirely by volunteers. Your contribution is integral to the future of the project. THANK YOU!
Types of contribution¶
There are many ways to contribute to repo2docker:
Update the documentation. If you’re reading a page or docstring and it doesn’t make sense (or doesn’t exist!), please let us know by opening a bug report. It’s even more amazing if you can give us a suggested change.
Fix bugs or add requested features. Have a look through the issue tracker and see if there are any tagged as “help wanted”. As the label suggests, we’d love your help!
Report a bug. If repo2docker isn’t doing what you thought it would do then open a bug report. That issue template will ask you a few questions described in more detail below.
Suggest a new feature. We know that there are lots of ways to extend repo2docker! If you’re interested in adding a feature then please open a feature request. That issue template will ask you a few questions described in detail below.
Review someone’s Pull Request. Whenever somebody proposes changes to the repo2docker codebase, the community reviews the changes, and provides feedback, edits, and suggestions. Check out the open pull requests and provide feedback that helps improve the PR and get it merged. Please keep your feedback positive and constructive!
Tell people about repo2docker. As we said above, repo2docker is built by and for its community. If you know anyone who would like to use repo2docker, please tell them about the project! You could give a talk about it, or run a demonstration. The sky is the limit :rocket::star2:.
If you’re not sure where to get started, then please come and say hello in our Gitter channel, or open an discussion thread at the Jupyter discourse forum.
Process for making a contribution¶
This outlines the process for getting changes to the repo2docker project merged.
Identify the correct issue template: bug report or feature request.
Bug reports (examples, new issue) will ask you for a description of the problem, the expected behaviour, the actual behaviour, how to reproduce the problem, and your personal set up. Bugs can include problems with the documentation, or code not running as expected.
It is really important that you make it easy for the maintainers to reproduce the problem you’re having. This guide on creating a minimal, complete and verifiable example is a great place to start.
Feature requests (examples, new issue) will ask you for the proposed change, any alternatives that you have considered, a description of who would use this feature, and a best-guess of how much work it will take and what skills are required to accomplish.
Very easy feature requests might be updates to the documentation to clarify steps for new users. Harder feature requests may be to add new functionality to the project and will need more in depth discussion about who can complete and maintain the work.
Feature requests are a great opportunity for you to advocate for the use case you’re suggesting. They help others understand how much effort it would be to integrate the work,and - if you’re successful at convincing them that this effort is worth it - make it more likely that they to choose to work on it with you.
Open an issue. Getting consensus with the community is a great way to save time later.
Make edits in your fork of the repo2docker repository.
Make a pull request. Read the next section for guidelines for both reviewers and contributors on merging a PR.
Wait for a community member to merge your changes. Remember that someone else must merge your pull request. That goes for new contributors and long term maintainers alike. Because
main
is continuously deployed to mybinder.org it is essential thatmain
is always in a deployable state.(optional) Deploy a new version of repo2docker to mybinder.org by following these steps
Guidelines to getting a Pull Request merged¶
These are not hard rules to be enforced by 🚓 but they are suggestions written by the repo2docker maintainers to help complete your contribution as smoothly as possible for both you and for them.
Create a PR as early as possible, marking it with
[WIP]
while you work on it. This avoids duplicated work, lets you get high level feedback on functionality or API changes, and/or helps find collaborators to work with you.Keep your PR focused. The best PRs solve one problem. If you end up changing multiple things, please open separate PRs for the different conceptual changes.
Add tests to your code. PRs will not be merged if Travis is failing.
Apply PEP8 as much as possible, but not too much. If in doubt, ask.
Use merge commits instead of merge-by-squashing/-rebasing. This makes it easier to find all changes since the last deployment
git log --merges --pretty=format:"%h %<(10,trunc)%an %<(15)%ar %s" <deployed-revision>..
and your PR easier to review.Make it clear when your PR is ready for review. Prefix the title of your pull request (PR) with
[MRG]
if the contribution is complete and should be subjected to a detailed review.Use commit messages to describe why you are proposing the changes you are proposing.
Try to not rush changes (the definition of rush depends on how big your changes are). Remember that everyone in the repo2docker team is a volunteer and we can not (nor would we want to) control their time or interests. Wait patiently for a reviewer to merge the PR. (Remember that someone else must merge your PR, even if you have the admin rights to do so.)
Setting up for Local Development¶
To develop & test repo2docker locally, you need:
Familiarity with using a command line terminal
A computer running macOS / Linux
Some knowledge of git
At least python 3.6
Your favorite text editor
A recent version of Docker Community Edition
Clone the repository¶
First, you need to get a copy of the repo2docker git repository on your local disk. Fork the repository on GitHub, then clone it to your computer:
git clone https://github.com/<your-username>/repo2docker
This will clone repo2docker into a directory called repo2docker
. You can
make that your current directory with cd repo2docker
.
Set up a local virtual environment¶
After cloning the repository, you should set up an isolated environment to install libraries required for running / developing repo2docker.
python3 -m venv .
source bin/activate
pip3 install -e .
pip3 install -r dev-requirements.txt
pip3 install -r docs/doc-requirements.txt
pip3 install black
This should install all the libraries required for testing & running repo2docker!
Code formatting¶
We use black
as code formatter to
get a consistent layout for all the code in this project. This makes reading
the code easier.
To format your code run black .
in the top-level directory of this repository.
Many editors have plugins that will automatically apply black as you edit files.
We also have a pre-commit hook setup that will check that code is formatted
according to black’s style guide. You can activate it with pre-commit install
.
As part of our continuous integration tests we will check that code is formatted properly and the tests will fail if this is not the case.
Verify that docker is installed and running¶
If you do not already have Docker, you should be able to download and install it for your operating system using the links from the official website. After you have installed it, you can verify that it is working by running the following commands:
docker version
It should output something like:
Client:
Version: 17.09.0-ce
API version: 1.32
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:42:45 2017
OS/Arch: linux/amd64
Server:
Version: 17.09.0-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:41:24 2017
OS/Arch: linux/amd64
Experimental: false
Then you are good to go!
Building the documentation locally¶
If you only changed the documentation, you can also build the documentation locally using sphinx
.
pip install -r docs/doc-requirements.txt
cd docs/
make html
Then open the file docs/build/html/index.html
in your browser.
Roadmap¶
This roadmap collects “next steps” for repo2docker. It is about creating a shared understanding of the project’s vision and direction amongst the community of users, contributors, and maintainers. The goal is to communicate priorities and upcoming release plans. It is not a aimed at limiting contributions to what is listed here.
Using the roadmap¶
Sharing Feedback on the Roadmap¶
All of the community is encouraged to provide feedback as well as share new ideas with the community. Please do so by submitting an issue. If you want to have an informal conversation first use one of the other communication channels. After submitting the issue, others from the community will probably respond with questions or comments they have to clarify the issue. The maintainers will help identify what a good next step is for the issue.
What do we mean by “next step”?¶
When submitting an issue, think about what “next step” category best describes your issue:
now, concrete/actionable step that is ready for someone to start work on. These might be items that have a link to an issue or more abstract like “decrease typos and dead links in the documentation”
soon, less concrete/actionable step that is going to happen soon, discussions around the topic are coming close to an end at which point it can move into the “now” category
later, abstract ideas or tasks, need a lot of discussion or experimentation to shape the idea so that it can be executed. Can also contain concrete/actionable steps that have been postponed on purpose (these are steps that could be in “now” but the decision was taken to work on them later)
Reviewing and Updating the Roadmap¶
The roadmap will get updated as time passes (next review by 31st January 2019) based on discussions and ideas captured as issues. This means this list should not be exhaustive, it should only represent the “top of the stack” of ideas. It should not function as a wish list, collection of feature requests or todo list. For those please create a new issue.
The roadmap should give the reader an idea of what is happening next, what needs input and discussion before it can happen and what has been postponed.
The roadmap proper¶
Project vision¶
Repo2docker is a dependable tool used by humans that reduces the complexity of creating the environment in which a piece of software can be executed.
Now¶
The “Now” items are being actively worked on by the project:
reduce documentation typos and syntax errors
increase test coverage to 80% (see https://codecov.io/gh/jupyterhub/repo2docker/tree/main/repo2docker for low coverage files)
mounting repository contents in locations that is not
/home/jovyan
investigate options for pinning repo2docker versions (#490)
Soon¶
The “Soon” items are being discussed/a plan of action is being made. Once an item reaches the point of an actionable plan and person who wants to work on it, the item will be moved to the “Now” section. Typically, these will be moved at a future review of the roadmap.
Later¶
The “Later” items are things that are at the back of the project’s mind. At this time there is no active plan for an item. The project would like to find the resources and time to discuss and then execute these ideas.
support execution on a remote host (with more resources than available locally) via the command-line
add support for using ZIP files as the repo (
repo2docker https://example.com/an-archive.zip
)
Architecture¶
This is a living document talking about the architecture of repo2docker from various perspectives.
Buildpacks¶
The buildpack concept comes from Heroku and Ruby on Rails’ Convention over Configuration doctrine.
Instead of the user specifying a complete specification of exactly how they want their environment to be, they can focus only on how their environment differs from a conventional environment. This means instead of deciding ‘should I get Python from Apt or pyenv or ?’, user can just specify ‘I want python-3.6’. Usually, specifying a runtime and list of libraries with explicit versions is all that is needed.
In repo2docker, a Buildpack does the following things:
Detect if it can handle a given repository
Build a base language environment in the docker image
Copy the contents of the repository into the docker image
Assemble a specific environment in the docker image based on repository contents
Push the built docker image to a specific docker registry (optional)
Run the build docker image as a docker container (optional)
Detect¶
When given a repository, repo2docker first has to determine which buildpack to use. It takes the following steps to determine this:
Look at the ordered list of
BuildPack
objects listed inRepo2Docker.buildpacks
traitlet. This is populated with a default set of buildpacks in most-specific-to-least-specific order. Other applications using this can add / change this using traditional traitlet configuration mechanisms.Calls the
detect
method of eachBuildPack
object. This method assumes that the repository is present in the current working directory, and should returnTrue
if the repository is something that it should be used for. For example, aBuildPack
that usesconda
to install libraries can check for presence of anenvironment.yml
file and say ‘yes, I can handle this repository’ by returningTrue
. Usually buildpacks look for presence of specific files (requirements.txt
,environment.yml
,install.R
,manifest.xml
etc) to determine if they can handle a repository or not. Buildpacks may also look into specific files to determine specifics of the required environment. More than one buildpack may use such information, as properties can be inherited.If no
BuildPack
returns true, then repo2docker will use the defaultBuildPack
(defined inRepo2Docker.default_buildpack
traitlet).
Build base environment¶
Once a buildpack is chosen, it builds a base environment that is mostly the same for various repositories built with the same buildpack.
For example, in CondaBuildPack
, the base environment consists of installing miniconda
and basic notebook packages (from repo2docker/buildpacks/conda/environment.yml
). This is going
to be the same for most repositories built with CondaBuildPack
, so we want to use
docker layer caching as
much as possible for performance reasons. Next time a repository is built with CondaBuildPack
,
we can skip straight to the copy step (since the base environment docker image layers have
already been built and cached).
The get_build_scripts
and get_build_script_files
methods are primarily used for this.
get_build_scripts
can return arbitrary bash script lines that can be run as different users,
and get_build_script_files
is used to copy specific scripts (such as a conda installer) into
the image to be run as pat of get_build_scripts
. Code in either has following constraints:
You can not use the contents of repository in them, since this happens before the repository is copied into the image. For example,
pip install -r requirements.txt
will not work, since there’s norequirements.txt
inside the image at this point. This is an explicit design decision, to enable better layer caching.You may, however, read the contents of the repository and modify the scripts emitted based on that! For example, in
CondaBuildPack
, if there’s Python 2 specified inenvironment.yml
, a different kind of environment is set up. The reading of theenvironment.yml
is performed in the BuildPack itself, and not in the scripts returned byget_build_scripts
. This is fine. BuildPack authors should still try to minimize the variants created in this fashion, to optimize the build cache.
Copy repository contents¶
The contents of the repository are copied unconditionally into the Docker image, and made
available for all further commands. This is common to most BuildPack
s, and the code is in
the build
method of the BuildPack
base class.
Assemble repository environment¶
The assemble stage builds the specific environment that is requested by the repository.
This usually means installing required libraries specified in a format native to the language
(requirements.txt
, environment.yml
, REQUIRE
, install.R
, etc).
Most of this work is done in get_assemble_scripts
method. It can return arbitrary bash script
lines that can be run as different users, and has access to the repository contents (unlike
get_build_scripts
). The docker image layers produced by this usually can not be cached,
so less restrictions apply to this than to get_build_scripts
.
At the end of the assemble step, the docker image is ready to be used in various ways!
Push¶
Optionally, repo2docker can push a built image to a docker registry.
This is done as a convenience only (since you can do the same with a docker push
after using repo2docker
only to build), and implemented in Repo2Docker.push
method. It is only activated if using the
--push
commandline flag.
Run¶
Optionally, repo2docker can run the built image and allow the user to access the Jupyter Notebook
running inside by default. This is also done as a convenience only (since you can do the same with docker run
after using repo2docker only to build), and implemented in Repo2Docker.run
. It is activated by default
unless the --no-run
commandline flag is passed.
ContentProviders¶
ContentProviders provide a way for repo2docker
to know how to find and
retrieve a repository. They follow a similar pattern as the BuildPacks
described above. When repo2docker
is called, its main argument will be
a path to a repository. This might be a local path or a URL. Upon being called,
repo2docker
will loop through all ContentProviders and perform the following
commands:
Run the
detect()
method on the repository path given torepo2docker
. This should return any value other thanNone
if the path matches what the ContentProvider is looking for.For example, the
Local
ContentProvider checks whether the argument is a valid local path. If so, thendetect(
returns a dictionary:{'path': source}
which defines the path to the repository. This path is used byfetch()
to check that it matches the output directory.If
detect()
returns something other thanNone
, runfetch()
with the returned value as its argument. This should result in the contents of the repository being placed locally to a folder.
For more information on ContentProviders, take a look at the ContentProvider base class which has more explanation.
Design of repo2docker¶
The repo2docker buildpacks are inspired by Heroku’s Build Packs. The philosophy for the repo2docker buildpacks includes:
using common configuration files for familiar installation and packaging tools
allowing configuration files to be combined to compose more complex setups
specifying default locations for configuration files (in the repository’s root,
binder
or.binder
directory)
When designing repo2docker
and adding to it in the future, the
developers are influenced by two primary use cases.
The use cases for repo2docker
which drive most design decisions are:
Automated image building used by projects like BinderHub
Manual image building and running the image from the command line client,
jupyter-repo2docker
, by users interactively on their workstations
Deterministic output¶
The core of repo2docker
can be considered a
deterministic algorithm.
When given an input directory which has a particular repository checked out, it
deterministically produces a Dockerfile based on the contents of the directory.
So if we run repo2docker
on the same directory multiple times, we get the
exact same Dockerfile output.
This provides a few advantages:
Reuse of cached built artifacts based on a repository’s identity increases efficiency and reliability. For example, if we had already run
repo2docker
on a git repository at a particular commit hash, we know we can just re-use the old output, since we know it is going to be the same. This provides massive performance & architectural advantages when building additional tools (like BinderHub) on top ofrepo2docker
.We produce Dockerfiles that have as much in common as possible across multiple repositories, enabling better use of the Docker build cache. This also provides massive performance advantages.
Reproducibility and version stability¶
Many ingredients go into making an image from a repository:
version of the base docker image
version of
repo2docker
itselfversions of the libraries installed by the repository
repo2docker
controls the first two, the user controls the third one. The current
policy for the version of the base image is that we will use the current LTS
version Bionic Beaver (18.04) for the foreseeable future.
The version of repo2docker
used to build an image can influence which packages
are installed by default and which features are supported during the build
process. We will periodically update those packages to keep step with releases
of Jupyter Notebook, JupyterLab, etc. For packages that are installed by
default but where you want to control the version we recommend you specify them
explicitly in your dependencies.
Unix principles “do one thing well”¶
repo2docker
should do one thing, and do it well. This one thing is:
Given a repository, deterministically build a docker image from it.
There’s also some convenience code (to run the built image) for users, but that’s separated out cleanly. This allows easy use by other projects (like BinderHub).
There is additional (and very useful) design advice on this in the Art of Unix Programming which is a highly recommended quick read.
Composability¶
Although other projects, like
s2i, exist to convert source to
Docker images, repo2docker
provides the additional functionality to support
composable environments. We want to easily have an image with
Python3+Julia+R-3.2 environments, rather than just one single language
environment. While generally one language environment per container works well,
in many scientific / datascience computing environments you need multiple
languages working together to get anything done. So all buildpacks are
composable, and need to be able to work well with other languages.
Pareto principle (The 80-20 Rule)¶
Roughly speaking, we want to support 80% of use cases, and provide an escape hatch (raw Dockerfiles) for the other 20%. We explicitly want to provide support only for the most common use cases - covering every possible use case never ends well.
An easy process for getting support for more languages here is to demonstrate
their value with Dockerfiles that other people can use, and then show that this
pattern is popular enough to be included inside repo2docker
. Remember that ‘yes’
is forever (very hard to remove features!), but ‘no’ is only temporary!
Common tasks¶
These are some common tasks to be done as a part of developing and maintaining repo2docker. If you’d like more guidance for how to do these things, reach out in the JupyterHub Gitter channel.
Set up your development environment first!
Before attempting most tasks, follow the directions to set up your development environment.
Though note that many of these tasks do not require Docker.
Running tests¶
We have a lot of tests for various cases supported by repo2docker in the tests/
subdirectory. If you fix a bug or add new functionality consider adding a new
test to prevent the bug from coming back. These use
py.test.
You can run all the tests with:
py.test -s tests/*
If you want to run a specific test, you can do so with:
py.test -s tests/<path-to-test>
To skip the tests related to Mercurial repositories (to avoid to install
Mercurial and hg-evolve), one can use the environment variable
REPO2DOCKER_SKIP_HG_TESTS
.
Troubleshooting Tests¶
Some of the tests have non-python requirements for your development machine. They are:
git-lfs
must be installed (instructions). It need not be activated – there is no need to run thegit lfs install
command. It just needs to be available to the test suite.If your test failure messages include “
git-lfs filter-process: git-lfs: command not found
”, this step should address the problem.
Minimum Docker Image size of 128GB is required. If you are not running docker on a linux OS, you may need to expand the runtime image size for your installation. See Docker’s instructions for macOS or Windows 10 for more information.
If your test failure messages include “
No space left on device: '/home/...
”, this step should address the problem.
Update and Freeze BuildPack Dependencies¶
This section covers the process by which repo2docker defines and updates the dependencies that are installed by default for several buildpacks.
For both the conda
and virtualenv
(pip
) base environments in the Conda BuildPack and Python BuildPack,
we install specific pinned versions of all dependencies. We explicitly list the dependencies
we want, then freeze them at commit time to explicitly list all the
transitive dependencies at current versions. This way, we know that
all dependencies will have the exact same version installed at all times.
To update one of the dependencies shared across all repo2docker
builds, you
must follow these steps (with more detailed information in the sections below):
Bump the version numbers of the dependencies you want to update in the
conda
environment (link)Make a pull request with your changes (link)
See the subsections below for more detailed instructions.
Conda dependencies¶
There are two files related to conda dependencies. Edit as needed.
repo2docker/buildpacks/conda/environment.yml
Contains list of packages to install in Python3 conda environments, which are the default. This is where all Notebook versions & notebook extensions (such as JupyterLab / nteract) go.
repo2docker/buildpacks/conda/environment.py-2.7.yml
Contains list of packages to install in Python2 conda environments, which can be specifically requested by users. This only needs
IPyKernel
and kernel related libraries. Notebook / Notebook Extension need not be installed here.
Once you edit either of these files to add a new package / bump version on an existing package, you should then run:
cd ./repo2docker/buildpacks/conda/ python freeze.py
This script will resolve dependencies and write them to the respective
.lock
files.After the freeze script finishes, a number of files will have been modified. They roughly follow this pattern:
repo2docker/buildpacks/conda/environment*
You should commit all modified files to git.
Make a Pull Request to the
jupyterhub/repo2docker
repository, with a description of what versions were bumped / what new packages were added and why. If you fix a bug or add new functionality consider adding a new test to prevent the bug from coming back/the feature breaking in the future.Once the pull request is approved (but not yet merged), Update the change log (details below) and commit the change log, then update the pull request.
Creating a Release¶
We make a release of whatever is on main
every month. We use “calendar versioning”.
Monthly releases give users a predictable pattern for when releases are going to
happen and prevents locking up improvements for fixes for long periods of time.
A new release will automatically be created when a new git tag is created and pushed to the repository.
To create a new release, follow these steps:
Create a new tag and push it¶
First, tag a new release locally:
V=YYYY.MM.0; git tag -am "release $V" $V
If you need to make a second (or third) release in a month increment the trailing 0 of the version to 1 (or 2).
Then push this change up to the main repository
git push origin --tags
GitHub Actions should create a new release on the repo2docker PyPI. Once this has completed, make sure that the new version has been updated.
Create a new release on the GitHub repository¶
Once the new release has been pushed to PyPI, we need to create a new release on the GitHub repository releases page. Once on that page, follow these steps:
Click “Draft a new release”
Choose a tag version using the same tag you just created above
The release name is simply the tag version
Finally, click “Publish release”
That’s it!
Keeping the Pipfile and requirements files up to date¶
We now have both a dev-requirements.txt
and a Pifile
for repo2docker, as
such it is important to keep these in sync/up-to-date.
Both files use pip identifiers
so if you are updating for example the Sphinx version
in the doc-requirements.txt
(currently Sphinx = ">=1.4,!=1.5.4"
) you can use the
same syntax to update the Pipfile and viceversa.
At the moment this has to be done manually so please make sure to update both files accordingly.
Uncommon tasks¶
Compare generated Dockerfiles between repo2docker versions¶
For larger refactorings it can be useful to check that the generated Dockerfiles match between an older version of r2d and the current version. The following shell script automates this test.
#! /bin/bash -e
current_version=$(jupyter-repo2docker --version | sed s@+@-@)
echo "Comparing $(pwd) (local $current_version vs. $R2D_COMPARE_TO)"
basename="dockerfilediff"
diff_r2d_dockerfiles_with_version () {
docker run --rm -t -v "$(pwd)":"$(pwd)" --user 1000 jupyterhub/repo2docker:"$1" jupyter-repo2docker --no-build --debug "$(pwd)" &> "$basename"."$1"
jupyter-repo2docker --no-build --debug "$(pwd)" &> "$basename"."$current_version"
# remove first line logging the path
sed -i '/^\[Repo2Docker\]/d' "$basename"."$1"
sed -i '/^\[Repo2Docker\]/d' "$basename"."$current_version"
diff --strip-trailing-cr "$basename"."$1" "$basename"."$current_version" | colordiff
rm "$basename"."$current_version" "$basename"."$1"
}
startdir="$(pwd)"
cd "$1"
#diff_r2d_dockerfiles 0.10.0-22.g4f428c3.dirty
diff_r2d_dockerfiles_with_version "$R2D_COMPARE_TO"
cd "$startdir"
Put the code above in a file tests/dockerfile_diff.sh
and make it executable: chmod +x dockerfile_diff.sh
.
Configure the repo2docker version you want to compare with your local version in the environment variable R2D_COMPARE_TO
.
The scripts takes one input: the directory where repo2docker should be executed.
cd tests/
R2D_COMPARE_TO=0.10.0 ./dockerfile_diff.sh venv/py35/
Run it for all directories where there is a verify
file:
cd tests/
R2D_COMPARE_TO=0.10.0 CMD=$(pwd)/dockerfile_diff.sh find . -name 'verify' -execdir bash -c '$CMD $(pwd)' \;
To keep the created Dockefilers for further inspection, comment out the deletion line in the script.
Add a new buildpack¶
A new buildpack is needed when a new language or a new package manager should be supported. Existing buildpacks are a good model for how new buildpacks should be structured. See the Buildpacks page for more information about the structure of a buildpack.
Criteria to balance and consider¶
Criteria to balance are:
Maintenance burden on repo2docker.
How easy it is to use a given setup without support from repo2docker natively. There are two escape hatches here -
postBuild
andDockerfile
.How widely used is this language / package manager? This is the primary tradeoff with point (1). We (the Binder / Jupyter team) want to make new formats as little as possible, so ideally we can just say “X repositories on binder already use this using one of the escape hatches in (2), so let us make it easy and add native support”.
Adding libraries or UI to existing buildpacks¶
Note that this doesn’t apply to adding additional libraries / UI to existing buildpacks. For example, if we had an R buildpack and it supported IRKernel, it is much easier to just support RStudio / Shiny with it, since those are library additions instead of entirely new buildpacks.
Add a new content provider¶
Adding a new content provider allows repo2docker to grab repositories from new locations on the internet. To do so, you should take the following steps:
Sub-class the ContentProvider class. This will give you a skeleton class you can modify to support your new content provider.
Implement a detect() method for the class. This takes an input string (e.g., a URL or path) and determines if it points to this particular content provider. It should return a dictionary (called
spec
that will be passed to thefetch()
method. For example, see the ZenodoProvider detect method.Implement a fetch() method for the class. This takes the dictionary
spec
as input, and ensures the repository exists on disk (e.g., by downloading it) and returns a path to it. For example, see the ZenodoProvider fetch method.
Changelog¶
Version 2022.02.0¶
New features¶
Update ipywidgets jupyter-offlinenotebook jupyterlab PR #1127 by @manics
Allow passing in extra args to Docker initialization PR #1124 by @yuvipanda
Allow passing in traitlets via commandline PR #1123 by @yuvipanda
Bump default R version to 4.1 PR #1107 by @yuvipanda
Update jupyterlab 3.2.5 jupyter-resource-usage 0.6.1 PR #1105 by @manics
Get binary R packages from packagemanager.rstudio.com PR #1104 by @yuvipanda
Support R 4.1 PR #1102 by @yuvipanda
Add command line option to pass extra build args PR #1100 by @TimoRoth
Set labels when building image from Dockerfile PR #1097 by @TimoRoth
Bump JupyterLab to 3.1.11 PR #1081 by @choldgraf
Bootstrap base env with micromamba PR #1062 by :user:`wolfv `
Default UI to JupyterLab PR #1035 by @SylvainCorlay
API changes¶
Bug fixes¶
Other merged PRs¶
Put micromamba in /usr/local/bin and use mamba for installs PR #1128 by @minrk
Delete /tmp/downloaded_packages after running install.R PR #1119 by @yuvipanda
Use a smaller R library in our tests PR #1118 by @yuvipanda
Only get R itself (r-base-core) from apt, not CRAN packages PR #1117 by @minrk
Say ‘apt repository’ rather than PPA PR #1111 by @yuvipanda
Quieter R builds PR #1103 by @yuvipanda
update user_interface doc to reflect that lab is default PR #1085 by @minrk
Updates to dev docs + Recommonmark -> MyST Parser PR #1082 by @choldgraf
[mrg] __init__.py: r_version: fixed description PR #1074 by @magnush0lm
Rename requirements.py-3.5.txt to requirements.py-3.5.pip PR #1061 by @manics
Remove nodesource’ nodejs PR #847 by @yuvipanda
Version 2021.08.0¶
The repo2docker container image has moved to quay.io/jupyterhub/repo2docker
New features¶
API changes¶
Bug fixes¶
Workaround docker-py dependency’s failure to import six PR #1066: by @consideratio
fix: add chardet, a not explicitly declared dependency PR #1064 by @johnhoman
Add build-base to build stage of docker image PR #1051 by @yuvipanda
Fix regression in hydroshare introduced after moving to requests PR #1034 by @MridulS
Other merged PRs¶
Update README quay.io URL, Add docker latest tag PR #1075 by @manics
GitHub workflow build and push to Docker hub PR #1071 by @manics
Remove Pipfile & Pipfile.lock PR #1054 by @yuvipanda
Remove CircleCI docs build PR #1053 by @yuvipanda
Pin doc requirements to avoid CI breakages PR #1052 by @manics
Stop using deprecated add_stylesheet in sphinx PR #1050 by @yuvipanda
Add study participation notice to readme PR #1046 by @sgibson91
Bump urllib3 from 1.26.4 to 1.26.5 PR #1045 by @dependabot
Bump pyyaml from 5.1.1 to 5.4 PR #1029 by @dependabot
Version 2021.03.0¶
New features¶
Add GH workflow to push releases to PYPi and introduce CalVer PR #1004 by @betatim
Add entrypoint script which automatically propagates *_PROXY env vars… PR #1003 ([@g-braeunlich](https://github.com/g-braeunlich))
Fetch available Julia versions from hosted json PR #994 by @tomyun
Add a contentprovider for Software Heritage persistent ID (SWHID) PR #988 by @douardda
add 4.0, 4.0.2 to list of supported R versions PR #960 by @minrk
API changes¶
Bug fixes¶
fix dataverse regression introduced in last release PR #1011 by @MridulS
buildpacks.r: dont use apt-key directly to respect *_proxy env vars PR #1019 ([@g-braeunlich](https://github.com/g-braeunlich))
Version 2021.01.0¶
New features¶
Replace urllib by requests in contentproviders PR #993 by @douardda
Use mambaforge instead of miniforge PR #992 by @SylvainCorlay
Mamba number three PR #962 by @SylvainCorlay
Add Julia 1.4.2 support PR #899 by @davidanthoff
chmod start script from repo2docker-entrypoint PR #886 by @danlester
Add support for Julia 1.4.1 PR #878 by @davidanthoff
Add support for Julia 1.4.0 PR #870 by @davidanthoff
Use miniforge instead of miniconda to get conda PR #859 by @yuvipanda
If looking for latest MRAN URL try earlier snapshots too PR #851 by @manics
Bug fixes¶
Other merged PRs¶
Weekly test of master to check for external failures PR #998 by @manics
Remove reference to master branch from CLI doc PR #977 by @betatim
add chown to COPY commands to reduce layer count PR #969 by @bollwyvl
set TIMEFORMAT for timed bash conda commands PR #966 by @manics
Disable jupyterlab extension build minimize PR #963 by @manics
Bump Black version to 20.8b1 and use –target-version=py36 PR #955 by @paugier
Crosslink ‘Configuring your repository’ with usage PR #952 by @manics
Add www-frame-origin=same to /etc/rstudio/rserver.conf PR #944 ([@rkevin-arch](https://github.com/rkevin-arch))
update repo URLs for jupyterhub/repo2docker PR #939 by @minrk
Upgrade custom test infrastructure for pytest 6.0.0 PR #936 by @betatim
validate_image_name: mention lowercase, fix formatting PR #934 by @manics
little improvement for testing binder_dir PR #928 by @bitnik
doc: runtime.txt installs python x.y (& concise rewording) PR #914 by @mdeff
doc: environment.yml installs a conda env, not only python PR #913 by @mdeff
Workaround Docker issue impacting some tests on macOS PR #882 by @hwine
[docs] fix grammatical error in section title PR #872 by @jameslamb
Document loose conda export with –from-history PR #863 by @xhochy
docs: postBuild warn about shell script errors being ignored PR #844 by @manics
Version 0.11.0¶
Release date: 2020-02-05
New features¶
Add support for Dataverse in PR #739 by @Xarthisius.
Add support for configuring the version of R installed in PR #772 by @betatim.
Add support for Julia 1.2.0 in PR #768 by @davidanthoff.
Add support for Julia 1.3.0 and 1.0.5 in PR #822 by @davidanthoff.
Add support for Julia 1.3.1 in PR #831 by @davidanthoff.
Update Miniconda to 4.7.10 in PR #769 by @davidrpugh.
Update IRKernel to 1.0.2 in PR #770 by @GeorgianaElena.
Update RStudio to 1.2 in PR #803 by @pablobernabeu.
Switch to “pandas” sphinx theme for documentation in PR #816 by @choldgraf.
Add content provider documentation in PR #824 by @choldgraf.
Add support for automatic RStudio install when using R packages via conda in PR #838 by @xhochy.
Add Hydroshare as content provider in PR #800 by @sblack-usu.
Update to Jupyter Notebook 6 and Lab 1.2 in PR #839 by @minrk.
Bug fixes¶
Fix for submodule check out in PR #809 by @davidbrochart.
Handle requirements.txt files with different encodings in PR #771 by @GeorgianaElena.
Update to nteract-on-jupyter 2.1.3 in PR #2.1.3 by :user:`betatim.
Use useradd –no-log-init to fix exhausting disk space in PR #804 by @manics.
Add help text for commandline arguments in PR #517 by @yuvipanda.
Fix submodule checkout in PR #809 by @davidbrochart.
Version 0.10.0¶
Release date: 2019-08-07
New features¶
Increased minimum Python version supported for running repo2docker itself to Python 3.5 in PR #684 by @betatim.
Support for Pipfile and Pipfile.lock implemented in PR #649 by @consideratio.
Use only conda packages for our base environments in PR #728 by @scottyhq.
Fast rebuilds when repo dependencies haven’t changed by @minrk and @betatim in PR #743, PR #752, PR #718 and PR #716.
Add support for general Invenio repositories in PR #704 by @tmorrell.
Add support for julia 1.0.4 and 1.1.1 in PR #710 by @davidanthoff.
Bump Conda from 4.6.14 to 4.7.5 in PR #719 by @davidrpugh.
API changes¶
Bug fixes¶
Prevent building the image as root if –user-id and –user-name are not specified in PR #676 by @Xarthisius.
Add bash to Dockerfile to fix usage of private repos with git-crendential-env in PR #738 by @eexwhyzee.
Version 0.9.0¶
Release date: 2019-05-05
New features¶
Support for julia Project.toml, JuliaProject.toml and Manifest.toml files in PR #595 by @davidanthoff
Set JULIA_PROJECT globally, so that every julia instance starts with the julia environment activated in PR #612 by @davidanthoff.
Update Miniconda version to 4.6.14 and Conda version to 4.6.14 in PR #637 by @jhamman
Install notebook into notebook env instead of root. Activate conda environments and shell integration via ENTRYPOINT in PR #651 by @minrk
Support for .binder directory in addition to binder directory for location of configuration files, in PR #653 by @jhamman.
Updated contributor guide and issue templates for bugs, feature requests, and support questions in PR #654 and PR #655 by @KirstieJane and @betatim.
Create a page naming and describing the “Reproducible Execution Environment Specification” (the specification used by repo2docker) in PR #662 by @choldgraf.
API changes¶
Bug fixes¶
Install IJulia kernel into ${NB_PYTHON_PREFIX}/share/jupyter in PR #622 by @davidanthoff.
Ensure git submodules are updated and initilized correctly in PR #639 by @djhoese.
Use archive.debian.org as source for the debian jessie based legacy buildpack in PR #633 by @betatim.
Update to version 5.7.6 of the notebook package used in all environments in PR #628 by @betatim.
Update to version 5.7.8 of the notebook package and version 2.0.12 of nteract-on-jupyter in PR #650 by @betatim.
Switch to newer version of jupyter-server-proxy to fix websocket handling in PR #646 by @betatim.
Ensure ENTRYPOINT is an absolute path in PR #657 by @yuvipanda.
Fix handling of –build-memory-limit values without a postfix in PR #652 by @betatim.
Version 0.8.0¶
Release date: 2019-02-21
New features¶
Add additional metadata to docker images about how they were built PR #500 by @jrbourbeau.
Allow users to install global NPM packages: PR #573 by @GladysNalvarte.
Add documentation on switching the user interface presented by a container. PR #568 by user:choldgraf.
Increased test coverage to ~87% by @betatim and @yuvipanda.
Documentation improvements and additions by @lheagy, @choldgraf.
Remove f-strings from code base, repo2docker is compatible with Python 3.4+ again by @jrbourbeau in PR #520.
Local caching of previously built repostories to speed up launch times by @betatim in PR #511.
Make destination of repository content in the container image configurable on the CLI via
--target-repo-dir
. By @yuvipanda in PR #507.Expose CPU limit settings for building and running containers. By @GladysNalvarte in PR #579.
Make Python 3.7 the default version. By @yuvipanda and @minrk in PR #539.
API changes¶
Bug fixes¶
In some cases the version of conda installed in images was not pinned and got upgraded by user actions. Fixed in PR #576 by @minrk.
Fix an error related to checking if debug output was enabled or not: PR #575 by @yuvipanda.
Update nteract frontend to version 2.0.0 by @yuvipanda in PR #571.
Fix quoting issue in
GIT_CREDENTIAL_ENV
environment variable by @minrk in PR #572.Change to using the first 8 characters of each Git commit, not the last 8, to tag each built docker image of repo2docker itself. @minrk in PR #562.
Allow users to select the Julia when using a
requirements.txt
by @yuvipanda in PR #557.Set
JULIA_DEPOT_PATH
to install packages outside the home directory by @yuvipanda in PR #555.
Version 0.7.0¶
Release date: 2018-12-12
New features¶
Build from sub-directory: build the image based on a sub-directory of a repository PR #413 by @dsludwig.
Editable mode: allows editing a local repository from a live container PR #421 by @evertrol.
Documentation: improved the documentation for contributors PR #453 by @choldgraf.
Buildpack: added support for the nix package manager PR #407 by @costrouc.
Log a ‘success’ message when push is complete PR #482 by @yuvipanda.
Allow specifying images to reuse cache from PR #478 by @yuvipanda.
Add JupyterHub back to base environment PR #476 by @yuvipanda.
Repo2docker has a logo! by @agahkarakuzu and @blairhudson.
Improve support for Stencila, including identifying stencila runtime from document context PR #457 by @nuest.
Version 0.6¶
Released 2018-09-09
Version 0.5¶
Released 2018-02-07
Version 0.4.1¶
Released 2018-09-06
Version 0.2¶
Released 2018-05-25
Version 0.1.1¶
Released 2017-04-19
Version 0.1¶
Released 2017-04-14