jupyter-repo2docker is a tool to build, run, and push Docker images from source code repositories.
jupyter-repo2docker
repo2docker fetches a repository (from GitHub, GitLab, Zenodo, Figshare, Dataverse installations, a Git repository or a local directory) and builds a container image in which the code can be executed. The image build process is based on the configuration files found in the repository.
repo2docker
repo2docker can be used to explore a repository locally by building and executing the constructed image of the repository, or as a means of building images that are pushed to a Docker registry.
repo2docker is the tool used by BinderHub to build images on demand.
Please report Bugs, ask questions or contribute to the project.
Instructions and information on how to get started with repo2docker on your own machine. Select from the pages listed below to begin.
repo2docker requires Python 3.6 or above on Linux and macOS. See below for more information about Windows support.
Install Docker as it is required to build Docker images. The Community Edition, is available for free.
Recent versions of Docker are recommended. The latest version of Docker, 18.03, successfully builds repositories from binder-examples. The BinderHub helm chart uses version 17.11.0-ce-dind. See the helm chart for more details.
18.03
17.11.0-ce-dind
For Mercurial repositories, Mercurial and hg-evolve need to be installed. For example, on Debian based distributions, one can do:
sudo apt install mercurial $(hg debuginstall --template "{pythonexe}") -m pip install hg-evolve --user
To install Mercurial on other systems, see here.
Note that for old Mercurial versions, you may need to specify a version for hg-evolve. For example, hg-evolve==9.2 for hg 4.5 (which is installed with apt on Ubuntu 18.4).
hg-evolve==9.2
pip
We recommend installing repo2docker with the pip tool:
python3 -m pip install jupyter-repo2docker
for the latest release. To install the most recent code from the upstream repository, run:
python3 -m pip install https://github.com/jupyterhub/repo2docker/archive/master.zip
For information on using repo2docker, see Using repo2docker.
Alternatively, you can install repo2docker from a local source tree, e.g. in case you are contributing back to this project:
git clone https://github.com/jupyterhub/repo2docker.git cd repo2docker python3 -m pip install -e .
That’s it! For information on using repo2docker, see Using repo2docker.
Windows support for repo2docker is still in the experimental stage.
An article about using Windows and the WSL (Windows Subsytem for Linux or Bash on Windows) provides additional information about Windows and docker.
Note
Docker must be running in order to run repo2docker. For more information on installing repo2docker, see Installing repo2docker.
repo2docker can build a reproducible computational environment for any repository that follows The Reproducible Execution Environment Specification. repo2docker is called with the URL of a Git repository, a DOI from Zenodo or Figshare, a Handle or DOI from a Dataverse installation, or a path to a local directory.
It then performs these steps:
repo2docker is called with this command:
jupyter-repo2docker <source-repository>
where <source-repository> is:
<source-repository>
a URL of a Git repository (https://github.com/binder-examples/requirements), a Zenodo DOI (10.5281/zenodo.1211089), or a path to a local directory (a/local/directory)
https://github.com/binder-examples/requirements
10.5281/zenodo.1211089
a/local/directory
of the source repository you want to build.
For example, the following command will build an image of Peter Norvig’s Pytudes repository:
jupyter-repo2docker https://github.com/norvig/pytudes
Building the image may take a few minutes.
Pytudes uses a requirements.txt file to specify its Python environment. Because of this, repo2docker will use pip to install dependencies listed in this requirement.txt file, and these will be present in the generated Docker image. To learn more about configuration files in repo2docker visit Configuration Files.
requirement.txt
When the image is built, a message will be output to your terminal:
Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://0.0.0.0:36511/?token=f94f8fabb92e22f5bfab116c382b4707fc2cade56ad1ace0
Pasting the URL into your browser will open Jupyter Notebook with the dependencies and contents of the source repository in the built image.
To build a particular branch and commit, use the argument --ref and specify the branch-name or commit-hash. For example:
--ref
branch-name
commit-hash
jupyter-repo2docker --ref 9ced85dd9a84859d0767369e58f33912a214a3cf https://github.com/norvig/pytudes
Tip
For reproducible builds, we recommend specifying a commit-hash to deterministically build a fixed version of a repository. Not specifying a commit-hash will result in the latest commit of the repository being built.
repo2docker will look for configuration files in:
binder/
.binder/
Having both binder/ and .binder/ folders is not allowed. If one of these folders exists, only configuration files in that folder are considered, configuration in the root directory will be ignored.
Check the complete list of configuration files supported by repo2docker to see how to configure the build process.
repo2docker builds an environment with Python 3.7 by default. If you’d like a different version, you can specify this in your configuration files.
--debug
--no-build
To debug the docker image being built, pass the --debug parameter:
jupyter-repo2docker --debug https://github.com/norvig/pytudes
This will print the generated Dockerfile, build it, and run it.
Dockerfile
To see the generated Dockerfile without actually building it, pass --no-build to the commandline. This Dockerfile output is for debugging purposes of repo2docker only - it can not be used by docker directly.
jupyter-repo2docker --no-build --debug https://github.com/norvig/pytudes
Fetch a repository and build a container image
usage: jupyter-repo2docker [-h] [--config CONFIG] [--json-logs] [--image-name IMAGE_NAME] [--ref REF] [--debug] [--no-build] [--build-memory-limit BUILD_MEMORY_LIMIT] [--no-run] [--publish PORTS] [--publish-all] [--no-clean] [--push] [--volume VOLUMES] [--user-id USER_ID] [--user-name USER_NAME] [--env ENVIRONMENT] [--editable] [--target-repo-dir TARGET_REPO_DIR] [--appendix APPENDIX] [--subdir SUBDIR] [--version] [--cache-from CACHE_FROM] repo ...
repo
Path to repository that should be built. Could be local path or a git URL.
cmd
Custom command to run after building container
-h
,
--help
show this help message and exit
--config
<config>
Path to config file for repo2docker
--json-logs
Emit JSON logs instead of human readable logs
--image-name
<image_name>
Name of image to be built. If unspecified will be autogenerated
<ref>
Reference to build instead of default reference. For example branch name or commit for a Git repository.
Turn on debug logging
Do not actually build the image. Useful in conjunction with –debug.
--build-memory-limit
<build_memory_limit>
Total Memory that can be used by the docker build process
--no-run
Do not run container after it has been built
--publish
<ports>
-p
Specify port mappings for the image. Needs a command to run in the container.
--publish-all
-P
Publish all exposed ports to random host ports.
--no-clean
Don’t clean up remote checkouts after we are done
--push
Push docker image to repository
--volume
<volumes>
-v
Volumes to mount inside the container, in form src:dest
--user-id
<user_id>
User ID of the primary user in the image
--user-name
<user_name>
Username of the primary user in the image
--env
-e
Environment variables to define at container run time
--editable
-E
Use the local repository in edit mode
--target-repo-dir
<target_repo_dir>
Path inside the image where contents of the repositories are copied to, and where all the build operations (such as postBuild) happen. Defaults to ${HOME} if not set
--appendix
<appendix>
Appendix of Dockerfile commands to run at the end of the build. Can be used to customize the resulting image after all standard build steps finish.
--subdir
<subdir>
Subdirectory of the git repository to examine. Defaults to ‘’.
--version
Print the repo2docker version and exit.
--cache-from
<cache_from>
List of images to try & re-use cached image layers from. Docker only tries to re-use image layers from images built locally, not pulled from a registry. We can ask it to explicitly re-use layers from non-locally built images by through the ‘cache_from’ parameter.
A collection of frequently asked questions with answers. If you have a question and have found an answer, send a PR to add it here!
One can specify a Python version in the environment.yml file of a repository or runtime.txt file if using requirements.txt instead of environment.yml.
environment.yml
runtime.txt
requirements.txt
Repo2docker officially supports the following versions of Python (specified in your environment.yml or runtime.txt file):
Additional versions may work, as long as the base environment can be installed for your version of Python. The most likely source of incompatibility is if one of the packages in the base environment is not packaged for your Python, either because the version of the package is too new and your chosen Python is too old, or vice versa.
I Python 2.7 is specified, a separate environment for the kernel will be installed with Python 2. The notebook server will run in the default Python 3.7 environment.
All Julia versions since Julia 0.7.0 are supported via a Project.toml file, and this is the recommended way to install Julia environments. Julia versions 0.6.x and earlier are supported via a REQUIRE file.
The default version of R is currently R 3.6.1. You can select the version of R you want to use by specifying it in the runtime.txt file.
We support R versions 3.4, 3.5 and 3.6.
ResolvePackageNotFound
If you used conda env export to generate your environment.yml it will generate a list of packages and versions of packages that is pinned to platform specific versions. These very specific versions are not available in the linux docker image used by repo2docker. A typical error message will look like the following:
conda env export
Step 39/44 : RUN conda env update -n root -f "environment.yml" && conda clean -tipsy && conda list -n root ---> Running in ebe9a67762e4 Solving environment: ...working... failed ResolvePackageNotFound: - jsonschema==2.6.0=py36hb385e00_0 - libedit==3.1.20181209=hb402a30_0 - tornado==5.1.1=py36h1de35cc_0 ...
We recommend to use conda env export --no-builds -f environment.yml to export your environment and then edit the file by hand to remove platform specific packages like appnope.
conda env export --no-builds -f environment.yml
appnope
See How to automatically create a environment.yml that works with repo2docker for a recipe on how to create strict exports of your environment that will work with repo2docker.
Yes! With a postBuild - Run code after installing the environment file, you can place any files that should be called from the command line in the folder ~/.local/. This folder will be available in a user’s PATH, and can be run from the command line (or as a subsequent build step.)
~/.local/
To configure environment variables for all users of a repository use the start configuration file.
When running repo2docker locally you can use the -e or --env command-line flag for each variable that you want to define.
For example jupyter-repo2docker -e VAR1=val1 -e VAR2=val2 ...
jupyter-repo2docker -e VAR1=val1 -e VAR2=val2 ...
No, you can’t.
If you pass the --debug flag to repo2docker, it outputs the intermediate Dockerfile that is used to build the docker image. While it is tempting to copy this as a base for your own Dockerfile, that is not supported & in most cases will not work. The --debug output is just our intermediate generated Dockerfile, and is meant to be built in a very specific way. Hence the output of --debug can not be built with a normal docker build -t . or similar traditional docker command.
docker build -t .
Check out the binder-examples GitHub organization for example repositories you can copy & modify for your own use!
Yes: use the --editable or -E flag (don’t confuse this with the -e flag for environment variables), and run repo2docker on a local repository:
repo2docker -E my-repository/
This builds a Docker container from the files in that repository (using, for example, a requirements.txt or install.R file), then runs that container, while connecting the working directory inside the container to the local repository outside the container. For example, in case there is a notebook file (.ipynb), this will open in a local web browser, and one can edit it and save it. The resulting notebook is updated in both the Docker container and the local repository. Once the container is exited, the changed file will still be in the local repository.
install.R
.ipynb
This allows for easy testing of the container while debugging some items, as well as using a fully customizable container to edit notebooks (among others).
Editable mode is a convenience option that will bind the repository to the container working directory (usually $HOME). If you need to mount to a different location in the container, use the --volumes option instead. Similarly, for a fully customized user Dockerfile, this option is not guaranteed to work.
$HOME
--volumes
If you are trying to run an R shiny app using the /shiny/folder_containing_shiny url option, but the launch returns “The application exited during initialization.”, there might be something wrong with the specification of the app. One way of debugging the app in the container is by running the rstudio url, open either the ui or server file for the app, and run the app in the container rstudio. This way you can see the rstudio logs as it tries to initialise the shiny app. If you a missing a package or other dependency for the container, this will be obvious at this stage.
/shiny/folder_containing_shiny
rstudio
The Jupyter community believes strongly in building on top of pre-existing tools whenever possible (this is why repo2docker buildpacks largely build off of patterns that already exist in the data analytics community). We try to perform due-diligence and search for other communities to leverage and help, but sometimes it makes the most sense to build our own new tool. In the case of repo2docker, we spent time integrating with a pre-existing tool called source2image. This is an excellent open tool for containerization, but we ultimately decided that it did not fit the use-case we wanted to address. For more information, here is a short blog post about the decision and the reasoning behind it.
Short, actionable guides that cover specific topics with repo2docker. Select from the pages listed below to get started.
You can build several user interfaces into the resulting Docker image. This is controlled with various configuration files.
You do not need any extra configuration in order to allow the use of the JupyterLab interface. You can launch JupyterLab from within a user session by opening the Jupyter Notebook and appending /lab to the end of the URL like so:
/lab
http(s)://<server:port>/lab
To switch back to the classic notebook, add /tree to the URL like so:
/tree
http(s)://<server:port>/tree
For example, the following Binder URL will open the pyTudes repository and begin a JupyterLab session in the ipynb folder:
ipynb
https://mybinder.org/v2/gh/norvig/pytudes/master?urlpath=lab/tree/ipynb
The /tree/ipynb above is how JupyterLab directs you to a specific file or folder.
/tree/ipynb
To learn more about URLs in JupyterLab and Jupyter Notebook, visit starting JupyterLab.
nteract is a notebook interface built with React. It is similar to a more feature-filled version of the traditional Jupyter Notebook interface.
nteract comes pre-installed in any session that has been built from a Python repository.
You can launch nteract from within a user session by replacing /tree with /nteract at the end of a notebook server’s URL like so:
/nteract
http(s)://<server:port>/nteract
For example, the following Binder URL will open the pyTudes repository and begin an nteract session in the ipynb folder:
https://mybinder.org/v2/gh/norvig/pytudes/master?urlpath=nteract/tree/ipynb
The /tree/ipynb above is how nteract directs you to a specific file or folder.
To learn more about nteract, visit the nteract website.
The RStudio user interface is automatically enabled if a configuration file for R is detected (i.e. an R version specified in runtime.txt). If this is detected, RStudio will be accessible by appending /rstudio to the URL, like so:
/rstudio
http(s)://<server:port>/rstudio
For example, the following Binder link will open an RStudio session in the R demo repository.
http://mybinder.org/v2/gh/binder-examples/r/master?urlpath=rstudio
Shiny lets you create interactive visualizaions with R. Shiny is automatically enabled if a configuration file for R is detected (i.e. an R version specified in runtime.txt). If this is detected, Shiny will be accessible by appending /shiny/<folder-w-shiny-files> to the URL, like so:
/shiny/<folder-w-shiny-files>
http(s)://<server:port>/shiny/bus-dashboard
This assumes that a folder called bus-dashboard exists in the root of the repository, and that it contains all of the files needed to run a Shiny app.
bus-dashboard
For example, the following Binder link will open a Shiny session in the R demo repository.
http://mybinder.org/v2/gh/binder-examples/r/master?urlpath=shiny/bus-dashboard/
Stencila support has been removed due to changes in stencila making it incompatible. Please get in touch if you would like to help restore stencila support.
You can define many different languages in your configuration files. This page describes how to use some of the more common ones.
Your environment will have Python (and specified dependencies) installed when you use one of the following configuration files:
By default, the environment will have Python 3.7.
Changed in version 0.8: Upgraded default Python from 3.6 to 3.7.
To specify a specific version of Python, you have two options:
Use environment.yml. Conda environments let you define the Python version in environment.yml. To do so, add python=X.X to your dependencies section, like so:
python=X.X
name: python 2.7 dependencies: - python=2.7 - numpy
Use runtime.txt with requirements.txt. If you are using requirements.txt instead of environment.yml, you can specify the Python runtime version in a separate file called runtime.txt. This file contains a single line of the following form:
python-X.X
For example:
python-3.6
To ensure that R is installed, you must specify a version of R in a runtime.txt file. This takes the following form:
r-YYYY-MM-DD
The date corresponds to the state of the MRAN repository at this day. Make sure that you choose a day with the desired version of your packages. For example, to use the MRAN repository on January 1st, 2018, add this line to runtime.txt:
r-2018-01-01
Note that to install specific packages with the R environment, you should use the install.R configuration file.
To build an environment with Julia, include a configuration file called Project.toml. The format of this file is documented at the Julia Pkg.jl documentation. To specify a specific version of Julia to install, put a Julia version in the [compat] section of the Project.toml file, as described here: https://julialang.github.io/Pkg.jl/v1/compatibility/.
Project.toml
[compat]
If a language is not “officially” supported by a build pack, it can often be installed with a postBuild script. This will run arbitrary bash commands, and can be used to download / install a language.
postBuild
bash
It may also be possible to combine multiple languages in a single environment. The details on how to accomplish this with all possible combinations are outside the scope of this guide. However we recommend that you take a look at the Multi-Language Demo repository for some inspiration.
This how-to explains how to create a environment.yml that specifies all installed packages and their precise versions from your environment.
conda env export -f environment.yml creates a strict export of all packages. This is the most robust for reproducibility, but it does bake in potential platform-specific packages, so you can only use an exported environment on the same platform.
conda env export -f environment.yml
repo2docker uses a linux based image as the starting point for every docker image it creates. However a lot of people use OSX or Windows as their day to day operating system. This means that the environment.yml created by a strict export will not work with error messages saying that certain packages can not be resolved (ResolvePackageNotFound).
To get a minimal environment.yml that only contains the packages you explicitly installed run conda env export --from-history -f environment.yml. We recommend that you use this option to create your environment.yml. The resulting environment.yml then contains a loose pinning of the versions used, e.g. pandas=0.25 if you explicitly requested this pandas version on installation. If you didn’t list a version constraint during installation, it will also not be listed in your environment.yml.
conda env export --from-history -f environment.yml
pandas=0.25
pandas
While this approach doesn’t lead to perfect reproducibilty, it will contain just the same packages as if you would recreate the enviroment with the same commands again today.
Follow this procedure to create a strict export of your environment that will work with repo2docker and sites like mybinder.org.
We will launch a terminal inside a basic docker image, install the packages you need and then perform a strict export of the environment.
repo2docker https://github.com/binder-examples/conda-freeze
http://127.0.0.1:61037/?token=30e61ec80bda6dd0d14805ea76bb59e7b0cd78b5d6b436f0
conda install <yourpackages>
conda env export -n root
This will give you a strict export of your environment that precisely pins the versions of packages in your environment based on a linux environment.
JupyterLab uses workspaces to save the current state of windows, settings, and documents that are open in a JupyterLab session. It is a way to persist the general configuration over time.
It is possible to export JupyterLab workspaces and load them in to another JupyterLab installation in order to share a workspace with someone else.
In order to package your workspace with a repository, we recommend following the steps in this example repository:
https://github.com/ian-r-rose/binder-workspace-demo/
JupyterHub allows multiple users to collaborate on a shared Jupyter server. repo2docker can build Docker images that can be shared within a JupyterHub deployment. For example, mybinder.org uses JupyterHub and repo2docker to allow anyone to build a Docker image of a git repository online and share an executable version of the repository with a URL to the built image.
To build JupyterHub-ready Docker images with repo2docker, the version of your JupyterHub deployment must be included in the environment.yml or requirements.txt of the git repositories you build.
If your instance of JupyterHub uses DockerSpawner, you will need to set its command to run jupyterhub-singleuser by adding this line in your configuration file:
DockerSpawner
jupyterhub-singleuser
c.DockerSpawner.cmd = ['jupyterhub-singleuser']
We’ve created for you the continuous-build repository so that you can push a Docker container to Docker Hub directly from a GitHub repository that has a Jupyter notebook. Here are instructions to do this.
Today you will be doing the following:
Fork and clone the continuous-build GitHub repository to obtain the hidden .circleci folder. Creating an image repository on Docker Hub Connecting your repository to CircleCI Push, commit, or create a pull request to trigger a build.
.circleci
You don’t need to install any dependencies on your host to build the container, it will be done on a continuous integration server, and the container built and available to you to pull from Docker Hub.
First, fork the continuous-build GitHub repository to your account, and clone the branch via either:
git clone https://www.github.com/<username>/continuous-build
or
git clone git@github.com:<username>/continuous-build.git
The hidden folder .circleci/config.yml has instructions for CircleCI to automatically discover and build your repo2docker Jupyter notebook container. The default template provided in the repository in this folder will do the most basic steps, including:
.circleci/config.yml
This repository aims to provide templates for your use. If you have a request for a new template, please let us know. We will add templates as they are requested to do additional tasks like test containers, run nbconvert, etc.
Thus, if I have a repository named myrepo and I want to use the default configuration on circleCI, I would copy it there from the continuous-build folder. In the example below, I’m creating a new folder called “myrepo” and then copying the entire folder there:
myrepo
continuous-build
mkdir -p myrepo cp -R continuous-build/.circleci myrepo/
You would then logically create a GitHub repository in the “myrepo” folder, add the circleci configuration folder, and continue on to the next steps.
cd myrepo git init git add .circleci
Go to Docker Hub, log in, and click the big blue button that says “create repository” (not an automated build). Choose an organization and name that you like (in the traditional format <ORG>/<NAME>), and remember it! We will be adding it, along with your Docker credentials, to be encrypted CircleCI environment variables.
<ORG>/<NAME>
If you navigate to the main app page you should be able to click “Add Projects” and then select your repository. If you don’t see it on the list, then select a different organization in the top left. Once you find the repository, you can click the button to “Start Building” and accept the defaults.
Before you push or trigger a build, let’s set up the following environment variables. Also in the project interface on CirleCi, click the gears icon next to the project name to get to your project settings. Under settings, click on the “Environment Variables” tab. In this section, you want to define the following:
CONTAINER_NAME
DOCKER_TAG
DOCKER_USER
DOCKER_PASS
REPO_NAME
If you don’t define the CONTAINER_NAME it will default to be the repository where it is building from, which you should only do if the Docker Hub repository is named equivalently. If you don’t define either of the variables from step 3. for the Docker credentials, your image will build but not be pushed to Docker Hub. Finally, if you don’t define the REPO_NAME it will again use the name of the repository defined for the CONTAINER_NAME.
Once the environment variables are set up, you can push or issue a pull request to see circle build the workflow. Remember that you only need the .circleci/config.yml and not any other files in the repository. If your notebook is hosted in the same repository, you might want to add these, along with your requirements.txt, etc.
By default, new builds on CircleCI will not build for pull requests and you can change this default in the settings. You can easily add filters (or other criteria and actions) to be performed during or after the build by editing the .circleci/config.yml file in your repository.
You should then be able to pull your new container, and run it! Here is an example:
docker pull <ORG>/<NAME> docker run -it --name repo2docker -p 8888:8888 <ORG>/<NAME> jupyter notebook --ip 0.0.0.0
For a pre-built working example, try the following:
docker pull vanessa/repo2docker docker run -it --name repo2docker -p 8888:8888 vanessa/repo2docker jupyter notebook --ip 0.0.0.0
You can then enter the url and token provided in the browser to access your notebook. When you are done and need to stop and remove the container:
docker stop repo2docker docker rm repo2docker
Information about configuring your repository to work with repo2docker, and controlling elements of the built environment using configuration files.
For information on where to put your configuration files see Where to put configuration files.
repo2docker looks for configuration files in the repository being built to determine how to build it. In general, repo2docker uses the same configuration files as other software installation tools, rather than creating new custom configuration files.
A number of repo2docker configuration files can be combined to compose more complex setups.
The binder examples organization on GitHub contains a list of sample repositories for common configurations that repo2docker can build with various configuration files such as Python and R installation in a repository.
A list of supported configuration files (roughly in the order of build priority) can be found on this page (and to the right).
environment.yml is the standard configuration file used by conda that lets you install any kind of package, including Python, R, and C/C++ packages. repo2docker does not use your environment.yml to create and activate a new conda environment. Rather, it updates a base conda environment defined here with the packages listed in your environment.yml. This means that the environment will always have the same default name, not the name specified in your environment.yml.
You can install files from pip in your environment.yml as well. For example, see the binder-examples environment.yml file.
You can also specify which Python version to install in your built environment with environment.yml. By default, repo2docker installs Python 3.7 with your environment.yml unless you include the version of Python in this file. conda supports all versions of Python, though repo2docker support is best with Python 3.7, 3.6, 3.5 and 2.7.
conda
Warning
If you include a Python version in a runtime.txt file in addition to your environment.yml, your runtime.txt will be ignored.
Pipfile
Pipfile.lock
pipenv allows you to manage a virtual environment Python dependencies. When using pipenv, you end up with Pipfile and Pipfile.lock files. The lock file contains explicit details about the packages that has been installed that met the criteria within the Pipfile.
pipenv
If both Pipfile and Pipfile.lock are found by repo2docker, the former will be ignored in favor of the lock file. Also note that these files distinguish packages and development packages and that repo2docker will install both kinds.
This specifies a list of Python packages that should be installed in your environment. Our requirements.txt example on GitHub shows a typical requirements file.
setup.py
To install your repository like a Python package, you may include a setup.py file. repo2docker installs setup.py files by running pip install -e ..
pip install -e .
A Project.toml (or JuliaProject.toml) file can specify both the version of Julia to be used and a list of Julia packages to be installed. If a Manifest.toml is present, it will determine the exact versions of the Julia packages that are installed.
JuliaProject.toml
Manifest.toml
REQUIRE
A REQUIRE file can specify both the version of Julia to be used and which Julia packages should be used. The use of REQUIRE is only recommended for pre 1.0 Julia versions. The recommended way of installing a Julia environment that uses Julia 1.0 or newer is to use a Project.toml file. If both a REQUIRE and a Project.toml file are detected, the REQUIRE file is ignored. To see an example of a Julia repository with REQUIRE and environment.yml, visit binder-examples/julia-python.
This is used to install R libraries pinned to a specific snapshot on MRAN. To set the date of the snapshot add a runtime.txt. For an example install.R file, visit our example install.R file.
apt.txt
A list of Debian packages that should be installed. The base image used is usually the latest released version of Ubuntu.
We use apt.txt, for example, to install LaTeX in our example apt.txt for LaTeX.
DESCRIPTION
To install your repository like an R package, you may include a DESCRIPTION file. repo2docker installs the package and dependencies from the DESCRIPTION by running devtools::install_git(".").
devtools::install_git(".")
You also need to have a runtime.txt file that is formatted as r-<YYYY>-<MM>-<DD>, where YYYY-MM-DD is a snapshot of MRAN that will be used for your R installation.
r-<YYYY>-<MM>-<DD>
A script that can contain arbitrary commands to be run after the whole repository has been built. If you want this to be a shell script, make sure the first line is #!/bin/bash.
#!/bin/bash
Note that by default the build will not be stopped if an error occurs inside a shell script. You should include set -e or the equivalent at the start of the script to avoid errors being silently ignored.
set -e
An example use-case of postBuild file is JupyterLab’s demo on mybinder.org. It uses a postBuild file in a folder called binder to prepare their demo for binder.
binder
start
A script that can contain simple commands to be run at runtime (as an ENTRYPOINT to the docker container). If you want this to be a shell script, make sure the first line is #!/bin/bash. The last line must be exec "$@" or equivalent.
exec "$@"
Use this to set environment variables that software installed in your container expects to be set. This script is executed each time your binder is started and should at most take a few seconds to run.
If you only need to run things once during the build phase use postBuild - Run code after installing the environment.
Sometimes you want to specify the version of the runtime (e.g. the version of Python or R), but the environment specification format will not let you specify this information (e.g. requirements.txt or install.R). For these cases, we have a special file, runtime.txt.
runtime.txt is only supported when used with environment specifications that do not already support specifying the runtime (when using environment.yml for conda or Project.toml for Julia, runtime.txt will be ignored).
Have python-x.y in runtime.txt to run the repository with Python version x.y. See our Python2 example repository.
python-x.y
Have r-<RVERSION>-<YYYY>-<MM>-<DD> in runtime.txt to run the repository with R version RVERSION and libraries from a YYYY-MM-DD snapshot of MRAN. RVERSION can be set to 3.4, 3.5, 3.6, or to patch releases for the 3.5 and 3.6 series. If you do not specify a version, the latest release will be used (currently R 3.6). See our R example repository.
r-<RVERSION>-<YYYY>-<MM>-<DD>
default.nix
Specify packages to be installed by the nix package manager. When you use this config file all other configuration files (like requirements.txt) that specify packages are ignored. When using nix you have to specify all packages and dependencies explicitly, including the Jupyter notebook package that repo2docker expects to be installed. If you do not install Jupyter explicitly repo2docker will no be able to start your container.
nix
nix-shell is used to evaluate a nix expression written in a default.nix file. Make sure to pin your nixpkgs to produce a reproducible environment.
To see an example repository visit nix binder example.
In the majority of cases, providing your own Dockerfile is not necessary as the base images provide core functionality, compact image sizes, and efficient builds. We recommend trying the other configuration files before deciding to use your own Dockerfile.
With Dockerfiles, a regular Docker build will be performed.
If a Dockerfile is present, all other configuration files will be ignored.
See the Advanced Binder Documentation for best-practices with Dockerfiles.
repo2docker scans a repository for particular Configuration Files, such as requirements.txt or REQUIRE. The collection of files, their contents, and the resulting actions that repo2docker takes is known as the Reproducible Execution Environment Specification (or REES).
The goal of the REES is to automate and encourage existing community best practices for reproducible computational environments. This includes installing pacakges using community-standard specification files and their corresponding tools, such as requirements.txt (with pip), REQUIRE (with Julia), or apt.txt (with apt). While repo2docker automates the creation of the environment, a human should be able to look at a REES-compliant repository and reproduce the environment using common, clear steps without repo2docker software.
apt
Currently, the definition of the REE Specification is the following:
Any directory containing zero or more files from the Configuration Files list is a valid reproducible execution environment as defined by the REES. The configuration files have to all be placed either in the root of the directory, in a binder/ sub-directory or a .binder/ sub-directory.
For example, the REES recognises requirements.txt as a valid config file. The file format is as defined by the requirements.txt standard of the Python community. A REES-compliant tool will install a Python interpreter (of unspecified version) and perform the equivalent action of pip install -r requirements.txt so that the user can afterwards run python and use the packages installed.
pip install -r requirements.txt
The repo2docker community is welcoming of all kinds of help and participation from others. Below are a few ways that you can get involved, as well as resources for understanding the structure and design of the repo2docker package.
Thank you for thinking about contributing to repo2docker! This is an open source project that is developed and maintained entirely by volunteers. Your contribution is integral to the future of the project. THANK YOU!
There are many ways to contribute to repo2docker:
If you’re not sure where to get started, then please come and say hello in our Gitter channel, or open an discussion thread at the Jupyter discourse forum.
This outlines the process for getting changes to the repo2docker project merged.
Identify the correct issue template: bug report or feature request.
Bug reports (examples, new issue) will ask you for a description of the problem, the expected behaviour, the actual behaviour, how to reproduce the problem, and your personal set up. Bugs can include problems with the documentation, or code not running as expected.
It is really important that you make it easy for the maintainers to reproduce the problem you’re having. This guide on creating a minimal, complete and verifiable example is a great place to start.
Feature requests (examples, new issue) will ask you for the proposed change, any alternatives that you have considered, a description of who would use this feature, and a best-guess of how much work it will take and what skills are required to accomplish.
Very easy feature requests might be updates to the documentation to clarify steps for new users. Harder feature requests may be to add new functionality to the project and will need more in depth discussion about who can complete and maintain the work.
Feature requests are a great opportunity for you to advocate for the use case you’re suggesting. They help others understand how much effort it would be to integrate the work,and - if you’re successful at convincing them that this effort is worth it - make it more likely that they to choose to work on it with you.
Open an issue. Getting consensus with the community is a great way to save time later.
Make edits in your fork of the repo2docker repository.
Make a pull request. Read the next section for guidelines for both reviewers and contributors on merging a PR.
Wait for a community member to merge your changes. Remember that someone else must merge your pull request. That goes for new contributors and long term maintainers alike. Because master is continuously deployed to mybinder.org it is essential that master is always in a deployable state.
master
(optional) Deploy a new version of repo2docker to mybinder.org by following these steps
These are not hard rules to be enforced by 🚓 but they are suggestions written by the repo2docker maintainers to help complete your contribution as smoothly as possible for both you and for them.
[WIP]
git log --merges --pretty=format:"%h %<(10,trunc)%an %<(15)%ar %s" <deployed-revision>..
[MRG]
To develop & test repo2docker locally, you need:
First, you need to get a copy of the repo2docker git repository on your local disk. Fork the repository on GitHub, then clone it to your computer:
git clone https://github.com/<your-username>/repo2docker
This will clone repo2docker into a directory called repo2docker. You can make that your current directory with cd repo2docker.
cd repo2docker
After cloning the repository, you should set up an isolated environment to install libraries required for running / developing repo2docker.
There are many ways to do this but here we present you with two approaches: virtual environment or pipenv.
virtual environment
python3 -m venv . source bin/activate pip3 install -e . pip3 install -r dev-requirements.txt pip3 install -r docs/doc-requirements.txt pip3 install black
This should install all the libraries required for testing & running repo2docker!
Note that you will need to install pipenv first using pip3 install pipenv. Then from the root directory of this project you can use the following commands:
pip3 install pipenv
pipenv install --dev
This should install both the dev and docs requirements at once!
We use black as code formatter to get a consistent layout for all the code in this project. This makes reading the code easier.
black
To format your code run black . in the top-level directory of this repository. Many editors have plugins that will automatically apply black as you edit files.
black .
We also have a pre-commit hook setup that will check that code is formatted according to black’s style guide. You can activate it with pre-commit install.
pre-commit install
As part of our continuous integration tests we will check that code is formatted properly and the tests will fail if this is not the case.
If you do not already have Docker, you should be able to download and install it for your operating system using the links from the official website. After you have installed it, you can verify that it is working by running the following commands:
docker version
It should output something like:
Client: Version: 17.09.0-ce API version: 1.32 Go version: go1.8.3 Git commit: afdb6d4 Built: Tue Sep 26 22:42:45 2017 OS/Arch: linux/amd64 Server: Version: 17.09.0-ce API version: 1.32 (minimum version 1.12) Go version: go1.8.3 Git commit: afdb6d4 Built: Tue Sep 26 22:41:24 2017 OS/Arch: linux/amd64 Experimental: false
Then you are good to go!
If you only changed the documentation, you can also build the documentation locally using sphinx .
sphinx
pip install -r docs/doc-requirements.txt cd docs/ make html
Then open the file docs/build/html/index.html in your browser.
docs/build/html/index.html
This roadmap collects “next steps” for repo2docker. It is about creating a shared understanding of the project’s vision and direction amongst the community of users, contributors, and maintainers. The goal is to communicate priorities and upcoming release plans. It is not a aimed at limiting contributions to what is listed here.
All of the community is encouraged to provide feedback as well as share new ideas with the community. Please do so by submitting an issue. If you want to have an informal conversation first use one of the other communication channels. After submitting the issue, others from the community will probably respond with questions or comments they have to clarify the issue. The maintainers will help identify what a good next step is for the issue.
When submitting an issue, think about what “next step” category best describes your issue:
The roadmap will get updated as time passes (next review by 31st January 2019) based on discussions and ideas captured as issues. This means this list should not be exhaustive, it should only represent the “top of the stack” of ideas. It should not function as a wish list, collection of feature requests or todo list. For those please create a new issue.
The roadmap should give the reader an idea of what is happening next, what needs input and discussion before it can happen and what has been postponed.
Repo2docker is a dependable tool used by humans that reduces the complexity of creating the environment in which a piece of software can be executed.
The “Now” items are being actively worked on by the project:
/home/jovyan
The “Soon” items are being discussed/a plan of action is being made. Once an item reaches the point of an actionable plan and person who wants to work on it, the item will be moved to the “Now” section. Typically, these will be moved at a future review of the roadmap.
The “Later” items are things that are at the back of the project’s mind. At this time there is no active plan for an item. The project would like to find the resources and time to discuss and then execute these ideas.
repo2docker https://example.com/an-archive.zip
This is a living document talking about the architecture of repo2docker from various perspectives.
The buildpack concept comes from Heroku and Ruby on Rails’ Convention over Configuration doctrine.
Instead of the user specifying a complete specification of exactly how they want their environment to be, they can focus only on how their environment differs from a conventional environment. This means instead of deciding ‘should I get Python from Apt or pyenv or ?’, user can just specify ‘I want python-3.6’. Usually, specifying a runtime and list of libraries with explicit versions is all that is needed.
In repo2docker, a Buildpack does the following things:
When given a repository, repo2docker first has to determine which buildpack to use. It takes the following steps to determine this:
BuildPack
Repo2Docker.buildpacks
detect
True
manifest.xml
Repo2Docker.default_buildpack
Once a buildpack is chosen, it builds a base environment that is mostly the same for various repositories built with the same buildpack.
For example, in CondaBuildPack, the base environment consists of installing miniconda and basic notebook packages (from repo2docker/buildpacks/conda/environment.yml). This is going to be the same for most repositories built with CondaBuildPack, so we want to use docker layer caching as much as possible for performance reasons. Next time a repository is built with CondaBuildPack, we can skip straight to the copy step (since the base environment docker image layers have already been built and cached).
CondaBuildPack
repo2docker/buildpacks/conda/environment.yml
The get_build_scripts and get_build_script_files methods are primarily used for this. get_build_scripts can return arbitrary bash script lines that can be run as different users, and get_build_script_files is used to copy specific scripts (such as a conda installer) into the image to be run as pat of get_build_scripts. Code in either has following constraints:
get_build_scripts
get_build_script_files
The contents of the repository are copied unconditionally into the Docker image, and made available for all further commands. This is common to most BuildPacks, and the code is in the build method of the BuildPack base class.
build
The assemble stage builds the specific environment that is requested by the repository. This usually means installing required libraries specified in a format native to the language (requirements.txt, environment.yml, REQUIRE, install.R, etc).
Most of this work is done in get_assemble_scripts method. It can return arbitrary bash script lines that can be run as different users, and has access to the repository contents (unlike get_build_scripts). The docker image layers produced by this usually can not be cached, so less restrictions apply to this than to get_build_scripts.
get_assemble_scripts
At the end of the assemble step, the docker image is ready to be used in various ways!
Optionally, repo2docker can push a built image to a docker registry. This is done as a convenience only (since you can do the same with a docker push after using repo2docker only to build), and implemented in Repo2Docker.push method. It is only activated if using the --push commandline flag.
docker push
Repo2Docker.push
Optionally, repo2docker can run the built image and allow the user to access the Jupyter Notebook running inside by default. This is also done as a convenience only (since you can do the same with docker run after using repo2docker only to build), and implemented in Repo2Docker.run. It is activated by default unless the --no-run commandline flag is passed.
docker run
Repo2Docker.run
ContentProviders provide a way for repo2docker to know how to find and retrieve a repository. They follow a similar pattern as the BuildPacks described above. When repo2docker is called, its main argument will be a path to a repository. This might be a local path or a URL. Upon being called, repo2docker will loop through all ContentProviders and perform the following commands:
Run the detect() method on the repository path given to repo2docker. This should return any value other than None if the path matches what the ContentProvider is looking for.
detect()
None
For example, the Local ContentProvider checks whether the argument is a valid local path. If so, then detect( returns a dictionary: {'path': source} which defines the path to the repository. This path is used by fetch() to check that it matches the output directory.
Local
detect(
{'path': source}
fetch()
If detect() returns something other than None, run fetch() with the returned value as its argument. This should result in the contents of the repository being placed locally to a folder.
For more information on ContentProviders, take a look at the ContentProvider base class which has more explanation.
The repo2docker buildpacks are inspired by Heroku’s Build Packs. The philosophy for the repo2docker buildpacks includes:
.binder
When designing repo2docker and adding to it in the future, the developers are influenced by two primary use cases. The use cases for repo2docker which drive most design decisions are:
The core of repo2docker can be considered a deterministic algorithm. When given an input directory which has a particular repository checked out, it deterministically produces a Dockerfile based on the contents of the directory. So if we run repo2docker on the same directory multiple times, we get the exact same Dockerfile output.
This provides a few advantages:
Many ingredients go into making an image from a repository:
repo2docker controls the first two, the user controls the third one. The current policy for the version of the base image is that we will use the current LTS version Bionic Beaver (18.04) for the foreseeable future.
The version of repo2docker used to build an image can influence which packages are installed by default and which features are supported during the build process. We will periodically update those packages to keep step with releases of Jupyter Notebook, JupyterLab, etc. For packages that are installed by default but where you want to control the version we recommend you specify them explicitly in your dependencies.
repo2docker should do one thing, and do it well. This one thing is:
Given a repository, deterministically build a docker image from it.
There’s also some convenience code (to run the built image) for users, but that’s separated out cleanly. This allows easy use by other projects (like BinderHub).
There is additional (and very useful) design advice on this in the Art of Unix Programming which is a highly recommended quick read.
Although other projects, like s2i, exist to convert source to Docker images, repo2docker provides the additional functionality to support composable environments. We want to easily have an image with Python3+Julia+R-3.2 environments, rather than just one single language environment. While generally one language environment per container works well, in many scientific / datascience computing environments you need multiple languages working together to get anything done. So all buildpacks are composable, and need to be able to work well with other languages.
Roughly speaking, we want to support 80% of use cases, and provide an escape hatch (raw Dockerfiles) for the other 20%. We explicitly want to provide support only for the most common use cases - covering every possible use case never ends well.
An easy process for getting support for more languages here is to demonstrate their value with Dockerfiles that other people can use, and then show that this pattern is popular enough to be included inside repo2docker. Remember that ‘yes’ is forever (very hard to remove features!), but ‘no’ is only temporary!
These are some common tasks to be done as a part of developing and maintaining repo2docker. If you’d like more guidance for how to do these things, reach out in the JupyterHub Gitter channel.
We have a lot of tests for various cases supported by repo2docker in the tests/ subdirectory. If you fix a bug or add new functionality consider adding a new test to prevent the bug from coming back. These use py.test.
tests/
You can run all the tests with:
py.test -s tests/*
If you want to run a specific test, you can do so with:
py.test -s tests/<path-to-test>
To skip the tests related to Mercurial repositories (to avoid to install Mercurial and hg-evolve), one can use the environment variable REPO2DOCKER_SKIP_HG_TESTS.
REPO2DOCKER_SKIP_HG_TESTS
Some of the tests have non-python requirements for your development machine. They are:
git-lfs
git lfs install
git-lfs filter-process: git-lfs: command not found
No space left on device: '/home/...
This section covers the process by which repo2docker defines and updates the dependencies that are installed by default for several buildpacks.
For both the conda and virtualenv (pip) base environments in the Conda BuildPack and Python BuildPack, we install specific pinned versions of all dependencies. We explicitly list the dependencies we want, then freeze them at commit time to explicitly list all the transitive dependencies at current versions. This way, we know that all dependencies will have the exact same version installed at all times.
virtualenv
To update one of the dependencies shared across all repo2docker builds, you must follow these steps (with more detailed information in the sections below):
See the subsections below for more detailed instructions.
There are two files related to conda dependencies. Edit as needed.
Contains list of packages to install in Python3 conda environments, which are the default. This is where all Notebook versions & notebook extensions (such as JupyterLab / nteract) go.
repo2docker/buildpacks/conda/environment.py-2.7.yml
Contains list of packages to install in Python2 conda environments, which can be specifically requested by users. This only needs IPyKernel and kernel related libraries. Notebook / Notebook Extension need not be installed here.
IPyKernel
Once you edit either of these files to add a new package / bump version on an existing package, you should then run:
cd ./repo2docker/buildpacks/conda/ python freeze.py
This script will resolve dependencies and write them to the respective .frozen.yml files. You will need docker installed to run this script.
.frozen.yml
docker
After the freeze script finishes, a number of files will have been created. Commit the following subset of files to git:
repo2docker/buildpacks/conda/environment.yml repo2docker/buildpacks/conda/environment.frozen.yml repo2docker/buildpacks/conda/environment.py-2.7.yml repo2docker/buildpacks/conda/environment.py-2.7.frozen.yml repo2docker/buildpacks/conda/environment.py-3.5.frozen.yml repo2docker/buildpacks/conda/environment.py-3.6.frozen.yml
Make a pull request; see details below.
Once the pull request is approved (but not yet merged), Update the change log (details below) and commit the change log, then update the pull request.
Once you’ve made the commit, please make a Pull Request to the jupyterhub/repo2docker repository, with a description of what versions were bumped / what new packages were added and why. If you fix a bug or add new functionality consider adding a new test to prevent the bug from coming back/the feature breaking in the future.
jupyterhub/repo2docker
We make a release of whatever is on master every month. We use “calendar versioning”. Monthly releases give users a predictable pattern for when releases are going to happen and prevents locking up improvements for fixes for long periods of time.
A new release will automatically be created when a new git tag is created and pushed to the repository.
To create a new release, follow these steps:
First, tag a new release locally:
V=YYYY.MM.0; git tag -am "release $V" $V
If you need to make a second (or third) release in a month increment the trailing 0 of the version to 1 (or 2).
Then push this change up to the master repository
git push origin --tags
GitHub Actions should create a new release on the repo2docker PyPI. Once this has completed, make sure that the new version has been updated.
Once the new release has been pushed to PyPI, we need to create a new release on the GitHub repository releases page. Once on that page, follow these steps:
That’s it!
We now have both a dev-requirements.txt and a Pifile for repo2docker, as such it is important to keep these in sync/up-to-date.
dev-requirements.txt
Pifile
Both files use pip identifiers so if you are updating for example the Sphinx version in the doc-requirements.txt (currently Sphinx = ">=1.4,!=1.5.4") you can use the same syntax to update the Pipfile and viceversa.
pip identifiers
doc-requirements.txt
Sphinx = ">=1.4,!=1.5.4"
At the moment this has to be done manually so please make sure to update both files accordingly.
For larger refactorings it can be useful to check that the generated Dockerfiles match between an older version of r2d and the current version. The following shell script automates this test.
#! /bin/bash -e current_version=$(jupyter-repo2docker --version | sed s@+@-@) echo "Comparing $(pwd) (local $current_version vs. $R2D_COMPARE_TO)" basename="dockerfilediff" diff_r2d_dockerfiles_with_version () { docker run --rm -t -v "$(pwd)":"$(pwd)" --user 1000 jupyterhub/repo2docker:"$1" jupyter-repo2docker --no-build --debug "$(pwd)" &> "$basename"."$1" jupyter-repo2docker --no-build --debug "$(pwd)" &> "$basename"."$current_version" # remove first line logging the path sed -i '/^\[Repo2Docker\]/d' "$basename"."$1" sed -i '/^\[Repo2Docker\]/d' "$basename"."$current_version" diff --strip-trailing-cr "$basename"."$1" "$basename"."$current_version" | colordiff rm "$basename"."$current_version" "$basename"."$1" } startdir="$(pwd)" cd "$1" #diff_r2d_dockerfiles 0.10.0-22.g4f428c3.dirty diff_r2d_dockerfiles_with_version "$R2D_COMPARE_TO" cd "$startdir"
Put the code above in a file tests/dockerfile_diff.sh and make it executable: chmod +x dockerfile_diff.sh.
tests/dockerfile_diff.sh
chmod +x dockerfile_diff.sh
Configure the repo2docker version you want to compare with your local version in the environment variable R2D_COMPARE_TO. The scripts takes one input: the directory where repo2docker should be executed.
R2D_COMPARE_TO
cd tests/ R2D_COMPARE_TO=0.10.0 ./dockerfile_diff.sh venv/py35/
Run it for all directories where there is a verify file:
verify
cd tests/ R2D_COMPARE_TO=0.10.0 CMD=$(pwd)/dockerfile_diff.sh find . -name 'verify' -execdir bash -c '$CMD $(pwd)' \;
To keep the created Dockefilers for further inspection, comment out the deletion line in the script.
A new buildpack is needed when a new language or a new package manager should be supported. Existing buildpacks are a good model for how new buildpacks should be structured. See the Buildpacks page for more information about the structure of a buildpack.
Criteria to balance are:
Note that this doesn’t apply to adding additional libraries / UI to existing buildpacks. For example, if we had an R buildpack and it supported IRKernel, it is much easier to just support RStudio / Shiny with it, since those are library additions instead of entirely new buildpacks.
Adding a new content provider allows repo2docker to grab repositories from new locations on the internet. To do so, you should take the following steps:
spec
Release date: 2020-02-05
Release date: 2019-08-07
Release date: 2019-05-05
Release date: 2019-02-21
GIT_CREDENTIAL_ENV
JULIA_DEPOT_PATH
Release date: 2018-12-12
Released 2018-09-09
Released 2018-02-07
Released 2018-09-06
Released 2018-05-25
Released 2017-04-19
Released 2017-04-14