Using repo2docker

Note

Docker must be running in order to run repo2docker. For more information on installing repo2docker, see Installing repo2docker.

repo2docker can build a reproducible computational environment for any repository that follows The Reproducible Execution Environment Specification. repo2docker is called with the URL of a Git repository, a DOI from Zenodo or Figshare, a Handle or DOI from a Dataverse installation, a SWHID of a directory of a revision archived in the Software Heritage Archive, or a path to a local directory.

It then performs these steps:

  1. Inspects the repository for configuration files. These will be used to build the environment needed to run the repository.

  2. Builds a Docker image with an environment specified in these configuration files.

  3. Launches the image to let you explore the repository interactively via Jupyter notebooks, RStudio, or many other interfaces (optional)

  4. Pushes the images to a Docker registry so that it may be accessed remotely (optional)

Calling repo2docker

repo2docker is called with this command:

jupyter-repo2docker <source-repository>

where <source-repository> is:

  • a URL of a Git repository (https://github.com/binder-examples/requirements),

  • a Zenodo DOI (10.5281/zenodo.1211089),

  • a SWHID (swh:1:rev:999dd06c7f679a2714dfe5199bdca09522a29649), or

  • a path to a local directory (a/local/directory)

of the source repository you want to build.

For example, the following command will build an image of Peter Norvig’s Pytudes repository:

jupyter-repo2docker https://github.com/norvig/pytudes

Building the image may take a few minutes.

Pytudes uses a requirements.txt file to specify its Python environment. Because of this, repo2docker will use pip to install dependencies listed in this requirement.txt file, and these will be present in the generated Docker image. To learn more about configuration files in repo2docker visit Configuration Files.

When the image is built, a message will be output to your terminal:

Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
    http://0.0.0.0:36511/?token=f94f8fabb92e22f5bfab116c382b4707fc2cade56ad1ace0

Pasting the URL into your browser will open Jupyter Notebook with the dependencies and contents of the source repository in the built image.

Building a specific branch, commit or tag

To build a particular branch and commit, use the argument --ref and specify the branch-name or commit-hash. For example:

jupyter-repo2docker --ref 9ced85dd9a84859d0767369e58f33912a214a3cf https://github.com/norvig/pytudes

Tip

For reproducible builds, we recommend specifying a commit-hash to deterministically build a fixed version of a repository. Not specifying a commit-hash will result in the latest commit of the repository being built.

Where to put configuration files

repo2docker will look for configuration files in:

  • A folder named binder/ in the root of the repository.

  • A folder named .binder/ in the root of the repository.

  • The root directory of the repository.

Having both binder/ and .binder/ folders is not allowed. If one of these folders exists, only configuration files in that folder are considered, configuration in the root directory will be ignored.

Check the complete list of configuration files supported by repo2docker to see how to configure the build process.

Note

repo2docker builds an environment with Python 3.7 by default. If you’d like a different version, you can specify this in your configuration files.

Debugging repo2docker with --debug and --no-build

To debug the docker image being built, pass the --debug parameter:

jupyter-repo2docker --debug https://github.com/norvig/pytudes

This will print the generated Dockerfile, build it, and run it.

To see the generated Dockerfile without actually building it, pass --no-build to the commandline. This Dockerfile output is for debugging purposes of repo2docker only - it can not be used by docker directly.

jupyter-repo2docker --no-build --debug https://github.com/norvig/pytudes

Command line API

jupyter-repo2docker

Fetch a repository and build a container image

usage: jupyter-repo2docker [-h] [--config CONFIG] [--json-logs]
                           [--image-name IMAGE_NAME] [--ref REF] [--debug]
                           [--no-build]
                           [--build-memory-limit BUILD_MEMORY_LIMIT]
                           [--no-run] [--publish PORTS] [--publish-all]
                           [--no-clean] [--push] [--volume VOLUMES]
                           [--user-id USER_ID] [--user-name USER_NAME]
                           [--env ENVIRONMENT] [--editable]
                           [--target-repo-dir TARGET_REPO_DIR]
                           [--appendix APPENDIX] [--subdir SUBDIR] [--version]
                           [--cache-from CACHE_FROM]
                           repo ...
repo

Path to repository that should be built. Could be local path or a git URL.

cmd

Custom command to run after building container

-h, --help

show this help message and exit

--config <config>

Path to config file for repo2docker

--json-logs

Emit JSON logs instead of human readable logs

--image-name <image_name>

Name of image to be built. If unspecified will be autogenerated

--ref <ref>

Reference to build instead of default reference. For example branch name or commit for a Git repository.

--debug

Turn on debug logging

--no-build

Do not actually build the image. Useful in conjunction with –debug.

--build-memory-limit <build_memory_limit>

Total Memory that can be used by the docker build process

--no-run

Do not run container after it has been built

--publish <ports>, -p <ports>

Specify port mappings for the image. Needs a command to run in the container.

--publish-all, -P

Publish all exposed ports to random host ports.

--no-clean

Don’t clean up remote checkouts after we are done

--push

Push docker image to repository

--volume <volumes>, -v <volumes>

Volumes to mount inside the container, in form src:dest

--user-id <user_id>

User ID of the primary user in the image

--user-name <user_name>

Username of the primary user in the image

--env, -e

Environment variables to define at container run time

--editable, -E

Use the local repository in edit mode

--target-repo-dir <target_repo_dir>

Path inside the image where contents of the repositories are copied to, and where all the build operations (such as postBuild) happen. Defaults to ${HOME} if not set

--appendix <appendix>

Appendix of Dockerfile commands to run at the end of the build. Can be used to customize the resulting image after all standard build steps finish.

--subdir <subdir>

Subdirectory of the git repository to examine. Defaults to ‘’.

--version

Print the repo2docker version and exit.

--cache-from <cache_from>

List of images to try & re-use cached image layers from. Docker only tries to re-use image layers from images built locally, not pulled from a registry. We can ask it to explicitly re-use layers from non-locally built images by through the ‘cache_from’ parameter.