Using repo2docker#

Note

Docker must be running in order to run repo2docker. For more information on installing repo2docker, see Installing repo2docker.

repo2docker can build a reproducible computational environment for any repository that follows The Reproducible Execution Environment Specification. repo2docker is called with the URL of a Git repository, a DOI from Zenodo or Figshare, a Handle or DOI from a Dataverse installation, a SWHID of a directory of a revision archived in the Software Heritage Archive, or a path to a local directory.

It then performs these steps:

  1. Inspects the repository for configuration files. These will be used to build the environment needed to run the repository.

  2. Builds a Docker image with an environment specified in these configuration files.

  3. Launches the image to let you explore the repository interactively via Jupyter notebooks, RStudio, or many other interfaces (optional)

  4. Pushes the images to a Docker registry so that it may be accessed remotely (optional)

Calling repo2docker#

repo2docker is called with this command:

jupyter-repo2docker <source-repository>

where <source-repository> is:

  • a URL of a Git repository (https://github.com/binder-examples/requirements),

  • a Zenodo DOI (10.5281/zenodo.1211089),

  • a SWHID (swh:1:rev:999dd06c7f679a2714dfe5199bdca09522a29649), or

  • a path to a local directory (a/local/directory)

of the source repository you want to build.

For example, the following command will build an image of Peter Norvig’s Pytudes repository:

jupyter-repo2docker https://github.com/norvig/pytudes

Building the image may take a few minutes.

Pytudes uses a requirements.txt file to specify its Python environment. Because of this, repo2docker will use pip to install dependencies listed in this requirement.txt file, and these will be present in the generated Docker image. To learn more about configuration files in repo2docker visit Configuration Files.

When the image is built, a message will be output to your terminal:

Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
    http://0.0.0.0:36511/?token=f94f8fabb92e22f5bfab116c382b4707fc2cade56ad1ace0

Pasting the URL into your browser will open Jupyter Notebook with the dependencies and contents of the source repository in the built image.

Building a specific branch, commit or tag#

To build a particular branch and commit, use the argument --ref and specify the branch-name or commit-hash. For example:

jupyter-repo2docker --ref 9ced85dd9a84859d0767369e58f33912a214a3cf https://github.com/norvig/pytudes

Tip

For reproducible builds, we recommend specifying a commit-hash to deterministically build a fixed version of a repository. Not specifying a commit-hash will result in the latest commit of the repository being built.

Where to put configuration files#

repo2docker will look for configuration files in:

  • A folder named binder/ in the root of the repository.

  • A folder named .binder/ in the root of the repository.

  • The root directory of the repository.

Having both binder/ and .binder/ folders is not allowed. If one of these folders exists, only configuration files in that folder are considered, configuration in the root directory will be ignored.

Check the complete list of configuration files supported by repo2docker to see how to configure the build process.

Note

repo2docker builds an environment with Python 3.7 by default. If you’d like a different version, you can specify this in your configuration files.

Debugging repo2docker with --debug and --no-build#

To debug the docker image being built, pass the --debug parameter:

jupyter-repo2docker --debug https://github.com/norvig/pytudes

This will print the generated Dockerfile, build it, and run it.

To see the generated Dockerfile without actually building it, pass --no-build to the commandline. This Dockerfile output is for debugging purposes of repo2docker only - it can not be used by docker directly.

jupyter-repo2docker --no-build --debug https://github.com/norvig/pytudes

Command line API#

jupyter-repo2docker#

Fetch a repository and build a container image

usage: jupyter-repo2docker [-h] [--help-all] [--version] [--config CONFIG]
                           [--json-logs] [--image-name IMAGE_NAME] [--ref REF]
                           [--debug] [--no-build] [--build]
                           [--build-memory-limit BUILD_MEMORY_LIMIT]
                           [--no-run] [--run] [--publish PORTS]
                           [--publish-all] [--no-clean] [--clean] [--push]
                           [--no-push] [--volume VOLUMES] [--user-id USER_ID]
                           [--user-name USER_NAME] [--env ENVIRONMENT]
                           [--editable] [--target-repo-dir TARGET_REPO_DIR]
                           [--appendix APPENDIX] [--label LABELS]
                           [--build-arg BUILD_ARGS] [--subdir SUBDIR]
                           [--cache-from CACHE_FROM] [--engine ENGINE]
                           repo ...
repo#

Path to repository that should be built. Could be local path or a git URL.

cmd#

Custom command to run after building container

-h, --help#

show this help message and exit

--help-all#

Display all configurable options and exit.

--version#

Print the repo2docker version and exit.

--config <config>#

Path to config file for repo2docker

--json-logs#

Emit JSON logs instead of human readable logs

--image-name <image_name>#

Name of image to be built. If unspecified will be autogenerated

--ref <ref>#

Reference to build instead of default reference. For example branch name or commit for a Git repository.

--debug#

Turn on debug logging

--no-build#

Do not actually build the image. Useful in conjunction with –debug.

--build#

Build the image (default)

--build-memory-limit <build_memory_limit>#

Total Memory that can be used by the docker build process

--no-run#

Do not run container after it has been built

--run#

Run container after it has been built (default).

--publish <ports>, -p <ports>#

Specify port mappings for the image. Needs a command to run in the container.

--publish-all, -P#

Publish all exposed ports to random host ports.

--no-clean#

Don’t clean up remote checkouts after we are done

--clean#

Clean up remote checkouts after we are done (default).

--push#

Push docker image to repository

--no-push#

Don’t push docker image to repository (default).

--volume <volumes>, -v <volumes>#

Volumes to mount inside the container, in form src:dest

--user-id <user_id>#

User ID of the primary user in the image

--user-name <user_name>#

Username of the primary user in the image

--env, -e#

Environment variables to define at container run time

--editable, -E#

Use the local repository in edit mode

--target-repo-dir <target_repo_dir>#

Path inside the image where contents of the repositories are copied to, and where all the build operations (such as postBuild) happen. Defaults to ${HOME} if not set

--appendix <appendix>#

Appendix of Dockerfile commands to run at the end of the build. Can be used to customize the resulting image after all standard build steps finish.

--label <labels>#

Extra label to set on the image, in form name=value

--build-arg <build_args>#

Extra build arg to pass to the build process, in form name=value

--subdir <subdir>#

Subdirectory of the git repository to examine. Defaults to ‘’.

--cache-from <cache_from>#

List of images to try & re-use cached image layers from. Docker only tries to re-use image layers from images built locally, not pulled from a registry. We can ask it to explicitly re-use layers from non-locally built images by through the ‘cache_from’ parameter.

--engine <engine>#

Name of the container engine. Defaults to ‘docker’.