.. _docker-chapter: ===================== MediaGoblin in Docker ===================== Since version 0.14.0, Mediagoblin natively supports `Docker `_. We push release versions of Mediagoblin as Docker images to `the project's Docker Hub account `_. This makes it easy for anyone to spin up a new service in Docker with no more prerequisite than the Docker runtime itself. You can start a single standalone container using the official images published to Docker hub. For real deployments, it is however recommended to deploy a multi-container stack using, e.g., `Docker Compose `_. This page documents how to do either of those things. .. TODO: we'll get there later It is even possible to leverage Docker contexts to start the containers in the cloud, e.g., with `AWS ECS/Fargate `_. Data persistence, or sharing across containers, is done via volumes, mounted in `/srv`. In ECS, volumes are mounted from `EFS `_. Quickstart ========== A standalone container in charge of both serving and processing content can simply be started with .. parsed-literal:: docker run --interactive --tty \\ --publish=6543:6543 --volume=/PATH/TO/YOUR/DATA:/srv \\ mediagoblin/mediagoblin:|release| This will download the official image from Docker Hub, and create a container running Mediagoblin. It will be accessible at http://localhost:6543. The ``--publish`` option (or ``-p`` for short) makes the container's port 6543 available to the host. The ``--volume`` option (``-v`` for short) mount a path from local filesystem (``/PATH/TO/YOUR/DATA``, in this example) into the container. This is where Mediagoblin will store all its data. It can be empty initially, or have been previously initialised. The ``--interactive --tty`` (or ``-it``) options are not strictly needed, but they should allow you to terminate the running process by sending it a ``Ctrl+C``, rather than having to use ``docker kill``. .. note:: See further down in this section for more details on data persistence. On first run of the container, the administrator's password will be autogenerated, and shown (once, and only once) in the log output. .. parsed-literal:: =============================================================================== NEW ADMINISTRATOR ACCOUNT CREATED ADMIN_USER=admin ADMIN_PASSWORD= ADMIN_EMAIL=admin@example.com =============================================================================== .. note:: See further down in this section to learn how to choose or change the admin's username, password or email. First run ~~~~~~~~~ If all goes well, you should see the following output on first run. .. code-block:: bash usermod: no changes Creating missing configuration file paste.ini ... Creating missing configuration file mediagoblin.ini ... Creating empty database mediagoblin.db ... INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade -> 52bf0ccbedc1, initial revision INFO [alembic.runtime.migration] Running upgrade 52bf0ccbedc1 -> a98c1a320e88, Image media type initial migration INFO [alembic.runtime.migration] Running upgrade 52bf0ccbedc1 -> 101510e3a713, #5382 Removes graveyard items from collections INFO [alembic.runtime.migration] Running upgrade 101510e3a713 -> 8429e33fdf7, Remove the Graveyard objects from CommentNotification objects INFO [alembic.runtime.migration] Running upgrade 8429e33fdf7 -> 4066b9f8b84a, use_comment_link_ids_notifications INFO [alembic.runtime.migration] Running upgrade 4066b9f8b84a -> 3145accb8fe3, remove tombstone comment wrappers INFO [alembic.runtime.migration] Running upgrade 3145accb8fe3 -> 228916769bd2, ensure Report.object_id is nullable INFO [alembic.runtime.migration] Running upgrade 228916769bd2 -> cc3651803714, add main transcoding progress column to MediaEntry INFO [alembic.runtime.migration] Running upgrade 228916769bd2 -> afd3d1da5e29, Subtitle plugin initial migration Laying foundations for __main__: + Laying foundations for Privilege table Cannot link theme... no theme set Linked asset directory for plugin "coreplugin_basic_auth": /opt/mediagoblin/lib/python3.11/site-packages/mediagoblin/plugins/basic_auth/static to: /srv/user_dev/plugin_static/coreplugin_basic_auth Creating admin user ... User created (and email marked as verified). The user admin is now an admin. =============================================================================== NEW ADMINISTRATOR ACCOUNT CREATED ADMIN_USER=admin ADMIN_PASSWORD= ADMIN_EMAIL=admin@example.com =============================================================================== Running /opt/mediagoblin/lazyserver.sh -c ./paste.ini --server-name=broadcast ... Using paster config: ./paste.ini Using paster from $PATH + export CELERY_ALWAYS_EAGER=true + paster serve ./paste.ini --server-name=broadcast --reload Starting subprocess with file monitor 2024-07-14 08:09:30,760 INFO [mediagoblin.app] GNU MediaGoblin 0.14.0.dev main server starting 2024-07-14 08:09:31,054 INFO [mediagoblin.app] Setting up plugins. 2024-07-14 08:09:31,054 INFO [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.geolocation 2024-07-14 08:09:31,054 INFO [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.processing_info 2024-07-14 08:09:31,054 INFO [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.basic_auth 2024-07-14 08:09:31,054 INFO [mediagoblin.init.plugins] Importing plugin module: mediagoblin.media_types.image 2024-07-14 08:09:31,114 INFO [mediagoblin.init.celery] Setting celery configuration from object "mediagoblin.init.celery.dummy_settings_module" Starting server in PID 58. 2024-07-14 08:09:31,122 INFO [waitress] Serving on http://0.0.0.0:6543 It will be terser on subsequent runs, because configuration and databases already exist, and data migrations aren't necessary (unless upgrading to a new version of the container). You can confirm that the container is running happily with the ``docker ps`` command, which will show the running containers, ports and health status (if configured). .. code-block:: bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 541710f616d5 mediagoblin/mediagoblin:0.14.0.dev "/opt/mediagoblin/en…" 37 seconds ago Up 36 seconds (healthy) 0.0.0.0:6543->6543/tcp, :::6543->6543/tcp vibrant_germain At this point, you should be able to point your browser to http://locahost:6543 and be greeted by the Mediagoblin landing page. Data persistence ~~~~~~~~~~~~~~~~ Data in a Docker image is read-only. Any change in a live container remains until the container is destroyed, and is lost thereafter. This includes all changes made in `/srv`, where de Mediagoblin data resides. This is obviously not desirable for media storage. `Docker has support for various types of storage mechanisms `_ for this purpose. We saw in the previous section how to start the container in such a way that a local path is `bind-mounted `_ onto `/srv`. .. parsed-literal:: -v /PATH/TO/YOUR/DATA:/srv This means that any data written by the containers will be written to `/PATH/TO/YOUR/DATA` in the host filesystem. As it is outside of Docker's control, this data will persist even if the Mediagoblin container is destroyed. A new container instance can then be restarted with the same bind-mount volume option. It will resume serving the data transparently. This is useful for backups, as well as as an upgrade path between subsequent versions of Mediagoblin without losing data. Starting with an empty data directory, the container will create the configuration and the database on first run. You can confirm it with ``ls /PATH/TO/YOUR/DATA`` outside of the container. .. code-block:: bash $ ls /PATH/TO/YOUR/DATA mediagoblin mediagoblin.db mediagoblin.ini paste.ini user_dev You can also make manual changes to the data if needed. .. warning:: The argument to the `--volume` option must be an absolute path, otherwise it will be interpreted as the name of a Docker volume. Using a `Docker volume `_ is another way to ensure data persistence across container recreation. Rather than writing data out into the specified host filesystem, Docker will manage the volume (`volume-name`, in the following example) internally. .. parsed-literal:: -v `volume-name`:/srv While this offers the same data persistence benefits, management of the data should be done with the ``docker volume`` command. Moreover, it may not be as straightforward to access and back-up without more Docker knowledge. Using this approach is therefore only recommended to users already familiar with it. Administrator account ~~~~~~~~~~~~~~~~~~~~~ A default administrator account is created by the entrypoint script. The login is ``admin``, and the password is automatically generated if unspecified. The details of the admin account are output in the logs the very first time a new instance is initialised. You can override both those values on first run, by passing overrides via the environment. .. parsed-literal:: docker run --p 6543 -v /PATH/TO/YOUR/DATA:/srv \\ mediagoblin/mediagoblin:|release| \\ -e ADMIN_USER=myadmin -e ADMIN_PASSWORD=generateme \\ mediagoblin/mediagoblin:|release| .. note:: If the ``ADMIN_PASSWORD`` is set to ``generateme`` (the default), it will be auto-generated on first run, i.e., when no database exists in the data directory yet. The generated password will be output, once, in the startup logs. Alternatively, you can change the current admin password after at anytime by using the ``gmg`` tool. .. parsed-literal:: docker run --p 6543 -v /PATH/TO/YOUR/DATA:/srv \\ mediagoblin/mediagoblin:|release| \\ gmg changepw admin `` You can, of course, use ``gmg`` in this way for any other task you would generally perform in non-containerised environments. Configuring plugins ~~~~~~~~~~~~~~~~~~~ By default, no plugin is enabled in the example configuration file. As for non-containerised deployments of Mediagoblin, :doc:`plugins can be enabled by adding relevant sections ` to the `mediagoblin.ini` configuation file. However, plugins can be preconfigured when a new containerised environment is initialised, by passing a snippet of configuration file, with embedded newlines, for the ``[plugins]`` section via the ``PLUGINS`` environment variable. .. parsed-literal:: docker run --interactive --tty \\ --p 6543 -v /PATH/TO/YOUR/DATA:/srv \\ -e PLUGINS='[[mediagoblin.media_types.audio]]\n[[mediagoblin.media_types.video]]\navailable_resolutions = 144p,240p\n' mediagoblin/mediagoblin:|release| This mechanism is only active on first initialisation of an empty data directory. It can however be forced by setting the ``FORCE_RECONFIG`` environment variable to ``true`` . .. parsed-literal:: ... -e FORCE_RECONFIG=true ... .. warning:: Force-reconfiguration has not been thoroughly tested, and may not behave flawlessly. Docker Compose stack ==================== Docker Compose allows to encode more details about how to run a container, such as volumes, ports and environments variables. This is done via `configuration file `_ instead of the command line. It also allows spinning up more that one container at a time, and setting up the necessary network environment so they can communicate with each other. Multiple configurations files can be used at the same time, to selectively configure or various aspect of the desired stack. Mediagoblin takes this approach, in providing a basic ``docker-compose.yml``, which contains shared options. .. , and a number of additional overlays allowing to run a non-lazy deployment locally, or a similar deployment in AWS ECS. .. note:: Historically, ``docker-compose`` was a command separate to ``docker`` itself, but functionality has now been merged and extended. This guide therefore uses the ``docker compose`` subcommand. Standalone service ~~~~~~~~~~~~~~~~~~ Prior to delving into multi-container stacks, you can have a look at the standalone ``docker-compose.standalone.yml`` which does very little more than the ``docker`` commands in the previous section. There are however two noteworthy differences. .. literalinclude:: ../../../docker-compose.standalone.yml :language: yaml First, in the ``volumes`` section, a named docker volume, ``mediagoblin-data`` is created for ``/srv``. As discussed before, the volume will be reused every time a stack is brought up. At the end of the file, in the ``volumes`` section, additional parameters are provided so the ``mediagoblin-data`` volume is actually mapped to a bind mount. It is configured to use the ``data`` subdirectory of the current path where the stack was started. Second, it uses an ``env_file``, which allows to conveniently pass a number of environment variables to the container. Those can include the parameters for of the ``ADMIN_PASSWORD``, or ``PLUGINS``, as discussed previously. These changes will be carried over through the next few sections. .. note:: ``docker compose`` uses file ``docker-compose.yml`` by default, which we'll discuss later. To use the standalone variation, the ``-f`` option can be used. .. code-block:: bash docker compose -f docker-compose.standalone.yml up .. note:: By default, docker will keep hold of the terminal, and output logs from the application. To regain use of the terminal, you can add the ``-d`` flag at the end of this command. To see the logs, you can then use ``docker compose logs -f``. As before, this will make the Mediagoblin instance available at http://localhost:6543/. You can log in as the admin, and upload a file before moving on to the next section. You can shut the container down with .. code-block:: bash docker compose -f docker-compose.standalone.yml down Multi-container stack ~~~~~~~~~~~~~~~~~~~~~ The previous section was a light introduction into ``docker-compose.yml`` files, but didn't achieve much. We can now move on to defining more than one service in the stack: separate Paste and Celery containers, with a side of RabbitMQ and Nginx. The basic ``docker-compose.yml`` file does just that. .. literalinclude:: ../../../docker-compose.yml :language: yaml It is fairly similar to the standalone setup, except it defines all three services. Both ``paste`` and ``celery`` are essentially the same, except for the ``command`` that is executed. Some additional environment variables are set in the ``environment`` section, most notably where to find RabbitMQ. The ``healthcheck`` of the Celery container is also adjusted to remain useful. One last service is started, based on the official RabbitMQ images, to support communication between both containers, and some start-up order rules are defined via the ``depends_on`` sections. As this configuration is in the default ``docker-compose.yml`` file, starting the stack up is fairly straight forward. .. code-block:: bash docker compose up As before, this stack uses the ``mediagoblin-data`` named volume, which is mounted in both Paste and Celery containers. If you started a fresh lazyserver before, and uploaded some test data, you should still be able to access it now. .. not relevant at the moment Working with named volumes ~~~~~~~~~~~~~~~~~~~~~~~~~~ So far, data has been stored in the ``mediagoblin-data`` named volume. You can list all existing named volumes with .. code-block:: bash docker volume ls If all the containers using them are down, you can also delete them. A quirk is that even though Docker Compose sees it as ``mediagoblin-data``, it prefixes its name with that of the stack getting created. By default this is simply the name of the directory with the compose file is. In this case, the full volume name will be ``mediagoblin_mediagoblin-data``. .. code-block:: bash docker volume rm mediagoblin_mediagoblin-data Nginx reverse-proxy ~~~~~~~~~~~~~~~~~~~ When running a non-test instance, it is not recommended to expose the application straight to the public internet. Instead, it is good practice to put a reverse-proxy in between, to handle the fine details of the HTTP protocol. Nginx tends to be a good choice. As discussed in :ref:`the deployment documentation `, the Nginx configuration needs to be adjusted to best work with Mediagoblin. For ease of use, we build and publish a pre-configured Nginx image to Docker hub alongside the Mediagoblin one. You can extend your Compose stack from ``docker-compose.yml`` by also including the Nginx service defined in ``docker-compose.nginx.yml``. .. code-block:: bash docker compose -f docker-compose.yml -f docker-compose.nginx.yml up For simplicity in your own deployment, you can include all ``services`` in a single file. .. note:: As the nginx container is added via an override, the ``paste`` container continues to expose it own port to the rest of the system. Third-party cloud providers =========================== Containerisation of Mediagoblin offers a new way to run the service on third-party hosts. However, as the containerisation of Mediagoblin is still very recent, we haven't explored the various cloud providers and deployments methods. If you've had success with this type of deployment, please consider `contributing your experience to the documentation `_! .. ecs .. code-block:: bash docker compose -f docker-compose.yml -f docker-compose.ecs.yml up Dockerised Build ================ .. XXX: move this to a separate section It is possible (and perhaps even preferred) to build Mediagoblin within a container. This will create a Docker image suitable to `run on its own `_ (using :ref:`lazyserver `), or as part of a `Docker Compose `_ stack with separate containers for Paste, Celery, and RabbitMQ, as well as the optional pre-configured Nginx. Core container ~~~~~~~~~~~~~~ Unlike a local build, the only dependency required by a Docker build is the ``docker`` tool itself. When present, the ``configure`` script will prefer this approach (unless ``--without-docker`` is explicitely passed). The steps to perform a build nonetheless follow the familiar incantation. .. code-block:: bash ./configure && make This will create a build stage with the necessary build dependencies, such as ``npm`` and ``-dev`` packages, create a final image containing the built package, and run the tests within a container started from that image. The name of the image will be ``mediagoblin/mediagoblin:``, where ```` is set from the ``configure.ac`` e.g., ``mediagoblin/mediagoblin:0.14.0.dev``. When building this way, the dependencies for most plugins (:ref:`media types ` and :ref:`core plugins `) are included. Two notable exceptions are support of :ref:`Documents ` (but not PDFs), and :ref:`STL files `. Their dependencies (``unoconv`` and ``blender``, respectively) were deemed too large to include by default. While the ``make``-based build is the simplest, it is possible to build custom containers, with a preferred set of dependencies, directly with ``docker build .``. Detailing this process is beyond the scope of this chapter. However you can have a look at the ``Dockerfile`` to see what build arguments (``ARG``, configurable via ``--build-arg``), are supported. Python wheel and documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is also possible to build the Python Wheel and the docs out of the image, with .. code-block:: bash make dist # and make docs respectively. .. note:: While the wheel is getting built successfully, it is still a work in progress, and it has not been tested yet. Preconfigured Nginx image ~~~~~~~~~~~~~~~~~~~~~~~~~ As part of the Docker-based build process, a dedicated ``Dockerfile.nginx`` is also created. This allows us to build the pre-configured Nginx Docker image which gets pushed to Docker hub. .. parsed-literal:: docker build -f Dockerfile.nginx . -t mediagoblin/nginx:|release| .. Development ~~~~~~~~~~~ TODO Main motivation: working on MG code in docke .. code-block:: bash docker run -it -p 6543:6543 \ -v $(pwd):/opt/mediagoblin \ -v $(pwd)/data:/srv \ mediagoblin:|release|