.. _docker-chapter:
=====================
MediaGoblin in Docker
=====================
Since version 0.14.0, Mediagoblin natively supports `Docker
`_. We push release versions of Mediagoblin as Docker
images to `the project's Docker Hub account
`_. This makes it easy for anyone to spin
up a new service in Docker with no more prerequisite than the Docker runtime
itself.
You can start a single standalone container using the official images published
to Docker hub. For real deployments, it is however recommended to deploy a
multi-container stack using, e.g., `Docker Compose
`_.
This page documents how to do either of those things.
..
TODO: we'll get there later
It is even possible to leverage Docker contexts
to start the containers in the cloud, e.g., with `AWS ECS/Fargate
`_.
Data persistence, or sharing across containers, is done via volumes, mounted in
`/srv`. In ECS, volumes are mounted from `EFS
`_.
Quickstart
==========
A standalone container in charge of both serving and processing content can
simply be started with
.. parsed-literal::
docker run --interactive --tty \\
--publish=6543:6543 --volume=/PATH/TO/YOUR/DATA:/srv \\
mediagoblin/mediagoblin:|release|
This will download the official image from Docker Hub, and create a container
running Mediagoblin. It will be accessible at http://localhost:6543.
The ``--publish`` option (or ``-p`` for short) makes the container's port 6543
available to the host.
The ``--volume`` option (``-v`` for short) mount a path from local filesystem
(``/PATH/TO/YOUR/DATA``, in this example) into the container. This is where
Mediagoblin will store all its data. It can be empty initially, or have been
previously initialised.
The ``--interactive --tty`` (or ``-it``) options are not strictly needed, but
they should allow you to terminate the running process by sending it a
``Ctrl+C``, rather than having to use ``docker kill``.
.. note:: See further down in this section for more details on data persistence.
On first run of the container, the administrator's password will be
autogenerated, and shown (once, and only once) in the log output.
.. parsed-literal::
===============================================================================
NEW ADMINISTRATOR ACCOUNT CREATED
ADMIN_USER=admin
ADMIN_PASSWORD=
ADMIN_EMAIL=admin@example.com
===============================================================================
.. note:: See further down in this section to learn how to choose or change the
admin's username, password or email.
First run
~~~~~~~~~
If all goes well, you should see the following output on first run.
.. code-block:: bash
usermod: no changes
Creating missing configuration file paste.ini ...
Creating missing configuration file mediagoblin.ini ...
Creating empty database mediagoblin.db ...
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 52bf0ccbedc1, initial revision
INFO [alembic.runtime.migration] Running upgrade 52bf0ccbedc1 -> a98c1a320e88, Image media type initial migration
INFO [alembic.runtime.migration] Running upgrade 52bf0ccbedc1 -> 101510e3a713, #5382 Removes graveyard items from collections
INFO [alembic.runtime.migration] Running upgrade 101510e3a713 -> 8429e33fdf7, Remove the Graveyard objects from CommentNotification objects
INFO [alembic.runtime.migration] Running upgrade 8429e33fdf7 -> 4066b9f8b84a, use_comment_link_ids_notifications
INFO [alembic.runtime.migration] Running upgrade 4066b9f8b84a -> 3145accb8fe3, remove tombstone comment wrappers
INFO [alembic.runtime.migration] Running upgrade 3145accb8fe3 -> 228916769bd2, ensure Report.object_id is nullable
INFO [alembic.runtime.migration] Running upgrade 228916769bd2 -> cc3651803714, add main transcoding progress column to MediaEntry
INFO [alembic.runtime.migration] Running upgrade 228916769bd2 -> afd3d1da5e29, Subtitle plugin initial migration
Laying foundations for __main__:
+ Laying foundations for Privilege table
Cannot link theme... no theme set
Linked asset directory for plugin "coreplugin_basic_auth":
/opt/mediagoblin/lib/python3.11/site-packages/mediagoblin/plugins/basic_auth/static
to:
/srv/user_dev/plugin_static/coreplugin_basic_auth
Creating admin user ...
User created (and email marked as verified).
The user admin is now an admin.
===============================================================================
NEW ADMINISTRATOR ACCOUNT CREATED
ADMIN_USER=admin
ADMIN_PASSWORD=
ADMIN_EMAIL=admin@example.com
===============================================================================
Running /opt/mediagoblin/lazyserver.sh -c ./paste.ini --server-name=broadcast ...
Using paster config: ./paste.ini
Using paster from $PATH
+ export CELERY_ALWAYS_EAGER=true
+ paster serve ./paste.ini --server-name=broadcast --reload
Starting subprocess with file monitor
2024-07-14 08:09:30,760 INFO [mediagoblin.app] GNU MediaGoblin 0.14.0.dev main server starting
2024-07-14 08:09:31,054 INFO [mediagoblin.app] Setting up plugins.
2024-07-14 08:09:31,054 INFO [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.geolocation
2024-07-14 08:09:31,054 INFO [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.processing_info
2024-07-14 08:09:31,054 INFO [mediagoblin.init.plugins] Importing plugin module: mediagoblin.plugins.basic_auth
2024-07-14 08:09:31,054 INFO [mediagoblin.init.plugins] Importing plugin module: mediagoblin.media_types.image
2024-07-14 08:09:31,114 INFO [mediagoblin.init.celery] Setting celery configuration from object "mediagoblin.init.celery.dummy_settings_module"
Starting server in PID 58.
2024-07-14 08:09:31,122 INFO [waitress] Serving on http://0.0.0.0:6543
It will be terser on subsequent runs, because configuration and databases
already exist, and data migrations aren't necessary (unless upgrading to a new version of the container).
You can confirm that the container is running happily with the ``docker ps``
command, which will show the running containers, ports and health status (if configured).
.. code-block:: bash
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
541710f616d5 mediagoblin/mediagoblin:0.14.0.dev "/opt/mediagoblin/en…" 37 seconds ago Up 36 seconds (healthy) 0.0.0.0:6543->6543/tcp, :::6543->6543/tcp vibrant_germain
At this point, you should be able to point your browser to http://locahost:6543
and be greeted by the Mediagoblin landing page.
Data persistence
~~~~~~~~~~~~~~~~
Data in a Docker image is read-only. Any change in a live container remains
until the container is destroyed, and is lost thereafter. This includes all
changes made in `/srv`, where de Mediagoblin data resides. This is obviously
not desirable for media storage.
`Docker has support for various types of storage mechanisms
`_ for this purpose. We saw in the
previous section how to start the container in such a way that a local path is
`bind-mounted `_ onto `/srv`.
.. parsed-literal::
-v /PATH/TO/YOUR/DATA:/srv
This means that any data written by the containers will be written to
`/PATH/TO/YOUR/DATA` in the host filesystem. As it is outside of Docker's
control, this data will persist even if the Mediagoblin container is
destroyed.
A new container instance can then be restarted with the same bind-mount volume
option. It will resume serving the data transparently. This is useful for
backups, as well as as an upgrade path between subsequent versions of
Mediagoblin without losing data.
Starting with an empty data directory, the container will create the
configuration and the database on first run. You can confirm it with ``ls
/PATH/TO/YOUR/DATA`` outside of the container.
.. code-block:: bash
$ ls /PATH/TO/YOUR/DATA
mediagoblin mediagoblin.db mediagoblin.ini paste.ini user_dev
You can also make manual changes to the data if needed.
.. warning::
The argument to the `--volume` option must be an absolute path, otherwise it
will be interpreted as the name of a Docker volume.
Using a `Docker volume `_ is another
way to ensure data persistence across container recreation. Rather than writing
data out into the specified host filesystem, Docker will manage the volume
(`volume-name`, in the following example) internally.
.. parsed-literal::
-v `volume-name`:/srv
While this offers the same data persistence benefits, management of the data
should be done with the ``docker volume`` command. Moreover, it may not be as
straightforward to access and back-up without more Docker knowledge. Using this
approach is therefore only recommended to users already familiar with it.
Administrator account
~~~~~~~~~~~~~~~~~~~~~
A default administrator account
is created by the entrypoint script. The login is ``admin``, and the
password is automatically generated if unspecified. The details of the admin
account are output in the logs the very first time a new instance is initialised.
You can override both those values on first run, by passing overrides via the
environment.
.. parsed-literal::
docker run --p 6543 -v /PATH/TO/YOUR/DATA:/srv \\
mediagoblin/mediagoblin:|release| \\
-e ADMIN_USER=myadmin -e ADMIN_PASSWORD=generateme \\
mediagoblin/mediagoblin:|release|
.. note::
If the ``ADMIN_PASSWORD`` is set to ``generateme`` (the default), it will be
auto-generated on first run, i.e., when no database exists in the data
directory yet. The generated password will be output, once, in the startup
logs.
Alternatively, you can change the current admin password after at anytime by
using the ``gmg`` tool.
.. parsed-literal::
docker run --p 6543 -v /PATH/TO/YOUR/DATA:/srv \\
mediagoblin/mediagoblin:|release| \\
gmg changepw admin ``
You can, of course, use ``gmg`` in this way for any other task you would
generally perform in non-containerised environments.
Configuring plugins
~~~~~~~~~~~~~~~~~~~
By default, no plugin is enabled in the example configuration file. As for
non-containerised deployments of Mediagoblin, :doc:`plugins can be enabled by
adding relevant sections ` to the `mediagoblin.ini`
configuation file.
However, plugins can be preconfigured when a new containerised environment is
initialised, by passing a snippet of configuration file, with embedded
newlines, for the ``[plugins]`` section via the ``PLUGINS`` environment variable.
.. parsed-literal::
docker run --interactive --tty \\
--p 6543 -v /PATH/TO/YOUR/DATA:/srv \\
-e PLUGINS='[[mediagoblin.media_types.audio]]\n[[mediagoblin.media_types.video]]\navailable_resolutions = 144p,240p\n'
mediagoblin/mediagoblin:|release|
This mechanism is only active on first initialisation of an empty data
directory. It can however be forced by setting the ``FORCE_RECONFIG``
environment variable to ``true`` .
.. parsed-literal::
... -e FORCE_RECONFIG=true ...
.. warning::
Force-reconfiguration has not been thoroughly tested, and may not behave flawlessly.
Docker Compose stack
====================
Docker Compose allows to encode more details about how to run a container, such
as volumes, ports and environments variables. This is done via `configuration
file `_ instead of the command
line. It also allows spinning up more that one container at a time, and setting
up the necessary network environment so they can communicate with each other.
Multiple configurations files can be used at the same time, to selectively
configure or various aspect of the desired stack. Mediagoblin takes this
approach, in providing a basic ``docker-compose.yml``, which contains shared
options.
..
, and a number of additional overlays allowing to run a non-lazy
deployment locally, or a similar deployment in AWS ECS.
.. note::
Historically, ``docker-compose`` was a command separate to ``docker``
itself, but functionality has now been merged and extended. This guide
therefore uses the ``docker compose`` subcommand.
Standalone service
~~~~~~~~~~~~~~~~~~
Prior to delving into multi-container stacks, you can have a look at the
standalone ``docker-compose.standalone.yml`` which does very little more than
the ``docker`` commands in the previous section. There are however two
noteworthy differences.
.. literalinclude:: ../../../docker-compose.standalone.yml
:language: yaml
First, in the ``volumes`` section, a named docker volume, ``mediagoblin-data``
is created for ``/srv``. As discussed before, the volume will be reused every
time a stack is brought up. At the end of the file, in the ``volumes`` section,
additional parameters are provided so the ``mediagoblin-data`` volume is
actually mapped to a bind mount. It is configured to use the ``data``
subdirectory of the current path where the stack was started.
Second, it uses an ``env_file``, which allows to conveniently pass a number of
environment variables to the container. Those can include the parameters for of
the ``ADMIN_PASSWORD``, or ``PLUGINS``, as discussed previously.
These changes will be carried over through the next few sections.
.. note::
``docker compose`` uses file ``docker-compose.yml`` by default, which we'll
discuss later. To use the standalone variation, the ``-f`` option can be used.
.. code-block:: bash
docker compose -f docker-compose.standalone.yml up
.. note::
By default, docker will keep hold of the terminal, and output logs from the
application. To regain use of the terminal, you can add the ``-d`` flag at the
end of this command. To see the logs, you can then use ``docker compose logs
-f``.
As before, this will make the Mediagoblin instance available at
http://localhost:6543/. You can log in as the admin, and upload a file before moving on to the next section.
You can shut the container down with
.. code-block:: bash
docker compose -f docker-compose.standalone.yml down
Multi-container stack
~~~~~~~~~~~~~~~~~~~~~
The previous section was a light introduction into ``docker-compose.yml``
files, but didn't achieve much. We can now move on to defining more than one
service in the stack: separate Paste and Celery containers, with a side of
RabbitMQ and Nginx.
The basic ``docker-compose.yml`` file does just that.
.. literalinclude:: ../../../docker-compose.yml
:language: yaml
It is fairly similar to the standalone setup, except it defines all three
services. Both ``paste`` and ``celery`` are essentially the same, except for
the ``command`` that is executed. Some additional environment variables are set
in the ``environment`` section, most notably where to find RabbitMQ. The
``healthcheck`` of the Celery container is also adjusted to remain useful.
One last service is started, based on the official RabbitMQ images, to support
communication between both containers, and some start-up order rules are
defined via the ``depends_on`` sections.
As this configuration is in the default ``docker-compose.yml`` file, starting the stack up is fairly straight forward.
.. code-block:: bash
docker compose up
As before, this stack uses the ``mediagoblin-data`` named volume, which is
mounted in both Paste and Celery containers. If you started a fresh lazyserver
before, and uploaded some test data, you should still be able to access it now.
.. not relevant at the moment
Working with named volumes
~~~~~~~~~~~~~~~~~~~~~~~~~~
So far, data has been stored in the ``mediagoblin-data`` named volume. You can
list all existing named volumes with
.. code-block:: bash
docker volume ls
If all the containers using them are down, you can also delete them. A quirk is
that even though Docker Compose sees it as ``mediagoblin-data``, it prefixes
its name with that of the stack getting created. By default this is simply the
name of the directory with the compose file is. In this case, the full volume
name will be ``mediagoblin_mediagoblin-data``.
.. code-block:: bash
docker volume rm mediagoblin_mediagoblin-data
Nginx reverse-proxy
~~~~~~~~~~~~~~~~~~~
When running a non-test instance, it is not recommended to expose the
application straight to the public internet. Instead, it is good practice to
put a reverse-proxy in between, to handle the fine details of the HTTP
protocol. Nginx tends to be a good choice.
As discussed in :ref:`the deployment documentation `, the
Nginx configuration needs to be adjusted to best work with Mediagoblin. For
ease of use, we build and publish a pre-configured Nginx image to Docker hub
alongside the Mediagoblin one.
You can extend your Compose stack from ``docker-compose.yml`` by also including the Nginx service defined in ``docker-compose.nginx.yml``.
.. code-block:: bash
docker compose -f docker-compose.yml -f docker-compose.nginx.yml up
For simplicity in your own deployment, you can include all ``services`` in a
single file.
.. note:: As the nginx container is added via an override, the ``paste``
container continues to expose it own port to the rest of the system.
Third-party cloud providers
===========================
Containerisation of Mediagoblin offers a new way to run the service on
third-party hosts. However, as the containerisation of Mediagoblin is still
very recent, we haven't explored the various cloud providers and deployments
methods.
If you've had success with this type of deployment, please consider
`contributing your experience to the documentation
`_!
..
ecs
.. code-block:: bash
docker compose -f docker-compose.yml -f docker-compose.ecs.yml up
Dockerised Build
================
..
XXX: move this to a separate section
It is possible (and perhaps even preferred) to build Mediagoblin within a
container.
This will create a Docker image suitable to `run on its own
`_ (using :ref:`lazyserver
`), or as part of a `Docker Compose
`_ stack with separate containers
for Paste, Celery, and RabbitMQ, as well as the optional pre-configured Nginx.
Core container
~~~~~~~~~~~~~~
Unlike a local build, the only dependency required by a Docker build is the
``docker`` tool itself. When present, the ``configure`` script will prefer this
approach (unless ``--without-docker`` is explicitely passed).
The steps to perform a build nonetheless follow the familiar incantation.
.. code-block:: bash
./configure && make
This will create a build stage with the necessary build dependencies, such as
``npm`` and ``-dev`` packages, create a final image containing the built package,
and run the tests within a container started from that image.
The name of the image will be ``mediagoblin/mediagoblin:``, where
```` is set from the ``configure.ac`` e.g.,
``mediagoblin/mediagoblin:0.14.0.dev``.
When building this way, the dependencies for most plugins (:ref:`media types
` and :ref:`core plugins `) are
included. Two notable exceptions are support of :ref:`Documents ` (but
not PDFs), and :ref:`STL files `. Their dependencies (``unoconv`` and
``blender``, respectively) were deemed too large to include by default.
While the ``make``-based build is the simplest, it is possible to build custom
containers, with a preferred set of dependencies, directly with ``docker build
.``. Detailing this process is beyond the scope of this chapter. However you can
have a look at the ``Dockerfile`` to see what build arguments (``ARG``,
configurable via ``--build-arg``), are supported.
Python wheel and documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is also possible to build the Python Wheel and the docs out of the image,
with
.. code-block:: bash
make dist
# and
make docs
respectively.
.. note::
While the wheel is getting built successfully, it is still a work in
progress, and it has not been tested yet.
Preconfigured Nginx image
~~~~~~~~~~~~~~~~~~~~~~~~~
As part of the Docker-based build process, a dedicated ``Dockerfile.nginx`` is
also created. This allows us to build the pre-configured Nginx Docker image which gets pushed to Docker hub.
.. parsed-literal::
docker build -f Dockerfile.nginx . -t mediagoblin/nginx:|release|
..
Development
~~~~~~~~~~~
TODO
Main motivation: working on MG code in docke
.. code-block:: bash
docker run -it -p 6543:6543 \
-v $(pwd):/opt/mediagoblin \
-v $(pwd)/data:/srv \
mediagoblin:|release|