diff --git a/docs/docs/features/ml-hardware-acceleration.md b/docs/docs/features/ml-hardware-acceleration.md
index 9f2d33cc35..ca1cb8edb1 100644
--- a/docs/docs/features/ml-hardware-acceleration.md
+++ b/docs/docs/features/ml-hardware-acceleration.md
@@ -53,6 +53,12 @@ You do not need to redo any machine learning jobs after enabling hardware accele
 3. Still in `immich-machine-learning`, add one of -[armnn, cuda, openvino] to the `image` section's tag at the end of the line.
 4. Redeploy the `immich-machine-learning` container with these updated settings.
 
+### Confirming Device Usage
+
+You can confirm the device is being recognized and used by checking its utilization. There are many tools to display this, such as `nvtop` for NVIDIA or Intel and `intel_gpu_top` for Intel.
+
+You can also check the logs of the `immich-machine-learning` container. When a Smart Search or Face Detection job begins, or when you search with text in Immich, you should either see a log for `Available ORT providers` containing the relevant provider (e.g. `CUDAExecutionProvider` in the case of CUDA), or a `Loaded ANN model` log entry without errors in the case of ARM NN.
+
 #### Single Compose File
 
 Some platforms, including Unraid and Portainer, do not support multiple Compose files as of writing. As an alternative, you can "inline" the relevant contents of the [`hwaccel.ml.yml`][hw-file] file into the `immich-machine-learning` service directly.
@@ -95,9 +101,22 @@ immich-machine-learning:
 
 Once this is done, you can redeploy the `immich-machine-learning` container.
 
-:::info
-You can confirm the device is being recognized and used by checking its utilization (via `nvtop` for CUDA, `intel_gpu_top` for OpenVINO, etc.). You can also enable debug logging by setting `IMMICH_LOG_LEVEL=debug` in the `.env` file and restarting the `immich-machine-learning` container. When a Smart Search or Face Detection job begins, you should see a log for `Available ORT providers` containing the relevant provider. In the case of ARM NN, the absence of a `Could not load ANN shared libraries` log entry means it loaded successfully.
-:::
+#### Multi-GPU
+
+If you want to utilize multiple NVIDIA or Intel GPUs, you can set the `MACHINE_LEARNING_DEVICE_IDS` environmental variable to a comma-separated list of device IDs and set `MACHINE_LEARNING_WORKERS` to the number of listed devices. You can run a command such as `nvidia-smi -L` or `glxinfo -B` to see the currently available devices and their corresponding IDs.
+
+For example, if you have devices 0 and 1, set the values as follows:
+
+```
+MACHINE_LEARNING_DEVICE_IDS=0,1
+MACHINE_LEARNING_WORKERS=2
+```
+
+In this example, the machine learning service will spawn two workers, one of which will allocate models to device 0 and the other to device 1. Different requests will be processed by one worker or the other.
+
+This approach can be used to simply specify a particular device as well. For example, setting `MACHINE_LEARNING_DEVICE_IDS=1` will ensure device 1 is always used instead of device 0.
+
+Note that you should increase job concurrencies to increase overall utilization and more effectively distribute work across multiple GPUs. Additionally, each GPU must be able to load all models. It is not possible to distribute a single model to multiple GPUs that individually have insufficient VRAM, or to delegate a specific model to one GPU.
 
 [hw-file]: https://github.com/immich-app/immich/releases/latest/download/hwaccel.ml.yml
 [nvct]: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
diff --git a/docs/docs/install/environment-variables.md b/docs/docs/install/environment-variables.md
index 29549586d3..b8fdf72234 100644
--- a/docs/docs/install/environment-variables.md
+++ b/docs/docs/install/environment-variables.md
@@ -164,6 +164,7 @@ Redis (Sentinel) URL example JSON before encoding:
 | `MACHINE_LEARNING_ANN`                                    | Enable ARM-NN hardware acceleration if supported                                                    |                `True`                 | machine learning |
 | `MACHINE_LEARNING_ANN_FP16_TURBO`                         | Execute operations in FP16 precision: increasing speed, reducing precision (applies only to ARM-NN) |                `False`                | machine learning |
 | `MACHINE_LEARNING_ANN_TUNING_LEVEL`                       | ARM-NN GPU tuning level (1: rapid, 2: normal, 3: exhaustive)                                        |                  `2`                  | machine learning |
+| `MACHINE_LEARNING_DEVICE_IDS`<sup>\*4</sup>               | Device IDs to use in multi-GPU environments                                                         |                  `0`                  | machine learning |
 
 \*1: It is recommended to begin with this parameter when changing the concurrency levels of the machine learning service and then tune the other ones.
 
@@ -171,6 +172,8 @@ Redis (Sentinel) URL example JSON before encoding:
 
 \*3: For scenarios like HPA in K8S. https://github.com/immich-app/immich/discussions/12064
 
+\*4: Using multiple GPUs requires `MACHINE_LEARNING_WORKERS` to be set greater than 1. A single device is assigned to each worker in round-robin priority.
+
 :::info
 
 Other machine learning parameters can be tuned from the admin UI.
diff --git a/machine-learning/Dockerfile b/machine-learning/Dockerfile
index 3bfdf7d2e2..155d78f4a3 100644
--- a/machine-learning/Dockerfile
+++ b/machine-learning/Dockerfile
@@ -104,7 +104,7 @@ RUN echo "hard core 0" >> /etc/security/limits.conf && \
 
 COPY --from=builder /opt/venv /opt/venv
 COPY ann/ann.py /usr/src/ann/ann.py
-COPY start.sh log_conf.json ./
+COPY start.sh log_conf.json gunicorn_conf.py ./
 COPY app .
 ENTRYPOINT ["tini", "--"]
 CMD ["./start.sh"]
diff --git a/machine-learning/app/config.py b/machine-learning/app/config.py
index af2d0aa4b9..52be0a30c8 100644
--- a/machine-learning/app/config.py
+++ b/machine-learning/app/config.py
@@ -39,6 +39,10 @@ class Settings(BaseSettings):
         case_sensitive = False
         env_nested_delimiter = "__"
 
+    @property
+    def device_id(self) -> str:
+        return os.environ.get("MACHINE_LEARNING_DEVICE_ID", "0")
+
 
 class LogSettings(BaseSettings):
     immich_log_level: str = "info"
diff --git a/machine-learning/app/sessions/ort.py b/machine-learning/app/sessions/ort.py
index 1a244b7c57..00c7ad50a9 100644
--- a/machine-learning/app/sessions/ort.py
+++ b/machine-learning/app/sessions/ort.py
@@ -86,11 +86,13 @@ class OrtSession:
         provider_options = []
         for provider in self.providers:
             match provider:
-                case "CPUExecutionProvider" | "CUDAExecutionProvider":
+                case "CPUExecutionProvider":
                     options = {"arena_extend_strategy": "kSameAsRequested"}
+                case "CUDAExecutionProvider":
+                    options = {"arena_extend_strategy": "kSameAsRequested", "device_id": settings.device_id}
                 case "OpenVINOExecutionProvider":
                     options = {
-                        "device_type": "GPU",
+                        "device_type": f"GPU.{settings.device_id}",
                         "precision": "FP32",
                         "cache_dir": (self.model_path.parent / "openvino").as_posix(),
                     }
diff --git a/machine-learning/app/test_main.py b/machine-learning/app/test_main.py
index 5f8e5b9e9c..ad8986d572 100644
--- a/machine-learning/app/test_main.py
+++ b/machine-learning/app/test_main.py
@@ -210,10 +210,24 @@ class TestOrtSession:
         session = OrtSession(model_path, providers=["OpenVINOExecutionProvider", "CPUExecutionProvider"])
 
         assert session.provider_options == [
-            {"device_type": "GPU", "precision": "FP32", "cache_dir": "/cache/ViT-B-32__openai/openvino"},
+            {"device_type": "GPU.0", "precision": "FP32", "cache_dir": "/cache/ViT-B-32__openai/openvino"},
             {"arena_extend_strategy": "kSameAsRequested"},
         ]
 
+    def test_sets_device_id_for_openvino(self) -> None:
+        os.environ["MACHINE_LEARNING_DEVICE_ID"] = "1"
+
+        session = OrtSession("ViT-B-32__openai", providers=["OpenVINOExecutionProvider"])
+
+        assert session.provider_options[0]["device_type"] == "GPU.1"
+
+    def test_sets_device_id_for_cuda(self) -> None:
+        os.environ["MACHINE_LEARNING_DEVICE_ID"] = "1"
+
+        session = OrtSession("ViT-B-32__openai", providers=["CUDAExecutionProvider"])
+
+        assert session.provider_options[0]["device_id"] == "1"
+
     def test_sets_provider_options_kwarg(self) -> None:
         session = OrtSession(
             "ViT-B-32__openai",
diff --git a/machine-learning/gunicorn_conf.py b/machine-learning/gunicorn_conf.py
new file mode 100644
index 0000000000..efec3a95aa
--- /dev/null
+++ b/machine-learning/gunicorn_conf.py
@@ -0,0 +1,12 @@
+import os
+
+from gunicorn.arbiter import Arbiter
+from gunicorn.workers.base import Worker
+
+device_ids = os.environ.get("MACHINE_LEARNING_DEVICE_IDS", "0").replace(" ", "").split(",")
+env = os.environ
+
+
+# Round-robin device assignment for each worker
+def pre_fork(arbiter: Arbiter, _: Worker) -> None:
+    env["MACHINE_LEARNING_DEVICE_ID"] = device_ids[len(arbiter.WORKERS) % len(device_ids)]
diff --git a/machine-learning/start.sh b/machine-learning/start.sh
index c3fda523df..552cca1f5e 100755
--- a/machine-learning/start.sh
+++ b/machine-learning/start.sh
@@ -17,6 +17,7 @@ fi
 
 gunicorn app.main:app \
 	-k app.config.CustomUvicornWorker \
+	-c gunicorn_conf.py \
 	-b "$IMMICH_HOST":"$IMMICH_PORT" \
 	-w "$MACHINE_LEARNING_WORKERS" \
 	-t "$MACHINE_LEARNING_WORKER_TIMEOUT" \