From cb437829f3d5f291ed28709508f75fd37064e122 Mon Sep 17 00:00:00 2001
From: Mert <101130780+mertalev@users.noreply.github.com>
Date: Tue, 12 Sep 2023 02:22:42 -0400
Subject: [PATCH] chore(docs): updated ML documentation (#4063)

---
 docs/docs/FAQ.md                           | 37 ++++++++++++++++++----
 docs/docs/developer/architecture.md        |  6 ++++
 docs/docs/install/config-file.md           | 23 +++++++++++++-
 docs/docs/install/environment-variables.md | 20 +++++++-----
 machine-learning/README.md                 |  6 ++--
 5 files changed, 75 insertions(+), 17 deletions(-)

diff --git a/docs/docs/FAQ.md b/docs/docs/FAQ.md
index b7a61b543a..4548cba372 100644
--- a/docs/docs/FAQ.md
+++ b/docs/docs/FAQ.md
@@ -39,15 +39,40 @@ This often happens when using a reverse proxy or cloudflare tunnel in front of I
 
 ### Why is Immich slow on low-memory systems like the Raspberry Pi?
 
-Immich uses optional machine-learning features to enhance search results. This feature, however, can be too heavy to run on a Raspberry Pi. To disable machine learning, comment out the `immich-machine-learning` section of your docker-compose.yml and set `IMMICH_MACHINE_LEARNING_ENABLED=false` in your .env file.
+Immich optionally uses machine learning for several features. However, it can be too heavy to run on a Raspberry Pi. You can [mitigate](/docs/FAQ.md#how-can-i-disable-machine-learning#how-can-i-lower-immichs-cpu-usage) this or [disable](/docs/FAQ.md#how-can-i-disable-machine-learning) machine learning entirely.
 
-### How to disable machine-learning and TypeSense?
+### How can I lower Immich's CPU usage?
 
-:::warning
-Disabling both will result in poor search experience and typesense utilizes CLIP embeddings which are generated by machine-learning.
+The initial backup is the most intensive due to the number of jobs running. The most CPU-intensive ones are transcoding and machine learning jobs (Tag Images, Encode CLIP, Recognize Faces), and to a lesser extent thumbnail generation. Here are some ways to lower their CPU usage:
+
+- Lower the job concurrency for these jobs to 1.
+- Under Settings > Transcoding Settings > Threads, set the number of threads to a low number like 1 or 2.
+- Set the `TYPESENSE_THREAD_POOL_SIZE` environmental variable and restart the Typesense container. For instance, `TYPESENSE_THREAD_POOL_SIZE=8` will limit it to 8 threads.
+- Under Settings > Machine Learning Settings > Facial Recognition > Model Name, you can change the facial recognition model to `buffalo_s` instead of `buffalo_l`. The former is a smaller and faster model, albeit not as good.
+  - You _must_ re-run the Recognize Faces job for all images after this for facial recognition on new images to work properly.
+- If these changes are not enough, see [below](/docs/FAQ.md#how-can-i-disable-machine-learning) for how you can disable machine learning.
+
+### How can I disable machine learning?
+
+:::info
+Disabling machine learning will result in a poor experience for searching and the 'Explore' page, as these are reliant on it to work as intended.
 :::
 
-These features can be disabled by commenting out `immich-typesense` and `immich-machine-learning` sections of the docker-compose.yml and setting `IMMICH_MACHINE_LEARNING_ENABLED=false` & `TYPESENSE_ENABLED=false` in your .env file.
+Machine learning can be disabled under Settings > Machine Learning Settings, either entirely or by model type. For instance, you can choose to disable smart search with CLIP, but keep facial recognition enabled. This means that the machine learning service will only process the enabled jobs.
+
+However, disabling all jobs will not disable the machine learning service itself. To prevent it from starting up at all in this case, you can comment out the `immich-machine-learning` section of the docker-compose.yml.
+
+### How can I disable TypeSense?
+
+:::info
+Disabling Typesense will result in a poor search experience since searching is reliant on it.
+:::
+
+You can disable Typesense by commenting out the `immich-typesense` section of the docker-compose.yml and setting `TYPESENSE_ENABLED=false` in your .env file.
+
+### I'm getting errors about models being corrupt or failing to download. What do I do?
+
+You can delete the model cache volume, which is where models are downloaded. This will give the service a clean environment to download the model again.
 
 ### What happens to existing files after I choose a new [Storage Template](/docs/administration/storage-template.mdx)?
 
@@ -59,7 +84,7 @@ This is fixed by running the storage migration job.
 
 ### Why is object detection not very good?
 
-The model we used for machine learning is a prebuilt model, so the accuracy is not very good. It will hopefully be replaced with a better solution in the future.
+The default image tagging model is relatively small. You can change this for a larger model like `google/vit-base-patch16-224` by setting the model name under Settings > Machine Learning Settings > Image Tagging. You can then re-run the Image Tagging job to get improved tags.
 
 ### How can I see Immich logs?
 
diff --git a/docs/docs/developer/architecture.md b/docs/docs/developer/architecture.md
index 26faafabb3..36e6ea939f 100644
--- a/docs/docs/developer/architecture.md
+++ b/docs/docs/developer/architecture.md
@@ -89,6 +89,12 @@ The machine learning service is written in [Python](https://www.python.org/) and
 
 All machine learning related operations have been externalized to this service, `immich-machine-learning`. Python is a natural choice for AI and machine learning. It also has some pretty specific hardware requirements. Running it as a separate container makes it possible to run the container on a separate machine, or easily disable it entirely.
 
+Each request to the machine learning service contains the relevant metadata for the model task, model name, and so on. These settings are stored in Postgres along with other system configs. For each request, the microservices container fetches these settings in order to attach them to the request.
+
+Internally, the machine learning service downloads, loads and configures the specified model for a given request before processing the text or image payload with it. Models that have been loaded are cached and reused across requests. A thread pool is used to process each request in a different thread so as not to block the async event loop.
+
+All models are in ONNX format. This format has wide industry support, meaning that most other model formats can be exported to it and many hardware APIs support it. It's also quite fast.
+
 Machine learning models are also quite _large_, requiring _quite a bit_ of memory. We are always looking for ways to improve and optimize this aspect of this container specifically.
 
 ### Postgres
diff --git a/docs/docs/install/config-file.md b/docs/docs/install/config-file.md
index 0cf131a02f..a569df4609 100644
--- a/docs/docs/install/config-file.md
+++ b/docs/docs/install/config-file.md
@@ -54,6 +54,25 @@ The default configuration looks like this:
       "concurrency": 1
     }
   },
+  "machineLearning": {
+    "classification": {
+      "minScore": 0.7,
+      "enabled": true,
+      "modelName": "microsoft/resnet-50"
+    },
+    "enabled": true,
+    "url": "http://immich-machine-learning:3003",
+    "clip": {
+      "enabled": true,
+      "modelName": "ViT-B-32::openai"
+    },
+    "facialRecognition": {
+      "enabled": true,
+      "modelName": "buffalo_l",
+      "minScore": 0.7,
+      "maxDistance": 0.6
+    }
+  },
   "oauth": {
     "enabled": false,
     "issuerUrl": "",
@@ -75,7 +94,9 @@ The default configuration looks like this:
   },
   "thumbnail": {
     "webpSize": 250,
-    "jpegSize": 1440
+    "jpegSize": 1440,
+    "quality": 90,
+    "colorspace": "p3"
   }
 }
 ```
diff --git a/docs/docs/install/environment-variables.md b/docs/docs/install/environment-variables.md
index a006a3b0cc..98ca8595e5 100644
--- a/docs/docs/install/environment-variables.md
+++ b/docs/docs/install/environment-variables.md
@@ -188,15 +188,19 @@ Typesense URL example JSON before encoding:
 
 ## Machine Learning
 
-| Variable                                         | Description                                |       Default       | Services         |
-| :----------------------------------------------- | :----------------------------------------- | :-----------------: | :--------------- |
-| `MACHINE_LEARNING_MODEL_TTL`                     | Model TTL                                  |        `300`        | machine learning |
-| `MACHINE_LEARNING_CACHE_FOLDER`                  | ML Cache Location                          |      `/cache`       | machine learning |
-| `MACHINE_LEARNING_REQUEST_THREADS`<sup>\*1</sup> | Request thread pool size                   | number of CPU cores | machine learning |
-| `MACHINE_LEARNING_MODEL_INTER_OP_THREADS`        | Number of parallel model operations        |         `1`         | machine learning |
-| `MACHINE_LEARNING_MODEL_INTRA_OP_THREADS`        | Number of threads for each model operation |         `2`         | machine learning |
+| Variable                                         | Description                                                       |       Default       | Services         |
+| :----------------------------------------------- | :---------------------------------------------------------------- | :-----------------: | :--------------- |
+| `MACHINE_LEARNING_MODEL_TTL`<sup>\*1</sup>       | Inactivity time (s) before a model is unloaded (disabled if <= 0) |         `0`         | machine learning |
+| `MACHINE_LEARNING_CACHE_FOLDER`                  | Directory where models are downloaded                             |      `/cache`       | machine learning |
+| `MACHINE_LEARNING_REQUEST_THREADS`<sup>\*2</sup> | Thread count of the request thread pool (disabled if <= 0)        | number of CPU cores | machine learning |
+| `MACHINE_LEARNING_MODEL_INTER_OP_THREADS`        | Number of parallel model operations                               |         `1`         | machine learning |
+| `MACHINE_LEARNING_MODEL_INTRA_OP_THREADS`        | Number of threads for each model operation                        |         `2`         | machine learning |
+| `MACHINE_LEARNING_WORKERS`<sup>\*3</sup>         | Number of worker processes to spawn                               |         `1`         | machine learning |
+| `MACHINE_LEARNING_WORKER_TIMEOUT`                | Maximum time (s) of unresponsiveness before a worker is killed    |        `120`        | machine learning |
 
-\*1: It is recommended to begin with this parameter when changing the concurrency levels of the machine learning service and then tune the other ones.
+\*1: This is an experimental feature. It may result in increased memory use over time when loading models repeatedly.
+\*2: It is recommended to begin with this parameter when changing the concurrency levels of the machine learning service and then tune the other ones.
+\*3: Since each process duplicates models in memory, changing this is not recommended unless you have abundant memory to go around.
 
 :::info
 
diff --git a/machine-learning/README.md b/machine-learning/README.md
index 61ecc23ba9..cb1aa4312f 100644
--- a/machine-learning/README.md
+++ b/machine-learning/README.md
@@ -17,6 +17,8 @@ Be sure to commit the `poetry.lock` and `pyproject.toml` files to reflect any ch
 
 To measure inference throughput and latency, you can use [Locust](https://locust.io/) using the provided `locustfile.py`.
 Locust works by querying the model endpoints and aggregating their statistics, meaning the app must be deployed.
-You can run `load_test.sh` to automatically deploy the app locally and start Locust, optionally adjusting its env variables as needed.
+You can change the models or adjust options like score thresholds through the Locust UI.
 
-Alternatively, for more custom testing, you may also run `locust` directly: see the [documentation](https://docs.locust.io/en/stable/index.html). Note that in Locust's jargon, concurrency is measured in `users`, and each user runs one task at a time. To achieve a particular per-endpoint concurrency, multiply that number by the number of endpoints to be queried. For example, if there are 3 endpoints and you want each of them to receive 8 requests at a time, you should set the number of users to 24.
\ No newline at end of file
+To get started, you can simply run `locust --web-host 127.0.0.1` and open `localhost:8089` in a browser to access the UI. See the [Locust documentation](https://docs.locust.io/en/stable/index.html) for more info on running Locust. 
+
+Note that in Locust's jargon, concurrency is measured in `users`, and each user runs one task at a time. To achieve a particular per-endpoint concurrency, multiply that number by the number of endpoints to be queried. For example, if there are 3 endpoints and you want each of them to receive 8 requests at a time, you should set the number of users to 24.
\ No newline at end of file