mirror of
https://github.com/immich-app/immich.git
synced 2025-01-09 13:26:47 +01:00
7fc1954e2a
Fixes https://github.com/immich-app/immich/issues/5982. There are basically three options: 1. Search `originalFileName` by dropping a file extension from the query (if present). Lower fidelity but very easy - just a standard index & equality. 2. Search `originalPath` by adding an index on `reverse(originalPath)` and using `starts_with(reverse(query) + "/", reverse(originalPath)`. A weird index & query but high fidelity. 3. Add a new generated column called `originalFileNameWithExtension` or something. More storage, kinda jank. TBH, I think (1) is good enough and easy to make better in the future. For example, if I search "DSC_4242.jpg", I don't really think it matters if "DSC_4242.mov" also shows up. edit: There's a fourth approach that we discussed a bit in Discord and decided we could switch to it in the future: using a GIN. The minor issue is that Postgres doesn't tokenize paths in a useful (they're a single token and it won't match against partial components). We can solve that by tokenizing it ourselves. For example: ``` immich=# with vecs as (select to_tsvector('simple', array_to_string(string_to_array('upload/library/sushain/2015/2015-08-09/IMG_275.JPG', '/'), ' ')) as vec) select * from vecs where vec @@ phraseto_tsquery('simple', array_to_string(string_to_array('library/sushain', '/'), ' ')); vec ------------------------------------------------------------------------------- '-08':6 '-09':7 '2015':4,5 'img_275.jpg':8 'library':2 'sushain':3 'upload':1 (1 row) ``` The query is also tokenized with the 'split-by-slash-join-with-space' strategy. This strategy results in `IMG_275.JPG`, `2015`, `sushain` and `library/sushain` matching. But, `08` and `IMG_275` do not match. The former is because the token is `-08` and the latter because the `img_275.jpg` token is matched against exactly. |
||
---|---|---|
.. | ||
api | ||
jobs | ||
docker-compose.server-e2e.yml |