starred/karakeep

Fork 0

mirror of https://github.com/karakeep-app/karakeep.git synced 2026-04-25 07:56:05 +03:00

[GH-ISSUE #128] Not all image formats are supported. #102

New issue

Open

opened 2026-03-02 11:46:37 +03:00 by kerem · 7 comments

kerem commented

2026-03-02 11:46:37 +03:00

Owner

Originally created by @lucius346346 on GitHub (Apr 29, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/128

Some image formats don't work corectly in Hoarder.

PNG and BMP can't be added at all using Web UI
WEBP can't be parsed with AI - Ollama in my case.

Originally created by @lucius346346 on GitHub (Apr 29, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/128 Some image formats don't work corectly in Hoarder. PNG and BMP can't be added at all using Web UI WEBP can't be parsed with AI - Ollama in my case.

kerem added the

feature request

label

2026-03-02 11:46:37 +03:00

kerem commented

2026-03-02 11:46:38 +03:00

Author

Owner

@scubanarc commented on GitHub (Apr 29, 2024):

Didn't test BMP or WEBP but agree with PNG:

https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png

@scubanarc commented on GitHub (Apr 29, 2024): Didn't test BMP or WEBP but agree with PNG: https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png

kerem commented

2026-03-02 11:46:38 +03:00

Author

Owner

@MohamedBassem commented on GitHub (May 1, 2024):

PNGs seems to be working fine for me.

As for BMP, yeah, I didn't add support for that just yet. Should be easy to add.

WEBP can't be parsed with AI - Ollama in my case.

hmmm, yeah, this depends on the model. One thing we can consider is to convert the image before passing it to the tag inferrence.

@MohamedBassem commented on GitHub (May 1, 2024): PNGs seems to be working fine for me. <img width="368" alt="Screenshot 2024-05-01 at 9 20 21 AM" src="https://github.com/MohamedBassem/hoarder-app/assets/2418637/4f74e48b-4e72-40c0-bb25-73f61c00eab5"> As for BMP, yeah, I didn't add support for that just yet. Should be easy to add. > WEBP can't be parsed with AI - Ollama in my case. hmmm, yeah, this depends on the model. One thing we can consider is to convert the image before passing it to the tag inferrence.

kerem commented

2026-03-02 11:46:38 +03:00

Author

Owner

@lucius346346 commented on GitHub (May 1, 2024):

PNGs seems to be working fine for me.

Ok, that one is on me. Misconfiguration of Nginx on my part.

@lucius346346 commented on GitHub (May 1, 2024): > PNGs seems to be working fine for me. Ok, that one is on me. Misconfiguration of Nginx on my part.

kerem commented

2026-03-02 11:46:39 +03:00

Author

Owner

@Deathproof76 commented on GitHub (May 7, 2024):

PNGs seems to be working fine for me.
As for BMP, yeah, I didn't add support for that just yet. Should be easy to add.

WEBP can't be parsed with AI - Ollama in my case.

hmmm, yeah, this depends on the model. One thing we can consider is to convert the image before passing it to the tag inferrence.

The problem with WEBP definitely seems to lie with ollamas implementation https://github.com/ollama/ollama/issues/2457 currently only png and jpeg are working. Multimodal llm based on LLaVa, for example, should be able to handle webp and many other formats too.

@Deathproof76 commented on GitHub (May 7, 2024): > PNGs seems to be working fine for me. > > <img alt="Screenshot 2024-05-01 at 9 20 21 AM" width="368" src="https://private-user-images.githubusercontent.com/2418637/327081347-4f74e48b-4e72-40c0-bb25-73f61c00eab5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTUwOTEyNzcsIm5iZiI6MTcxNTA5MDk3NywicGF0aCI6Ii8yNDE4NjM3LzMyNzA4MTM0Ny00Zjc0ZTQ4Yi00ZTcyLTQwYzAtYmIyNS03M2Y2MWMwMGVhYjUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUwNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MDdUMTQwOTM3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGFmNTY1ZTQ1YmIzZWFlMmMzNzI3ZjgzZjlkZjBlYmVhYzhmODBmYWExNGUwZTgxYzI4MDc5Y2EyZjg0MGRlNyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.kJdMeoj35XXQGQjeiSAEdgMIjhKZaQXRwYBpS-MeaR4"> > As for BMP, yeah, I didn't add support for that just yet. Should be easy to add. > > > WEBP can't be parsed with AI - Ollama in my case. > > hmmm, yeah, this depends on the model. One thing we can consider is to convert the image before passing it to the tag inferrence. The problem with WEBP definitely seems to lie with ollamas implementation https://github.com/ollama/ollama/issues/2457 currently only png and jpeg are working. Multimodal llm based on LLaVa, for example, should be able to handle webp and many other formats too.

kerem commented

2026-03-02 11:46:39 +03:00

Author

Owner

@Deathproof76 commented on GitHub (May 7, 2024):

@MohamedBassem maybe sharp could be used for something like this for ollama in inference.ts? Convert to temporary .jpeg images which get sent to ollama and deleted afterwards (Disclaimer: Not a programmer, don't understand the code, just used AI):

import { Ollama } from "ollama";
import OpenAI from "openai";
import sharp from 'sharp';

import serverConfig from "@hoarder/shared/config";
import logger from "@hoarder/shared/logger";

export interface InferenceResponse {
  response: string;
  totalTokens: number | undefined;
}

export interface InferenceClient {
  inferFromText(prompt: string): Promise<InferenceResponse>;
  inferFromImage(
    prompt: string,
    contentType: string,
    image: string,
  ): Promise<InferenceResponse>;
}

export class InferenceClientFactory {
  static build(): InferenceClient | null {
    if (serverConfig.inference.openAIApiKey) {
      return new OpenAIInferenceClient();
    }

    if (serverConfig.inference.ollamaBaseUrl) {
      return new OllamaInferenceClient();
    }
    return null;
  }
}

class OpenAIInferenceClient implements InferenceClient {
  openAI: OpenAI;

  constructor() {
    this.openAI = new OpenAI({
      apiKey: serverConfig.inference.openAIApiKey,
      baseURL: serverConfig.inference.openAIBaseUrl,
    });
  }

  async inferFromText(prompt: string): Promise<InferenceResponse> {
    const chatCompletion = await this.openAI.chat.completions.create({
      messages: [{ role: "system", content: prompt }],
      model: serverConfig.inference.textModel,
      response_format: { type: "json_object" },
    });

    const response = chatCompletion.choices[0].message.content;
    if (!response) {
      throw new Error(`Got no message content from OpenAI`);
    }
    return { response, totalTokens: chatCompletion.usage?.total_tokens };
  }

  async inferFromImage(
    prompt: string,
    contentType: string,
    image: string,
  ): Promise<InferenceResponse> {
    const chatCompletion = await this.openAI.chat.completions.create({
      model: serverConfig.inference.imageModel,
      response_format: { type: "json_object" },
      messages: [
        {
          role: "user",
          content: [
            { type: "text", text: prompt },
            {
              type: "image_url",
              image_url: {
                url: `data:${contentType};base64,${image}`,
                detail: "low",
              },
            },
          ],
        },
      ],
      max_tokens: 2000,
    });

    const response = chatCompletion.choices[0].message.content;
    if (!response) {
      throw new Error(`Got no message content from OpenAI`);
    }
    return { response, totalTokens: chatCompletion.usage?.total_tokens };
  }
}

class OllamaInferenceClient implements InferenceClient {
  ollama: Ollama;

  constructor() {
    this.ollama = new Ollama({
      host: serverConfig.inference.ollamaBaseUrl,
    });
  }

  async runModel(model: string, prompt: string, image?: string) {
    const chatCompletion = await this.ollama.chat({
      model: model,
      format: "json",
      stream: true,
      messages: [
        { role: "user", content: prompt, images: image ? [image] : undefined },
      ],
    });

    let totalTokens = 0;
    let response = "";
    try {
      for await (const part of chatCompletion) {
        response += part.message.content;
        if (!isNaN(part.eval_count)) {
          totalTokens += part.eval_count;
        }
        if (!isNaN(part.prompt_eval_count)) {
          totalTokens += part.prompt_eval_count;
        }
      }
    } catch (e) {
      // There seem to be some bug in ollama where you can get some successfull response, but still throw an error.
      // Using stream + accumulating the response so far is a workaround.
      // https://github.com/ollama/ollama-js/issues/72
      totalTokens = NaN;
      logger.warn(
        `Got an exception from ollama, will still attempt to deserialize the response we got so far: ${e}`,
      );
    }

    return { response, totalTokens };
  }

  async inferFromText(prompt: string): Promise<InferenceResponse> {
    return await this.runModel(serverConfig.inference.textModel, prompt);
  }

  async inferFromImage(
    prompt: string,
    contentType: string,
    image: string,
  ): Promise<InferenceResponse> {
    // Convert the image to a Buffer
    const buffer = Buffer.from(image, 'base64');

    // Check if the image format is webp or heic
    const isWebp = contentType.includes('image/webp');
    const isHeic = contentType.includes('image/heic');

    // If the image format is webp or heic, convert it to jpeg
    let convertedBuffer;
    if (isWebp || isHeic) {
      convertedBuffer = await sharp(buffer)
        .jpeg({ quality: 80 }) // You can adjust the quality as needed
        .toBuffer();
    } else {
      convertedBuffer = buffer;
    }

    // Encode the converted image as a base64 string
    const convertedImage = convertedBuffer.toString('base64');

    // Run the model with the converted image
    const inferenceResult = await this.runModel(
      serverConfig.inference.imageModel,
      prompt,
      `data:image/jpeg;base64,${convertedImage}`,
    );

    // Delete the converted image after inference
    convertedBuffer = null;
    convertedImage = null;

    return inferenceResult;
  }
}

heic and webp just as an example. But it seems that sharp doesn't even support heic out of the box https://obviy.us/blog/sharp-heic-on-aws-lambda/ "only JPEG, PNG, WebP, GIF, AVIF, TIFF and SVG images". Well, maybe it helps😅👍

@Deathproof76 commented on GitHub (May 7, 2024): @MohamedBassem maybe sharp could be used for something like this for ollama in [inference.ts](https://github.com/MohamedBassem/hoarder-app/blob/main/apps/workers/inference.ts)? Convert to temporary .jpeg images which get sent to ollama and deleted afterwards (Disclaimer: Not a programmer, don't understand the code, just used AI): ``` import { Ollama } from "ollama"; import OpenAI from "openai"; import sharp from 'sharp'; import serverConfig from "@hoarder/shared/config"; import logger from "@hoarder/shared/logger"; export interface InferenceResponse { response: string; totalTokens: number | undefined; } export interface InferenceClient { inferFromText(prompt: string): Promise<InferenceResponse>; inferFromImage( prompt: string, contentType: string, image: string, ): Promise<InferenceResponse>; } export class InferenceClientFactory { static build(): InferenceClient | null { if (serverConfig.inference.openAIApiKey) { return new OpenAIInferenceClient(); } if (serverConfig.inference.ollamaBaseUrl) { return new OllamaInferenceClient(); } return null; } } class OpenAIInferenceClient implements InferenceClient { openAI: OpenAI; constructor() { this.openAI = new OpenAI({ apiKey: serverConfig.inference.openAIApiKey, baseURL: serverConfig.inference.openAIBaseUrl, }); } async inferFromText(prompt: string): Promise<InferenceResponse> { const chatCompletion = await this.openAI.chat.completions.create({ messages: [{ role: "system", content: prompt }], model: serverConfig.inference.textModel, response_format: { type: "json_object" }, }); const response = chatCompletion.choices[0].message.content; if (!response) { throw new Error(`Got no message content from OpenAI`); } return { response, totalTokens: chatCompletion.usage?.total_tokens }; } async inferFromImage( prompt: string, contentType: string, image: string, ): Promise<InferenceResponse> { const chatCompletion = await this.openAI.chat.completions.create({ model: serverConfig.inference.imageModel, response_format: { type: "json_object" }, messages: [ { role: "user", content: [ { type: "text", text: prompt }, { type: "image_url", image_url: { url: `data:${contentType};base64,${image}`, detail: "low", }, }, ], }, ], max_tokens: 2000, }); const response = chatCompletion.choices[0].message.content; if (!response) { throw new Error(`Got no message content from OpenAI`); } return { response, totalTokens: chatCompletion.usage?.total_tokens }; } } class OllamaInferenceClient implements InferenceClient { ollama: Ollama; constructor() { this.ollama = new Ollama({ host: serverConfig.inference.ollamaBaseUrl, }); } async runModel(model: string, prompt: string, image?: string) { const chatCompletion = await this.ollama.chat({ model: model, format: "json", stream: true, messages: [ { role: "user", content: prompt, images: image ? [image] : undefined }, ], }); let totalTokens = 0; let response = ""; try { for await (const part of chatCompletion) { response += part.message.content; if (!isNaN(part.eval_count)) { totalTokens += part.eval_count; } if (!isNaN(part.prompt_eval_count)) { totalTokens += part.prompt_eval_count; } } } catch (e) { // There seem to be some bug in ollama where you can get some successfull response, but still throw an error. // Using stream + accumulating the response so far is a workaround. // https://github.com/ollama/ollama-js/issues/72 totalTokens = NaN; logger.warn( `Got an exception from ollama, will still attempt to deserialize the response we got so far: ${e}`, ); } return { response, totalTokens }; } async inferFromText(prompt: string): Promise<InferenceResponse> { return await this.runModel(serverConfig.inference.textModel, prompt); } async inferFromImage( prompt: string, contentType: string, image: string, ): Promise<InferenceResponse> { // Convert the image to a Buffer const buffer = Buffer.from(image, 'base64'); // Check if the image format is webp or heic const isWebp = contentType.includes('image/webp'); const isHeic = contentType.includes('image/heic'); // If the image format is webp or heic, convert it to jpeg let convertedBuffer; if (isWebp || isHeic) { convertedBuffer = await sharp(buffer) .jpeg({ quality: 80 }) // You can adjust the quality as needed .toBuffer(); } else { convertedBuffer = buffer; } // Encode the converted image as a base64 string const convertedImage = convertedBuffer.toString('base64'); // Run the model with the converted image const inferenceResult = await this.runModel( serverConfig.inference.imageModel, prompt, `data:image/jpeg;base64,${convertedImage}`, ); // Delete the converted image after inference convertedBuffer = null; convertedImage = null; return inferenceResult; } } ``` heic and webp just as an example. But it seems that sharp doesn't even support heic out of the box https://obviy.us/blog/sharp-heic-on-aws-lambda/ "only JPEG, PNG, WebP, GIF, AVIF, TIFF and SVG images". Well, maybe it helps😅👍

kerem commented

2026-03-02 11:46:39 +03:00

Author

Owner

@MohamedBassem commented on GitHub (May 10, 2024):

@Deathproof76 thanks for sharing the code, I'm already working on something similar using sharp as well :)

@MohamedBassem commented on GitHub (May 10, 2024): @Deathproof76 thanks for sharing the code, I'm already working on something similar using sharp as well :)

kerem commented

2026-03-02 11:46:39 +03:00

Author

Owner

@m00nwtchr commented on GitHub (Oct 22, 2025):

AVIF support would be appreciated as well. Even better if archived images could be automatically converted into AVIF for efficiency, but that's a separate issue.

@m00nwtchr commented on GitHub (Oct 22, 2025): AVIF support would be appreciated as well. Even better if archived images could be automatically converted into AVIF for efficiency, but that's a separate issue.

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

starred/karakeep#102

No description provided.

Rows
Columns