[GH-ISSUE #289] monolith is not embedding SVG files correctly #190

Closed
opened 2026-03-02 11:47:28 +03:00 by kerem · 7 comments
Owner

Originally created by @scubanarc on GitHub (Jul 9, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/289

Monolith is capable of embedding SVG files in the output HTML, but when I hoard a page with Hoarder that includes SVG images, the monolith output is broken.

Here's an example page that fails:

https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html

If I grab this page with Hoarder, the archive is broken. Anywhere that there was an SVG there is a broken image icon.

However, if I grab it with monolith manually, the output HTML file is correct. Here's my monolith command line:

monolith -o svgtest.htm https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html

My testing was done with monolith 2.8.1 downloaded directly from their github here:

https://github.com/Y2Z/monolith/releases/download/v2.8.1/monolith-gnu-linux-x86_64
Originally created by @scubanarc on GitHub (Jul 9, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/289 Monolith is capable of embedding SVG files in the output HTML, but when I hoard a page with Hoarder that includes SVG images, the monolith output is broken. Here's an example page that fails: ``` https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html ``` If I grab this page with Hoarder, the archive is broken. Anywhere that there was an SVG there is a broken image icon. However, if I grab it with monolith manually, the output HTML file is correct. Here's my monolith command line: ``` monolith -o svgtest.htm https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html ``` My testing was done with monolith 2.8.1 downloaded directly from their github here: ``` https://github.com/Y2Z/monolith/releases/download/v2.8.1/monolith-gnu-linux-x86_64 ```
kerem 2026-03-02 11:47:28 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@scubanarc commented on GitHub (Jul 9, 2024):

More info...

I entered the worker container and ran the monolith command above. From within the worker container, the SVG files are not pulled correctly. From outside the container, the SVG files are pulled and embedded correctly.

Inside the container, the resulting HTML file is 14.2 MB, while outside the container the resulting HTML file is 17.6 MB.

The monolith logs from inside and outside the container are identical.

<!-- gh-comment-id:2217821253 --> @scubanarc commented on GitHub (Jul 9, 2024): More info... I entered the worker container and ran the monolith command above. From within the worker container, the SVG files are not pulled correctly. From outside the container, the SVG files are pulled and embedded correctly. Inside the container, the resulting HTML file is 14.2 MB, while outside the container the resulting HTML file is 17.6 MB. The monolith logs from inside and outside the container are identical.
Author
Owner

@kamtschatka commented on GitHub (Jul 12, 2024):

hm I just ran the same test and it works fine for me.
The worker has monolith version 2.8.1 as you said:

/app/apps/workers # monolith --version
monolith 2.8.1

The filesize is 17.58MB after downloading it in the container.
Is it working for you in the meantime? Otherwise maybe you have some kind of issue in the network or a firewall or similar?

<!-- gh-comment-id:2226497088 --> @kamtschatka commented on GitHub (Jul 12, 2024): hm I just ran the same test and it works fine for me. The worker has monolith version 2.8.1 as you said: ``` /app/apps/workers # monolith --version monolith 2.8.1 ``` The filesize is 17.58MB after downloading it in the container. Is it working for you in the meantime? Otherwise maybe you have some kind of issue in the network or a firewall or similar?
Author
Owner

@scubanarc commented on GitHub (Jul 13, 2024):

I just ran the command in the worker container again and it worked this time. I got the full 17.6 MB capture from inside the container, which is different from a few days ago.

So I deleted my capture in Hoarder and recreated it. The problem persists. That's got me scratching my head.

I was able to delete the asset.bin and replace it with my manually captured "svgtest.htm" file (renamed) just to make sure that it wasn't a render issue, and it renders just fine as if it was captured correctly.

It's possibly a network/firewall issue, but I'm a network person and we are having no other issues that I can detect.

Can you try Hoarding that page through the web interface and see if you get the full 17.6 MB. If you do, then I'll know that I'm having a local issue.

<!-- gh-comment-id:2226552653 --> @scubanarc commented on GitHub (Jul 13, 2024): I just ran the command in the worker container again and it worked this time. I got the full 17.6 MB capture from inside the container, which is different from a few days ago. So I deleted my capture in Hoarder and recreated it. The problem persists. That's got me scratching my head. I was able to delete the asset.bin and replace it with my manually captured "svgtest.htm" file (renamed) just to make sure that it wasn't a render issue, and it renders just fine as if it was captured correctly. It's possibly a network/firewall issue, but I'm a network person and we are having no other issues that I can detect. Can you try Hoarding that page through the web interface and see if you get the full 17.6 MB. If you do, then I'll know that I'm having a local issue.
Author
Owner

@kamtschatka commented on GitHub (Jul 13, 2024):

OK i tried it out and confirm that it does not show the images correctly in hoarder.

The code shows that this is the used commandline:

monolith  - -Ije -t 5 -b ${baseUrl} -o ${assetPath} 

I modified your command to this:

monolith  -Ije -t 5 -o svg.html https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html

This also works fine (please doublecheck) and the file contains the proper images. (i did not provide the baseUrl, but i think that is fine)

The thing is, that the code is actually also passing the html from the previous crawling step into monolith instead of providing the URL, which seems to cause this issue.
Once you confirmed that the above command works fine for you as well, I can dig a bit deeper into this and ask @MohamedBassem why it was implemented like this.

Edit: OK I tried it out with piping the html from the page to monolith directly and the outcome is different and the svg is no longer captured. I guess we should simply make a new request to the page to get all the resources properly.

<!-- gh-comment-id:2226859939 --> @kamtschatka commented on GitHub (Jul 13, 2024): OK i tried it out and confirm that it does not show the images correctly in hoarder. The code shows that this is the used commandline: ``` monolith - -Ije -t 5 -b ${baseUrl} -o ${assetPath} ``` I modified your command to this: ``` monolith -Ije -t 5 -o svg.html https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html ``` This also works fine (please doublecheck) and the file contains the proper images. (i did not provide the baseUrl, but i think that is fine) The thing is, that the code is actually also passing the html from the previous crawling step into monolith instead of providing the URL, which seems to cause this issue. Once you confirmed that the above command works fine for you as well, I can dig a bit deeper into this and ask @MohamedBassem why it was implemented like this. Edit: OK I tried it out with piping the html from the page to monolith directly and the outcome is different and the svg is no longer captured. I guess we should simply make a new request to the page to get all the resources properly.
Author
Owner

@MohamedBassem commented on GitHub (Jul 13, 2024):

The thing is, that the code is actually also passing the html from the previous crawling step into monolith instead of providing the URL, which seems to cause this issue.

Monolith doesn't execute javascript. So if you have a pure SPA, monolith will see an empty page. That's why you want chrome to first run the javascript and load the page, then pass the final html to monolith.

<!-- gh-comment-id:2226878057 --> @MohamedBassem commented on GitHub (Jul 13, 2024): > The thing is, that the code is actually also passing the html from the previous crawling step into monolith instead of providing the URL, which seems to cause this issue. Monolith doesn't execute javascript. So if you have a pure SPA, monolith will see an empty page. That's why you want chrome to first run the javascript and load the page, then pass the final html to monolith.
Author
Owner

@kamtschatka commented on GitHub (Jul 13, 2024):

OK turns our this is caused by the basePath we are passing to monolith.
Currently we pass https://musictheory.pugetsound.edu, which causes it not work, but if we pass https://musictheory.pugetsound.edu/mt21c, it works.

Gotta figure out a way to pass the correct path to it.

<!-- gh-comment-id:2226884500 --> @kamtschatka commented on GitHub (Jul 13, 2024): OK turns our this is caused by the basePath we are passing to monolith. Currently we pass https://musictheory.pugetsound.edu, which causes it not work, but if we pass https://musictheory.pugetsound.edu/mt21c, it works. Gotta figure out a way to pass the correct path to it.
Author
Owner

@scubanarc commented on GitHub (Jul 13, 2024):

monolith -Ije -t 5 -o svg.html https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html

Yes, this works fine from inside the container.

Strangely this file is smaller, but still complete.

<!-- gh-comment-id:2226994029 --> @scubanarc commented on GitHub (Jul 13, 2024): > monolith -Ije -t 5 -o svg.html https://musictheory.pugetsound.edu/mt21c/DiatonicChordsInMinor.html Yes, this works fine from inside the container. Strangely this file is smaller, but still complete.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#190
No description provided.