[GH-ISSUE #663] Question: How to get Chromium Browsing data to ArchvieBox when it is in a VM? #415

Closed
opened 2026-03-01 14:43:22 +03:00 by kerem · 13 comments
Owner

Originally created by @voarsh2 on GitHub (Mar 15, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/663

I'm running Chromium on my desktop, completely separate from my ArchvieBox VM.
How would I get my browser history in to it?
The documentation assumes they're both running together?

Originally created by @voarsh2 on GitHub (Mar 15, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/663 I'm running Chromium on my desktop, completely separate from my ArchvieBox VM. How would I get my browser history in to it? The documentation assumes they're both running together?
kerem closed this issue 2026-03-01 14:43:22 +03:00
Author
Owner

@pirate commented on GitHub (Mar 15, 2021):

They don't have to be running together, you can copy/mount your CHROME_DATA_DIR inside the VM and pass it to archivebox. Or export your history using https://github.com/ArchiveBox/ArchiveBox/blob/dev/bin/export_browser_history.sh on the desktop and pass the list of URLs to archivebox as text.

<!-- gh-comment-id:799714484 --> @pirate commented on GitHub (Mar 15, 2021): They don't have to be running together, you can copy/mount your CHROME_DATA_DIR inside the VM and pass it to archivebox. Or export your history using https://github.com/ArchiveBox/ArchiveBox/blob/dev/bin/export_browser_history.sh on the desktop and pass the list of URLs to archivebox as text.
Author
Owner

@voarsh2 commented on GitHub (Mar 15, 2021):

They don't have to be running together, you can copy/mount your CHROME_DATA_DIR inside the VM and pass it to archivebox. Or export your history using https://github.com/ArchiveBox/ArchiveBox/blob/dev/bin/export_browser_history.sh on the desktop and pass the list of URLs to archivebox as text.

I can't run .sh on Windows, I need Linux for that.
The VM I refer to... is on another host system, I can't mount my chrome directory from my PC, over the network to another host....
I might be able to install Chrome on a network share... but that's getting messy here....?

<!-- gh-comment-id:799805183 --> @voarsh2 commented on GitHub (Mar 15, 2021): > They don't have to be running together, you can copy/mount your CHROME_DATA_DIR inside the VM and pass it to archivebox. Or export your history using https://github.com/ArchiveBox/ArchiveBox/blob/dev/bin/export_browser_history.sh on the desktop and pass the list of URLs to archivebox as text. I can't run .sh on Windows, I need Linux for that. The VM I refer to... is on another host system, I can't mount my chrome directory from my PC, over the network to another host.... I might be able to install Chrome on a network share... but that's getting messy here....?
Author
Owner

@mAAdhaTTah commented on GitHub (Mar 15, 2021):

I can't run .sh on Windows, I need Linux for that.

The new WSL2 should be able to run it. If you're not familiar you can install it here: https://docs.microsoft.com/en-us/windows/wsl/install-win10

<!-- gh-comment-id:799815267 --> @mAAdhaTTah commented on GitHub (Mar 15, 2021): > I can't run .sh on Windows, I need Linux for that. The new WSL2 should be able to run it. If you're not familiar you can install it here: https://docs.microsoft.com/en-us/windows/wsl/install-win10
Author
Owner

@voarsh2 commented on GitHub (Mar 16, 2021):

The new WSL2 should be able to run it. If you're not familiar you can install it here: https://docs.microsoft.com/en-us/windows/wsl/install-win10

Lucky I have Windows 10 Pro. The average home user doesn't have this, I believe.

Basically you're suggesting me to run ArchiveBox via WSL2 (Ubuntu), which I will not do. I have my own dedicated hardware. I cannot mount Chrome data to WSL2, and send my data to the remote VM (over LAN) (ArchiveBox)

If this is so tricky, then I think the project needs some sort of API, or chrome extension or native Windows app. If I want chrome browser history I must use WSL2 (on my desktop).

<!-- gh-comment-id:799909827 --> @voarsh2 commented on GitHub (Mar 16, 2021): > > > > The new WSL2 should be able to run it. If you're not familiar you can install it here: https://docs.microsoft.com/en-us/windows/wsl/install-win10 Lucky I have Windows 10 Pro. The average home user doesn't have this, I believe. Basically you're suggesting me to run ArchiveBox via WSL2 (Ubuntu), which I will not do. I have my own dedicated hardware. I cannot mount Chrome data to WSL2, and send my data to the remote VM (over LAN) (ArchiveBox) If this is so tricky, then I think the project needs some sort of API, or chrome extension or native Windows app. If I want chrome browser history I must use WSL2 (on my desktop).
Author
Owner

@pirate commented on GitHub (Mar 16, 2021):

ArchiveBox doesn't require WSL2, just the browser history export helper script. I don't use Windows, nor do I have a Windows machine to test on, you'll have to figure out a different way to get your history into ArchiveBox if you don't have a way to run the bash script or share your data dir with archivebox.

<!-- gh-comment-id:799951807 --> @pirate commented on GitHub (Mar 16, 2021): ArchiveBox doesn't require WSL2, just the browser history export helper script. I don't use Windows, nor do I have a Windows machine to test on, you'll have to figure out a different way to get your history into ArchiveBox if you don't have a way to run the bash script or share your data dir with archivebox.
Author
Owner

@voarsh2 commented on GitHub (Mar 16, 2021):

I don't use Windows, nor do I have a Windows machine to test on, you'll have to figure out a different way to get your history into ArchiveBox if you don't have a way to run the bash script or share your data dir with archivebox.

Basically telling me to stuff it, not your problem, because you have Linux and run this completely on your desktop computer, not as a web service on a remote machine, not that most people would be on Windows....

ArchiveBox doesn't require WSL2

Unlikely, even when I install Chrome, it doesn't let me specify an install location.
Only thing I can think of is move the Appdata app location to a shared NFTS drive... not ideal.
The project should really think of running this like an actual web service. If I have lots of data to "archive" I'm hardly going to run this on a desktop computer, unless I have tons of hdd's/USB harddrives. Average user doesn't.

Instead I have a dedicated server with tons of drives, and isolated VM environments, and I can't even send an API call or run a little script to "post" my chrome data to it....
There's no extension or way to run a script in Windows to post data to a remote address, which is what I would expect from a project like this.

Any funding options?

<!-- gh-comment-id:800313673 --> @voarsh2 commented on GitHub (Mar 16, 2021): >I don't use Windows, nor do I have a Windows machine to test on, you'll have to figure out a different way to get your history into ArchiveBox if you don't have a way to run the bash script or share your data dir with archivebox. Basically telling me to stuff it, not your problem, because you have Linux and run this completely on your desktop computer, not as a web service on a remote machine, not that most people would be on Windows.... > ArchiveBox doesn't require WSL2 Unlikely, even when I install Chrome, it doesn't let me specify an install location. Only thing I can think of is move the Appdata app location to a shared NFTS drive... not ideal. The project should really think of running this like an actual web service. If I have lots of data to "archive" I'm hardly going to run this on a desktop computer, unless I have tons of hdd's/USB harddrives. Average user doesn't. Instead I have a dedicated server with tons of drives, and isolated VM environments, and I can't even send an API call or run a little script to "post" my chrome data to it.... There's no extension or way to run a script in Windows to post data to a remote address, which is what I would expect from a project like this. Any funding options?
Author
Owner

@pirate commented on GitHub (Mar 16, 2021):

I run it as a web service in a docker container on Linux, and on my desktop Mac. I don't archive all my history (it would quickly fill many terabytes). Much easier to archive a subset of your history using bookmarks or a tool like pocket / pinboard, and send that to your vm.

<!-- gh-comment-id:800521875 --> @pirate commented on GitHub (Mar 16, 2021): I run it as a web service in a docker container on Linux, and on my desktop Mac. I don't archive all my history (it would quickly fill many terabytes). Much easier to archive a subset of your history using bookmarks or a tool like pocket / pinboard, and send that to your vm.
Author
Owner

@voarsh2 commented on GitHub (Mar 16, 2021):

Much easier to archive a subset of your history using bookmarks or a tool like pocket / pinboard, and send that to your vm.

Yes, but when you say VM, you mean like oracle box, or some wsl2 or something like that on a normal PC.
I have TB's of storage... not a problem.
I just can't get my browsing data to a remote machine for ArchiveBox.
The project in it's current state isn't fit for my purpose, which is sad for me. But I'm sure it works for many who only have a handful of sites to archive and don't have dedicated hardware.

<!-- gh-comment-id:800524333 --> @voarsh2 commented on GitHub (Mar 16, 2021): > Much easier to archive a subset of your history using bookmarks or a tool like pocket / pinboard, and send that to your vm. Yes, but when you say VM, you mean like oracle box, or some wsl2 or something like that on a normal PC. I have TB's of storage... not a problem. I just can't get my browsing data to a remote machine for ArchiveBox. The project in it's current state isn't fit for my purpose, which is sad for me. But I'm sure it works for many who only have a handful of sites to archive and don't have dedicated hardware.
Author
Owner

@pirate commented on GitHub (Mar 16, 2021):

No it's a dedicated VM running in the cloud, not on a local PC. Why not rsync your chrome data dir to the server periodically and then run the export script on the VM?

<!-- gh-comment-id:800544461 --> @pirate commented on GitHub (Mar 16, 2021): No it's a dedicated VM running in the cloud, not on a local PC. Why not rsync your chrome data dir to the server periodically and then run the export script on the VM?
Author
Owner

@voarsh2 commented on GitHub (Mar 16, 2021):

No it's a dedicated VM running in the cloud, not on a local PC. Why not rsync your chrome data dir to the server periodically and then run the export script on the VM?

What folder exactly needs to go to the remote?
I am using a Chromium browser (not official Chrome). Userdata folder?

<!-- gh-comment-id:800582346 --> @voarsh2 commented on GitHub (Mar 16, 2021): > No it's a dedicated VM running in the cloud, not on a local PC. Why not rsync your chrome data dir to the server periodically and then run the export script on the VM? What folder exactly needs to go to the remote? I am using a Chromium browser (not official Chrome). Userdata folder?
Author
Owner

@pirate commented on GitHub (Mar 16, 2021):

Yup, whatever folder contains the Default folder (your default chrome profile).

C:\Users\<username>\AppData\Local\Google\Chrome\User Data\Default <- not this
C:\Users\<username>\AppData\Local\Google\Chrome\User Data <- this one

Chromium is what archivebox uses (not chrome), so it should be fine, just make sure you're on or near the latest version.

<!-- gh-comment-id:800605756 --> @pirate commented on GitHub (Mar 16, 2021): Yup, whatever folder contains the `Default` folder (your default chrome profile). `C:\Users\<username>\AppData\Local\Google\Chrome\User Data\Default` <- not this `C:\Users\<username>\AppData\Local\Google\Chrome\User Data` <- this one Chromium is what archivebox uses (not chrome), so it should be fine, just make sure you're on or near the latest version.
Author
Owner

@voarsh2 commented on GitHub (Mar 17, 2021):

Closing for now.
I will need to check that Rsync works with Windows, perhaps I can write a batch script that'll send it to my NFTS share.
Thanks.

<!-- gh-comment-id:800714246 --> @voarsh2 commented on GitHub (Mar 17, 2021): Closing for now. I will need to check that Rsync works with Windows, perhaps I can write a batch script that'll send it to my NFTS share. Thanks.
Author
Owner

@voarsh2 commented on GitHub (Mar 27, 2021):

I tried cmd scp to copy files over to my remote linux VM, didn't work. I can't even zip my Chomium data as I get permission/read errors with it open (I'm not going to close it everytime it tries to copy /User Data/

I hope this project can be more friendly towards Windows.

<!-- gh-comment-id:808657171 --> @voarsh2 commented on GitHub (Mar 27, 2021): I tried cmd scp to copy files over to my remote linux VM, didn't work. I can't even zip my Chomium data as I get permission/read errors with it open (I'm not going to close it everytime it tries to copy /User Data/ I hope this project can be more friendly towards Windows.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#415
No description provided.