My personal bookmark archive system with SingleFile and a static file server
I am looking for a replacement for my bookmarking workflow. Currently I am using a combination of Shaarli and ArchiveBox. I have kept Shaarli. But, I have decided to replace ArchiveBox with a combination of SingleFile and dufs.
The primary reason I want to replace it is because I can’t save bookmarks for offline on devices I carry with me, like my Chromebook and my phone. Well, technically I can since I am saving pages with ArchiveBox and it is saving… Archives. But, my main complaint with ArchiveBox and all the other bookmarking apps that save copies of pages is they don’t save the file with a human readable name. They are obfuscated page IDs instead, which makes it impossible to sort through the archived pages without using the app.
The key features I’m looking for are:
- Save a full copy of the page.
- Write the filename in human readable format.
- Open the files with the browser and not locked to the service/app.
- Sync the files to any device and open with standard tools.
- Organize files into collections or folders.
- Tag files.
- Basic search. Mostly file names and tags.
I decided against other bookmark managers, like Linkwarden, because I don’t need to save a copy of every page I bookmark. Instead, I want a cross-platform, agnostic standard bookmark manager (Shaarli) that just collects pages I want to visit. Then, if the page has something I want to save, I can use SingleFile to save an archive of the page on my NAS.
There are over 1,000 bookmarks in my Shaarli instance. But I don’t need literally all of them saved forever in my personal archive. I like to dump links in there and revisit as necessary. Then when I find a good read, a quality how-to, a blog post I enjoyed, or software I used in a project, I want to keep that archived. This is when I save it with SingleFile and have it send to the NAS. I can then search and view the archive with the dufs server.
Web server - dufs
One of my favorite self-hosted apps is dufs. I know Copyparty is trendy right now. I’ve used it on other projects, like my Pi Zero bug boredom buster build, and I even tried it with this project. But, Copyparty is too much for my needs in this project.
Here’s what I love about dufs and why I chose it:
- Tiny. The entire Docker image is 4.72MB.
- Lightning fast because of its simplicity and small size.
- Built-in WebDAV server.
- Built-in static web server. This is critical for using it with SingleFile as it will display the full archive without having to download. I can just click the link in
dufsand it will render the page. - Easy-to-configure access controls.
- I already have it running on my NAS 😉.
I use dufs as a web frontend for my NAS so I can easily access it from any browser, namely my Chromebook. Since it was already running, using it as the endpoint for SingleFile and the frontend for me to browse my bookmark archive was a no-brainer.
Serving static files
SingleFile compresses all of the page assets into a single html file (hence the name). This html file can be served by dufs and looks exactly like the original page. Clicking on the link in dufs opens a new tab and renders the full page without any extra tools.
SingleFile + WebDAV
In the configuration of SingleFile you can set the Destination to be WebDAV. To send the files to the right location the WebDAV URL should be something like this:
http://IP_ADDR:PORT/PATH/
⚠️ The trailing slash is important and follows the Unix paths syntax. Without the trailing slash it will just save the files to the root dir.
Since I am using the WebDAV feature in dufs the Destination location is https://example.com/SingleFile/ with my username & password.
Downloading the bookmarks directory locally
One of the reasons I love dufs is because it has easy-to-use command line options for interacting with it using curl. This is a command to download a dir as a zip and save to the current working directory on the local machine.
curl -o ./singlefile_bookmarks.zip https://example.com/SingleFile?zip --user dom:[REDACTED_PASS]
You can do much more than download a zip archive, including uploading, downloading, and moving files with just curl.
If you are curious to see my compose file for dufs it will be at the bottom.
SingleFile
This is how I’m saving the pages for my archive. I chose SingleFile because it does exactly what I want: It saves full copies of pages in a single html file with a human readable title and will upload to a WebDAV endpoint.
SingleFile is also available on both Firefox and Chrome, so will work on both my desktop and my Chromebook. Additionally, I can add the extension in Firefox for Android, which has almost all the same features. Basically, I can use it on all my devices.
What I love about this setup is I just have a directory of saved pages. Nothing fancy.
But, I don’t have to use SingleFile to view them. It is, in fact, just a directory of files that I can open with any browser offline. That means I can simply sync the directory to any machine I want to have these pages available on in an offline capacity. On my Chromebook I rsync the directory on a scheduled job.
🔔 TODO: I might setup Syncthing (again) just for syncing specific directories from my NAS to various devices (like the Chromebook and my phone). Not a fan of this, though.
Screenshot
In the settings for SingleFile under “File Format” you can select to have an image of the site saved.
⚠️ This only works on desktop apps when saving the page and not in Firefox on Android. You can still view the image on Android if saved from another machine.
Once selected you can change the file extension to png and it will load as an image.
Annotation
SingleFile has a built-in annotator and any annotation made before saving are rendered with the archive, even in dufs.
Extracting page contents
Since I’m using the universal zip format I can unzip the file and have access to all of the assets, including all the images on the page.
I tried this with XArchiver, which is built into XFCE desktops, and didn’t have any luck extracting it. In the SingleFile info it says some programs can have issues extracting the archive. However, pop over the command line and use unzip which works totally fine.
Tagging
Since I’m using SingleFile and just dumping them into a directory I have to make a custom system for tagging. I am opting for adding tags to the file name which will then be searchable. To be honest, tags are the most important feature I need as I don’t need to search inside the documents (although possible if I want to pursue).
This is the syntax I will be using:
name_of_file[tag1 tag2 tag3].u.zip.html
All tags should be separated by a space and inside of brackets following the name of the site. Inside of SingleFile I have it configured to pop-up and ask for the file name before saving so I can add the tags.
Here is a real world example:
TSDProxy [linux_server homelab tailscale docker].u.zip.html
Now any simple search can find the tags.
Misc. stuff
I know other bookmark managers can save pages in various other formats, such as a PDF. If I wanted to do that I could easily convert the pages myself, instead of having all of my bookmarks saved in every format possible. It is already saving the pages in html and as a png. If I wanted a PDF I already self-host an instance of StirlingPDF that can convert the files.
My system is using dufs to serve the pages, but this isn’t necessary. In fact, if it wasn’t for my Chromebook I wouldn’t need it at all since it is just a directory of files and I have it mapped to other various devices with samba. I could have SingleFile save the pages to the connected samba share at my desktop and access with network shares on other devices. Or, I can switch to Copyparty or similar tools at any time.
As an avid note-taker I am not concerned with finding specific text inside the saved pages. I know if there is a line or specific flag I need from a page, it will be in my notes.
- - - - -
Did you like this post? Give it an upvote by clicking on the arrows below! Sending me an upvote is like you and I giving each other a high five.
🙏 😎
Thank you for reading! If you would like to comment on this post you can start a conversation on the Fediverse. Message me on Mastodon at @cinimodev@masto.ctms.me. Or, you may email me at blog.discourse904@8alias.com. This is an intentionally masked email address that will be forwarded to the correct inbox.If you enjoy the random stuff I write here, post to Mastodon, or watch on YouTube, and are feeling generous, I am open to tips of Ko-fi.
Dufs compose
dufs:
image: sigoden/dufs
container_name: dufs
environment:
- TZ=America/Los_Angeles
- PGID=1000
- PUID=1000
volumes:
- /path/to/config/dir:/config
- /path/to/share/dir:/data
ports:
- 5000:5000
command: -c /config/config.yaml
restart: unless-stopped
And the config file:
serve-path: '/data'
bind:
- 0.0.0.0
port: 5000
hidden:
- '*.log'
- '*.lock'
auth:
- [USER]:[REDACTED]:rw
allow-all: true
log-format: '$remote_addr "$request" $status $http_user_agent'
Copyparty - TESTING | DEPRECATED
💡I moved on from Copyparty because it doesn’t do anything different than Dufs, which I already have running.
This is the testing branch on my desktop.
Docker
First, we need to create a compose file that will map the volumes, ports, etc.
version: '3'
services:
copyparty:
image: copyparty/ac:latest
container_name: copyparty
user: "1000:1000"
ports:
- 3939:3939
volumes:
- /home/dominic/docker_config/copyparty/config:/cfg
- /home/dominic/docker_config/copyparty/data:/w:z
stop_grace_period: 15s # thumbnailer is allowed to continue finishing
up for 10s after the shutdown signal
healthcheck:
test: ["CMD-SHELL", "wget --spider -q 127.0.0.1:3923/?reset"]
interval: 1m
timeout: 2s
retries: 5
start_period: 15s
This compose file is mapping $HOME/docker_config/copyparty/data to the internal path of /w. This is important for the conf file as we don’t need to pass the local dir, just the dir inside the container.
Configuration
[global]
e2dsa # enable file indexing and filesystem scanning
e2ts # enable multimedia indexing
ansi # enable colors in log messages
daw # WebDAV write/delete
allow-csrf # Fix cors headers to accept uploads for WebDAV.
usernames # Allow multi-users & have to enter username.
p: 3939 # Port
ver # show copyparty version in the controlpanel
# theme: 2 # monokai
# name: datasaver # change the server-name that's displayed in the
browser
no-robots, force-js # Try to block search engine indexing
[accounts]
dominic: [REDACTED] # username: password
[/] # create a volume at "/"
/w # share the docker data volume path
accs:
A: dominic # User all permissions
In this config I have allowed full indexing, passed some flags to fix uploading to WebDAV from SingleFile, set the port, and created my user. No one is allowed to view any files except the user.
Failure to upload
The issue with SingleFile was it sends an “Origin” header when uploading. The server isn’t expecting any Origin or it has to at least match the Origin header. With SingleFile, it sends an Origin of the Mozilla extension, which will throw errors and blocks the upload. To fix this I am simply disabling allow headers checks. Would be a problem if made public.
SingleFile & Copyparty | DEP
I am passing “usernames” in the configuration file, so now we can add the user and pass to the creds section of SingleFile for the WebDAV path.
⚠️ If you don’t enable usernames you need to put the password in the username field in SingleFile.