Displaying my bookmarks RSS feed on a static page in Hugo

Posted on Feb 7, 2024

I was recently inspired by Ana Rodrigues to start adding a blog post on a frequent basis of my bookmarks. The goal is to have a place where I can share some of my favorite things I’ve discovered.

Rather than a regular blog post, below are my notes on how I created a page on my Hugo blog that displays the title, description, and URL from items tagged web-list from my self-hosted Shaarl bookmarks manager. This page gets updated once a day and checks for any changes. Therefore it is updated on a rolling basis and is now automated.

Shaarli provides RSS feeds for bookmarks, including separate RSS feeds for tags. I keep Shaarli as a LAN-only tool so I don’t expose all of my bookmarks. But, I want to expose specific bookmarks onto my Hugo blog. The notes below include:

Creating a Hugo shortcode to pull the RSS feed, clean it up, and display it in a way that fits my blog theme.
Make a static page that displays this information.
Automate rebuilding the Hugo site once a day to pull in any changes to the RSS feed.

I will preface all of this with I am not a developer or an IT professional. I am a landscape contractor who likes to play with computers and self-hosting. Everything I do is all self-taught through curiosity and effort (plus some good notes). I’m not friends with anyone who is a professional in these fields, so most of the time its just me feverishly reading blog posts and application documentation.

I’m sure there are better ways to do this. But, this works for me and I’m incredibly proud of what I’ve accomplished with this website.

Step 1: Create shortcode

I created this shortcode by reverse engineering a few different posts from people who were doing something similar. This file lives in /layouts/shortcodes and it named shaarli_feed.html.

{{ $u := "https://favorites.proxy.TLD/feed/atom?&searchtags=web_list" }}
{{ with resources.GetRemote $u }}
  {{ with .Err }}
    {{ errorf "%s" . }}
  {{ else }}
     {{ with . | unmarshal }}
      {{ range .entry }}
      <div>
        <h2>{{ .title }}</h2>
        <p>Added: {{ .updated | time.AsTime | time.Format ":date_long" }}</p>
        <p>{{ index .content "#text" | replaceRE "<br>|&#8212; <a[^>]*>.*?</a>" "" | safeHTML | truncate 300 }}</p>
        <p><a href="{{ index .link `-href` }}">Link</a></p>
      </div>	
      {{ end }}
    {{ end }}
  {{ end }}
{{ else }}
  {{ errorf "Unable to get remote resource %q" $u }}
{{ end }}

Some keys in this doc:

{{range .entry is the only way I was able to get it to work. Other people had range .limit with some qualifiers, but that never worked for me.
replaceRE "<br>|— <a[^>]*>.*?</a>" "" is to remove the <a> tag, its contents, and an em dash for a permalink from Shaarli. I only wanted the description of the bookmark. But, Shaarli RSS XML puts a permalink in the <content> section. This is to remove that.
index .content "#text" pulls the text from <content> in the RSS XML from Shaarli. It took a while to figure this one out. When you look at the XML file for an RSS feed, there are tags for certain kinds of content. Some call it <description> or something else. In the Shaarli XML for the RSS feed, the description is in a tag called <content>. This can be adjusted depending on the RSS XML.
I asked Google Bard for help with this and they gave me the code re.ReplaceAll as the string for removing specific HTML codes from the .content section. It didn’t work and I had to look around. Eventually found this which shows the correct code is replaceRE.

I used Google Bard on and off throughout this process and I will say it was mostly useless. The information it provided was consistently wrong. Not that it didn’t understand what I was doing, but it was just flat wrong. The code samples would include tags that don’t exist. The information would clearly contradict what the official Hugo documentation contained. I would ask follow up questions and it would admit being wrong, then give another wrong answer.

If any IT pro is reading this, why do you use AI for coding? Is Bard just that bad? Is Copiliot better?

Create page

For the bookmarks page all we need is a markdown file like this:

---
title: Bookmarks
description: Recent bookmarks I found interesting. This is a rolling page, updated once a month.
date: 2024-02-06
tags: boomkarks
---

{/{/< shaarli_feed >/}/}

Remove the / from the brackets. Those have been added so this post won’t call the shortcode and display the entire contents of the feed.

This calls the shortcode and displays the content. This file lives in /content

Refreshing page

I’m still trying to figure this out. I don’t know what triggers the feed to be pulled again by Hugo.

Looking at this page I think I’m having cache issues. Stopping Hugo and then reloading with --ignoreCache loads the new list right away.

So my startup command looks like this: DEP - Don’t use

hugo serve --bind=0.0.0.0 --baseURL=https://blog.ctms.me --appendPort=false --environment=production --disableFastRender --ignoreCache

I found this is better:

hugo serve --bind=0.0.0.0 --baseURL=https://blog.ctms.me --appendPort=false --environment=production --disableFastRender --cacheDir /path/to/blog_files/cache/

In this launch command I am passing the directory for the cache, which I created inside the root directory for the site. This makes it so I can easily clear the cache whenever I want to and then point to it with a clean slate.

Rebuilding automation

First I need to create a fish alias for starting the blog. I prefer using fish as my shell. Instructions are different depending on your shell. I am also using the snap version of Hugo for additional confined security. Also, the version is apt is absolutely ancient.

alias blog_run='cd /path/to/blog_files && /snap/bin/hugo serve --bind=0.0.0.0 --baseURL=https://blog.ctms.me --appendPort=false --environment=production --disableFastRender --cacheDir /path/to/blog_files/cache/'

funcsave blog_run

This command is changing to the blog directory and then starting it with all the flags.

Now I’m creating a script that will stop the blog, delete the cache directory, and then restart the blog. We can do this with screen instead of tmux. Found this thread about killing a screen using the CLI.

#! /bin/bash
screen -X -S "hugo" quit
rm -rf /path/to/blog_files/cache/fillecache/getresource
screen -S hugo -d -m fish -c 'blog_run; exec fish'

It works! This script is killing the screen, then deleting the cache folder that has the cache file inside (doing it this way in case the file name changes), and the restarting screen with the right name and the blog_run command.

I now setup a cron job to run every day at 1PM.

01 13 * * * cd /path/to/scripts && sh blog_reload.sh

- - - - -

Did you like this post? Give it an upvote by clicking on the arrows below! Sending me an upvote is like you and I giving each other a high five.

🙏 😎

Thank you for reading! If you would like to comment on this post you can start a conversation on the Fediverse. Message me on Mastodon at @cinimodev@masto.ctms.me. Or, you may email me at blog.discourse904@8alias.com. This is an intentionally masked email address that will be forwarded to the correct inbox.

If you enjoy the random stuff I write here, post to Mastodon, or watch on YouTube, and are feeling generous, I am open to tips of Ko-fi.