Aaron Watts Dev Github Pages Static Website
Back to home
RSS feed

Static Website - Github Pages

The google search home page with your name here written
        in the query input

You don't always need a full stack. When all you need to do is show information, a static website is enough. And with Github Pages, it is possible to have a fully functioning website for free, just like this one! It could just be the perfect solution for your portfolio page.

In fact, in 2024, static sites can do a bit more than just send information. This site, for example, has a comment section on each project page (scroll down and see for yourself). There's an oceans-worth of webdev lessons and tutorials out there, but not many really talk about getting the site up, and indexed. So instead of talking about how to code a website, I thought I'd go over some of the things I've learned in building this website and getting it bing-ready, as well as some of the interesting things you can get working on static websites today.

Why Not a CMS?

Good question. You can absolutely use a Content Management System. Github Pages even has one incorporated into it, it's called Jekyll. But I had a very specific vision for my website. Story time.

I bought my domain name a while ago. I had been learning web development, and thought I might like to do it professionally. And so, naturally I would need an online portfolio. Eventually, I changed my mind, and decided that I prefer to play with mono fonts as a hobby, so I scrapped that idea. But I kept the domain name, just in case I decided I might eventually want to do something with it. And I did, I'm doing it now, it's this site. Anyway, when I was learning all these amazing CSS and Javascript spells and incantations, two things would often happen. The first was that more code would lead to more bugs, and the other thing was that I was never satisified with a project. I would always want to improve or fix something, and nothing ever got finished.

Then I came across a website, just some motherfucking website, and I realised then, that less is more. When I decided I wanted to build this website to log my projects, I decided I wanted it to follow those design principles (or lack there of). In doing very little, I wanted to acheive a responsive, hierarchical website that just worked everywhere, and loaded up fast. You don't get that with CMS's. And to that end, I use very little javascript on this site (even though I love javascript), it highlights my syntax, runs a filter feature on the home page, and runs the comments sections on these projects pages. I don't import fonts, and as much as possible is hosted in the website itself. The only external service is the comments because, well, static.

So, to keep to those design principles, I've opted to do everything myself (and also because I enjoy coding). Although, that doesn't mean things have to be difficult either. After looking at the things you need to get done to pass the bing test, we will take a look at automating some of the tedious stuff with python scripts.

Github Pages

So, assuming you've got a basic website built and ready to go, you will need somewhere to serve the files from. I'm not savvy enough to want to brave self hosting on a home server, so I opted for Github Pages. If you're here reading this, then chances are you either have a github account, or are familiar with the git CLI. Github Pages websites are free, though there are limitations to them, but for a portfolio page, a blog, or documentation it is ideal.

There are 2 different types of github pages websites. The first is the user/organisation site. This will give you a url in the format <username>.github.io, which, in itself is pretty tidy as far as free url's go. You will be able to use advanced DNS configuration with a DNS provider such as namecheap if you go this route, meaning it will be much easier to set up with search engines. The second is a project site. Although trickier, it is still possible to set up a custom domain in this case, and without advanced DNS configuration, setting up with search engines will involve adding meta tags into the site headers for verification.

Custom Domain Name

You don't need a custom domain name. But with a DNS provider, such as namecheap, you get the benefit of advnaced DNS configuration, making it far easier to set up SEO, using CNAME and txt records, instead of uglying up your code with unnecessary meta tags. Also, <yourname>.com looks far nicer, and more professional, than <yourname>.github.io. The process for getting custom domains set up for user and organisation pages is pretty thoroughly documented by gihtub here. For project pages, there is a solution that worked for me over at Stack Overflow here.

Getting Bing Ready

Okay, so Bing is not the only search engine. But out of Bing and Google, Bing is definitely the hard-arse parent that isn't goint to tolerate any slack from you. Google is quite lax, provided your url's are all correct and your site is verified. But Bing runs a number of tests and won't index if your site fails them. The only other remaining pure search engines remaining today, might be Yahoo and DuckDuckGo, but both of those pull indexes off of Bing, so through the hoops we jump. There may well be more issues than what I bring up here, but assuming you're starting with something like the emmet boilerplate, you shouldn't run in to too many more than I did.

H1's

They tried to tell you. You didn't listen. Semantics is important. But you kept following those CSS guru's on youtube like they were bitches on heat. Now all you know about web dev is how to nest spans in divs in spans in divs to make a sidemenu icon change shape when the menu is hidden.

Bing wants to see a h1 on every page. Be smart and use it on the main title of the relevant content of each page. I didn't really see the need for one on my splash page, but Bing flagged it, so I had to put one in and cancel the styling on it.

Alt-txt

Yes, semantics is important. And yes, I regularly forget to put the alt-txt on images. Oh, fine. I just don't bother, I remember and then decide I'm too lazy to do it. Anyway, Bing pulled me on it, and now I try to be a good web dev.

Meta Description

SEO has changed over the years, and search engines rely less on the meta tags in html headers than they used to. However, Bing still wants a meta desription tag, it won't index a page without one. You can throw in the meta keywords lists and all other bells and whistles if you wish, but just make sure the description is in their, as it seems to be the one Bing cares about.

Favicon

Technically, this wasn't really an obstacle to getting indexed, but it is rumoured that it can improve your ranking in search engines, and it also get's rid of that annoying console warning. Easiest way, use one of the favicon.io favicon generators, and they will not only render all the files required for it, but they'll also provide the markup for the headers, as well as a site.webmanifest file for android users to be able to create shortcuts on their homescreen of your website - just don't forget to update the name and short name values in that file before pushing.

Sitemap

There are two main types of sitemap. The first is html sitemaps, intended for users, but search engines can also crawl them to find links to all the pages on your site. The second is xml sitemaps, which you can submit directly to search engines, so they have an immediate directory of your sites urls.

Creating a html sitemap isn't too difficult if you have already managed to code a static website, so I'll look at the xml sitemap, as there's a bit to learn with xml if you haven't used it much before.

The sitemap xml file isn't too complicated, and is a good thing to look at before moving onto RSS feeds, which are similar but have a fair bit more going on to them. Let's take a look at the sitemap for this website, and then break it down:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>https://aaronwattsdev.com/</loc>
    </url>
    <url>
        <loc>https://aaronwattsdev.com/home/</loc>
    </url>
    <url>
        <loc>https://aaronwattsdev.com/projects/gpicase/</loc>
    </url>
    <url>
        <loc>https://aaronwattsdev.com/projects/kde-plasma-bigscreen/</loc>
    </url>
    <url>
        <loc>https://aaronwattsdev.com/projects/media-keyboard/</loc>
    </url>
    <url>
        <loc>https://aaronwattsdev.com/projects/pi5-desktop/</loc>
    </url>
    <url>
        <loc>https://aaronwattsdev.com/projects/retropie-nespi-4/</loc>
    </url>
</urlset>

This should all look a little familiar if you've ever played around with svg's in the browser, as it's the same type of markup. The first line is just stating which version of xml and encoding the document is using.

<?xml version="1.0" encoding="UTF-8"?>

The next line is the opening tag for the root element, <urlset>. It's what it sounds like, a set of url's.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <!-- urls go here -->
</urlset>

Nested within the <urlset> element are the <url> elements, one for each page in your website that you wish to be known to a search engines crawler. Nested within each <url> element, should be the information about each url. <loc> contains the url itself, and is the only tag that's required within the <url> tag, but you can check what other tags can be included by checking the protocol documenation at sitemap.org.

<url>
    <loc>https://aaronwattsdev.com/</loc>
</url>

A CMS would write these out for you, and doing them by hand can become a bit boring, and also prone to human error, so we will take a look at writing scripts to build these xml files for us shortly. But first, let's look at the rss feed.

RSS Feed

So, the following xml file contains the bare minimum for an rss feed to function in most readers. The required elements are all there. We can add more for a far better experience, but already, it is quite a bit to write each time you want to add to the feed. Further, all the required information can be gathered from within our project in some way, so before we flesh this file out to be a better feed, we would be foolish not to get some kind of automation set up to do this for us.

But first, let's have a quick look at what we have so far:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
    <channel>
        <title>AaronWattsDev Projects</title>
        <link>https://aaronwattsdev.com/home</link>
        <description>Projects in coding, raspberry pi, linux and more</description>
        <item>
            <title>Raspberry Pi 5 - Desktop Computer</title>
            <link>https://aaronwattsdev.com/projects/pi5-desktop</link>
            <description>This is less of a guide, and more just an outline of how, and why, I'm doing it. As well as a review of how it's going. It's not here as clickbait, you might not want to do it yourself, but I'm here to say that it's working for me.</description>
        </item>
        <item>
            <title>KDE Plasma Bigscreen - Raspberry Pi 4</title>
            <link>https://aaronwattsdev.com/projects/kde-plasma-bigscreen</link>
            <description>I built this media centre at the start of 2024. There may be easier methods of installation by now. When I built this project, existing guides mentioned an installation image on KDE's bigscreen page, however, it doesn't seem to exist currently, except for a manjaro image, and I prefer to stick to Raspberry Pi's own debian OS when I can, especially that it ensures the best compatibility with things like cases and accessories.</description>
        </item>
        <item>
            <title>RetroPie - RetroFlag NesPi 4 Case</title>
            <link>https://aaronwattsdev.com/projects/retropie-nespi-4</link>
            <description>The Raspberry Pi 4, with Retroflag's NesPi 4 Case, running RetroPie is the ultimate retro gaming rig. Lower power consumption? Check. Light gun games? You got it. Great customisation options? Of course. Did you know you can even use a wiimote as the stylus for the Drastic Nintendo DS Emulator?? Did I hear someone say Wario Ware?</description>
        </item>
        <item>
            <title>Media Keyboard - Pico Controller</title>
            <link>https://aaronwattsdev.com/projects/media-keyboard</link>
            <description>My keyboard doesn't include the media keys, so I built a macro keyboard to control the media on my desktop computer. It is plug'n'play and requires no configuration to work between different devices. The project can be easily tweaked to program macro's for work, steam games, and whatever else you can't be bothered to type out manually.</description>
        </item>
        <item>
            <title>GPi Case 2 - Compute Module 4</title>
            <link>https://aaronwattsdev.com/projects/gpicase</link>
            <description>The GPi Case 2 with Raspberry Pi's Compute Module 4 is the ultimate portable gaming device. The case offers amazing functionality, and when combined with a CM4 and 64-bit Recalbox, there is little that can challenge it for a portable retro gaming experience.</description>
        </item>
    </channel>
</rss>

So, ignoring the first line, that is the same as the sitemap xml file, we can see that the root element in the rss feed is the <rss> element. The rss version is specified as an attribute, it is using rss 2.0. Within that, is the nested <channel> element. Inside of the <channel> element, we have the feed information: <title>, <link> and <description>. This all provides the rss aggregators with the information they need about the channel itself.

<rss version="2.0">
    <channel>
        <title>AaronWattsDev Projects</title>
        <link>https://aaronwattsdev.com/home</link>
        <description>Projects in coding, raspberry pi, linux and more</description>
        <!-- rss items go here -->
    </channel>
</rss>

Finally, we have the rss items themselves. These each consist of an <item> element, with a nested <title>, <link> and <description> element each. What's helpful about these elements, is the information they require are all available on this sites home page, where each project is listed with links and descriptions. This will come in handy in a when we automate the feed.

<item>
    <title>Raspberry Pi 5 - Desktop Computer</title>
    <link>https://aaronwattsdev.com/projects/pi5-desktop</link>
    <description>This is less of a guide, and more just an outline of how, and why, I'm doing it. As well as a review of how it's going. It's not here as clickbait, you might not want to do it yourself, but I'm here to say that it's working for me.</description>
</item>

Automating the Boring Stuff

One of the main benefits of using a CMS, is that you don't have to do, or even think about, most of this stuff. Not only is ammending the sitemap and rss feed another job that can be time consuming, forgotten, or even done wrong, but it's also not necessary. Even without a CMS, we can make our own lives easier with a bit of messy hacking. First let's handle the easy one, which is the sitemap. But before we begin, it's worth mentioning that a sensible directory structure will help a lot with automation, and that what I write here isn't necessarily going to work for your own sites structure, so you will need to hack around with it yourself to get it working for you. I will talk about the structure I use and how it affects or benefits the scripts I'm writing for reference.

A Sitemap Generator in Python

So the sitemap is essentially just a set of links to each html page in the website. Easy, my site only has two folders containing any html files at all, that's the root/ and projects/ directories.

/
index.html
home.html
projects/
    gpicase.html
    kde-plasma-bigscreen.html
    media-keyboard.html
    pi5-desktop.html
    retropie-nespi-4.html

With this in mind, all we need to do, is build a <url> element for each file, append it to the root element, and then write it to a file. We can use the os module to probe our filesystem, and the xml.etree.ElementTree module parses and writes xml. Both are included in python3.

The following snippet will build a root element, and then append a subelement to it, and fill out the text. We also have to register the xml namespace we will be using before we make the root element, and set the namespace on the root element after we have created it, not doing so will result in ns0 inserting itself within all the following elements down the tree.

import xml.etree.ElementTree as ET

# create the root element and set namespace
ET.register_namespace('', 'http://www.sitemaps.org/schemas/sitemap/0.9')
urlset = ET.Element('urlset')
urlset.set('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9')

# append a subelement to the root element
url = ET.SubElement(urlset, 'url')
loc = ET.SubElement(url, 'loc')
loc.text = 'https://aaronwattsdev.com/'

We are going to have to build a lot more url's, so that may as well be extracted out to a function. We will pass the filepath to it as a parameter, and use it to build the url itself. We will also set the default path param to an empty string, for when we need to build the index url.

def build_url(path=''):
    url = ET.SubElement(urlset, 'url')
    loc = ET.SubElement(url, 'loc')
    loc.text = f'{root_url}{path}'

I'm going to keep the script simple, and just call the two files in the root directory directly, as I don't expect these to change any time soon. But the projects directory is likely to have new files going into it regularly, so for that, I will use the os python module to iterate through these files, and call the build file function, passing the filename with the .html extension truncated off.

build_url()
build_url('home/')

for filename in os.listdir('projects/'):
    build_url(f'{filename[:-5]}/')

Finally, we just need to declare the element tree, indent it for readability, and write it to sitemap.xml. We can define the xml version and encoding in the write function.

tree = ET.ElementTree(urlset)
ET.indent(tree)
tree.write('sitemap.xml', xml_declaration='version', encoding='UTF-8')

A quick look at the finished script and you will see I have stored the base url as a string, to help in the build url function. It only gets used on the one line within the function, but it makes it easier to change later if we need to by declaring it at the start of the script

import os
import xml.etree.ElementTree as ET

root_url = 'https://aaronwattsdev.com/'

def build_url(path=''):
    url = ET.SubElement(urlset, 'url')
    loc = ET.SubElement(url, 'loc')
    loc.text = f'{root_url}{path}'

ET.register_namespace('', 'http://www.sitemaps.org/schemas/sitemap/0.9')
urlset = ET.Element('urlset')
urlset.set('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9')

build_url()
build_url('home/')

for filename in os.listdir('projects/'):
    build_url(f'{filename[:-5]}/')

tree = ET.ElementTree(urlset)
ET.indent(tree)
tree.write('sitemap.xml', xml_declaration='version', encoding='UTF-8')

A Simple RSS Generator in Python

There is a bit more to do in this script. As well as url's for each page, we will need a title and a description for each project. Well, that all exists in /home.html, and we can use the python bs4, a.k.a. Beautiful Soup, module to read and parse the html file, and extract all the data we need.

from bs4 import BeautifulSoup

with open('home.html') as f:
    txt = f.read()
    soup = BeautifulSoup(txt, 'lxml')

projects = soup.select('.project')

for project in projects:
    title = project.select_one('h2').text
    link = project.select_one('a')['href']
    description = project.select_one('.description').text

The next step is to build the xml. Which is essentially the same process as with the sitemap, however the elements are a little more complex for the rss elements.

import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup

# strings for the channel information
base_url = 'https://aaronwattsdev.com'
title_text = 'AaronWattsDev Projects'
description_text = 'Projects in coding, raspberry pi, linux and more'

# build a single element
# params: parent, type, text
# returns: element
def build_element(el_parent, el_type, el_text=''):
    element = ET.SubElement(el_parent, el_type)
    # leaves text blank if empty
    if len(el_text):
        element.text = el_text
    return element

# build rss item element
# params: title, link, description
def build_item(title, link, description):
    link_text = f'{base_url}{link}'
    rss_item = build_element(rss_channel, 'item')
    build_element(rss_item, 'title', title)
    build_element(rss_item, 'link', link_text)
    # join and split to remove excess whitespace from html formatting
    build_element(rss_item, 'description', ' '.join(description.split()))

# make root element
rss = ET.Element('rss')
rss.set('version', '2.0')

# append channel element and append channel details
rss_channel = build_element(rss, 'channel')
build_element(rss_channel, 'title', title_text)
build_element(rss_channel, 'link', f'{base_url}/home')
build_element(rss_channel, 'description', description_text)

# make soup
with open('home.html') as f:
    txt = f.read()
    soup = BeautifulSoup(txt, 'lxml')

# parse projects from soup and build rss items with data
projects = soup.select('.project')
for project in projects:
    title = project.select_one('h2').text
    link = project.select_one('a')['href']
    description = project.select_one('.description').text
    build_item(title, link, description)

# build and write xml tree
tree = ET.ElementTree(rss)
ET.indent(tree)
tree.write('rss.xml', xml_declaration='version', encoding='UTF-8')
    

A Better RSS Feed

We have the basics now for a working RSS feed, but loaded up into an aggregator, it looks a little empty compared to other RSS feeds out there. There are a few tweaks we can make to the feed to improve it. By including the atom namespace and an <atom:link> element in the channel, we can increase the rss feed's compatibility with more aggregators. We can also use <enclosure> and <media> elements to include images in the rss feed. A <guid> element can help aggregators work out if it has already received an item or not. And some aggregators use the <category> element to categorise feeds.

There are other elements you can include, such as dates for updates and published, but I am not yet including this information in articles, I may implement it in the near future. Check the rssboard media-rss documentation for more quality-of-life elements around media.

<?xml version='1.0' encoding='UTF-8'?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
    <atom:link href="https://aaronwattsdev.com/feed.xml" rel="self" type="applications/rss+xml" />
    <title>AaronWattsDev Projects</title>
    <link>https://aaronwattsdev.com/home</link>
    <description>Projects in coding, raspberry pi, linux and more</description>
    <category>Technology</category>
    <item>
        <title>Raspberry Pi 5 - Desktop Computer</title>
        <link>https://aaronwattsdev.com/projects/pi5-desktop</link>
        <description>This is less of a guide, and more just an outline of how, and why, I'm doing it. As well as a review of how it's going. It's not here as clickbait, you might not want to do it yourself, but I'm here to say that it's working for me.</description>
        <guid>https://aaronwattsdev.com/projects/pi5-desktop</guid>
        <enclosure url="https://aaronwattsdev.com/images/projects/pi5-desktop.jpg" length="0" type="image/jpeg" />
        <media:thumbnail url="https://aaronwattsdev.com/images/projects/pi5-desktop.jpg" width="1920" height="1080" />
        <media:content type="image/jpeg" url="https://aaronwattsdev.com/images/projects/pi5-desktop.jpg" />
    </item>
    </channel>
</rss>

My design choices and project structure have influenced how I have written the script to collect all the required information to build the individual rss items. For example, I don't have any image links on the page I am scraping on - I just didn't want the page to have to load an image for every article. However, I have only one image per article, and each image has the same filename as the html file it's associated with. So by knowing the page url, the script can easily decipher the image url that's associated with it.

/
index.html
home.html
projects/
    gpicase.html
    kde-plasma-bigscreen.html
    ...
images/
    projects/
        gpicase.jpg
        kde-plasma-bigscreen.jpg

A different solution here could be to include a hidden element in each project on the home page that lists them, with innertext that defines an image url. Here is the final script for generating a more complex rss feed, there's a lot happening, but broken down, it's not too different from the simple rss script, it just has more jobs to do now:

import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup

base_url = 'https://aaronwattsdev.com'
title_text = 'AaronWattsDev Projects'
description_text = 'Projects in coding, raspberry pi, linux and more'

# builds a basic xml element, populates text, and appends to specified parent
def build_element(el_parent, el_type, el_text=''):
    element = ET.SubElement(el_parent, el_type)
    if len(el_text):
        element.text = el_text
    return element

# builds a rss item from scraped project data
def build_item(title, link, description):
    link_text = f'{base_url}{link}'
    rss_item = build_element(rss_channel, 'item')
    build_element(rss_item, 'title', title)
    build_element(rss_item, 'link', link_text)
    build_element(rss_item, 'description', ' '.join(description.split()))
    build_element(rss_item, 'guid', link_text)
    build_media(rss_item, link)

# called by build_item: builds complex media elements and appends to item parent
def build_media(el_parent, el_link):
    img_link = f'{base_url}/images{el_link}.jpg'
    enclosure = ET.SubElement(el_parent, 'enclosure')
    enclosure.set('url', img_link)
    enclosure.set('length', '0')
    enclosure.set('type','image/jpeg')
    media_thumbnail = ET.SubElement(el_parent, 'media:thumbnail')
    media_thumbnail.set('url', img_link)
    media_thumbnail.set('width', '1920')
    media_thumbnail.set('height', '1080')
    media_content = ET.SubElement(el_parent, 'media:content')
    media_content.set('type', 'image/jpeg')
    media_content.set('url', img_link)

# Declare XML namespaces to be used
ET.register_namespace('', 'http://www.w3.org/2005/Atom')
ET.register_namespace('', 'http://search.yahoo.com/mrss/')

# create root rss element and set attributes
rss = ET.Element('rss')
rss.set('version', '2.0')
rss.set('xmlns:atom', 'http://www.w3.org/2005/Atom')
rss.set('xmlns:media', 'http://search.yahoo.com/mrss/')

# add necessary child elements to root element and set attributes if required
rss_channel = build_element(rss, 'channel')
atom_link = ET.SubElement(rss_channel, 'atom:link')
atom_link.set('href', f'{base_url}/feed.xml')
atom_link.set('rel', 'self')
atom_link.set('type', 'applications/rss+xml')
build_element(rss_channel, 'title', title_text)
build_element(rss_channel, 'link', f'{base_url}/home')
build_element(rss_channel, 'description', description_text)
build_element(rss_channel, 'category', 'Technology')

# parse contents of home.html
with open('home.html') as f:
    txt = f.read()
    soup = BeautifulSoup(txt, 'lxml')

# create and populate an item element for each project
projects = soup.select('.project')
for project in projects:
    title = project.select_one('h2').text
    link = project.select_one('a')['href']
    description = project.select_one('.description').text
    build_item(title, link, description)

# build and write the tree
tree = ET.ElementTree(rss)
ET.indent(tree)
tree.write('feed.xml', xml_declaration='version', encoding='UTF-8')

Further Automation

We could expand on this even further. The list of projects in home.html could, with not much more work than we've already done, be generated from the files in the projects/ directory. Right now I haven't implemented published dates for each article, and so I unless I handle it another way, articles will become listed in alphabetical order, and I think I would prefer to have newest appear first. We could even strip away the repeated code from each project page, such as the headers, links and style tags, and have Beautiful Soup build the pages from very simple hmtl pages that only contain the article information, the way the Jekyll builds from markdown files, but as we are working directly in html, we would have far more control over CSS and JavaScript. Then, after that, I'm not really sure what a CMS could offer that we haven't already just done ourselves. These are things I hope to look at in the near future, but for now, I can feel my pigments fading and I would like to go out into the sunlight for a while, so let's start wrapping this up.

Index Page Generator

Eagle eyed readers may have noticed that I have, in fact, added published dates to the projects on this site. I did also write a script that takes the exisiting projects and generates content within the home.html page where the full list of projects currently lives. It was not an incredibly complex script either.

Using the Beautiful Soup Python module, the script crawls through the projects directory, and scrapes the relevant data required to make project summaries and links. And, thanks to the power of Beautiful Soup, we can work with the html much like we would with the DOM, and instead of rewriting the entire page, we can simply just rewrite the contents of the element that contains the full list of projects, which in my case, is the main element.

from bs4 import BeautifulSoup
import os
from datetime import date

projects = []

# crawl projects folder
for f in os.scandir('projects'):
    path_name = f'projects/{f.name}'
    with open(path_name) as projectf:
        project_txt = projectf.read()
        project_soup = BeautifulSoup(project_txt, 'lxml')
    # scrape the relevant data
    title = project_soup.select_one('h1').text
    intro = project_soup.select_one('p#intro').text
    description = ' '.join(intro.split())
    html_date = project_soup.select_one('time')
    project_date = date.fromisoformat(html_date['datetime'])
    keywords = project_soup.select_one('meta[name=keywords]')['content'].split(',')
    keywords = [kw.lstrip() for kw in keywords]
    
    projects.append({
        'title' : title,
        'description' : description,
        'keywords' : keywords,
        'date' : project_date,
        'datehtml' : html_date,
        'link': f'/{path_name[:-5]}'
    })

# callback function to organise into order of date
def date_sort(e):
    return e['date']

# sort projects by date - newest first
projects.sort(reverse=True, key=date_sort)

with open('home.html') as inf:
    txt = inf.read()
    soup = BeautifulSoup(txt, 'lxml')

# select and clear main element
main_element = soup.select_one('main')
main_element.clear()

# for each project, create and add the relevant html to the main element
for project in projects:
    project_div = soup.new_tag('div')
    project_div['class'] = 'project'
    project_header = soup.new_tag('h2')
    project_header.string = project['title']
    project_div.append(project_header)
    project_date = soup.new_tag('time', datetime=project['datehtml']['datetime'])
    project_date.string = project['datehtml'].text
    project_div.append(project_date)
    project_ul = soup.new_tag('ul')
    project_ul['class'] = 'topic-container'
    for topic in project['keywords']:
        topic_li = soup.new_tag('li')
        topic_li['class'] = 'topic'
        topic_li.string = topic
        project_ul.append(topic_li)
    project_div.append(project_ul)
    project_link = soup.new_tag('a', href=project['link'])
    project_link.string = 'Go to project'
    project_div.append(project_link)
    project_description = soup.new_tag('p')
    project_description['class'] = 'description'
    project_description.string = project['description']
    project_div.append(project_description)
    main_element.append(project_div)

# write changes to file home.html
with open ('home.html', 'w') as outf:
    outf.write(BeautifulSoup.prettify(soup))

Adding A Comments Section

This wasn't something that I was previously aware that you could do with static websites, and I'm not even sure what first brought my attention to it. But it turns out there are quite a few options here. There seems to an option or two that use github issues to serve the comments, which is actually pretty smart. But the one I chose to go with was Cactus Comments.

Cactus Comments is free, and doesn't require users to sign in like the solutions based on github issues. It runs on matrix, which is a defederated chat and messaging service. And I'm a big fan of defederation. As it runs on matrix, you can use the element PC and mobile apps to get alerts of, reply to, and moderate comments (there is an arm 64-bit version for linux too, which is great now that I do all of my computing on my raspberry pi 5!).

I won't explain how to get started, the documentation for cactus is pretty easy to follow. At first look you'll wonder if there's stuff missing from it, as it all seems too simple, but it really is as simple as it looks!

Back to Top

Comments Section