DevLog: Implementing a Weblog Markdown Parser

Published November 6, 2025

Over the past couple weeks, I've been building a personal website to showcase my projects and whatever else I want to share. "Whatever else" includes hosting a weblog on the site, to which I can incrementally add content over time that's both easy to manage and browse. To keep the content management simple, I've chosen to store my articles in their own Git repository separate from the website code. This means my weblog catalog is independent of the website application (which actually ends up just being a template), making it easy to manage with the deployed configuration.

Since the article files are meant to be human-readable, easy to edit, and universally compatible across common renderers, I wrote the content in Markdown. The simplest option would have been to just link to the source Markdown files, which GitHub can render naively. However, I wanted to provide a more polished reading experience on the site and tie in the application's theming. I also plan to add more features in the future that would require integrating the content into the application, one way or another.

Parsing the Options

There was no need to reinvent the wheel here; there are many established Markdown parsers out there. These parsers transpile Markdown syntax into HTML, wrapping segments of content in the appropriate HTML tags to structure the content and apply styling in a web browser. The challenge was simply finding one that met the requirements I had in mind. The parser needed to:

convert common Markdown syntax to HTML,
render Markdown footnotes,
and include some support for adding conditional HTML classes for styling.

The most active and popular choice that met these requirements was Markdown it!.

In most other cases, it's not necessary to add custom HTML classes (identifying labels used to target specific elements for styling or scripting) to the parsed elements. Typically one could simply use tag names for styling: e.g., .article h1. But, since my dynamic theme system needs to apply different styles based on additional context (not just the element being styled), styling by tag name wouldn't work. To the best of my knowledge, there currently isn't a runtime protocol for applying the cascading style sheets (CSS) definitions of one selector to another — something like Sass's @extend directive.

.article h1 {
    @extend .typography-heading-1;
}

If there were, HTML tags could be mapped to the appropriate theme classes (i.e. the HTML class names generated by the theme system) in the CSS, without needing to modify the HTML output of the parser. Alas, this is not the case, and the HTML output of the parser needs to include the HTML classes matching the theme classes.

Thankfully, the render interface exposes the tokenized elements during parsing, allowing HTML classes to be added as needed. In the end, it was easy enough to implement the conditional application of HTML classes with Markdown it!. The only exception was the footnote plugin, which required a small patch to apply styling correctly to the footnote references (the superscript numbers inline with the text).^[1]

Thinking Abstractly

One additional thing I needed to consider when rendering an article page was the metadata that browsers, search engines, and other services expect when displaying pages or links. Basic search engine optimization (SEO) requires a page title and a summary description to be included in the HTML document. To handle this, I created a separate directory of article abstracts within the Git repository that mirrors the article directory structure. Each abstract file includes the title and description for the corresponding article file. Putting the optional article abstract into its own Markdown file keeps the article content to its own dedicated file. Both the abstract and article files can each be rendered to provide either a simple article preview or the full article, respectively.

I Do Declare

There wasn't much else to it, really. Since the theme styles are globally available as plain CSS, there was no need to implement any theme-specific logic in the parser. Had I used scoped styles (e.g., CSS Modules, Svelte component style definitions) or CSS in Javascript (e.g., styled-components, emotion), this wouldn't have been as simple. The article component would have needed to be aware of the current theme and dynamically apply the correct styles to match the current theme.

While I have criticized declarative models in the past, this solution is a great example of how declarative design can significantly simplify implementation. Simply mapping HTML elements to theme classes leaves the styling logic to the CSS engine, allowing me to focus on the composition and structure of the content. And because the HTML classes used by the application are just semantic tokens within the theming system, the theme CSS generator takes care of the rest naturally.

Not-So-Hyper Text

In terms of code, this update to render articles within the application was quite trivial. However, a lot had to be considered beyond simply choosing the right parser library. When designing a content system, one of the core questions that must be addressed is: how do I format the content data? With any system that's not fully self-contained, data needs to be stored and transmitted in a consistent format. If that format changes with new features, all existing content may need to be updated to match — which can be challenging to manage, even if it's just emptying the browser cache of a few users. Your data structure is always coupled with the software that consumes it, one way or another.^[2] And since I chose to store my articles outside the application repository, I needed to really think about how I structured the content.

So, the idea was to keep things as simple as possible, and there's nothing simpler than plain text. Markdown isn't much more than plain text, and Markdown's syntax is as close to plain text styling as you can get. At a glance, a Markdown document just looks like an old text file from before rich text editors were commonplace and documents were broken up with dashes, slashes and pound signs. This means that my content will remain quite stable and accessible, regardless of how Markdown specifications change over time. Since I'm relying on an off-the-shelf parser to handle the Markdown validation, it shouldn't be too difficult to ensure that the content meets the expected structure, as long as it's easily decoded text (which is really anything transferred over HTTP).

Text is Plain

But this does raise another question: how will I add more features in the future? For example, one of my goals is to allow embedding images and other media files directly in the articles. Also, it would be really great to feed in code files and design documents directly from the source files. Since this would require merging content from multiple sources, and likely different formats, this wouldn't be appropriate to handle with a Markdown parser alone.

The simplest solution to consider would be to extend the Markdown syntax with custom directives that the parser could interpret, then pass back the appropriate command to the application and embed the content correctly.

On GitHub, you can just include a link to raw files directly, like so:
https://github.com/systemcarl/weblog/blob/2bd7694f413c43640033e09db42eca7aad80b58d/articles/hello-world.md?plain=1#L2

On GitHub, you can just include a link to raw files directly, like so: https://github.com/systemcarl/weblog/blob/2bd7694f413c43640033e09db42eca7aad80b58d/articles/hello-world.md?plain=1#L2

However, this only works if you're viewing the content in an environment that understands that GitHub URLs link to a file resource. Any other parser would just leave the directives as-is in the output HTML as unintelligible text. This is not ideal, but there could be ways to mitigate this problem with cleverly designed link types or code blocks that would still fall back to a standard element if whatever new feature (file embedding in this example) were not supported.

Including a query parameter could signal that the link should be embedded:
[Example](https://github.com/systemcarl/weblog/blob/2bd7694f413c43640033e09db42eca7aad80b58d/articles/hello-world.md?embed=1&plain=1#L2)

Including a query parameter could signal that the link should be embedded: Example

When parsed with any standard Markdown parser, this would just render as a normal link that directs to the file on GitHub (so long as the embed parameter is irrelevant to GitHub). The downside is that when previewing the content on GitHub, you no longer see embedded content.

Another option would be to allow HTML in the content. This is generally supported by most Markdown parsers. Common HTML elements can be used directly within the Markdown content, allowing for more complex structures to be included in the document. And since HTML is also just a plain text format, it fits well within my design's principles of simplicity and stability. However, HTML is not nearly as nice to read and thus requires a more involved HTML renderer to present the content in an accessible way. But if used sparingly within a Markdown document, this solution could still work well as a compromise between readability and functionality — the odd HTML tag appearing in a text document would not present a polished experience, but the average reader would likely understand the intent and carry on unfazed.

A Common Content Concern

Since the start of my project, content format (how the content is both stored and presented) has been a common concern for me. This simple but critical element of the design is a key decision that will impact the overall structure and experience of the application. The structure of your data shapes the outcome of a project, well beyond the current iteration of its software. Deciding how to structure the theming system also required carefully considering not just my initial design, but how the design might evolve over time, with more features and different themes. In the same way, building out the content template required thinking about how the content might be structured to handle different use cases beyond just my own personal website. Each of these problems required thinking abstractly about how the underlying data (be it the article content, the theme definitions, or other application configurations) is transformed into the final user experience.

I expect this will continue to be a theme (philosophical, not visual) as this project continues. As I add more features to the underlying application, the constraints will continue to tighten around how the content must be structured. The application still needs to provide a way to browse, filter, and sort articles — which will likely require some additional metadata to be included with each article. But at this point, keeping the system simple and flexible was the best way to prevent the destiny of the application from being determined by today's assumptions.

As for the article content, the Markdown format is simple and stable. I haven't had any issues working with Markdown files. I've since added themed syntax highlighting and extended the parsing to handle GitHub syntax alert quote blocks. It has also been easy to update and edit (thanks, Ren^[3]) articles, since the Markdown files are just plain text hosted on GitHub. Worst case, if I decide to scrap the whole website and application, I still have a neat and tidy collection of Markdown file articles that anyone can read.

This is a footnote, here! This definition links back to the text where it was referenced. ↩︎
Discussion of the content files coupling to the applications implementation came from feedback provided by fellow software developer, Chris Adkins. ↩︎
Another huge thanks to my partner, Ren, for their editorial assistance and the unrelenting emotional support. ↩︎

Edited by Renata Soljmosi