Tracking the "last modified" date in markdown

  • javascript
  • markdown
  • git
posted on 6th January 2022 āˆ™ by Hynek Svacha āˆ™ 8 min read

Iā€™ve migrated my blog completely to Markdown (MDX) after I realized that itā€™s probably the best match for my workflow. So far, Iā€™m quite happy with the decision. However, I was missing a few features that typically come out of the box when using a conventional CMS, like tracking the date of the last edit.

I will go through a couple of possible solutions to this problem.

Use a Markdown CMS

If you are using VSCode, there is the vscode-front-matter plugin that claims to ā€œgive you the power and control of a full-blown CMS running straight in the editor.ā€

I tested it briefly, and it looks cool. It has some issues, e.g., it doesnā€™t seem to fully support MDX at the time (although this is probably being fixed recently).

While I would probably recommend this plugin to a friend if asked, it was not the solution I chose. Why?

Firstly, I like the pristine experience of creating posts solely by editing text files (just like the elders around the campfires remember creating web pages using nothing but HTML and CSS).

Secondly, I didnā€™t want to become dependent on another tool (however cool it could be).

Use filesystem metadata

My first, rather naĆÆve, attempt was to use metadata that are accessible from the filesystem.

But there is an issue: while some metadata do live in the file itself (like image EXIF data), the main info (including the last edit) is stored in the filesystem itself. Thatā€™s why it is available even when the file is completely empty (its length is exactly 0 bytes).

(I donā€™t want to dive too deep into details, because I am nowhere near an expert on this topic.)

The fs built-in library of Node.js has multiple methods to access file metadata. For the sake of simplicity, I will only refer to the synchronous api method, fs.statSync.

This method takes a path and returns a fs.Stats instance. So, to get the last edit date, you can do something like this:

import * as fs from "fs"

const filePath = path.join(process.cwd(), `<PATH>/<TO>/<FILE>.mdx`)
const { mtime } = fs.statSync(filePath)

Caveats

Using the edit date from file system can be helpful, but has a quite limited use case.

It can be used if you are directly deploying the static build, like when you build the site using Next.js or Gatsby and then deploy the contents of an out or public dir using Netlifyā€™s dragā€™nā€™drop feature (or good olā€™ FTP).

It wonā€™t work if you are deploying using continuous integration, like when your site is linked to a Github repo and is being built from scratch on someoneā€™s (e.g., Netilfyā€™s) server each time you push a new commit to Github.

Why? because the files are created from scratch on every build. So, while it hopefully wonā€™t break anything, the last modify date on all your files will match the date of the last build, which is probably not what you want.

Next.js getStaticProps example (filesystem)

export const getStaticProps = async context => {
  const slug = String(context.params?.slug)
  const filePath = path.join(process.cwd(), `_mdx_/${slug}/index.mdx`)
  const rawContents = fs.readFileSync(filePath, "utf8")

  const { mtime } = fs.statSync(filePath)

  const { content, data: meta } = matter(rawContents)

  const mdxSource = await serialize(content, {
    scope: meta,
  })

  return {
    props: {
      source: mdxSource,
      slug,
      content,
      meta: {
        ...meta,
        // Next.js is throwing error unless I serialize the date to JSON
        dateLastModified: JSON.stringify(mtime.toString()),
      },
    },
  }
}

Use Git

There are good news though ā€“ if you are deploying using continuous integration, then you most likely track your files with Git. And Git actually stores a lot of metadata about the committed files.

Using git log command, we can obtain two meta properties related to the last modify date: author date and committer date. The ā€œauthor dateā€ is the original date of the commit, while the ā€œcommitter dateā€ is the date of the last commitā€™sā€™ edit (e.g., using --amend or rebase). In the examples, Iā€™ll be using the ā€œauthor date.ā€

But how do you obtain it in Node.js? There is a built-in library child_process, which enables us to spawn a separate process that can run any system command. For simplicityā€™s sake, we will use its synchronous method execSync to capture the return value of the parameterized git log command. The method will return a Buffer, so weā€™ll need to convert it to a date string using toString() method:

import { execSync } from "child_process"

const lastAuthorDate = execSync(
  `git log -1 --pretty=format:%aI -- <PATH>/<TO>/<FILE>.mdx`,
).toString()

Caveats(?)

There are just a few minor ones that I know of. We need to keep in mind that Git doesnā€™t track the moment edits are saved, but rather the moment they are committed. Which is not necessarily bad, just different from the filesystem way to track updates. (Once I was baffled by the funny dates of my post edits, then I realized that I just kinda forget about the need to commit them.)

The second difference I can think of is in the way Git tracks file path changes. When renaming or moving a file, youā€™ll have to make a commit, and the commit date will appear as the last modification date, which is not optimal (the present-day filesystems, AFAIK, would ignore such a name and/or path change).

We can avoid this, but itā€™ll take some effort. Git, among other things, saves the brief commit status of a file (A for added, M for modified, R for renamed/moved; see the full docs). Therefore, if we log the status among the other stats, we can easily filter out those starting with R:

import { execSync } from "child_process"

const allAuthorDates = execSync(
  `git log --follow --name-status --pretty=format:%aI -- <PATH>/<TO>/<FILE>.mdx`,
).toString()

/*
 *  This command will give us something like the following commit list:
 *  (commits with R100 status are merely renames/moves ā€“ no content updates)
 *
 *  2021-12-07T16:18:59+01:00
 *  R100    old.test.md  test.md
 *
 *  2021-12-07T15:37:15+01:00
 *  M       old.test.md
 *
 *  2021-10-07T10:30:11+01:00
 *  A       old.test.md
 */

/*
 *  Match the first date that is NOT followed by a line starting with R (for rename/move).
 *  The last commit from example above will be ignored.
 */
const [lastEditExceptPathChangeDate] = allAuthorDates.match(/20[\d-T:.Z+]+$(?!\r?\nR)/m) || [] // 2021-12-07T15:37:15+01:00

Next.js getStaticProps example (Git)

export const getStaticProps = async context => {
  const slug = String(context.params?.slug)
  const filePath = path.join(process.cwd(), `_mdx_/${slug}/index.mdx`)
  const rawContents = fs.readFileSync(filePath, "utf8")

  const allAuthorDates = execSync(
    `git log --follow --name-status --pretty=format:%aI -- ${filePath}`,
  ).toString()

  // Match the first date that is NOT followed by a line starting with R (for rename/move)
  const [lastEditExceptPathChangeDate] = allAuthorDates.match(/20[\d-T:.Z+]+$(?!\r?\nR)/m) || []

  const { content, data: meta } = matter(rawContents)

  const mdxSource = await serialize(content, {
    scope: meta,
  })

  return {
    props: {
      source: mdxSource,
      slug,
      content,
      meta: {
        ...meta,
        dateLastModified: lastEditExceptPathChangeDate.toString(),
      },
    },
  }
}

šŸ”” Update: A year later

While the above can work fine, it surely did not withstand the migration of my site from Next.js to Astro. The issue is that there is a fine chance that some non-content-related updates will become necessary, e.g., updating imported libraries. Yeah, it can be fine with pure clear markdown, but I want to be fancy and use MDX. So I have to change some meta-stuff, and I lost track of the content updates.

So is there some hope?

There is. But you canā€™t rely fully on Git; it needs a hint to determine which changes were content-related and which werenā€™t. Lucky for us, there is a place we can put such a hint: the commit message.

Letā€™s say we add the ā€œEdit contentā€ key phrase to every commit that is content related. Then, we can acquire the commit message like this:

git log --follow --name-status --pretty=\"format:%aI|%s\" -- <OUR_FILE>

This will give us the date as before, but with a commit message delimited by the pipe ā€™|ā€™ character. As before, we can use regular expressions to parse the data. The nasty regex in the snippet below will match everything thatā€™s a date followed by a pipe followed by a string that starts with ā€œEdit contentā€, followed by a new line that doesnā€™t start with ā€œRā€:

// Using this, we'll get our date in the second capture group:
const [_, lastContentEditDate] = allAuthorDates.match(/(20[\d\-T:.Z+]+)\|(Edit content?.+)$(?!\r?\nR)/im) || []

Probably not bulletproof, but for a blog? Good enough.

Appendix: What about the date of publishing?

Honestly, I am tracking it manually (when Iā€™m done writing, Iā€™d update the frontmatter). One of the reasons is that Iā€™ve migrated my older blog posts to MDX, so neither the file creation dates nor the commit dates match the genuine birth dates.

But can it be done programmatically?

If you are deploying the already-built static site, you can use the filesystem meta: you get it as a ctime using Node.js. But beware: the date when the file was created can vastly differ from the actual date of publishing (e.g., a long post that takes two weeks to finish).

So how about Git? Well, it can be done, but it is not safe. According to my research, there is no reliable way to determine the date of the first file commit after renaming or moving the original file. See answers in this SO thread.

I made some tests. Once, I lost a file history as soon as I renamed a text file and added a question mark to its ā€œHello World!ā€ content in single commit. My blogā€™s post history, on the other hand, has sustained through the path updates. Generally, if the content is big enough (how big exactly? ), Git should be able to track the file path updates, but itā€™s not guaranteed.

If youā€™re willing to take a chance, you can get the first commit date as follows:

// āœ‹ WARNING: read the above before copy & paste!
import { execSync } from "child_process"

const firstAutorDate = execSync(
  git log --pretty=format:%aI --reverse -- <PATH>/<TO>/<FILE>.mdx | head -1
)

šŸ‘ Enjoy!

last modified on 28th May 2023


If you find anything in this post that should be improved (either factually or in language), feel free to edit it on Github .