Tracking the "last modified" date in markdown
- javascript
- markdown
- git
Iāve migrated my blog completely to Markdown (MDX) after I realized that itās probably the best match for my workflow. So far, Iām quite happy with the decision. However, I was missing a few features that typically come out of the box when using a conventional CMS, like tracking the date of the last edit.
I will go through a couple of possible solutions to this problem.
Use a Markdown CMS
If you are using VSCode, there is the vscode-front-matter plugin that claims to āgive you the power and control of a full-blown CMS running straight in the editor.ā
I tested it briefly, and it looks cool. It has some issues, e.g., it doesnāt seem to fully support MDX at the time (although this is probably being fixed recently).
While I would probably recommend this plugin to a friend if asked, it was not the solution I chose. Why?
Firstly, I like the pristine experience of creating posts solely by editing text files (just like the elders around the campfires remember creating web pages using nothing but HTML and CSS).
Secondly, I didnāt want to become dependent on another tool (however cool it could be).
Use filesystem metadata
My first, rather naĆÆve, attempt was to use metadata that are accessible from the filesystem.
But there is an issue: while some metadata do live in the file itself (like image EXIF data), the main info (including the last edit) is stored in the filesystem itself. Thatās why it is available even when the file is completely empty (its length is exactly 0
bytes).
(I donāt want to dive too deep into details, because I am nowhere near an expert on this topic.)
The fs
built-in library of Node.js has multiple methods to access file metadata. For the sake of simplicity, I will only refer to the synchronous api method, fs.statSync
.
This method takes a path and returns a fs.Stats
instance. So, to get the last edit date, you can do something like this:
import * as fs from "fs" const filePath = path.join(process.cwd(), `<PATH>/<TO>/<FILE>.mdx`) const { mtime } = fs.statSync(filePath)
Caveats
Using the edit date from file system can be helpful, but has a quite limited use case.
It can be used if you are directly deploying the static build, like when you build the site using Next.js or Gatsby and then deploy the contents of an out
or public
dir using Netlifyās dragānādrop feature (or good olā FTP).
It wonāt work if you are deploying using continuous integration, like when your site is linked to a Github repo and is being built from scratch on someoneās (e.g., Netilfyās) server each time you push a new commit to Github.
Why? because the files are created from scratch on every build. So, while it hopefully wonāt break anything, the last modify date on all your files will match the date of the last build, which is probably not what you want.
Next.js getStaticProps
example (filesystem)
export const getStaticProps = async context => { const slug = String(context.params?.slug) const filePath = path.join(process.cwd(), `_mdx_/${slug}/index.mdx`) const rawContents = fs.readFileSync(filePath, "utf8") const { mtime } = fs.statSync(filePath) const { content, data: meta } = matter(rawContents) const mdxSource = await serialize(content, { scope: meta, }) return { props: { source: mdxSource, slug, content, meta: { ...meta, // Next.js is throwing error unless I serialize the date to JSON dateLastModified: JSON.stringify(mtime.toString()), }, }, } }
Use Git
There are good news though ā if you are deploying using continuous integration, then you most likely track your files with Git. And Git actually stores a lot of metadata about the committed files.
Using git log
command, we can obtain two meta properties related to the last modify date: author date and committer date. The āauthor dateā is the original date of the commit, while the ācommitter dateā is the date of the last commitāsā edit (e.g., using --amend
or rebase
). In the examples, Iāll be using the āauthor date.ā
But how do you obtain it in Node.js? There is a built-in library child_process
, which enables us to spawn a separate process that can run any system command. For simplicityās sake, we will use its synchronous method execSync
to capture the return value of the parameterized git log
command. The method will return a Buffer
, so weāll need to convert it to a date string using toString()
method:
import { execSync } from "child_process" const lastAuthorDate = execSync( `git log -1 --pretty=format:%aI -- <PATH>/<TO>/<FILE>.mdx`, ).toString()
Caveats(?)
There are just a few minor ones that I know of. We need to keep in mind that Git doesnāt track the moment edits are saved, but rather the moment they are committed. Which is not necessarily bad, just different from the filesystem way to track updates. (Once I was baffled by the funny dates of my post edits, then I realized that I just kinda forget about the need to commit them.)
The second difference I can think of is in the way Git tracks file path changes. When renaming or moving a file, youāll have to make a commit, and the commit date will appear as the last modification date, which is not optimal (the present-day filesystems, AFAIK, would ignore such a name and/or path change).
We can avoid this, but itāll take some effort. Git, among other things, saves the brief commit status of a file (A for added, M for modified, R for renamed/moved; see the full docs). Therefore, if we log the status among the other stats, we can easily filter out those starting with R
:
import { execSync } from "child_process" const allAuthorDates = execSync( `git log --follow --name-status --pretty=format:%aI -- <PATH>/<TO>/<FILE>.mdx`, ).toString() /* * This command will give us something like the following commit list: * (commits with R100 status are merely renames/moves ā no content updates) * * 2021-12-07T16:18:59+01:00 * R100 old.test.md test.md * * 2021-12-07T15:37:15+01:00 * M old.test.md * * 2021-10-07T10:30:11+01:00 * A old.test.md */ /* * Match the first date that is NOT followed by a line starting with R (for rename/move). * The last commit from example above will be ignored. */ const [lastEditExceptPathChangeDate] = allAuthorDates.match(/20[\d-T:.Z+]+$(?!\r?\nR)/m) || [] // 2021-12-07T15:37:15+01:00
Next.js getStaticProps
example (Git)
export const getStaticProps = async context => { const slug = String(context.params?.slug) const filePath = path.join(process.cwd(), `_mdx_/${slug}/index.mdx`) const rawContents = fs.readFileSync(filePath, "utf8") const allAuthorDates = execSync( `git log --follow --name-status --pretty=format:%aI -- ${filePath}`, ).toString() // Match the first date that is NOT followed by a line starting with R (for rename/move) const [lastEditExceptPathChangeDate] = allAuthorDates.match(/20[\d-T:.Z+]+$(?!\r?\nR)/m) || [] const { content, data: meta } = matter(rawContents) const mdxSource = await serialize(content, { scope: meta, }) return { props: { source: mdxSource, slug, content, meta: { ...meta, dateLastModified: lastEditExceptPathChangeDate.toString(), }, }, } }
š Update: A year later
While the above can work fine, it surely did not withstand the migration of my site from Next.js to Astro. The issue is that there is a fine chance that some non-content-related updates will become necessary, e.g., updating imported libraries. Yeah, it can be fine with pure clear markdown, but I want to be fancy and use MDX. So I have to change some meta-stuff, and I lost track of the content updates.
So is there some hope?
There is. But you canāt rely fully on Git; it needs a hint to determine which changes were content-related and which werenāt. Lucky for us, there is a place we can put such a hint: the commit message.
Letās say we add the āEdit contentā key phrase to every commit that is content related. Then, we can acquire the commit message like this:
git log --follow --name-status --pretty=\"format:%aI|%s\" -- <OUR_FILE>
This will give us the date as before, but with a commit message delimited by the pipe ā|ā character. As before, we can use regular expressions to parse the data. The nasty regex in the snippet below will match everything thatās a date followed by a pipe followed by a string that starts with āEdit contentā, followed by a new line that doesnāt start with āRā:
// Using this, we'll get our date in the second capture group: const [_, lastContentEditDate] = allAuthorDates.match(/(20[\d\-T:.Z+]+)\|(Edit content?.+)$(?!\r?\nR)/im) || []
Probably not bulletproof, but for a blog? Good enough.
Appendix: What about the date of publishing?
Honestly, I am tracking it manually (when Iām done writing, Iād update the frontmatter). One of the reasons is that Iāve migrated my older blog posts to MDX, so neither the file creation dates nor the commit dates match the genuine birth dates.
But can it be done programmatically?
If you are deploying the already-built static site, you can use the filesystem meta: you get it as a ctime
using Node.js. But beware: the date when the file was created can vastly differ from the actual date of publishing (e.g., a long post that takes two weeks to finish).
So how about Git? Well, it can be done, but it is not safe. According to my research, there is no reliable way to determine the date of the first file commit after renaming or moving the original file. See answers in this SO thread.
I made some tests. Once, I lost a file history as soon as I renamed a text file and added a question mark to its āHello World!ā content in single commit. My blogās post history, on the other hand, has sustained through the path updates. Generally, if the content is big enough (how big exactly? ), Git should be able to track the file path updates, but itās not guaranteed.
If youāre willing to take a chance, you can get the first commit date as follows:
// ā WARNING: read the above before copy & paste! import { execSync } from "child_process" const firstAutorDate = execSync( git log --pretty=format:%aI --reverse -- <PATH>/<TO>/<FILE>.mdx | head -1 )
š Enjoy!
last modified
If you find anything in this post that should be improved (either factually or in language), feel free to edit it on Github .