node: async filtering

2022-01-30

 | 

~5 min read

 | 

966 words

Preamble

I was recently working on porting my blog to Remix. As part of this, I needed to parse my markdown files that constitute all of the posts.

The folder structure I use is very flat, it’s just a directory with a long list of files in it. All of the other data is stored in the frontmatter of posts.

That’s actually not quite true, I have one subdirectory to store some assets temporarily.

Why is that important? Because if you don’t check to confirm a path is a file / directory, it’s easy to throw an exception and kill the process.

Let’s paint a picture. Imagine the directory looks a bit like this:

drwxr-xr-x    4 stephen.weiss  staff     128 Jan 10 20:06 temp
-rw-r--r--    1 stephen.weiss  staff     275 Jan 30 08:07 blogpost1.md
-rw-r--r--    1 stephen.weiss  staff      91 Jan 10 19:59 blogpost2.md
...

One directory. Lots of files.

Initial Implementation

Now, let’s try to read them and filter out the directories.

import fs from "fs/promises"
const dir = await fs.readdir(postsPath).then((paths) =>
  paths.filter(async (pathName: string) => {
    const fullPath = path.join(postsPath, pathName)
    const isFile = (await fs.lstat(fullPath)).isFile()
    if (!isFile) console.log(`initial filter`, { isFile, fullPath })
    return isFile
  }),
)

console.log(dir) // ['temp','blogpost1','blogpost2']

Hmm! That’s not what we wanted, why is that?

Well, we’re trying to use the filter prototype method, but that method requires a predicate that returns a boolean. We’re returning a promise that resolves to a boolean.

This is easier to see if we tease this apart into multiple pieces:

const filter = async (filePath): Promise<boolean> => {
  const fullPath = path.join(postsPath, filePath)
  return (await fs.stat(fullPath)).isFile()
}

const dir = await fs.readdir(postsPath)
const filtered = Promise.all(await dir.filter(filter))
return await filtered

A Promise is truthy, so the filter sees no reason to exclude the non-files.

Initial Solution

Okay, so if the prototype method cannot act on promises, what options do we have?

In my case, I want to actually remove the element from the list, so reduce feels like a good option:

const filter = async (filePath) => {
  const fullPath = path.join(postsPath, filePath)
  return (await fs.stat(fullPath)).isFile()
}

const dir = await fs.readdir(postsPath)
const filtered = await dir.reduce(
  async (acc, cur) => ((await filter(cur)) ? [...(await acc), cur] : acc),
  [],
)
return await filtered

The money line is:

const filtered = await dir.reduce(
  async (acc, cur) => ((await filter(cur)) ? [...(await acc), cur] : acc),
  [],
)

Note that we are first awaiting the result of our filter function - a promise that resolves to a boolean value. Then, if that resolves to true, we will hit the first branch of the ternary: [...(await acc), cur]. We need the await here because each iteration of the reduce returns a promise.

async/await provides syntatic sugar for the promise chain, but in the background, we have a promise chain being built up with each pass through.

Typescript Caveats

Unfortunately, when I tried this in an actual project, my Typescript linter started screaming.

Two separate issues:

  1. The returned type is assumed to be a string when I very much intened this to be a promise of a list of strings (the promise is ignored because of the use of await).
  2. The function itself seems to be having type problems.

What to do?

Instead of trying to use the prototypal methods on the array, we can create our own map and filter functions that are designed to be asynchronous.

Let’s start with the filterAsync since that will be our entry point:

filterAsync.ts
async function filterAsync<T>(
  array: T[],
  callbackfn: (value: T, index: number, array: T[]) => Promise<boolean>,
): Promise<T[]> {
  const filterMap = await mapAsync(array, callbackfn)
  return array.filter((value, index) => filterMap[index])
}

The method takes an array of type T and returns an array of type T. The second argument is a callback that is presumed to be a deferred predicate.

The cool part about this is how that works with the mapAsync function:

mapAsync.ts
function mapAsync<T, U>(
  array: T[],
  callbackfn: (value: T, index: number, array: T[]) => Promise<U>,
): Promise<U[]> {
  return Promise.all(array.map(callbackfn))
}

The map function transforms the array of type T into one of type U (this is a standard idea in map functions since the whole point is manipulating each element, we wouldn’t expect the types to match).

In our case, since we’re passing in a callback that results in a boolean, when we get to the Promise.all line, we have a list of values that will all resolve to true or false.

Because this has been resolved before we run the .filter method back in the filterAsync we can refer to the index to get the value of the predicate.

Stepping through one piece at a time with pseudocode:

const arr = ['dir', 'post1', 'post2']
const mappedArr = [Promise<isFile('dir')>, Promise<isFile('post1')>, Promise<isFile('post2')>]
const resolvedMap = [false, true, true]

So, when we get to the filter, we’re asking questions like, “for the first element, at index 0, if we look at the resolvedMap, is it true or false?”

The point is that the predicate has already been calculated, and we’re now looking things up based on index.

This approach does have the drawback that we’re doubling the amount of space needed, which might matter if the lists are really big, though not a concern in my case.

Wrap up

I put together a tiny example repo here, with this diff showing how I refactored a filter that wants to use an async function.

Additional resources are this article on advanced web covering different approaches and this stack overflow question about filtering arrays with async functions.

This is also similar in nature to what I was writing about in Javascript: Awaiting Asynchronous Operations on Lists (Arrays).



Hi there and thanks for reading! My name's Stephen. I live in Chicago with my wife, Kate, and dog, Finn. Want more? See about and get in touch!