Graham Marlow

Pulling Puzzles from Lichess

03 Feb, 2025 til

Lichess is an awesome website, made even more awesome by the fact that it is free and open source. Perhaps lesser known is that the entire Lichess puzzle database is available for free download under the Creative Commons CC0 license. Every puzzle that you normally find under lichess.org/training is available for your perusal.

This is a quick guide for pulling that CSV and seeding a SQLite database so you can do something cool with it. You will need zstd.

First, wget the file from Lichess.org open database and save it into a temporary directory. Run zstd to uncompress it into a CSV that we can read via Ruby.

wget https://database.lichess.org/lichess_db_puzzle.csv.zst -P tmp/
zstd -d tmp/lichess_db_puzzle.csv.zst

CSV pulled down and uncompressed, it's time to read it into the application. I'm using Ruby on Rails, so I generate a database model like so:

bin/rails g model Puzzle \
  puzzle_id:string fen:string moves:string rating:integer \
  rating_deviation:integer popularity:integer nb_plays:integer \
  themes:string game_url:string opening_tags:string

Which creates the following migration:

class CreatePuzzles < ActiveRecord::Migration
  def change
    create_table :puzzles do |t|
      t.string :puzzle_id
      t.string :fen
      t.string :moves
      t.integer :rating
      t.integer :rating_deviation
      t.integer :popularity
      t.integer :nb_plays
      t.string :themes
      t.string :game_url
      t.string :opening_tags

      t.timestamps
    end
  end
end

A separate seed script pulls items from the CSV and bulk-inserts them into SQLite. I have the following in my db/seeds.rb, with a few omitted additions that check whether or not the puzzles have already been migrated.

csv_path = Rails.root.join("tmp", "lichess_db_puzzle.csv")
raise "CSV not found" unless File.exist?(csv_path)

buffer = []
buffer_size = 500
flush = ->() do
  Puzzle.insert_all(buffer)
  buffer.clear
end

CSV.foreach(csv_path, headers: true) do |row|
  buffer << {
    puzzle_id: row["PuzzleId"],
    fen: row["FEN"],
    moves: row["Moves"],
    rating: row["Rating"],
    rating_deviation: row["RatingDeviation"],
    popularity: row["Popularity"],
    nb_plays: row["NbPlays"],
    themes: row["Themes"],
    game_url: row["GameUrl"],
    opening_tags: row["OpeningTags"]
  }

  if buffer.count >= buffer_size
    flush.()
  end
end

flush.()

And with that you have the entire Lichess puzzle database available at your fingertips. The whole process takes less than a minute.

Puzzle.where("rating < 1700").count
# => 3035233

Logseq Has Perfected Note Organization

01 Feb, 2025 blog

A little while ago Apple Notes left me with quite the scare. I booted up the app to jot down an idea and found my entire collection of notes erased. I re-synced iCloud, nothing. Just the blank welcome screen.

Luckily my notes were still backed up to iCloud, even though they weren't displaying in the app (I checked via the web interface). After 40 minutes of debugging and toggling a series of obtuse settings, my notes were back on my phone. Yet the burn remained.

Since then I've been looking at alternatives for my long-term document/note storage. Apple Notes was never meant to be a formal archive of my written work, it just came out that way due to laziness in moving my notes somewhere permanent. I investigated the usual suspects: Notion, Obsidian, Bear, Org mode, good ol' git and markdown. Nothing stuck. Then I found Logseq and was immediately smitten.

The truth is, I don't actually use Logseq. I use Obsidian. You see, Logseq is a outliner. Every piece of text is attached to some kind of bulleted list, whether you're writing a code sample or attaching an image. Bulleted lists are great for notes, but not so great for blog posts or longform writing. I need a tool that can easily handle standard markdown for this blog, for example.

But despite not actually using Logseq, I've structured my Obsidian identically to Logseq. The Logseq method of organization is just so good. Everything boils down to three folders:

  • journal/: the place for daily notes.
  • pages/: high-level concepts that link between other pages or entries from the journal.
  • assets/: storage for images pasted from clipboard.

That's it! Just three folders, each containing a ton of flat files. All of my actual writing happens in journal pages, titled with the current day in YYYY-MM-DD format. I never need to think about file organization, nor do I struggle to find information.

Looking at a long list of YYYY-MM-DD files sounds difficult to navigate, but the key is that they're tagged with links to relevant pages (like [[disco-elysium]]) that attach the journal entry to a concept. When I want to view my notes on a concept, I navigate to the concept page (disco-elysium) and read through the linked mentions. I don't need to worry about placing a particular thought in a particular place because the link doesn't care.

I got hooked on this workflow because Logseq is incredible at linked mentions. Just take a look at this example page:

Logseq linked mentions example

All of the linked mentions (journal entries containing the tag [[disco-elysium]]) are directly embedded into the concept page. Logseq will even embed images, code samples, to-do items, you name it. It works incredibly well.

The Obsidian equivalent isn't quite as nice, but it gets the job done. Obsidian mentions are briefer, lack context, and stripped of formatting:

Obsidian linked mentions example

The flip-side is that I don't need to write notes in an outline form and can more easily handle moving my Obsidian notes into plain markdown files for my blog.

If you're like me and you want to use Logseq-style features in Obsidian there are a few configuration settings that are worth knowing about:

  • In your Core plugins/Daily notes settings, set the New file location to journal/ and turn on "Open daily note on startup".
  • In Core plugins/Backlinks, toggle "Show backlinks at the bottom of notes".
  • In Files and links, set the "Default location for new attachments" path to assets/.

These three settings changes will get you most of the way there. That said, before messing with those settings I encourage you to give Logseq a try. It's free and open source, it's built in Clojure, and it has an excellent community forum. Although I don't use it for my longform/personal writing, I use it at work where outlining fits my workflow better.

Paper Puzzle Remixes

11 Jan, 2025 blog, puzzles, gamedev

The holidays are always a great time for puzzles. My parents still receive print newspapers, offering an ideal opportunity to catch up on crosswords. This year I also picked up NYT's Puzzle Mania, a treasure-trove of paper puzzle goodness. Just a few days ago my partner and I finished the whopping 50x50 crossword puzzle. That's over 1000 clues!

What struck me as especially interesting with Puzzle Mania were the paper remixes of the popular "-dles":[1] Wordle, Spelling Bee, and Connections. Each remix tweaks the digital puzzle form so that it suits a printed medium, changing a few mechanics but keeping the puzzle evocative of its original design. Puzzmo did something similar with their Crossword Vol. 1, offering print versions of Really Bad Chess and Flipart.

In fact, when Puzzmo soft-launched they sent out beta invites via physical postcards to your address. Solve the puzzle on the postcard to unlock your way into the app.

Puzzmo beta invite postcards

The first and third pictured are remixes of Zach Gage games: Typeshift and Really Bad Chess. Typeshift is the more interesting of the two, since the digital version relies on a clever sliding interface to differentiate the game from a simple wordsearch. Adapting the game to print means the player can no longer find words by randomly moving the slider up and down.[2] It also means lowering the number of possible words to simplify the search.

I think the popularity of "-dle" puzzle games, the kind of daily games that one finds on NYT and Puzzmo, have to do with their resemblance to newspaper puzzles. They're short and snackable, perfect while waiting for coffee to brew. They're also crunchy enough that the player makes observable improvements over a long period of time, often in the form of a solving streak.

However, despite that resemblance there's a design tension that arises when adapting a digital puzzle into a print puzzle. What kinds of mechanics are translatable and why? How do the designers behind games like Flipart approach print adaptation of their digital games?

Zach Gage (creator of Flipart) gives some insight into the process in the Crossword Vol. 1 collection:

When we first started thinking about what kinds of puzzles we could make in print, we felt like Flipart was one Puzzmo game that truly could not work on paper. It was friend and fellow game designer JW Nijman who suggested a grid with embedded shapes that players would have to draw corresponding shapes on top of. [...] I didn't want players to have to do shape rotation in their heads (this is tough for many people!), so I brought JW's idea to Jack...

I recommend playing through a game of Flipart to get a sense of the difficulties Zach alludes to in this quote. A game of Flipart only takes tens of seconds. It's borderline instinctual; the ocular faculties take control as shapes rotate to avoid overlapping.

In contrast, the print version of the game is slow and methodical. Rotation is removed in favor of drawing the shape as-is. The fundamental constraint is drawing the shape in the grid such that the drawn shape contains the square that originally depicted it. Shapes cannot rotate and drawings cannot overlap. Print Flipart is much more of a logic puzzle.[3]

The first four print Flipart puzzles

Both the digital and print forms of Flipart play to the strengths of their medium. The digital form takes advantage of the fact that the computer can trivially render shapes in different rotations, something that's incredibly difficult for the human mind (and tedious to draw). The print form remains evocative of the digital, but ditches rotation in favor of something easier to both conceptualize and draw.

Converting a digital puzzle to a print puzzle is an interesting exercise. What can we learn from the process? A few rules come to mind:

  • Keep state simple. Unlike their digital counterparts, print puzzles cannot represent game state that often changes or changes in unintuitive ways (like rotations in Flipart). The best print puzzles have the player fill in the game state as they progress, e.g. letters in crosswords, numbers in sudoku, and shapes in print Flipart.

  • Complicated rule evaluations are a better fit for digital puzzles. Chess puzzles often feel more like an academic exercise than a casual puzzle, as the player must not only think about their own optimal move, but also the optimal response from their opponent. A puzzle that requires multiple back-and-forth turns quickly balloons into an overwhelming number of possibilities.

  • Rethink UI affordances. On the web, Typeshift uses a vertical slider to add extra flavor to the puzzle-solving experience. On paper, implementing a vertical slider is impossible. To compensate, the overall complexity of the puzzle is reduced.

  • Grids make for great playgrounds. I don't think it's a coincidence that crosswords and sudokus are confined to a grid. The grid is satisfying to fill and clearly denotes progress. It also provides a natural place to store game state.

I also want to shout-out a fantastic game that released last year: LOK Digital. It's relevant to this whole conversation because it actually goes in the reverse direction, adapting a print puzzle into a digital form. Because the rules of LOK are heavily reliant on rules evaluation, I personally think the digital adaptation is the way to go. It makes the overall experience quite a bit more enjoyable.


  1. Not my favorite term, but an apt description of the genre after the popularity of Wordle. ↩︎

  2. Not like I've done that before, obviously. ↩︎

  3. The print Flipart puzzles are surprisingly similar to the tetris puzzles from The Witness. ↩︎

Best of 2024

29 Dec, 2024 blog

I'm always surprised when distilling a year into a single post just how many things take place over those 365 days. When I'm in the thick of it I'm rarely thinking about the details. Events and projects come and go, rarely do I take a step back and properly register their impact or my feelings. So forgive me a moment of catharsis.

Game development

I made a game! It's a little game, but I'm proud of it. It received second place in a game jam and I think it's pretty good (only 25-ish entries in the jam so reign in the enthusiasm). At the very least, it contains my current best attempt at level design. Play it for free: Kat's Ghost.

Unsurprisingly the game is a block-pushing puzzle game similar to Sokoban. I say unsurprisingly because the Sokoban-like has been one of my favorite subgenres of puzzle games ever since Stephen's Sausage Roll (which I haven't even finished because it's devilishly hard). The Sokoban-like is the platonic ideal of a puzzle game: all logic, simple controls, simple constraints.

I also got into crossword construction this year, releasing two midi-sized American-style crosswords. Both of which are Dungeons and Dragons themed:

I tried (and failed) to get the first of those puzzles accepted into Puzzmo during their open submission period. Here's hoping my next submission does better.

2024 was a big year for puzzles. The availability of free online puzzle games like Minute Cryptic, Blockables, and the mainstays of Puzzmo or NYT have made puzzle-solving a daily exercise. We're living in the golden ages of snackable puzzle games. My morning routine has suffered.

This year also marks the release of Braid Anniversary Edition, released 16 years after the original. It includes the most in-depth commentary I've ever seen for a video game, talking game design, programming, art, and music. It offers a ton of wisdom and has inspired me to create. It's also just a phenomenal game.

Start (and end) Emacs

Late 2023 and early 2024 I spent quite a bit of time on Crafted Emacs with the goal of helping folks get started with Emacs. I've always felt that most of the starter kits pack too much extra stuff into the base Emacs installation, making for a very complicated or cumbersome first experience. Ditto for distributions like DOOM or Spacemacs that effectively hijack the built-in Emacs configuration tools in favor of custom ones (e.g. layers). Crafted Emacs felt like a nice, intermediate step.

That said, there was still something about Crafted Emacs that prevented me from recommending it to folks that were interested in switching to Emacs. For one, the README is that particular breed of verbosity that old-school Emacs hackers are so fond of. Heavy on the philosophy, light on the examples. For two, the module system is just inherently complicated. I really wanted to push new Emacs users towards a single-file configuration, just like how I started.

And so I created Start Emacs. It's basically just a "better defaults" setup for Emacs with some packages that align the Emacs and VSCode experience. I'm particularly happy with the extension guide guide, which moves a lot of the optional configuration into a handful of recipes.

During the making of Start Emacs I moved back to Windows as my primary dev machine and was absolutely hating the experience. Emacs mostly worked, but mainstays like Magit were horribly slow and many packages assumed access to standard Linux utilities like diff or grep. I spent so many hours messing around with different Windows development kits (MSYS2, w64devkit, etc.) but couldn't find something I was happy with. Finally I gave up and swapped over to WSL.

This period of Windows hacking had me switching back and forth a few different text editors while I troubleshooted Emacs, finally motivating me to try out Helix. The vim-ish keybindings definitely threw me for a loop, sitting in that awkward area of close enough to vim that it feels familiar, yet far enough away that I'm constantly invoking the wrong commands. But after I garnered enough experience with it I grew to like it so much that I started questioning my motivations. Why am I spending so much time setting up Emacs when I have a capable editor already working?

I switched and haven't looked back.

I've tried writing a blog post about my new setup but I can't motivate myself because it's so banal. I use Helix for editing text, tmux to manage terminal windows (which works excellent in the Windows Terminal, surprisingly), and have replaced all of my usual Emacs power features with CLI tools like ripgrep or Awk. I'm probably not as productive since I still lack familiarity with my tools, but I've really been enjoying leveraging a console workflow instead of relying on a GUI editor.

Am I done with Emacs? Probably. Do I still think Emacs is a great tool? Absolutely! Don't let my experience dissuade you from trying it out.

Ruby on Rails

This year felt like a great one for Ruby on Rails. The release of Rails 8 brings a bunch of awesome improvements, including built-in authentication, full-stack SQLite, and zero-build frontend development. Folks are talking about Rails again and they're doing so with a ton of enthusiasm.

Coincidentally all of this Rails enthusiasm lines up with a job change for myself, taking on a new role that does a lot more traditional Rails development. I'm thankful that I have the opportunity to work with Ruby everyday.

That said, I've never worked at a Rails shop that actually used Rails for the frontend. Every single app that I've worked on professionally with Rails has been an Rails JSON API paired with a SPA frontend, usually React. With SSR making a big comeback this year (thanks to Hotwire, HTMX, among others) I'm eager to dive into the new suite of Rails tools.

Books

This year continues a reading trend from the past few years: an exploration into Japanese literature through Haruki Murakami. Since then I've expanded to another Japanese-borne author, Kazuo Ishiguro, and am dabbling in the works of Yukio Mishima. But Murakami still reigns as my most-read author for the third year in a row.

He's especially notable this year thanks to the release of The City and Its Uncertain Walls in November. Let's just say the Murakami excitement was high.

Here are some of my reading highlights for this year:

  • The City and Its Uncertain Walls by Haruki Murakami. I just finished this one last week so it's fresh in my memory. I was surprised at how much of this book rehashes content from Hard-boiled Wonderland, with the exploration of consciousness as a town surrounded by a wall. Despite that, I enjoyed the deeper exploration into the shadow-self. "My real self isn't here. It's somewhere else. The me that's here looks like me, but is nothing more than a shadow projected onto the ground and walls..." Quite a few aspects of this novel parallel 1Q84, particularly the protagonist who searches for a long-lost love that rules his heart. The City and Its Uncertain Walls is an exploration of the self and how it relates to the world around us.

  • Anathem by Neal Stephenson. I've seen the name Neal Stephenson on many a massive tome at my local bookstore but haven't read any until this year. Now I'm hooked. Anathem is a slow novel in every category, but its exploration of philosophical topics is thorough and endlessly interesting for a layperson like me. Underpinning the novel is an exploration of realism and nominalism, depicted through manufactured names created for the world of Anathem. Just don't come to Anathem looking for plot.

  • 1Q84 by Haruki Murakami. It's long, ponderous, and contains one too many Proust references, but aspects of the work feel cohesive in a way that Murakami's other novels don't. I'm also a sucker for a story about a writer. I am not prepared for a literary analysis of 1Q84 though, I was mostly sailing on vibes.

  • Never Let Me Go by Kazuo Ishiguro. I was introduced to Ishiguro from his most latest novel, Klara and the Sun, which I found to be an enjoyable exploration of empathy, if a bit superficial on the Sci-Fi implications of an Android protagonist. Never Let Me Go has similar themes but delivers on them more successfully. But man, is this book a bummer. Where Klara and the Sun is light and forgiving, Never Let Me Go is oppressive and unyielding.

I also wanted to shout-out The Awk Programming Language which had a second edition release late last year that I finished in February. It's unexpectedly one of the best programming books that I've read recently for a language that I had no prior experience with. I bought the book expecting perl-ish one-liners for simple problems, but stayed for its profound analysis of DSLs and Awk as a toolkit for building them. Incredible stuff. These days I have too much enjoyment searching for problems that I can solve using little Awk scripts.

Movies

Over the last couple years I've met with a group of friends every weekend to discuss a movie that one of us picked. A kind of movie-book-club.

The result has been great. I'm thinking more critically about the media I consume and my relationship to it. I'm exposed to other perspectives that reflect experience I would've never gathered myself. I'm thankful to have the opportunity to meet and talk with others about this kind of stuff.

Notable films that I watched this year:

  • Perfect Days. I would describe this film as a personification of Taoism. It follows the daily ritual of a janitor for The Tokyo Toilet, an artsy urban development project distilled into fancy toilets. The movie is slow and contemplative and well worth the watch.

  • Vertigo. Lately I've been on a little Hitchcock kick, Vertigo being the first of the bunch that I haven't already seen. Unsurprisingly, it's great. It's a bit slow, but the twists are worth it.

  • Evil Does Not Exist. 2021's Drive My Car is one of my favorite films, period. So I went into Evil Does Not Exist with high expectations. Unfortunately this one did not do much for me. There's some allegorical storytelling underpinning this movie, filling in the lines between some light plot elements and nature cinematography. And while that cinematography is gorgeous, I couldn't shake a sense of boredom at the many extended pauses between beats. Normally contemplative movies are a hit for me, but this movie didn't spark any thoughts with its storytelling that were worthy of the thoughtful moments.

  • Autumn Sonata. Speaking of thoughtful moments. Look, Ingmar Bergman makes excellent movies. Autumn Sonata is no exception. There's a scene in this movie that is a slow pan onto the face of Liv Ullmann, broadcasting an entire life's worth of emotions into a mere thirty seconds.

Games

I was so starved for puzzles after beating Braid that I followed it up by playing through all of The Talos Principle and about a forth of the sequel. But neither of those games came out this year, so here's a short list of a few others that sparked my interest.

  • Braid: Anniversary Edition. Already mentioned above. Do yourself a favor and pick it up, both for the game and the commentary.

  • The Rookery. You have to be some kind of Chess sicko to get a kick out of this game, but if you are, it will suck up a ton of your time. It's effectively Chess: the roguelike, but executed incredibly well. It lacks the presentational details of something like Balatro (another great game this year) but still offers a tight gameplay loop.

  • UFO 50. An incredible achievement that is an easy recommendation for anyone remotely interested in game design. There are so many ideas in this game (well, at least 50) that twist well-known game mechanics in compelling ways. When I first heard about this game years ago I thought it was going to be a Warioware-like collection of minigames. Imagine my surprise when almost every one of the 50 games is about the length of an original NES title. The fact that this game was ever finished is an achievement. That it includes so many great games is nothing short of amazing.

  • Animal Well. I have played many metroidvanias over the years but have finished almost none of them. Animal Well is an exception. It wasn't my favorite game to play in 2024 but it was certainly my favorite one to talk about. There was a general sense of excitement around this title that was infectious, helped along by some devilish secrets.

Looking ahead

Not mentioned in this post are a couple months that I spent working on a Chess engine, or other numerous side projects that have been tabled, resumed, and tabled again. I'm thinking a lot about my reading stack, for lack of a better term. I've been noodling on a few ideas for building my own Goodreads alternative that doesn't have any of the AI cruft from Storygraph, focused purely on reading and notetaking. We'll see where it goes.

I'm also attempting to break into the world of longform writing, in the way of nonfiction. In other words, I'm writing a book. Well, several. Most of my attempts have suffered the same fate as the average side project, with myself working furiously until interest wanes, then promptly abandoning the idea.

Eventually one of my many book ideas will make its way into a finished product, and when that happens I hope those of you still reading this post will enjoy the result.

Automating Quick Notes with iOS Shortcuts

24 Dec, 2024 til

I've blogged before about why I really dislike apps like Notion for taking quick notes since they're so slow to open. The very act of opening the app to take said note often takes 10 or more seconds, typically with a whole bunch of JavaScript-inflicted loading states and blank screens. By the time I get to the note, I've already lost my train of thought.

As it turns out, this painpoint is a perfect candidate for the iOS Shortcuts app. I can create an automated workflow that captures my text input instantly but pushes to Notion in the background, allowing me to benefit from Notion's database-like organization but without dealing with the pitiful app performance.

Here's my Shortcut:

Notion Shortcut Workflow

Super simple but it gets the job done.

Solving Puzzles by Making Puzzles

19 Dec, 2024 blog, gamedev

This year I've substantially buffed up my crosswording skills. Mon-Wed on the NYT pose no threat, and I can even occasionally solve the Thu/Fri without checking an answer. Saturday remains befuddling.

One reason for my skill improvement is repetition. The more puzzles I solve, the more I recognize clue patterns and common words. Drill those puzzles frequently enough and skill inevitably trickles in.

In reality, repetition only explains a small sliver of my improvement. The bulk of my newfound skill doesn't come from training crossword puzzles out in the wild, but from making my own.

Building a crossword puzzle requires activating a whole bunch of underused brain wrinkles that remain latent when solving. Thinking of a theme and filling a bunch of words into a grid is just one small part of the equation. How do I measure difficulty so solvers don't get stuck? How do I compromise in a tradeoff between word quality and theme? Why does the software keep suggesting I use Australian birds?

The construction of quality reveals the heart of the puzzle. The very same questions I ask myself when endeavoring to make a good puzzle help reveal the construction of puzzles created by other people. For example, I now come equipped with a backlog of words that appear frequently thanks to their helpful vowels (OPAL, EMU, ERODE, ...). Difficult corners are made easier when I consider that the uncommon words are probably grouped with more common words. Themes are easier to spot now that I have thought of a few of my own.

This same skill applies to other puzzle genres, like the humble block-pushing puzzle game. Building interesting levels is a tough job that requires the constructor to think deeply about the constraints of their game. I don't know about other gamedevs, but I start by fiddling around with a random level layout, paring things back again and again until a single core concept is revealed to be interesting. I take that concept and build three or four levels around it, tutorializing it, expanding it, and remixing it.

This thought process has me thinking about other block-pushing puzzle games in a completely different way. Now when I get stuck on Patrick's Parabox I take a step back and attempt to reverse engineer the mechanic at play. Why did the constructor choose this level layout? What mechanic are they trying to showcase? What am I supposed to take away?

I suppose this same skill applies to programming, in the way of framework design. As a user of React, I may get frustrated at the hook APIs and the design of useEffect. But if I pare back the layers and think about what the framework is fundamentally accomplishing (that is, virtual DOM rendering with a JSX backend) the thought process of re-renders and useEffect dependencies starts to reveal itself. Without going out and building my own virtual DOM framework (something like snabbdom is a great start) it's hard to recognize the tradeoffs.

Will constructing crossword puzzles make you a better developer? Almost certainly not. But it's a ton of fun regardless.

The Most Common React Mistake

10 Dec, 2024 blog

The React homepage promises that "learning React is learning programming" and I think the framework somewhat delivers on it. At the very least you don't need to learn a new templating language thanks to JSX.

That said, don't be completely fooled by this promise. Like every other JavaScript framework, React is full of subtle complexities and esoteric nuances that have nothing to do with the language it's programmed in. In vanilla JavaScript there's no such thing as "the rules of hooks" or the need to avoid mutable variables in favor of useState.[1]

The subject of this post is one such piece of esoteric knowledge that I see newcomers trip up against when learning React (spoilers: it's useEffect). It's a great demonstration of the subtle complexities of React, where the promises of JavaScript-ness meet the reality of framework design.

The problems of syncing async state

A classic point of friction is the introduction of asynchronous code. You have some data from the server and you want to render it in your component to populate the initial values of a form. That last bit is where the bug arises, forms usually use controlled components which hold onto their values via useState calls. Attempting to populate the initial value of useState hooks from asynchronous code inevitably runs into a tricky issue. It's easiest to demonstrate by example.

Here's a simple component that wraps an HTML input and captures its value:

const MyInput = ({ initialText = '' }) => {
  const [text, setText] = useState(initialText)

  const handleChange = (ev) => {
    setText(ev.target.value)
  }

  return <input value={text} onChange={handleChange} />
}

It's a controlled input because the state variable text dictates the value of input. You might render MyInput in the template of a form, like so:

const App = () => {
  return (
    <form>
      <MyInput />
    </form>
  )
}

Perhaps even with an initial value by passing the prop initialText:

const App = () => {
  return (
    <form>
      <MyInput initialText="starting value" />
    </form>
  )
}

This is all fine and dandy. The input correctly initializes with the value of initialText when passing a string and correctly handles user input.

The problem arises when initialText is asynchronous, as is often the case when dealing with forms that are populated with data from a server. For example, introducing a new function getTextFromServer that simulates a 300ms response time:

const getTextFromServer = (ms = 300) =>
  new Promise((resolve) => {
    setTimeout(() => {
      resolve('text from server')
    }, ms)
  })

const App = () => {
  const [asyncText, setAsyncText] = useState('')

  useEffect(() => {
    const fetch = async () => {
      const text = await getTextFromServer()
      setAsyncText(text)
    }

    fetch()
  }, [])

  return (
    <form>
      <MyInput initialText={asyncText} />
    </form>
  )
}

A routine operation in React code: wrap an async fetch call with a useEffect and monitor the async state with useState. However, run this code and you'll find a bug. Can you spot it in the code?

Here's the problem: the initial value of MyInput is never populated with the value of asyncText. It remains blank, even after the getTextFromServer promise resolves.

Naturally the first step is to log out what's going on with initialText. Is the prop not being updated?

const MyInput = ({ initialText }) => {
  console.log(initialText)
  // ...

Here's what you'll see:

""
"text from server"

Well, actually this looks right. On the first render pass, the value is "", the initial value of the useState in the parent. After getTextFromServer responds with the string "text from server", that useState is updated and the child component, MyInput, is re-rendered. It receives the new value of "text from server" from props.

Well then, how come MyInput is blank?

This is where the most common React mistake is introduced. At this point in debugging, a new developer searches for a framework solution to this problem. We just encountered one such solution for handling async state by using useEffect, what if we were to use it again?

const MyInput = ({ initialText = '' }) => {
  const [text, setText] = useState(initialText)

  useEffect(() => {
    setText(initialText)
  }, [initialText])

  const handleChange = (ev) => {
    setText(ev.target.value)
  }

  return <input value={text} onChange={handleChange} />
}

Now when the value of the initialText prop updates asynchronously, MyInput updates to match. The useEffect monitors the dependency change in initialText and calls setText in response. No more blank input!

Generally when I see this kind of code appear in the wild, it's accompanied by the text "for some reason React isn't updating MyInput with the new value of initialText so I put in a useEffect to keep things in sync." That "for some reason" is revealing: something is happening in React-land that I don't really understand, but at least I solved it using a React-like solution.[2]

Here's the rub: sure, this code solves the problem. But it's also incredibly brittle. This solution isn't obviously incorrect because developer machines are fast and we're usually dealing with sub-100ms response times from whatever API we're working with. In other words, because of quick response times, a developer might not notice the pop-in when MyInput is updated with the asynchronous value.

The thing is slow connections (e.g. mobile phones accessing your application, server saturation, etc.) will experience increasingly worse pop-in because of this useEffect change. In the worst-case scenario, a user could type text into MyInput and have that text cleared away by the useEffect after asyncText is loaded! Try increasing getTextFromServer to 3000 and see the result yourself.

The other problem with this kind of code is that we've effectively doubled the number of renders of the MyInput component. Sure, in this contrived example more renders is not doing any harm, but you can imagine that for particularly complicated components that set 10s of hundreds of different pieces of state, additional renders are to be avoided. State-syncing code of the kind in this example often leads to more state-syncing due to extraneous render passes, a problem that keeps on giving as your application grows.[3]

So what's actually happening with the MyInput useState? Why isn't it picking up the new value of initialText from the component prop? The answer is hidden away in the React documentation (emphasis mine):

useState Parameters:

  • initialState: The value you want the state to be initially. It can be a value of any type, but there is a special behavior for functions. This argument is ignored after the initial render.

"Ignored after the initial render", meaning even though the prop initialText is updated correctly, the useState that wraps text doesn't care. It's memoized such that any additional renders of the component will have no effect on the state variable it encapsulates.

If you think about it, this behavior makes sense. In 90% of cases, you wouldn't want your state variables to be blown away by component re-renders. When you use useState you expect it to hold onto a value until setState is called, and the memoization achieves that goal.

Now that we know more about how useState works behind the scenes, we can find a different solution for the problem of handling asynchronous initial state.

Solution: handle the pending state

So what should you do instead? The easiest solution is to have the parent component own the loading state:

const MyInput = ({ initialText = '' }) => {
  const [text, setText] = useState(initialText)

  const handleChange = (ev) => {
    setText(ev.target.value)
  }

  return <input value={text} onChange={handleChange} />
}

function App() {
  const [isLoading, setIsLoading] = useState(false)
  const [asyncText, setAsyncText] = useState('')

  useEffect(() => {
    const fetch = async () => {
      setIsLoading(true)
      const text = await getTextFromServer()
      setIsLoading(false)
      setAsyncText(text)
    }

    fetch()
  }, [])

  return (
    <form>
      {isLoading ? <p>loading...</p> : <MyInput initialText={asyncText} />}
    </form>
  )
}

MyInput goes back to its original form: a single useState that accepts initialText as an argument. Because MyInput is only rendered when asyncText has been fetched from the server (determined via isLoading in the parent component) the resulting useState is called once with an initial value of "text from server". There's no longer any need to sync state because the initial render of the component has the desired state.

I'll argue that thinking about loading states is actually the power of avoiding useEffect to solve these kinds of problems. By moving control of the loading state up the component hierarchy, developers need to put more thought into the async nature of their application and how the UI will handle it.

Going back into the discussion of React complexity and the burden of frameworks, the whole counter-intuitive nature of useState discarding its argument after the first render is a mind-bender for the beginner. I could imagine spending a few hours on this problem and getting nowhere because it's hard to conceptualize that the cause is actually within the framework itself, buried in the implementation detail of memoization in the useState hook. It takes time to encounter these kinds of issues in React, but spend enough time with it and they will inevitably rise to the surface.


  1. Although of course there's the vanilla JS alternative of needing to re-render the DOM when you update application state, but that's neither here nor there. ↩︎

  2. I want to re-emphasize that I don't think the developer is at fault here. They encountered a subtle problem that is super confusing and solved it using the tools React gives them. I think it's a very natural way of thinking about things. ↩︎

  3. State-syncing begets more state-syncing because the lifecycle of state values becomes hard to reconcile, and the only solution is to set state again to ensure everything is the most recent. ↩︎

Type predicates to avoid casting

03 Dec, 2024 til

Type predicates have been around but today I found a particularly nice application. The situation is this: I have an interface that has an optional field, where the presence of that field means I need to create a new object on the server, and the lack of the field means the object has already been created and I'm just holding on to it for later. Here's what it looked like:

interface Thing {
  name: string
  blob?: File
}

const things: Thing[] = [
  /* ... */
]

const uploadNewThings = (things: (Thing & { blob: File })[]) =>
  Promise.all(things.map((thing) => createThing(thing.name, thing.blob)))

The intersection type Thing & { blob: File } means that uploadNewThings only accepts things that have the field blob. In other words, things that need to be created on the server because they have blob content.

However, TypeScript struggles if you try to simply filter the list of things before passing it into uploadNewThings:

uploadNewThings(things.filter((thing) => !!thing.blob))

The resulting error is this long stream of text:

Argument of type 'Thing[]' is not assignable to parameter of type '(Thing & { blob: File; })[]'.
  Type 'Thing' is not assignable to type 'Thing & { blob: File; }'.
    Type 'Thing' is not assignable to type '{ blob: File; }'.
      Types of property 'blob' are incompatible.
        Type 'File | undefined' is not assignable to type 'File'.
          Type 'undefined' is not assignable to type 'File'.

The tl;dr being that despite filtering things by thing => !!thing.blob, TypeScript does not recognize that the return value is actually Thing & { blob: File }.

Now you could just cast it,

things.filter((thing) => !!thing.blob) as (Thing & { blob: File })[]

But casting is bad! It's error-prone and doesn't really solve the problem that TypeScript is hinting at. Instead, use a type predicate:

const hasBlob = (t: Thing): t is Thing & { blob: File } => !!t.blob

uploadNewThings(things.filter(hasBlob))

With the type predicate (t is Thing & ...) I can inform TypeScript that I do in fact know what I'm doing, and that the call to filter results in a different interface.

Running and writing

15 Nov, 2024 til

Most runners run not because they want to live longer, but because they want to live life to the fullest. If you're going to while away the years, it's far better to live them with clear goals and fully alive than in a fog, and I believe running helps you do that. Exerting yourself to the fullest within your individual limits: that's the essence of running, and a metaphor for life—and for me, writing as well. - Haruki Murakami

Data migrations with data-migrate

13 Nov, 2024 til

What I traditionally would've used Rake tasks for has been replaced with data-migrate, a little gem that handles data migrations in the same way as Rails schema migrations. It's the perfect way to automate data changes in production, offering a single pattern for handling data backfills, seed scripts, and the like.

The pros are numerous:

  • Data migrations are easily generated via CLI and are templated with an up and down case so folks think about rollbacks.
  • Just like with Rails schema migrations, there's a migration ID kept around that ensures data migrations are run in order. Old PRs will have merge conflicts.
  • You can conditionally run data migrations alongside schema migrations with bin/rails db:migrate:with_data.

It's a really neat gem. I'll probably still rely on the good ol' Rake task for my personal projects, but will doubtless keep data-migrate in the toolbox for teams.

More in the archive →