How I broke Gatsby JS conditional page build and learned to debug the tool chain

I built my AWS CodeBuild Pipeline with a new feature called "Conditional Page Builds" and it did not worked as expected in the build environment.

Starting the build process again with no change in the source code and with identical copies of the .cache and public folder generated this output:

1...
2info One or more of your plugins have changed since the last time you ran Gatsby. As a precaution, we're deleting your site's cache to ensure there's no stale data.
3success initialize cache - 0.011s
4...
5success Building production JavaScript and CSS bundles - 30.860s
6success Rewriting compilation hashes - 0.005s
7...
8success run queries - 49.184s - 44/44 0.89/s
9success Building static HTML for pages - 17.453s - 40/40 2.29/s
10success Delete previous page data - 0.000s
11success Generating image thumbnails - 166.557s - 316/316 1.90/s
12...

a second call to GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true gatsby build --log-pages immediately following the first one shows the expected output:

1...
2success onPreInit - 0.020s
3success initialize cache - 0.012s
4...
5success Building production JavaScript and CSS bundles - 4.623s
6success run queries - 4.804s - 4/4 0.83/s
7success Building static HTML for pages - 0.732s - 0/0 0.00/s
8success Delete previous page data - 0.002s
9...

This proves the fact that the conditional page build is working and no page contains any data which is different on each creation.

Comparing the two outputs I found these problems:

  1. Cache invalidation by changed plugins
    info One or more of your plugins have changed since the last time you ran Gatsby. As a precaution, we're deleting your site's cache to ensure there's no stale data.
  2. Rewriting compilation hashes
  3. Regenerating Images
    success Generating image thumbnails

To open an issue on github I had to create a simple setup simulating the problems with the environment.

Issue 1 and 2 (posted on github) where my fault by creating a temporal file with a name starting with gatsby-* which conflicted with a part of gatsby creating a hash on the content of all files starting with this name.

creating a simpler setup

  1. create a gatsby site based on the default starter template
  2. upgrade gatsby and the tool chain to the latest stable versions
1npx gatsby new gatsby-starter
2cd gatsby-starter
3yarn upgrade --latest

on the first run of GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true yarn build --log-pages the result was as expected - there was no .cache and no public folder.

The build process creates the compilation hashes [24], the static HTML [26] and the image thumbnails [28].

1$ gatsby build --log-pages
2success open and validate gatsby-configs - 0.034s
3success load plugins - 1.153s
4success onPreInit - 0.020s
5success initialize cache - 0.012s
6success copy gatsby files - 0.122s
7success onPreBootstrap - 0.018s
8success createSchemaCustomization - 0.008s
9success source and transform nodes - 0.140s
10success building schema - 0.535s
11success createPages - 0.002s
12success createPagesStatefully - 0.082s
13success onPreExtractQueries - 0.002s
14success update schema - 0.051s
15success extract queries from components - 0.340s
16success write out requires - 0.007s
17success write out redirect data - 0.002s
18success Build manifest and related icons - 0.211s
19success onPostBootstrap - 0.220s
20
21info bootstrap finished - 8.052s
22
23success Building production JavaScript and CSS bundles - 6.059s
24success Rewriting compilation hashes - 0.061s
25success run queries - 6.757s - 7/7 1.04/s
26success Building static HTML for pages - 0.904s - 4/4 4.43/s
27success Delete previous page data - 0.003s
28success Generating image thumbnails - 7.766s - 6/6 0.77/s
29success onPostBuild - 0.018s
30info Done building in 15.908345021 sec
31info Built pages:
32Updated page: /404/
33Updated page: /
34Updated page: /page-2/
35Updated page: /404.html
36✨ Done in 16.24s.

running the same build command, a second time gives this output:

1$ gatsby build --log-pages
2success open and validate gatsby-configs - 0.033s
3success load plugins - 0.827s
4success onPreInit - 0.018s
5success initialize cache - 0.007s
6success copy gatsby files - 0.109s
7success onPreBootstrap - 0.009s
8success createSchemaCustomization - 0.006s
9success source and transform nodes - 0.067s
10success building schema - 0.375s
11success createPages - 0.002s
12success createPagesStatefully - 0.139s
13success onPreExtractQueries - 0.004s
14success update schema - 0.046s
15success extract queries from components - 0.286s
16success write out requires - 0.016s
17success write out redirect data - 0.004s
18success Build manifest and related icons - 0.223s
19success onPostBootstrap - 0.249s
20
21info bootstrap finished - 6.451s
22
23success Building production JavaScript and CSS bundles - 4.623s
24success run queries - 4.804s - 4/4 0.83/s
25success Building static HTML for pages - 0.732s - 0/0 0.00/s
26success Delete previous page data - 0.002s
27success onPostBuild - 0.002s
28info Done building in 12.096156853 sec
29✨ Done in 12.45s.

as expected the following steps have been skipped:

  • Rewriting compilation hashes
  • Building static HTML 0/0
  • Generating Images

simulating the build process of the build Pipeline

  1. create temporal directory
  2. clone the git repository to a temporal directory
  3. restore the build cache
  4. run the build command
  5. save the build cache

a simple bash script codepipeline.sh would look like this (don't use this for production as many edge cases are missing!):

1#!/bin/bash
2set -x
3
4BUILD_CACHE=/tmp/buildcache.tgz
5TMPDIR=$(mktemp -d)
6git clone https://github.com/aheissenberger/gatsby-cache-problem.git $TMPDIR
7if [ -f "$BUILD_CACHE" ]; then
8 tar -xf "$BUILD_CACHE" --directory $TMPDIR
9fi
10pushd $TMPDIR
11yarn install
12GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true yarn build --log-pages || exit 1
13if [ -f "$BUILD_CACHE" ]; then
14 rm "$BUILD_CACHE"
15fi
16tar czf "$BUILD_CACHE" .cache/ public/
17popd

Run this script twice and you will see that on the second run the image generation still happens:

1...
2success Building static HTML for pages - 1.621s - 0/0 0.00/s
3success Delete previous page data - 0.004s
4success Generating image thumbnails - 14.491s - 6/6 0.41/s <<==LOOK HERE
5success onPostBuild - 0.012s
6info Done building in 21.23928199 sec
7✨ Done in 21.56s.
8+ '[' -f /tmp/buildcache.tgz ']'
9+ rm /tmp/buildcache.tgz
10+ tar czf /tmp/buildcache.tgz .cache/ public/
11+ popd

I documented my environment and posted an issue on github after not finding any other issue with a similar problem.

Since I am not very patient, I decided to look at the problem myself.

How to debug Gatsby JS in VS Code

The first thing I do when testing out a new framework is setting up the Javascript debugger in VS Code. Gatsby provides a very good documentation which makes this a breeze.

Add this to your existing VS Code launch.json to debug conditional page building:

1{
2 "name": "Gatsby build Conditional",
3 "type": "node",
4 "request": "launch",
5 "protocol": "inspector",
6 "program": "${workspaceRoot}/node_modules/gatsby/dist/bin/gatsby",
7 "args": ["build", "--write-to-file", "--log-pages"],
8 "env": {"GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES":"true","CI":"true"},
9 "skipFiles": [],
10 "stopOnEntry": false,
11 "runtimeArgs": ["--nolazy"],
12 "sourceMaps": false
13}

activate searching in the node_modules folder

open the VS Code configuration .vscode/settings.json and add this:

1{
2 "search.exclude": {
3 "**/node_modules":false
4 },
5 "search.useIgnoreFiles":false
6}

Dig into the problem

The goal is to find the place in the code where the logic is which compares values from the cache with the existing environment and decides to rerun the image generation.

  1. let's search for Generating image thumbnails in the node_modules folder the result will list CHANGELOG.md and node_modules/gatsby-plugin-sharp/utils.js:38 looking into the second file will show us the code which prints the success message

  2. set a break point in this line [38]

  3. remove the .cache and public folder: rm -fr .cache/ ./public/

  4. start the VS Code Debugger and choose Gatsby build Conditional from the pulldown

  5. the debugger stops at line 38

    VS Code debugging / utils.js:38
    [VS Code debugging / utils.js:38]
  6. look at the CALL STACK in the left column and click on the second line - this is the script which called this function

    VS Code debugging / gatsby-node.js:91
    [VS Code debugging / gatsby-node.js:91]
  7. we see that this code part of the gatsby-plugin-sharp is activated by the event CREATE_JOB_V2

  8. if this would have been a normal function call, simple looking up the CALL STACK would be easy, but this is an event which could have been raised anywhere in the code based

  9. cancel the debugger and search again for CREATE_JOB_V2 inside the node_modules folder the search result contains 6 files where node_modules/gatsby/dist/redux/actions/public.js looks most promising

    VS Code search result / public.js:1198
    [VS Code search result / public.js:1198]
  10. looking at the 8 lines of this function:

    1actions.createJobV2 = (job, plugin) => (dispatch, getState) => {
    2 const currentState = getState();
    3 const internalJob = createInternalJob(job, plugin);
    4 const jobContentDigest = internalJob.contentDigest; // Check if we already ran this job before, if yes we return the result
    5 // We have an inflight (in progress) queue inside the jobs manager to make sure
    6 // we don't waste resources twice during the process
    7
    8 if (currentState.jobsV2 && currentState.jobsV2.complete.has(jobContentDigest)) {
    9 return Promise.resolve(currentState.jobsV2.complete.get(jobContentDigest).result);
    10 }
    11 // removed...
    12}
  • line 2 gets the state from the redux store which was populated from the .cache folder
  • line 3 creates a new Job object
  • line 4 gets the content digest hash from the new job
  • line 8 does a lookup in the redux store if the job has been done before
  1. set a breakpoint on const internalJob = createInternalJob(job, plugin);

  2. remove the .cache and public folder: rm -fr .cache/ ./public/

  3. run debugger - choose Gatsby build Conditional from the pulldown

  4. when the debugger pauses we can inspect the job attributes

    VS Code debug / public.js:1182
    [VS Code debug / public.js:1182]

    the left column shows us the following attributes with absolute paths /User/xxxxxxxx/...:
    job.inputPaths, job.outputDir, plugin.pluginFilepath, plugin.resolve

  5. step into the function createInternalJob to check on the creation of the diggest

    VS Code debug / jobs-manager.js
    [VS Code debug / jobs-manager.js]
  6. scrolldown until you see the place where the contentDigest is created:

    1internalJob.contentDigest = createContentDigest({
    2 name: job.name,
    3 inputPaths: internalJob.inputPaths.map(inputPath => inputPath.contentDigest),
    4 outputDir: internalJob.outputDir,
    5 args: internalJob.args,
    6 plugin: internalJob.plugin
    7});

We found the Problem

The hash created by Gatsby Job Queue Manager will always break when compared to values in the cache which have been created in a source folder with a different absolute path as long as the Job and the Plugin object contain absolute paths.

I also create an issue on github where you can follow the next steps.

Quick Fix

Always copy the source folder to the same path which has been used when the .cache folder was created.

Long term Solutions to the Problem

  • convert the object to a JSON string and replace the path to the root folder with a stable string or hash (still only a quick fix)
  • do not allow to store any absolute path in any Gatsby data structures: Job objects, plugin objects

I hope I could give some people a first impression on how to track down a problem by yourselves and provide maintainers of a project a better documentation to fix architectural problems.

Write a response on Medium