使用kaniko缓存运行dockerbuild时,未缓存npmrunbuild
我正在尝试为 React 应用程序 ( github repo )加速我的 Google Cloud Build 。因此,我开始按照官方 Cloud Build 文档中的建议使用 Kaniko Cache。
看来npm install我的构建过程的一部分现在确实被缓存了。但是,我原以为npm run build在源文件未更改时也会缓存。
我的 Dockerfile:
# Base image has ubuntu, curl, git, openjdk, node & firebase-tools installed
FROM gcr.io/team-timesheets/builder as BUILDER
## Install dependencies for functions first
WORKDIR /functions
COPY functions/package*.json ./
RUN npm ci
## Install app dependencies next
WORKDIR /
COPY package*.json ./
RUN npm ci
# Copy all app source files
COPY . .
# THIS SEEMS TO BE NEVER CACHED, EVEN WHEN SOURCE FILES HAVENT CHANGED
RUN npm run build:refs
&& npm run build:production
ARG VCS_COMMIT_ID
ARG VCS_BRANCH_NAME
ARG VCS_PULL_REQUEST
ARG CI_BUILD_ID
ARG CODECOV_TOKEN
ENV VCS_COMMIT_ID=$VCS_COMMIT_ID
ENV VCS_BRANCH_NAME=$VCS_BRANCH_NAME
ENV VCS_PULL_REQUEST=$VCS_PULL_REQUEST
ENV CI_BUILD_ID=$CI_BUILD_ID
ENV CODECOV_TOKEN=$CODECOV_TOKEN
RUN npm run test:cloudbuild
&& if [ "$CODECOV_TOKEN" != "" ];
then curl -s https://codecov.io/bash | bash -s - -X gcov -X coveragepy -X fix -s coverage;
fi
WORKDIR /functions
RUN npm run build
WORKDIR /
ARG FIREBASE_PROJECT_ID
ARG FIREBASE_TOKEN
RUN if [ "$FIREBASE_TOKEN" != "" ];
then firebase deploy --project $FIREBASE_PROJECT_ID --token $FIREBASE_TOKEN;
fi
构建输出:
BUILD
Pulling image: gcr.io/kaniko-project/executor:latest
latest: Pulling from kaniko-project/executor
Digest: sha256:b9eec410fa32cd77cdb7685c70f86a96debb8b087e77e63d7fe37eaadb178709
Status: Downloaded newer image for gcr.io/kaniko-project/executor:latest
gcr.io/kaniko-project/executor:latest
INFO[0000] Resolved base name gcr.io/team-timesheets/builder to builder
INFO[0000] Using dockerignore file: /workspace/.dockerignore
INFO[0000] Retrieving image manifest gcr.io/team-timesheets/builder
INFO[0000] Retrieving image gcr.io/team-timesheets/builder
INFO[0000] Retrieving image manifest gcr.io/team-timesheets/builder
INFO[0000] Retrieving image gcr.io/team-timesheets/builder
INFO[0000] Built cross stage deps: map[]
INFO[0000] Retrieving image manifest gcr.io/team-timesheets/builder
INFO[0000] Retrieving image gcr.io/team-timesheets/builder
INFO[0000] Retrieving image manifest gcr.io/team-timesheets/builder
INFO[0000] Retrieving image gcr.io/team-timesheets/builder
INFO[0001] Executing 0 build triggers
INFO[0001] Resolving srcs [functions/package*.json]...
INFO[0001] Checking for cached layer gcr.io/team-timesheets/app/cache:9307850446a7754b17d62c95be0c1580672377c1231ae34b1e16fc284d43833a...
INFO[0001] Using caching version of cmd: RUN npm ci
INFO[0001] Resolving srcs [package*.json]...
INFO[0001] Checking for cached layer gcr.io/team-timesheets/app/cache:7ca523b620323d7fb89afdd0784f1169c915edb933e1d6df493f446547c30e74...
INFO[0001] Using caching version of cmd: RUN npm ci
INFO[0001] Checking for cached layer gcr.io/team-timesheets/app/cache:1fd7153f10fb5ed1de3032f00b9fb904195d4de9dec77b5bae1a3cb0409e4530...
INFO[0001] No cached layer found for cmd RUN npm run build:refs && npm run build:production
INFO[0001] Unpacking rootfs as cmd COPY functions/package*.json ./ requires it.
INFO[0026] WORKDIR /functions
INFO[0026] cmd: workdir
INFO[0026] Changed working directory to /functions
INFO[0026] Creating directory /functions
INFO[0026] Taking snapshot of files...
INFO[0026] Resolving srcs [functions/package*.json]...
INFO[0026] COPY functions/package*.json ./
INFO[0026] Resolving srcs [functions/package*.json]...
INFO[0026] Taking snapshot of files...
INFO[0026] RUN npm ci
INFO[0026] Found cached layer, extracting to filesystem
INFO[0029] WORKDIR /
INFO[0029] cmd: workdir
INFO[0029] Changed working directory to /
INFO[0029] No files changed in this command, skipping snapshotting.
INFO[0029] Resolving srcs [package*.json]...
INFO[0029] COPY package*.json ./
INFO[0029] Resolving srcs [package*.json]...
INFO[0029] Taking snapshot of files...
INFO[0029] RUN npm ci
INFO[0029] Found cached layer, extracting to filesystem
INFO[0042] COPY . .
INFO[0043] Taking snapshot of files...
INFO[0043] RUN npm run build:refs && npm run build:production
INFO[0043] Taking snapshot of full filesystem...
INFO[0061] cmd: /bin/sh
INFO[0061] args: [-c npm run build:refs && npm run build:production]
INFO[0061] Running: [/bin/sh -c npm run build:refs && npm run build:production]
> thdk-timesheets-app@1.2.16 build:refs /
> tsc -p common
> thdk-timesheets-app@1.2.16 build:production /
> webpack --env=prod
Hash: e33e0aec56687788a186
Version: webpack 4.43.0
Time: 81408ms
Built at: 12/04/2020 6:57:57 AM
....
现在,由于缓存系统的开销,甚至似乎没有速度优势。
我对 Dockerfiles 比较陌生,所以希望我只是在这里遗漏了一条简单的线。
回答
简短回答:缓存失效很难。
在RUNDockerfile 的一部分中,可以运行任何命令。一般来说,docker(使用本地缓存时)或Kaniko现在已经决定是否可以缓存这一步。这通常是通过检查输出是否具有确定性来确定的——换句话说:如果再次运行相同的命令,它是否会产生与以前相同的文件更改(相对于最后一个图像)?
现在,这种简单的视图不足以确定可缓存的命令,因为任何命令都可能具有不影响本地文件系统的副作用 - 例如,网络流量。如果你运行一个curl -XPOST https://notify.example.com/build/XYZ发布一个成功或失败的构建一些通知API,这应该不被缓存。也许您的命令正在为管理员用户生成一个随机密码并将其保存到外部数据库 - 这一步也不应该被缓存。
另一方面,npm run build由于缩小器和捆绑器的工作方式,完全可重现的仍然可能导致两个不同的捆绑包 - 例如,缩小和丑化的构建具有不同的短变量名称。尽管生成的构建在语义上是相同的,但它们不是在字节级别上的——所以虽然这一步可以被缓存,但 docker 或 kaniko 无法识别。
区分可缓存和不可缓存的行为基本上是不可能的,因此您会在缓存中一次又一次地遇到误报或漏报形式的问题行为。
当我在构建管道时咨询客户时,我通常将 Dockerfiles 分成多个阶段或将缓存未命中或命中逻辑放入脚本中,如果 docker 确定某个步骤是错误的。
当您拆分 Dockerfile 时,您将拥有一个基础映像(其中包含所有依赖项和其他准备步骤)并将自定义缓存部分拆分为它自己的 Dockerfile - 后者然后引用前一个基础映像。这通常意味着,您必须有某种形式的模板(例如,FROM ${BASE_IMAGE}在开始时有一个,然后通过envsubst或更复杂的系统(如 helm)呈现)。
如果这不适合您的用例,您可以选择自己在脚本中实现逻辑。要了解哪些文件发生了变化,您可以使用git diff --name-only HEAD HEAD~1. 通过将其与更多逻辑相结合,您可以自定义脚本行为以仅在特定文件集更改时执行某些逻辑:
#!/usr/bin/env bash
# only rebuild, if something changed in 'app/'
if [[ ! -z "$(git diff --name-only HEAD HEAD~1 | grep -e '^(app/|package.*)')" ]]; then
npm run build:ref
curl -XPOST https://notify.api/deploy/$(git rev-parse --short HEAD)
// ... further steps ...
fi
您可以轻松地将此逻辑扩展到您的确切需求,并自己完全控制缓存逻辑 - 但您应该只对涉及 docker 或 kaniko 的误报或漏报的步骤执行此操作,因为以下所有步骤都不会被缓存到不确定的行为。