# Tracing my github actions

I've added a LOT of actions to one of my elixir project, and I've blown through the 3000 minutes included in the github pro plan.

```image
src: https://media.aezakmi.top/blog-images/articles/abb60c04-5ef9-4829-899f-f5b2872c45ff/550905bd-f963-433d-bff1-4ac3f769b466.webp
alt: image.webp
caption: /repo/action/metrics/usage - the results of bumping the budget 5 times in a month
```

I wanted to dig a bit deeper on how much each step in the job like `Playwright` actually takes, but the screenshot above is the absolute limit of the github actions observaibility.

I've heard cool things about [blacksmith.sh](https://www.blacksmith.sh/) but it requires (reasonably) for the repo to be in an organization and I did not want to bother with creating an org and transfering the repo there only for that.

I'm already paying for [sentry](https://sentry.io/) so if figured there should be a way to somehow generate opentelemetry traces as part of github actions run, and upload them to sentry after - and there is!

## opentelemetry-github action

There is of course already existing action for that - the [opentelemetry-github](https://github.com/marketplace/actions/github-actions-opentelemetry) action.

I wanted to trace my main `CI` job, and my post-merge `Cache Warm` action that sets up all the dependency cache for the elixir project and rust nif libs.

Here's the final `actions-observability.yml` file that worked for me.

```yml
name: Actions Observability

on: # zizmor: ignore[dangerous-triggers] Required for post-run telemetry; this workflow does not checkout or execute triggering workflow code.
  workflow_run:
    workflows:
      - Cache Warm
      - CI
    types:
      - completed

permissions:
  actions: read
  contents: read

jobs:
  export:
    runs-on: ubuntu-latest
    name: Export workflow trace

    steps:
      - name: Export workflow telemetry
        uses: plengauer/opentelemetry-github/actions/instrument/workflow@8a42906ace6a618a3b91545ef9b655b8c6d2ac23 # v5.54.0
        with:
          self_monitoring: "false"
        env:
          OTEL_SERVICE_NAME: github-actions
          OTEL_TRACES_EXPORTER: otlp
          OTEL_METRICS_EXPORTER: none
          OTEL_LOGS_EXPORTER: none
          OTEL_EXPORTER_OTLP_PROTOCOL: http/protobuf
          OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: ${{ secrets.SENTRY_GITHUB_ACTIONS_OTLP_TRACES_ENDPOINT }}
          OTEL_EXPORTER_OTLP_HEADERS: ${{ secrets.SENTRY_GITHUB_ACTIONS_OTLP_HEADERS }}
          OTEL_RESOURCE_ATTRIBUTES: service.namespace=repo-ex,deployment.environment=ci,github.repository=JakubSokolowski/repo-ex
```

To connect it to the OTLP endpoint for your project, you can follow the [official sentry instructions](https://blog.sentry.io/send-your-existing-opentelemetry-traces/).

> [!WARNING]
> The CI was also running a [zizmor](https://github.com/zizmorcore/zizmor) step to audit github actions, and it warned me that the `workflow_run` trigger is dangerous - however, this workflow only reads completed Actions metadata and exports telemetry, without checking out or executing any untrusted code, and the repo itself is private, so we're good.

## Results

Here's the same `Plawright` step now:

```image
src: https://media.aezakmi.top/blog-images/articles/abb60c04-5ef9-4829-899f-f5b2872c45ff/3fd7408c-b5e0-4d62-b780-889b71335b79.webp
alt: image.webp
caption: lot of setup, test run itself is pretty fast for around ~80 specs.
```

Impressive, very nice, very observabl

Adding this also helped me to identify a cache issue in the [schemathesis](https://schemathesis.io/) job, that cut the time down from ~7 to 3 minutes.

It's also very informative to see that I didn't add some new shitty playwright tests, but rather the `Install Playwright` step suddenly started to take ~17 minutes from the usual 20 seconds.

```image
src: https://media.aezakmi.top/blog-images/articles/abb60c04-5ef9-4829-899f-f5b2872c45ff/23b0c481-9d7f-4ead-ab2c-f25b4b6a714a.webp
alt: screenshot of the playwright trace waterfall in sentry
caption: my p95 of playwright tests
```
The Ubuntu/Azure apt mirror was shoveling bits one by one
```image
src: https://media.aezakmi.top/blog-images/articles/abb60c04-5ef9-4829-899f-f5b2872c45ff/968e090e-d73a-4d9d-a297-25d55748c2d1.webp
alt: screenshot of the Playwright sentry trace waterfall
caption: 😡 !!!!
```

```image
src: https://media.aezakmi.top/blog-images/articles/abb60c04-5ef9-4829-899f-f5b2872c45ff/90f154c4-290f-4aa3-8c2c-e2849045d1ed.webp
alt: screenshot o the logs of github actions step, that shows fonts downloading very slowly
caption: 20 kB/s, what is the opposite of blazingly fast? glacially slow?
```

I like the new observability very much. Im doing like a couple prs a week to that repo, and for the last month this used up ~10k spans in sentry, which I think is totally acceptable.
```image
src: https://media.aezakmi.top/blog-images/articles/abb60c04-5ef9-4829-899f-f5b2872c45ff/0e28d743-73eb-44b4-b199-6a0c1cb233d9.webp
alt: image.webp
caption: last month span usage for this action.
```

This runs as a dedicated action, so it uses up like 1 extra billable minute per run, which I think it's totally acceptable as well, especially given the time savings in other action that it helped me time. Overall, p cool! Thanks to [plengauer](https://github.com/plengauer/) for this action. It does even more, it can trace shell calls and all the curls and wgets run as part of the actions.

I don't have the need for that yet, unless the ubuntu mirrors become slow again.