Access to the path '/proc/<ID>/oom_score_adj' is denied #3132

romanvogman · 2023-12-06T13:25:56Z

Checks

I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
I am using charts that are officially provided

Controller Version

0.7.0

Deployment Method

Helm

Checks

This isn't a question or user support case (For Q&A and community support, go to Discussions).
I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Trigger a jenkins-action job with a self hosted github runner

Describe the bug

Trying to set up a self hosted runner in a kubernets mode that will replace a self hosted runner which currently runs on a dedicated VM.
When running a github action that runs a jenkins action I'm seeing a permissions error on Access to the path '/proc/<ID>/oom_score_adj' is denied

Describe the expected behavior

Expect it to run as is when running on a self hosted runner in a dedicated VM instead of in a kubernets cluster

Additional Context

Followed the following video to set it up

This is the runner config I'm currently using for the helm chart:

containerMode:
 type: "kubernetes"  ## type can be set to dind or kubernetes
   ## the following is required when containerMode.type=kubernetes
 kubernetesModeWorkVolumeClaim:
   accessModes: ["ReadWriteOnce"]
   storageClassName: "standard-rwo"
   resources:
     requests:
       storage: 1Gi

template:
  spec:
    securityContext:
      fsGroup: 123
      runAsUser: 1001
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "false"
        resources:
          limits:
            memory: "512Mi"
            cpu: "500m"
          requests:
            memory: "128Mi"
            cpu: "100m"

Controller Logs

same

Runner Pod Logs

[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper] Starting process:
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   File name: '/home/runner/externals/node16/bin/node'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   Arguments: '/home/runner/k8s/index.js'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   Working directory: '/home/runner/_work/end2end/end2end'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   Require exit code zero: 'False'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: ''
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'True'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   Persist current code page: 'False'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper] Failed to update oom_score_adj for PID: 57.
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper] System.UnauthorizedAccessException: Access to the path '/proc/57/oom_score_adj' is denied.
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]  ---> System.IO.IOException: Permission denied
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    --- End of inner exception stack trace ---
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    at System.IO.Strategies.OSFileStreamStrategy.Write(ReadOnlySpan`1 buffer)
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    at System.IO.Strategies.BufferedFileStreamStrategy.Flush(Boolean flushToDisk)
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    at System.IO.Strategies.BufferedFileStreamStrategy.Dispose(Boolean disposing)
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    at System.IO.StreamWriter.CloseStreamFromDispose(Boolean disposing)
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    at System.IO.StreamWriter.Dispose(Boolean disposing)
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    at System.IO.File.WriteAllText(String path, String contents)
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper]    at GitHub.Runner.Sdk.ProcessInvoker.WriteProcessOomScoreAdj(Int32 processId, Int32 oomScoreAdj)
[WORKER 2023-12-06 12:45:00Z INFO ProcessInvokerWrapper] Process started with process id 57, waiting for process exit.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-12-06T13:26:21Z

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

nikola-jokic · 2023-12-06T13:40:47Z

Hey @romanvogman,

This issue is related to the runner. However, can you please confirm that the job executes without issues? I know the runner raises this exception but usually, it does not influence the execution of the job. I am curious is this exception affecting your job, or are you reporting that the runner throws the exception?

romanvogman · 2023-12-06T13:49:17Z

Hi @nikola-jokic !
Sadly it fails with the following error, which also causes to runner pod to be terminated and a new one is lunched afterwards (min instances is set to 1 so perhaps that's the reason for scaling a new one):

[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner] Caught exception from step: System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner]  ---> System.Exception: The hook script at '/home/runner/k8s/index.js' running command 'RunContainerStep' did not execute successfully
[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner]    at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner]    --- End of inner exception stack trace ---
[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner]    at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner]    at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.RunContainerStepAsync(IExecutionContext context, ContainerInfo container, String dockerFile)
[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner]    at GitHub.Runner.Worker.Handlers.ContainerActionHandler.RunAsync(ActionRunStage stage)
[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner]    at GitHub.Runner.Worker.ActionRunner.RunAsync()
[WORKER 2023-12-06 13:45:23Z ERR  StepsRunner]    at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
[WORKER 2023-12-06 13:45:23Z INFO StepsRunner] Step result: Failed


[RUNNER 2023-12-06 13:45:26Z INFO Terminal] WRITE LINE: 2023-12-06 13:45:26Z: Job execute tests completed with result: Failed
2023-12-06 13:45:26Z: Job execute tests completed with result: Failed

√ Removed .credentials
√ Removed .runner
[RUNNER 2023-12-06 13:45:27Z INFO Listener] Runner execution has finished with return code 0
Runner listener exit with 0 return code, stop the service, no retry needed.
Exiting runner...

nikola-jokic · 2023-12-06T14:01:17Z

Oh, from this report, it definitely is not causing failure of the job.

The output that you provided showed that the hook execution failed. We should include better error reporting in the hook. The HTTP request failed is not nearly enough for users to troubleshoot the configuration issues.

It is possible that the node pressure is causing this kind of issue. The job pod needs to land on the runner node, so that may be causing issues with the hook implementation.

I will close this issue here, since it is not ARC related, but feel free to comment on it!

zerola · 2024-01-24T10:02:31Z

Hi @romanvogman , we have encountered the same issue as you described (using our GKE cluster for ARC). Just wondering - have you managed to solve it?

dmalone-keebo · 2024-01-26T07:01:52Z

Same here GKE + ARC

Nuru · 2024-01-27T21:04:04Z

@nikola-jokic wrote:

This issue is related to the runner.
...
I will close this issue here, since it is not ARC related, but feel free to comment on it!

@nikola-jokic Where is the right place to open this issue so it gets addressed? I remain confused about where the source code is for the runners used for Runner Controller Sets and where to open issues about them.

This is still happening in version 0.8.2

System.UnauthorizedAccessException: Access to the path '/proc/224/oom_score_adj' is denied

nikola-jokic · 2024-01-29T08:45:45Z

Hey,

Just to clarify, the issue with the access to the path is denied should not influence the workings of the runner at all. It is just an annoying exception that the runner throws. If you want to submit it, you can create an issue in the runner repo.

As far as the error reporting goes with the hook, we are hoping to publish a new 0.5.1 release soon and re-publish the image. That can help troubleshoot the hook setup. However, the System.UnauthorizedAccessException has nothing to do with the hook's HTTP error.

Nuru · 2024-01-29T21:18:28Z

@nikola-jokic wrote:

If you want to submit it, you can create an issue in the runner repo.

See, this is what I'm talking about with regard to confusion. That repo (actions/runner), as far as I can tell, is for the Summwerwind runner only (current version v2.312.0), but this issue is for the GitHub self-hosted runner image (version 0.7.0 as of this issue, now current is v0.8.2). I don't know where to report issues on that runner (as opposed to the controller).

zerola · 2024-01-31T09:56:03Z

@nikola-jokic Could you advise how to trouble shoot the hook problems? As described above, we are both running in GKE (standard, no autopilot), regular GitHub jobs work fine, the problem is with the containerized ones and kubernetes mode in ARC. The runner pod should start a second workflow pod for the container, but this is not happening. I can see inside the runner pod that hook process is running, however I do not see any relevant logs, even when I tried to provide RUNNER_DEBUG variable. I have checked the Kubernetes API also for authorization problems with regards to the used service accounts, but there was no problem. At the same time, Kubernetes events do not show anything suspicious. Thank you.

nikola-jokic · 2024-01-31T11:50:34Z

Hey @zerola,

Of course, currently debugging hook is almost impossible since the information about the error is hidden in the exception and not logged anywhere. This has been changed, starting at 0.5.0 release, but that release introduced a bug on alpine containers, so we ended up rolling back the hook version added to the runner version 2.312.0. We have a PR ready that should be released, but for now, you would have to build your own hook and provide it to the runner. If you decide to go with that approach, please use the branch where this PR is, or if you don't use alpine containers in your workflow, you can probably safely use the 0.5.0 release.

zerola · 2024-01-31T16:48:45Z

Hi @nikola-jokic , thanks for instructions. I have built my own runner image based on your https://github.com/actions/runner/blob/main/images/Dockerfile and provided the RUNNER_VERSION=2.312.0 with RUNNER_CONTAINER_HOOKS_VERSION=0.5.0.
However, the logs from the runner container still contain only this sort of information:

[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner] Caught exception from step: System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner]  ---> System.Exception: The hook script at '/home/runner/k8s/index.js' running command 'PrepareJob' did not execute successfully
[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner]    at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner]    --- End of inner exception stack trace ---
[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner]    at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner]    at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.PrepareJobAsync(IExecutionContext context, List`1 containers)
[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner]    at GitHub.Runner.Worker.ContainerOperationProvider.StartContainersAsync(IExecutionContext executionContext, Object data)
[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner]    at GitHub.Runner.Worker.JobExtensionRunner.RunAsync()
[WORKER 2024-01-31 16:38:36Z ERR  StepsRunner]    at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
[WORKER 2024-01-31 16:38:36Z INFO StepsRunner] Step result: Failed

nikola-jokic · 2024-02-01T11:35:36Z

Can you please turn on debugging and see the output in the workflow?

zerola · 2024-02-01T14:20:20Z

Can you please turn on debugging and see the output in the workflow?

Could you advise please how to turn on debugging? I found only RUNNER_DEBUG env variable, which is set.

nikola-jokic · 2024-02-01T14:26:37Z

Does the step output that can be seen in the UI show the reason for the failure? Based on this issue, it does seem to help so I'm trying to understand how are we missing the HTTP response log on the latest 0.5.0 version

zerola · 2024-02-01T14:48:16Z

No, the output in UI is still this:

Error: Error: Client network socket disconnected before secure TLS connection was established
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.```

caiocsgomes · 2024-02-28T14:12:38Z

Have you guys managed to find a solution for this? I'm running into the same problem.

caiocsgomes · 2024-02-28T14:20:48Z

I'm not able to catch what the problem is, I'm getting the same logs as @zerola

zerola · 2024-02-28T14:41:24Z

@caiocsgomes - Unfortunately no, in the end we have decided to use Docker-In-Docker mode for containerized workflows and that works. In any case, I will keep an eye on this PR if someone manages to solve it.

MPV · 2024-03-12T15:51:57Z

If you want to submit it, you can create an issue in the runner repo.

So, is this related to / caused by this upstream issue?

Self hosted runner in Linux container exits job prematurely runner#921

romanvogman · 2024-03-14T09:13:32Z

Hey @zerola, sorry for the late reply.

As @nikola-jokic mentioned - the issue wasn't related to Access to the path '/proc/57/oom_score_adj' is denied error. We assumed that it's related because it was the main exception we saw in the logs.

From our side we were running containerized tasks which required arc to run in a dind mode. After changing to dind (with a few other unrelated fixes) the issue was resolved.

Hope it helps to anyone who encounters this issue

remidebette · 2024-04-12T10:28:06Z

Hi, can someone definitely confirm that kubernetes mode does not support containerized task?

nikola-jokic · 2024-04-12T11:17:17Z

We should probably better report the error on the hook side. There is definitely a room to improve.

@remidebette, I'm sorry I don't understand, what do you mean when you say containerized task? Are you referring to the container step?

remidebette · 2024-04-12T13:33:24Z

Hi @nikola-jokic, we have been trying a "vanilla" install of the scaleset helm chart in kubernetes mode, switched our CI jobs to containers and are encountering the issue that I see in several tickets:

##[debug] ---> System.Exception: The hook script at '/home/runner/k8s/index.js' running command 'PrepareJob' did not execute successfully

In my understanding, this script is not stable and people in the discussions online have issues with it and switch back to dind instead.

For example
actions/runner-container-hooks#128 (comment)
actions/runner-container-hooks#103

What is specific to us is that we are using an onpremisses rancher cluster, the PVC class is ceph-rbd and the helm charts are installed with flux.

nikola-jokic · 2024-04-12T14:38:04Z

The script should be fine, but the error reported does not give you any clue what is going on.
Maybe if you turn on debugging for the workflow, you can see it?
There are e2e tests confirming that container hook is running the job, and I'm using it regularly, so I'm wondering if there is something wrong either with the configuration, or with the image (i.e. it fails to pull, or something else)

If you can, please let me know if the workflow pod is created, but something is incorrect there. If there is an example workflow I can run to see what is going on, that would also be helpful. One thing to note, if you are using private images, the container hook will not inherit the pull policy of the runner pod

noamgreen · 2024-08-17T18:30:13Z

@nikola-jokic HI this error come from the runner
in the code
"#if OS_LINUX
private void WriteProcessOomScoreAdj(int processId, int oomScoreAdj)
{
try
{
string procFilePath = $"/proc/{processId}/oom_score_adj";
if (File.Exists(procFilePath))
{
File.WriteAllText(procFilePath, oomScoreAdj.ToString());
Trace.Info($"Updated oom_score_adj to {oomScoreAdj} for PID: {processId}.");
}
}
catch (Exception ex)
{
Trace.Info($"Failed to update oom_score_adj for PID: {processId}.");
Trace.Info(ex.ToString());
}
}
#endif
"
https://github.com/actions/runner/blob/2979fbad9460c32bea9419595d8c3eacc8f4930d/src/Runner.Sdk/ProcessInvoker.cs#L657

not sure why

Nek0trkstr · 2024-08-29T14:47:06Z

Hi I've encountered the same error while trying to run ARC in kubernetes mode and was following the same guide as @romanvogman.
Adding following lines resolved the issue:

containerMode:
  type: "kubernetes"  ## type can be set to dind or kubernetes
  ## the following is required when containerMode.type=kubernetes
  kubernetesModeWorkVolumeClaim:
    accessModes: ["ReadWriteOnce"]
    # For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
    storageClassName: "openebs-hostpath"
    resources:
      requests:
        storage: 1Gi
+  kubernetesModeServiceAccount:
+   annotations:

This wasn't a part of the video that we both followed ,I see that this was already existed in 0.7.0 . So this change was introduced somewhere between 0.4.0 and 0.7.0.

romanvogman added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Dec 6, 2023

nikola-jokic added question Further information is requested and removed bug Something isn't working needs triage Requires review from the maintainers labels Dec 6, 2023

nikola-jokic closed this as completed Dec 6, 2023

knkarthik mentioned this issue Mar 28, 2024

Job pod failed to start on GKE Autopilot with container hooks (kubernetes mode) actions/runner-container-hooks#152

Open

4 tasks

edhenry mentioned this issue Jun 25, 2024

Error: Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'index') actions/runner#3358

Open

joosangkim mentioned this issue Jul 4, 2024

self-hosted action runner with kubernetes mode on EKS failed at Initialize containers step from Action UI actions/runner#3372

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access to the path '/proc/<ID>/oom_score_adj' is denied #3132

Access to the path '/proc/<ID>/oom_score_adj' is denied #3132

romanvogman commented Dec 6, 2023 •

edited

Loading

github-actions bot commented Dec 6, 2023

nikola-jokic commented Dec 6, 2023

romanvogman commented Dec 6, 2023

nikola-jokic commented Dec 6, 2023

zerola commented Jan 24, 2024

dmalone-keebo commented Jan 26, 2024

Nuru commented Jan 27, 2024 •

edited

Loading

nikola-jokic commented Jan 29, 2024

Nuru commented Jan 29, 2024

zerola commented Jan 31, 2024

nikola-jokic commented Jan 31, 2024

zerola commented Jan 31, 2024

nikola-jokic commented Feb 1, 2024

zerola commented Feb 1, 2024

nikola-jokic commented Feb 1, 2024

zerola commented Feb 1, 2024

caiocsgomes commented Feb 28, 2024

caiocsgomes commented Feb 28, 2024

zerola commented Feb 28, 2024

MPV commented Mar 12, 2024 •

edited

Loading

romanvogman commented Mar 14, 2024

remidebette commented Apr 12, 2024

nikola-jokic commented Apr 12, 2024

remidebette commented Apr 12, 2024 •

edited

Loading

nikola-jokic commented Apr 12, 2024

noamgreen commented Aug 17, 2024

Nek0trkstr commented Aug 29, 2024

Access to the path '/proc/<ID>/oom_score_adj' is denied #3132

Access to the path '/proc/<ID>/oom_score_adj' is denied #3132

Comments

romanvogman commented Dec 6, 2023 • edited Loading

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

github-actions bot commented Dec 6, 2023

nikola-jokic commented Dec 6, 2023

romanvogman commented Dec 6, 2023

nikola-jokic commented Dec 6, 2023

zerola commented Jan 24, 2024

dmalone-keebo commented Jan 26, 2024

Nuru commented Jan 27, 2024 • edited Loading

nikola-jokic commented Jan 29, 2024

Nuru commented Jan 29, 2024

zerola commented Jan 31, 2024

nikola-jokic commented Jan 31, 2024

zerola commented Jan 31, 2024

nikola-jokic commented Feb 1, 2024

zerola commented Feb 1, 2024

nikola-jokic commented Feb 1, 2024

zerola commented Feb 1, 2024

caiocsgomes commented Feb 28, 2024

caiocsgomes commented Feb 28, 2024

zerola commented Feb 28, 2024

MPV commented Mar 12, 2024 • edited Loading

romanvogman commented Mar 14, 2024

remidebette commented Apr 12, 2024

nikola-jokic commented Apr 12, 2024

remidebette commented Apr 12, 2024 • edited Loading

nikola-jokic commented Apr 12, 2024

noamgreen commented Aug 17, 2024

Nek0trkstr commented Aug 29, 2024

romanvogman commented Dec 6, 2023 •

edited

Loading

Nuru commented Jan 27, 2024 •

edited

Loading

MPV commented Mar 12, 2024 •

edited

Loading

remidebette commented Apr 12, 2024 •

edited

Loading