-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTel Context is getting lost in GraphQL manual instrumentation #6583
Comments
Let me know if a sample project with minimal reproducible code is required. |
That would be very useful 🙂 |
@jack-berg here is a sample project with a reproducible example https://github.com/govi20/dgs-otel This is where the OTel Context gets lost => DepartmentDataloader and the data loaders get called from here => EmployeeDataFetcher and there is no thread switch in between. The thread pool that I have configured executor that wraps the task using I've added steps to reproduce in the bug report. |
It looks like dgs is based on graphql-java, which has library instrumentation (https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/graphql-java/graphql-java-20.0/library). I wonder if you just plug in that instrumentation library if a bunch of these issues would go away. Worth a shot, at least, to see how it does! |
@jkwatson does this library perform auto instrumentation? I need to rely on manual instrumentation logic |
The "library" instrumentation doesn't require the javaagent. You can just plug it in programmatically. You might need to figure out how to hook it into dgs, but it doesn't need the agent. |
Given this documentation, I would guess it'll work just fine: https://netflix.github.io/dgs/advanced/instrumentation/ |
@jkwatson That's exactly what I am doing in my code, manually instrumenting using SimpleInstrumention implementation |
I recommend trying the library instrumentation then, and raising issues in the instrumentation repo if it has holes. |
@jkwatson Unfortunately, I can't use the library because it lacks the ability to instrument GraphQL data resolvers. But still, I've tried it out, and the issue can indeed be reproducible with this library. I plan to report this issue to the instrumentation repo. However, I'm unsure if the problem lies within the core libraries or in the instrumentation. |
The issue won't be with the core libraries. It's definitely an issue with making the instrumentation correctly propagate the context where it needs to go. |
Continuing @kilink 's comment Netflix/dgs-framework#1928 (comment) , To me it seems parameters.get(0).getContext().makeCurrent() is the only way to propagate context, without changing the current code. |
If I remember correctly, it doesn’t work even if the batch contains only 1 parameter.
Yes that fixes the issue but it is a workaround, doesn’t look elegant because ‘parameter’ is a domain object. |
Doesn't matter if it's 1 or more parameter. It's the DataLoader framework which is spitting out the CompletableFuture with fixed default executor.
Doesn't seem that this is in opentelemetry control. Opentelemetry already provides context propagation by customizing CompletableFuture with otel context wrapped executor, but the DataLoader framework is using fixed default executor in between. |
I have an async GraphQL resolver which I am using along with a GraphQLdata loader. The GraphQL service performs manual instrumentation.
The problem I am facing here is that OpenTelemetry context is getting lost in the load method. This is happening even before offloading the task on the context-wrapped executor.
Steps to reproduce
What did you expect to see?
Otel Span and context should be available in the DepartmentDataLoader's load() method as there is no thread switch in between. the span's end method is not called so I assume the span is not closed either
What did you see instead?
OTel Context is not getting propagated.
What version and what artifacts are you using?
Version: 1.38.0, I use custom implementation of SpanExporter.
Environment
This is not environment specific issue, it's reproducible on MacOS as well as CentOS.
Additional context
I've reported this issue on GraphQL DGS Framework that I use: Netflix/dgs-framework#1928
The Netflix DGS Folks have recommended me to check with tracing framework team.
The text was updated successfully, but these errors were encountered: