Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Annotate all Rels with output_names #1145

Open
tokoko opened this issue Sep 20, 2024 · 3 comments
Open

feat: Annotate all Rels with output_names #1145

tokoko opened this issue Sep 20, 2024 · 3 comments

Comments

@tokoko
Copy link
Collaborator

tokoko commented Sep 20, 2024

substrait doesn't keep track of intermediate field names, which makes it hard to properly round-trip queries. Now there's a way to optionally include names in output_names field as a hint attached to every Rel. compiler should populate these fields in order for eventual decompiler to take advantage of them.

@tokoko
Copy link
Collaborator Author

tokoko commented Sep 20, 2024

@gforsyth this will probably result in considerably larger proto objects for some queries, so I'm thinking of making the feature optional.

@gforsyth
Copy link
Member

Hey @tokoko! I have a few open-ended questions that I think might be useful to flesh out a bit.

Would an eventual decompiler (oh no) require these intermediate field names or merely take advantage of them when they were available?

If they're required, then I think making them optional is going to be a future foot-gun.

I guess this next one isn't a question: I wouldn't worry about making the feature optional until a) we've confirmed how much larger it makes proto objects (probably something like O(num_rels)?) and b) that we have some notion that there is actually a performance penalty worth caring about, like deserialization cost vs query execution time.

@tokoko
Copy link
Collaborator Author

tokoko commented Sep 23, 2024

Would an eventual decompiler (oh no) require these intermediate field names or merely take advantage of them when they were available?

I think not. Decompiler should ideally handle all valid substrait plans, not just the ones that we produce, so it should be strictly optional with a fallback to "randomly" generated names.

I agree on the second point. We can make it work first and evaluate performance trade-off later.

gforsyth pushed a commit that referenced this issue Sep 24, 2024
This is first PR to solve #1145 and adds outputNames hint to all Rels
except for JoinRel and CrossRel. Join relations need more work to handle
name collisions as ibis doesn't do it for every join step as part of
JoinChain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: backlog
Development

No branches or pull requests

2 participants