-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] %%send-to-spark fails for dataframes with '\n' or ' characters #862
Comments
🤖 Hello @spbansal, I'm Dosu, a bot here to assist you while a human maintainer is not available. I'm designed to help resolve issues, answer questions, and guide you on how to contribute to our projects. I'm here to help, so let's work together to find a solution to your issue. Thank you for reporting this issue. It seems like the problem is due to the '\n' and single quote characters not being properly escaped when the pandas dataframe is converted to a JSON string. This can be fixed by modifying the Here's a patch that should fix the issue: from pandas.io.json import dumps
class SendPandasDfToSparkCommand(Command):
# ...
def _pandas_df_to_json(self, records):
return dumps(records, ensure_ascii=False).replace('\n', '\\n').replace("'", "\\'") This patch modifies the Please apply this patch to your local copy of sparkmagic and test it with your use case. If it works as expected, you may want to submit it as a pull request to the sparkmagic repository so that it can be included in future versions of the software. As for your question about how the '%%send-to-spark' command is implemented in the sparkmagic codebase, it is implemented in the I hope this helps! If you have any other questions, feel free to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Based on the error message, I think that a cell containing double quotes would also be a problem. I prefer the human suggestion to encode everything in base64 and decode on the other end. It would be safer. |
Describe the bug
When using %%send-to-spark with a local pandas df which contains '\n' or ' (single quote) character, the command fails with the following error
SyntaxError: invalid syntax
To Reproduce
You can run the following notebook for reproducing this issue
Expected behavior
The dataframe should have been properly send to the remote spark kernel.
Screenshots
Versions:
Additional context
A workaround for this was to base64 encode any string fields before sending to spark. It would be great if this or a different solution could be patched in the repo.
The text was updated successfully, but these errors were encountered: