Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/renamed-columns #55

Merged
merged 6 commits into from
Jul 15, 2024
Merged

bug/renamed-columns #55

merged 6 commits into from
Jul 15, 2024

Conversation

fivetran-catfritz
Copy link
Contributor

@fivetran-catfritz fivetran-catfritz commented Jul 9, 2024

PR Overview

This PR will address the following Issue/Feature:

  • internal ticket

This PR will result in the following new package version:

  • v1.1.0 - breaking since some datatypes can change and could affect customers' downstream uses.

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

🚨 Breaking Change 🚨

  • This change is made breaking due to changes made in the source package. See the v1.1.0 dbt_salesforce_source release notes for more details.
  • Added logic to support user-specified scenarios where the Fivetran Salesforce connector syncs column names using the original Salesforce API naming convention. For example, while Fivetran typically provides the column as created_date, some users might choose to receive it as CreatedDate according to the API naming. This update ensures the package is automatically compatible with both naming conventions.
    • Specifically, the package now performs a COALESCE, preferring the original Salesforce API naming. If the original naming is not present, the Fivetran version is used instead.
    • Renamed columns are now explicitly cast to prevent conflicts during the COALESCE.
    • ❗This change is considered breaking since the resulting column types may differ from prior versions of this package.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

  • I had tried to write consistency tests, but because of the datatype changes necessitating the breaking change, comparing rows did not make sense. I extended the column test from the source package to look at all tables and column names, and that passed.
  • The other way I tested was running the transform package and made sure it ran without error and brought data through.

If you had to summarize this PR in an emoji, which would it be?

💃

@fivetran-catfritz fivetran-catfritz self-assigned this Jul 9, 2024
packages.yml Outdated
Comment on lines 2 to 6
# - package: fivetran/salesforce_source
# version: [">=1.1.0", "<1.2.0"]
- git: https://github.com/fivetran/dbt_salesforce_source.git
revision: bug/renamed-columns-option3
warn-unpinned: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# - package: fivetran/salesforce_source
# version: [">=1.1.0", "<1.2.0"]
- git: https://github.com/fivetran/dbt_salesforce_source.git
revision: bug/renamed-columns-option3
warn-unpinned: false
- package: fivetran/salesforce_source
version: [">=1.1.0", "<1.2.0"]

commit before merge

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz thanks for applying this breaking change update here and also adding the validation test! A have a few small comments and one small thing I wanted to callout, I noticed the consistency_daily_activity test for me is failing when using our internal data 🤔

image

Would you be able to look into this and see if you also experience these row failures? At first glance it looks like the only field are not tying out between dev and prod is the following:

  • tasks_completed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this here as well!

Comment on lines 16 to 18
case when lower(data_type) like '%numeric%' or lower(data_type) like '%float%' then 'numeric'
else data_type
end as data_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are making this a breaking change, would it make more sense to remove this case when statement? We know it will fail for this release (why it is breaking now), but it should not need this case statement for future updates and could risk us not catching datatype mismatches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was helpful to run it this time with the case statement but it makes sense to remove it for the future. Updated!

integration_tests/dbt_project.yml Show resolved Hide resolved
Comment on lines 32 to 44
final as (
-- test will fail if any rows from prod are not found in dev
(select * from prod
except distinct
select * from dev)

union all -- union since we only care if rows are produced

-- test will fail if any rows from dev are not found in prod
(select * from dev
except distinct
select * from prod)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is great! However, if there is a failure it is really difficult to decipher. Which record is from the prod/dev test and which is from the dev/prod test. Would there be a way to make this easier to understand when there is a failure?

Not required in this update, but I know we are reusing this a lot and would like to think about this for a future enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I found a way and updated!

Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz I have made the update here as well!

Adding a note here about the consistency test failure for consistency_daily_activity. I added a filter to remove the current day's rows in case there is sync activity for the current day between prod and dev runs. There is still the chance of a failure if other changes happen in between runs, but removing the current day should reduce that chance.

integration_tests/dbt_project.yml Show resolved Hide resolved
Comment on lines 16 to 18
case when lower(data_type) like '%numeric%' or lower(data_type) like '%float%' then 'numeric'
else data_type
end as data_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was helpful to run it this time with the case statement but it makes sense to remove it for the future. Updated!

Comment on lines 32 to 44
final as (
-- test will fail if any rows from prod are not found in dev
(select * from prod
except distinct
select * from dev)

union all -- union since we only care if rows are produced

-- test will fail if any rows from dev are not found in prod
(select * from dev
except distinct
select * from prod)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I found a way and updated!

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the package dep is bumped

CHANGELOG.md Outdated Show resolved Hide resolved
- Specifically, the package now performs a COALESCE, preferring the original Salesforce API naming. If the original naming is not present, the Fivetran version is used instead.
- Renamed columns are now explicitly cast to prevent conflicts during the COALESCE.
- ❗This change is considered breaking since the resulting column types may differ from prior versions of this package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend adding a Under the Hood section, and a line for the consistency tests you created.

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz Ready for release on Monday, with two minor edits on the CHANGELOG!

@fivetran-catfritz fivetran-catfritz merged commit e945a33 into main Jul 15, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants