Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIG Storage] Convert all jobs to Ginkgo --label-filter #33152

Open
13 tasks
carlory opened this issue Jul 30, 2024 · 11 comments
Open
13 tasks

[SIG Storage] Convert all jobs to Ginkgo --label-filter #33152

carlory opened this issue Jul 30, 2024 · 11 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@carlory
Copy link
Member

carlory commented Jul 30, 2024

What would you like to be added:

  • sig-storage-kind.yaml

    • pull-kubernetes-e2e-storage-kind-disruptive (run_if_changed, @pohly do we need to add a canary job before change it?) testgrid
    • pull-kubernetes-e2e-storage-kind-alpha-beta-features (manual, no canary) testgrid
    • ci-kubernetes-e2e-storage-kind-disruptive testgrid
    • ci-kubernetes-e2e-storage-kind-alpha-beta-features testgrid
  • sig-storage-gce-config.yaml

    • pull-kubernetes-e2e-gce-storage-slow (run_if_changed, some question as above) testgrid
    • pull-kubernetes-e2e-gce-storage-snapshot (run_if_changed, some question as above) testgrid
    • pull-kubernetes-e2e-gce-csi-serial (run_if_changed, some question as above) testgrid
    • pull-kubernetes-e2e-gce-storage-disruptive (manual, no canary)
    • pull-kubernetes-e2e-aws-storage-selinux (manual, no canary)
    • ci-kubernetes-e2e-gce-iscsi-serial testgrid
    • ci-kubernetes-e2e-snapshot

Why is this needed:

Part of #32911

/sig storage

@carlory carlory added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 30, 2024
@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jul 30, 2024
@carlory
Copy link
Member Author

carlory commented Jul 30, 2024

job: pull-kubernetes-e2e-storage-kind-alpha-beta-features

- name: FEATURE_GATES
  value: '{"VolumeAttributesClass":true, "HonorPVReclaimPolicy": true, "CSIVolumeHealth": true}'
- name: RUNTIME_CONFIG
  value: '{"api/ga":"true", "storage.k8s.io/v1alpha1":"true", "storage.k8s.io/v1beta1":"true"}'
- name: FOCUS
  value: \[Feature:VolumeAttributesClass\]|\[Feature:HonorPVReclaimPolicy\]|\[Feature:CSIVolumeHealth\]
- name: PARALLEL
  value: "true"

expected (if focus on some features)

- name: FEATURE_GATES
  value: '{"VolumeAttributesClass":true, "HonorPVReclaimPolicy": true, "CSIVolumeHealth": true}'
- name: RUNTIME_CONFIG
  value: '{"api/ga":"true", "storage.k8s.io/v1alpha1":"true", "storage.k8s.io/v1beta1":"true"}'
- name: LABEL_FILTER
  value: '!Slow && !Disruptive && !Flaky && FeatureGate: containsAny {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}'
- name: PARALLEL
  value: "true"

Test on macos

(base) ➜  kubernetes git:(master) _output/bin/ginkgo --dry-run --label-filter='!Slow && !Disruptive && !Flaky && FeatureGate: containsAny {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}' -v ./test/e2e | grep SUCCESS
SUCCESS! -- 25 Passed | 0 Failed | 0 Pending | 6578 Skipped

@carlory
Copy link
Member Author

carlory commented Jul 30, 2024

job: pull-kubernetes-e2e-storage-kind-alpha-beta-features

- name: FEATURE_GATES
  value: '{"VolumeAttributesClass":true, "HonorPVReclaimPolicy": true, "CSIVolumeHealth": true}'
- name: RUNTIME_CONFIG
  value: '{"api/ga":"true", "storage.k8s.io/v1alpha1":"true", "storage.k8s.io/v1beta1":"true"}'
- name: FOCUS
  value: \[Feature:VolumeAttributesClass\]|\[Feature:HonorPVReclaimPolicy\]|\[Feature:CSIVolumeHealth\]
- name: PARALLEL
  value: "true"

expected (if focus on sig-storage without the feature label)

- name: FEATURE_GATES
  value: '{"VolumeAttributesClass":true, "HonorPVReclaimPolicy": true, "CSIVolumeHealth": true}'
- name: RUNTIME_CONFIG
  value: '{"api/ga":"true", "storage.k8s.io/v1alpha1":"true", "storage.k8s.io/v1beta1":"true"}'
- name: LABEL_FILTER
  value:  '!Slow && !Disruptive && !Flaky && (Feature: consistsOf Alpha || Feature: consistsOf Beta) && !(FeatureGate: isEmpty) && sig-storage'
- name: PARALLEL
  value: "true"

Test on macos with the master branch of k/k

(base) ➜  kubernetes git:(master) _output/bin/ginkgo --dry-run --label-filter='!Slow && !Disruptive && !Flaky && (Feature: consistsOf Alpha || Feature: consistsOf Beta) && !(FeatureGate: isEmpty) && sig-storage' -v ./test/e2e | grep SUCCESS
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 6603 Skipped

0 Passed ? related-to kubernetes/kubernetes#125452.

If we want to apply this change, the kind of sig-testing jobs must be updated first. cc @pohly @aojea

@pohly
Copy link
Contributor

pohly commented Jul 30, 2024

do we need to add a canary job before change it?

I don't think so. If there is one, use it, otherwise let's just modify the job directly.

Test on macos

A before/after comparison would be more useful than the absolute number. The goal is to run exactly the same tests as before.

If we want to apply this change, the kind of sig-testing jobs must be updated first.

Feature: consistsOf Alpha || Feature: consistsOf Beta doesn't make sense to me. It means "if the Feature set is either exactly Alpha or Beta", which is too restrictive.

@carlory
Copy link
Member Author

carlory commented Jul 31, 2024

_output/bin/ginkgo --dry-run --label-filter='!Slow && !Disruptive && !Flaky && !(FeatureGate: isEmpty) && sig-storage' -v ./test/e2e

------------------------------
[sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (ntfs)] [Feature:Windows] volume-modify [Feature:VolumeAttributesClass] [FeatureGate:VolumeAttributesClass] [Beta] should create a volume with VAC [sig-storage, Feature:Windows, Feature:VolumeAttributesClass, FeatureGate:VolumeAttributesClass, Feature:Beta]
/Users/kiki/workspace/golang/src/k8s.io/kubernetes/test/e2e/storage/testsuites/volume_modify.go:151
• [0.000 seconds]

------------------------------
[sig-storage] CSI Mock selinux on mount SELinuxMount [LinuxOnly] [Feature:SELinux] should pass SELinux mount option for RWOP volume and Pod with SELinux context set [FeatureGate:SELinuxMountReadWriteOncePod] [Beta] [sig-storage, Feature:SELinux, FeatureGate:SELinuxMountReadWriteOncePod, Feature:Beta]

Without Feature: consistsOf Alpha || Feature: consistsOf Beta, the above test cases will be selected. i.e. [Feature:Windows], [Feature:SELinux], and etc.

According to the strict definition of a feature label, it should depend on other extra requirements. If it's not skipped in their own test cases, the job may fail in kind cluster.

So if we can not use Feature: consistsOf Alpha || Feature: consistsOf Beta to filter out the test cases, I have to explicitly specify the feature-gate in the label-filter. i.e. #33152 (comment) @pohly

@pohly
Copy link
Contributor

pohly commented Jul 31, 2024

Let's first spell out what pull-kubernetes-e2e-storage-kind-alpha-beta-features is supposed to run: "all SIG Storage tests which are alpha or beta, are not flaky or slow, and which can run in a kind cluster where VolumeAttributesClass/HonorPVReclaimPolicy/CSIVolumeHealth are supported".

This leads me to:

sig-storage && Feature: containsAny {Alpha, Beta} && !Slow && !Flaky && Feature: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth, Alpha, Beta}

One problem is that this includes variants of the tests for drivers other than [Driver: csi-hostpath], like [Driver: pd.csi.storage.gke.io]. It would be good to turn those inline tags into labels and exclude non-csi-hostpath drivers, then Ginkgo doesn't need to start tests that then skip themselves because the driver cannot be installed.

One thing that I noticed while looking at --dry-run output: why is this test marked as [Serial]?

[sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io] [Serial] [Testpattern: Dynamic PV (block volmode)] volume-modify [Feature:VolumeAttributesClass] [FeatureGate:VolumeAttributesClass] [Beta] should create a volume with VAC [sig-storage, Serial, Feature:VolumeAttributesClass, FeatureGate:VolumeAttributesClass, Feature:Beta]

If you have many of those, it can considerably slow down the job.

@carlory
Copy link
Member Author

carlory commented Jul 31, 2024

Feature: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth, Alpha, Beta}

If so, we have to introduce a new feature label even if the feature only depends feature-gate and alpha/beta API group.

Do you mean it is sig-storage && Feature: containsAny {Alpha, Beta} && !Slow && !Flaky && FeatureGate: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}'?

If yes, it can be written as 'sig-storage && !Slow && !Flaky && FeatureGate: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}'. When we use framework.WithFeatureGate, it always adds a Feature label, alpha or beta, to the test. So, we can use FeatureGate to filter the test.

@carlory
Copy link
Member Author

carlory commented Jul 31, 2024

then Ginkgo doesn't need to start tests that then skip themselves because the driver cannot be installed.

Those tests will skip when it's running unless tests run on specific environment.

I'm not sure whether I should filter out those tests or not in job configuration.

@carlory
Copy link
Member Author

carlory commented Jul 31, 2024

why is this test marked as [Serial]?
[sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io] [Serial] [Testpattern: Dynamic PV (block volmode)] volume-modify [Feature:VolumeAttributesClass] [FeatureGate:VolumeAttributesClass] [Beta] should create a volume with VAC [sig-storage, Serial, Feature:VolumeAttributesClass, FeatureGate:VolumeAttributesClass, Feature:Beta]

The gce-pd-csi-driver does not support custom provisioner name, it always uses pd.csi.storage.gke.io as the provisioner name. This is not configurable.

All in-tree integrations with cloud providers are removed. Should we remove related e2e tests from k/k. i.e.

https://github.com/kubernetes/kubernetes/blob/aab56e9b70b7d80f2a5a5f2907e172de257662b5/test/e2e/storage/drivers/csi.go#L830

cc @dims @jsafrane @msau42 @pohly

Should all jobs in sig-storage-gce-config.yaml be moved into sig-cloudprovider?

@pohly
Copy link
Contributor

pohly commented Jul 31, 2024

If so, we have to introduce a new feature label even if the feature only depends feature-gate and alpha/beta API group.

Yes. That is the current status.

Do you mean it is sig-storage && Feature: containsAny {Alpha, Beta} && !Slow && !Flaky && FeatureGate: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}?

That will work eventually, but right now a Feature: VolumeAttributesClass is required because some other job might not skip this test properly without such a Feature.

I'm not sure whether I should filter out those tests or not in job configuration.

You don't have to. Those e2eskipper.SkipUnlessProviderIs calls are intentionally executed before any costly per-test setup runs (like creating the per-test namespace). But I find skipping them in the job a bit nicer.

Should all jobs in sig-storage-gce-config.yaml be moved into sig-cloudprovider?

Probably? But I have to defer to the expert here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 29, 2024
@carlory
Copy link
Member Author

carlory commented Oct 30, 2024

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

4 participants