Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging to native health check #231

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fsamuel-bs
Copy link

We fail after 2 minutes when checking the native health check.
This might not be enough time to have all the docker containers up.
We can configure the timeout, but it was quite hard to debug it without logging.

We fail after 2 minutes when checking the native health check.
This might not be enough time to have all the docker containers up.
We can configure the timeout, but it was quite hard to debug this without logging.

@FunctionalInterface
public interface ClusterHealthCheck {
Logger log = LoggerFactory.getLogger(ClusterHealthCheck.class);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this here is sad. Should I make ClusterHealthCheck an abstract class and have this as a private variable?

Copy link
Contributor

@j-baker j-baker Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - instead, note that this is only used in NativeHealthCheck and do

static ClusterHealthCheck nativeHealthChecks() {
    return new ClusterHealthCheck() {
        static final Logger log = LoggerFactory.getLogger(ClusterHealthCheck.class);
        @Override
        SuccessOrFailure isClusterHealthy(Cluster cluster) throws InterruptedException {
            Set<String> unhealthyContainers = new LinkedHashSet<>();
            .........
        }
    }
}

@@ -52,13 +56,18 @@ static ClusterHealthCheck nativeHealthChecks() {
return cluster -> {
Set<String> unhealthyContainers = new LinkedHashSet<>();
try {
log.info("Checking health of containers {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these may be a little spammy, FYI. Think this is done in a fairly tight loop, so you may end up with many log lines.

Instead, I'd recommend doing this like:

Set<String> lastUnhealthyContainers = new LinkedHashSet<>();
return new ClusterHealthCheck() {
    @Override
    public SuccessOrFailure isClusterHealthy(Cluster cluster) throws InterruptedException {
        Set<String> currentUnhealthyContainers = new LinkedHashSet<>();
        boolean healthyContainerSetChanged = false;
        for (Container container : cluster.allContainers()) {
            State state = container.state();
            if (state == State.UNHEALTHY) {
                currentUnhealthyContainers.add(container.getContainerName());
            }
        }

        if (currentUnhealthyContainers.equals(lastUnhealthyContainers)) {
              log(currentUnhealthyContainers);
              lastUnhealthyContainers.clear();
              lastUnhealthyContainers.addAll(currentUnhealthyContainers);
        }

        if (!currentUnhealthyContainers.isEmpty()) {
            return SuccessOrFailure.failure("The following containers are not healthy: " ...);
        }

        return SuccessOrFailure.success();
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where you only log if the state has actually changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants