Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log warning on same version upgrade to prevent failed status #6273

Merged
merged 18 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Kind can be one of:
# - breaking-change: a change to previously-documented behavior
# - deprecation: functionality that is being removed in a later release
# - bug-fix: fixes a problem in a previous version
# - enhancement: extends functionality but does not break or fix existing behavior
# - feature: new functionality
# - known-issue: problems that we are aware of in a given version
# - security: impacts on the security of a product or a user’s deployment.
# - upgrade: important information for someone upgrading from a prior version
# - other: does not fit into any of the other categories
kind: bug-fix

# Change summary; a 80ish characters long description of the change.
summary: Log warning on same version upgrade attempts

# Long description; in case the summary is not enough to describe the change
# this field accommodate a description without length limits.
# NOTE: This field will be rendered only for breaking-change and known-issue kinds at the moment.
description: |
Log a warning instead of reporting an error whan a same-version upgrade is
attempted. This prevents the agent from reporting a "failed" status.

# Affected component; usually one of "elastic-agent", "fleet-server", "filebeat", "metricbeat", "auditbeat", "all", etc.
component: elastic-agent

# PR URL; optional; the PR number that added the changeset.
# If not present is automatically filled by the tooling finding the PR where this changelog fragment has been added.
# NOTE: the tooling supports backports, so it's able to fill the original PR number instead of the backport PR number.
# Please provide it if you are adding a fragment for a different PR.
#pr: https://github.com/owner/repo/1234

# Issue URL; optional; the GitHub issue related to this changeset (either closes or is part of).
# If not present is automatically filled by the tooling with the issue linked to the PR number.
issue: https://github.com/elastic/elastic-agent/issues/6186
5 changes: 5 additions & 0 deletions internal/pkg/agent/application/coordinator/coordinator.go
Original file line number Diff line number Diff line change
Expand Up @@ -575,6 +575,11 @@ func (c *Coordinator) Upgrade(ctx context.Context, version string, sourceURI str
cb, err := c.upgradeMgr.Upgrade(ctx, version, sourceURI, action, det, skipVerifyOverride, skipDefaultPgp, pgpBytes...)
if err != nil {
c.ClearOverrideState()
if errors.Is(err, upgrade.ErrUpgradeSameVersion) {
// Set upgrade state to completed, but return an error if a same version-upgrade is attempted.
det.SetState(details.StateCompleted)
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to consider this an error, that is what lead to the bug this is fixing in the first place. It needs to be like an idempotent API call. It returns success if the action has already completed successfully once.

}
det.Fail(err)
return err
}
Expand Down
24 changes: 17 additions & 7 deletions internal/pkg/agent/application/upgrade/upgrade.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ var agentArtifact = artifact.Artifact{
}

var ErrWatcherNotStarted = errors.New("watcher did not start in time")
var ErrUpgradeSameVersion = errors.New("upgrade did not occur because it is the same version")

// Upgrader performs an upgrade
type Upgrader struct {
Expand Down Expand Up @@ -162,6 +163,19 @@ func (av agentVersion) String() string {
func (u *Upgrader) Upgrade(ctx context.Context, version string, sourceURI string, action *fleetapi.ActionUpgrade, det *details.Details, skipVerifyOverride bool, skipDefaultPgp bool, pgpBytes ...string) (_ reexec.ShutdownCallbackFn, err error) {
u.log.Infow("Upgrading agent", "version", version, "source_uri", sourceURI)

currentVersion := agentVersion{
version: release.Version(),
snapshot: release.Snapshot(),
hash: release.Commit(),
}

// check version before download
same, _ := isSameVersion(u.log, currentVersion, packageMetadata{}, version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will not work for snapshot upgrades.
The packageMetadata is needed to check on the hash for snapshots in the check below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it should work for non-snapshot attempts, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchila so in this case this will fail because hash is unpopulated right? but then later it is extracted and checked again on line 224. so we lose a bit of time but will not perform upgrade

Copy link
Member

@pchila pchila Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are relying on the fact that the current version has a not empty hash compared to the zero value in packageMetadata{} so I don't see how it can ever detect this to be the same version as the current one if the hash in packageMetadata is ""

I guess we will never get true from here... Did you test this ? Is there a case where this returns true ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're correct, I forgot that comparing currentVersion vs version would also compare the hashes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new func that does this comparison correctly.

if same {
u.log.Warnf("Upgrade action skipped because agent is already at version %s", currentVersion)
return nil, ErrUpgradeSameVersion
}

// Inform the Upgrade Marker Watcher that we've started upgrading. Note that this
// is only possible to do in-memory since, today, the process that's initiating
// the upgrade is the same as the Agent process in which the Upgrade Marker Watcher is
Expand Down Expand Up @@ -206,15 +220,11 @@ func (u *Upgrader) Upgrade(ctx context.Context, version string, sourceURI string
return nil, fmt.Errorf("reading metadata for elastic agent version %s package %q: %w", version, archivePath, err)
}

currentVersion := agentVersion{
version: release.Version(),
snapshot: release.Snapshot(),
hash: release.Commit(),
}

// Recheck version here in case of a snapshot->snapshot upgrade on the same version.
same, newVersion := isSameVersion(u.log, currentVersion, metadata, version)
if same {
return nil, fmt.Errorf("agent version is already %s", currentVersion)
u.log.Warnf("Upgrade action skipped because agent is already at version %s", currentVersion)
return nil, ErrUpgradeSameVersion
michel-laterman marked this conversation as resolved.
Show resolved Hide resolved
}

u.log.Infow("Unpacking agent package", "version", newVersion)
Expand Down
17 changes: 17 additions & 0 deletions internal/pkg/agent/application/upgrade/upgrade_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -733,6 +733,23 @@ func TestIsSameVersion(t *testing.T) {
newVersion: agentVersion123SNAPSHOTghijkl,
},
},
{
name: "same version and snapshot, no hash (SNAPSHOT upgrade before download)",
args: args{
current: agentVersion123SNAPSHOTabcdef,
metadata: packageMetadata{
manifest: nil,
},
version: "1.2.3-SNAPSHOT",
},
want: want{
same: false,
newVersion: agentVersion{
version: "1.2.3",
snapshot: true,
},
},
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ func TestStandaloneUpgradeSameCommit(t *testing.T) {
upgradetest.WithUnprivileged(unprivilegedAvailable),
upgradetest.WithDisableHashCheck(true),
)
assert.ErrorContainsf(t, err, fmt.Sprintf("agent version is already %s", currentVersion), "upgrade should fail indicating we are already at the same version")
assert.ErrorContains(t, err, "upgrade did not occur because it is the same version")
})

t.Run(fmt.Sprintf("Upgrade on a repackaged version of agent %s (%s)", currentVersion, unPrivilegedString), func(t *testing.T) {
Expand Down
4 changes: 4 additions & 0 deletions testing/upgradetest/upgrader.go
Original file line number Diff line number Diff line change
Expand Up @@ -332,11 +332,15 @@ func PerformUpgrade(

upgradeOutput, err := startFixture.Exec(ctx, upgradeCmdArgs)
if err != nil {
logger.Logf("Upgrade err: %v, output: %s", err, upgradeOutput)
// Sometimes the gRPC server shuts down before replying to the command which is expected
// we can determine this state by the EOF error coming from the server.
// If the server is just unavailable/not running, we should not succeed.
// Starting with version 8.13.2, this is handled by the upgrade command itself.
outputString := string(upgradeOutput)
if strings.Contains(outputString, "upgrade did not occur because it is the same version") {
michel-laterman marked this conversation as resolved.
Show resolved Hide resolved
return fmt.Errorf("upgrade did not occur because it is the same version")
}
isConnectionInterrupted := strings.Contains(outputString, "Unavailable") && strings.Contains(outputString, "EOF")
if !isConnectionInterrupted {
return fmt.Errorf("failed to start agent upgrade to version %q: %w\n%s", endVersionInfo.Binary.Version, err, upgradeOutput)
Expand Down
Loading