Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDTEST-173] Retry new tests - parse remote configuration and fetch unique known tests #227

Merged
merged 16 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
f0d2445
parse early_flake_detection_enabled? from remote library settings
anmarchenko Aug 29, 2024
57a0d01
add DD_CIVISIBILITY_EARLY_FLAKE_DETECTION_ENABLED killswitch to settings
anmarchenko Aug 29, 2024
1c91543
use DD_CIVISIBILITY_EARLY_FLAKE_DETECTION_ENABLED killswitch in TestR…
anmarchenko Aug 29, 2024
0798794
parse slow test retries payload from library settings
anmarchenko Aug 30, 2024
a8ed9a9
use slow_test_retries in TestRetries::Component
anmarchenko Aug 30, 2024
063dea7
move enabled and slow_test_retries keys to constants
anmarchenko Aug 30, 2024
5a163ce
fix DD_API_SETTINGS_RESPONSE_SLOW_TEST_RETRIES_KEY
anmarchenko Aug 30, 2024
b9aadf3
parse faulty_session_threshold from remote settings
anmarchenko Aug 30, 2024
080014c
set retry_new_tests_percentage_limit in TestRetries::Component
anmarchenko Aug 30, 2024
a99a46a
additional constants for new test retries, rename skippable_test_id t…
anmarchenko Aug 30, 2024
db34827
add UniqueTestsClient to fetch a set of unique tests from backend
anmarchenko Aug 30, 2024
d17805b
build and inject UniqueTestsClient in TestRetries::Component on libra…
anmarchenko Aug 30, 2024
ddc2443
fetch and store unique tests set when configuring TestRetries::Component
anmarchenko Sep 2, 2024
2296b7c
emit early_flake_detection.response_tests metric after unique tests a…
anmarchenko Sep 2, 2024
b58b477
configure test retries and test optimisation in parallel using thread…
anmarchenko Sep 2, 2024
46630f4
finish comment
anmarchenko Sep 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion lib/datadog/ci/configuration/components.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
require_relative "../test_optimisation/coverage/writer"
require_relative "../test_retries/component"
require_relative "../test_retries/null_component"
require_relative "../test_retries/unique_tests_client"
require_relative "../test_visibility/component"
require_relative "../test_visibility/flush"
require_relative "../test_visibility/null_component"
Expand Down Expand Up @@ -116,7 +117,9 @@ def activate_ci!(settings)
@test_retries = TestRetries::Component.new(
retry_failed_tests_enabled: settings.ci.retry_failed_tests_enabled,
retry_failed_tests_max_attempts: settings.ci.retry_failed_tests_max_attempts,
retry_failed_tests_total_limit: settings.ci.retry_failed_tests_total_limit
retry_failed_tests_total_limit: settings.ci.retry_failed_tests_total_limit,
retry_new_tests_enabled: settings.ci.retry_new_tests_enabled,
unique_tests_client: build_unique_tests_client(settings, test_visibility_api)
)
# @type ivar @test_optimisation: Datadog::CI::TestOptimisation::Component
@test_optimisation = build_test_optimisation(settings, test_visibility_api)
Expand Down Expand Up @@ -236,6 +239,14 @@ def build_library_settings_client(settings, api)
)
end

def build_unique_tests_client(settings, api)
TestRetries::UniqueTestsClient.new(
api: api,
dd_env: settings.env,
config_tags: custom_configuration(settings)
)
end

# fetch custom tags provided by the user in DD_TAGS env var
# with prefix test.configuration.
def custom_configuration(settings)
Expand Down
6 changes: 6 additions & 0 deletions lib/datadog/ci/configuration/settings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,12 @@ def self.add_settings!(base)
o.default 1000
end

option :retry_new_tests_enabled do |o|
o.type :bool
o.env CI::Ext::Settings::ENV_RETRY_NEW_TESTS_ENABLED
o.default true
end

define_method(:instrument) do |integration_name, options = {}, &block|
return unless enabled

Expand Down
1 change: 1 addition & 0 deletions lib/datadog/ci/ext/settings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ module Settings
ENV_RETRY_FAILED_TESTS_ENABLED = "DD_CIVISIBILITY_FLAKY_RETRY_ENABLED"
ENV_RETRY_FAILED_TESTS_MAX_ATTEMPTS = "DD_CIVISIBILITY_FLAKY_RETRY_COUNT"
ENV_RETRY_FAILED_TESTS_TOTAL_LIMIT = "DD_CIVISIBILITY_TOTAL_FLAKY_RETRY_COUNT"
ENV_RETRY_NEW_TESTS_ENABLED = "DD_CIVISIBILITY_EARLY_FLAKE_DETECTION_ENABLED"

# Source: https://docs.datadoghq.com/getting_started/site/
DD_SITE_ALLOWLIST = %w[
Expand Down
7 changes: 7 additions & 0 deletions lib/datadog/ci/ext/telemetry.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,12 @@ module Telemetry
METRIC_CODE_COVERAGE_IS_EMPTY = "code_coverage.is_empty"
METRIC_CODE_COVERAGE_FILES = "code_coverage.files"

METRIC_EFD_UNIQUE_TESTS_REQUEST = "early_flake_detection.request"
METRIC_EFD_UNIQUE_TESTS_REQUEST_MS = "early_flake_detection.request_ms"
METRIC_EFD_UNIQUE_TESTS_REQUEST_ERRORS = "early_flake_detection.request_errors"
METRIC_EFD_UNIQUE_TESTS_RESPONSE_BYTES = "early_flake_detection.response_bytes"
METRIC_EFD_UNIQUE_TESTS_RESPONSE_TESTS = "early_flake_detection.response_tests"

METRIC_TEST_SESSION = "test_session"

TAG_TEST_FRAMEWORK = "test_framework"
Expand All @@ -73,6 +79,7 @@ module Telemetry
TAG_COMMAND = "command"
TAG_COVERAGE_ENABLED = "coverage_enabled"
TAG_ITR_SKIP_ENABLED = "itrskip_enabled"
TAG_EARLY_FLAKE_DETECTION_ENABLED = "early_flake_detection_enabled"
TAG_PROVIDER = "provider"
TAG_AUTO_INJECTED = "auto_injected"

Expand Down
7 changes: 7 additions & 0 deletions lib/datadog/ci/ext/transport.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ module Transport
DD_API_SETTINGS_RESPONSE_TESTS_SKIPPING_KEY = "tests_skipping"
DD_API_SETTINGS_RESPONSE_REQUIRE_GIT_KEY = "require_git"
DD_API_SETTINGS_RESPONSE_FLAKY_TEST_RETRIES_KEY = "flaky_test_retries_enabled"
DD_API_SETTINGS_RESPONSE_EARLY_FLAKE_DETECTION_KEY = "early_flake_detection"
DD_API_SETTINGS_RESPONSE_ENABLED_KEY = "enabled"
DD_API_SETTINGS_RESPONSE_SLOW_TEST_RETRIES_KEY = "slow_test_retries"
DD_API_SETTINGS_RESPONSE_FAULTY_SESSION_THRESHOLD_KEY = "faulty_session_threshold"
DD_API_SETTINGS_RESPONSE_DEFAULT = {DD_API_SETTINGS_RESPONSE_ITR_ENABLED_KEY => false}.freeze

DD_API_GIT_SEARCH_COMMITS_PATH = "/api/v2/git/repository/search_commits"
Expand All @@ -46,6 +50,9 @@ module Transport
DD_API_SKIPPABLE_TESTS_PATH = "/api/v2/ci/tests/skippable"
DD_API_SKIPPABLE_TESTS_TYPE = "test_params"

DD_API_UNIQUE_TESTS_PATH = "/api/v2/ci/libraries/tests"
DD_API_UNIQUE_TESTS_TYPE = "ci_app_libraries_tests_request"

CONTENT_TYPE_MESSAGEPACK = "application/msgpack"
CONTENT_TYPE_JSON = "application/json"
CONTENT_TYPE_MULTIPART_FORM_DATA = "multipart/form-data"
Expand Down
15 changes: 13 additions & 2 deletions lib/datadog/ci/remote/component.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# frozen_string_literal: true

require_relative "../worker"

module Datadog
module CI
module Remote
Expand Down Expand Up @@ -27,8 +29,17 @@ def configure(test_session)
end
end

test_optimisation.configure(library_configuration, test_session)
test_retries.configure(library_configuration)
# configure different components in parallel because they might block on HTTP requests
configuration_workers = [
Worker.new { test_optimisation.configure(library_configuration, test_session) },
Worker.new { test_retries.configure(library_configuration, test_session) }
]

# launch configuration workers
configuration_workers.each(&:perform)

# block until all workers are done (or 60 seconds has passed)
configuration_workers.each(&:wait_until_done)
end

private
Expand Down
55 changes: 48 additions & 7 deletions lib/datadog/ci/remote/library_settings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
require_relative "../transport/telemetry"
require_relative "../utils/parsing"

require_relative "slow_test_retries"

module Datadog
module CI
module Remote
Expand Down Expand Up @@ -49,37 +51,76 @@ def payload
def require_git?
return @require_git if defined?(@require_git)

@require_git = bool(Ext::Transport::DD_API_SETTINGS_RESPONSE_REQUIRE_GIT_KEY)
@require_git = Utils::Parsing.convert_to_bool(
payload.fetch(Ext::Transport::DD_API_SETTINGS_RESPONSE_REQUIRE_GIT_KEY, false)
)
end

def itr_enabled?
return @itr_enabled if defined?(@itr_enabled)

@itr_enabled = bool(Ext::Transport::DD_API_SETTINGS_RESPONSE_ITR_ENABLED_KEY)
@itr_enabled = Utils::Parsing.convert_to_bool(
payload.fetch(Ext::Transport::DD_API_SETTINGS_RESPONSE_ITR_ENABLED_KEY, false)
)
end

def code_coverage_enabled?
return @code_coverage_enabled if defined?(@code_coverage_enabled)

@code_coverage_enabled = bool(Ext::Transport::DD_API_SETTINGS_RESPONSE_CODE_COVERAGE_KEY)
@code_coverage_enabled = Utils::Parsing.convert_to_bool(
payload.fetch(Ext::Transport::DD_API_SETTINGS_RESPONSE_CODE_COVERAGE_KEY, false)
)
end

def tests_skipping_enabled?
return @tests_skipping_enabled if defined?(@tests_skipping_enabled)

@tests_skipping_enabled = bool(Ext::Transport::DD_API_SETTINGS_RESPONSE_TESTS_SKIPPING_KEY)
@tests_skipping_enabled = Utils::Parsing.convert_to_bool(
payload.fetch(Ext::Transport::DD_API_SETTINGS_RESPONSE_TESTS_SKIPPING_KEY, false)
)
end

def flaky_test_retries_enabled?
return @flaky_test_retries_enabled if defined?(@flaky_test_retries_enabled)

@flaky_test_retries_enabled = bool(Ext::Transport::DD_API_SETTINGS_RESPONSE_FLAKY_TEST_RETRIES_KEY)
@flaky_test_retries_enabled = Utils::Parsing.convert_to_bool(
payload.fetch(
Ext::Transport::DD_API_SETTINGS_RESPONSE_FLAKY_TEST_RETRIES_KEY, false
)
)
end

def early_flake_detection_enabled?
return @early_flake_detection_enabled if defined?(@early_flake_detection_enabled)

@early_flake_detection_enabled = Utils::Parsing.convert_to_bool(
early_flake_detection_payload.fetch(Ext::Transport::DD_API_SETTINGS_RESPONSE_ENABLED_KEY, false)
)
end

def slow_test_retries
return @slow_test_retries if defined?(@slow_test_retries)

@slow_test_retries = SlowTestRetries.new(
early_flake_detection_payload.fetch(Ext::Transport::DD_API_SETTINGS_RESPONSE_SLOW_TEST_RETRIES_KEY, {})
)
end

def faulty_session_threshold
return @faulty_session_threshold if defined?(@faulty_session_threshold)

@faulty_session_threshold = early_flake_detection_payload.fetch(
Ext::Transport::DD_API_SETTINGS_RESPONSE_FAULTY_SESSION_THRESHOLD_KEY, 0
)
end

private

def bool(key)
Utils::Parsing.convert_to_bool(payload.fetch(key, false))
def early_flake_detection_payload
payload.fetch(
Ext::Transport::DD_API_SETTINGS_RESPONSE_EARLY_FLAKE_DETECTION_KEY,
{}
)
end

def default_payload
Expand Down
3 changes: 2 additions & 1 deletion lib/datadog/ci/remote/library_settings_client.rb
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ def fetch(test_session)
1,
{
Ext::Telemetry::TAG_COVERAGE_ENABLED => library_settings.code_coverage_enabled?.to_s,
Ext::Telemetry::TAG_ITR_SKIP_ENABLED => library_settings.tests_skipping_enabled?.to_s
Ext::Telemetry::TAG_ITR_SKIP_ENABLED => library_settings.tests_skipping_enabled?.to_s,
Ext::Telemetry::TAG_EARLY_FLAKE_DETECTION_ENABLED => library_settings.early_flake_detection_enabled?.to_s
}
)

Expand Down
53 changes: 53 additions & 0 deletions lib/datadog/ci/remote/slow_test_retries.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# frozen_string_literal: true

module Datadog
module CI
module Remote
# Parses "slow_test_retries" payload for early flake detection settings
#
# Example payload:
# {
# "5s" => 10,
# "10s" => 5,
# "30s" => 3,
# "5m" => 2
# }
#
# The payload above means that for tests that run less than 5 seconds, we should retry them 10 times,
# for tests that run less than 10 seconds, we should retry them 5 times, and so on.
class SlowTestRetries
attr_reader :entries

Entry = Struct.new(:duration, :max_attempts)

DURATION_MEASURES = {
"s" => 1,
"m" => 60
}.freeze

def initialize(payload)
@entries = parse(payload)
end

def max_attempts_for_duration(duration)
@entries.each do |entry|
return entry.max_attempts if duration < entry.duration
end

0
end

private

def parse(payload)
(payload || {}).keys.filter_map do |key|
duration, measure = key.match(/(\d+)(\w+)/)&.captures
next if duration.nil? || measure.nil? || !DURATION_MEASURES.key?(measure)

Entry.new(duration.to_f * DURATION_MEASURES.fetch(measure, 1), payload[key].to_i)
end.sort_by(&:duration)
end
end
end
end
end
2 changes: 1 addition & 1 deletion lib/datadog/ci/test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ def parameters
private

def record_test_result(datadog_status)
test_id = Utils::TestRun.skippable_test_id(name, test_suite_name, parameters)
test_id = Utils::TestRun.datadog_test_id(name, test_suite_name, parameters)

# if this test was already executed in this test suite, mark it as retried
if test_suite&.test_executed?(test_id)
Expand Down
8 changes: 4 additions & 4 deletions lib/datadog/ci/test_optimisation/component.rb
Original file line number Diff line number Diff line change
Expand Up @@ -143,18 +143,18 @@ def stop_coverage(test)
def mark_if_skippable(test)
return if !enabled? || !skipping_tests?

skippable_test_id = Utils::TestRun.skippable_test_id(test.name, test.test_suite_name, test.parameters)
if @skippable_tests.include?(skippable_test_id)
datadog_test_id = Utils::TestRun.datadog_test_id(test.name, test.test_suite_name, test.parameters)
if @skippable_tests.include?(datadog_test_id)
if forked?
Datadog.logger.warn { "Intelligent test runner is not supported for forking test runners yet" }
return
end

test.set_tag(Ext::Test::TAG_ITR_SKIPPED_BY_ITR, "true")

Datadog.logger.debug { "Marked test as skippable: #{skippable_test_id}" }
Datadog.logger.debug { "Marked test as skippable: #{datadog_test_id}" }
else
Datadog.logger.debug { "Test is not skippable: #{skippable_test_id}" }
Datadog.logger.debug { "Test is not skippable: #{datadog_test_id}" }
end
end

Expand Down
2 changes: 1 addition & 1 deletion lib/datadog/ci/test_optimisation/skippable.rb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def tests
next unless test_data["type"] == Ext::Test::ITR_TEST_SKIPPING_MODE

attrs = test_data["attributes"] || {}
res << Utils::TestRun.skippable_test_id(attrs["name"], attrs["suite"], attrs["parameters"])
res << Utils::TestRun.datadog_test_id(attrs["name"], attrs["suite"], attrs["parameters"])
end

res
Expand Down
44 changes: 40 additions & 4 deletions lib/datadog/ci/test_retries/component.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
require_relative "strategy/no_retry"
require_relative "strategy/retry_failed"

require_relative "../ext/telemetry"
require_relative "../utils/telemetry"

module Datadog
module CI
module TestRetries
Expand All @@ -11,24 +14,57 @@ module TestRetries
# - retrying new tests - detect flaky tests as early as possible to prevent them from being merged
class Component
attr_reader :retry_failed_tests_enabled, :retry_failed_tests_max_attempts,
:retry_failed_tests_total_limit, :retry_failed_tests_count
:retry_failed_tests_total_limit, :retry_failed_tests_count,
:retry_new_tests_enabled, :retry_new_tests_duration_thresholds, :retry_new_tests_percentage_limit,
:retry_new_tests_unique_tests_set, :retry_new_tests_fault_reason

def initialize(
retry_failed_tests_enabled:,
retry_failed_tests_max_attempts:,
retry_failed_tests_total_limit:
retry_failed_tests_total_limit:,
retry_new_tests_enabled:,
unique_tests_client:
)
@retry_failed_tests_enabled = retry_failed_tests_enabled
@retry_failed_tests_max_attempts = retry_failed_tests_max_attempts
@retry_failed_tests_total_limit = retry_failed_tests_total_limit
# counter that store the current number of failed tests retried
# counter that stores the current number of failed tests retried
@retry_failed_tests_count = 0

@retry_new_tests_enabled = retry_new_tests_enabled
@retry_new_tests_duration_thresholds = nil
@retry_new_tests_percentage_limit = 0
@retry_new_tests_unique_tests_set = Set.new
# indicates that retrying new tests failed and was disabled
@retry_new_tests_fault_reason = nil

@unique_tests_client = unique_tests_client

@mutex = Mutex.new
end

def configure(library_settings)
def configure(library_settings, test_session)
@retry_failed_tests_enabled &&= library_settings.flaky_test_retries_enabled?
@retry_new_tests_enabled &&= library_settings.early_flake_detection_enabled?

return unless @retry_new_tests_enabled

# configure retrying new tests
@retry_new_tests_duration_thresholds = library_settings.slow_test_retries
@retry_new_tests_percentage_limit = library_settings.faulty_session_threshold
@retry_new_tests_unique_tests_set = @unique_tests_client.fetch_unique_tests(test_session)

if @retry_new_tests_unique_tests_set.empty?
@retry_new_tests_enabled = false
@retry_new_tests_fault_reason = "unique tests set is empty"

Datadog.logger.debug("Unique tests set is empty, retrying new tests disabled")
else
Utils::Telemetry.distribution(
Ext::Telemetry::METRIC_EFD_UNIQUE_TESTS_RESPONSE_TESTS,
@retry_new_tests_unique_tests_set.size.to_f
)
end
end

def with_retries(&block)
Expand Down
Loading
Loading