forked from linkedin/venice
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codecov t2 #20
Open
sushantmane
wants to merge
791
commits into
setup-codecov
Choose a base branch
from
codecov-t2
base: setup-codecov
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Codecov t2 #20
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…edin#949) Add a dedicated thread pool executor for SSL tasks in router. This same feature has been running in production and should be stable. The value of "router.client.ssl.handshake.threads" controls the number of threads that will be used. A value of 0 (default) disables the feature.
Expose partitionCount from VeniceChangelogConsumer
…d back the key parameter to put (linkedin#953) Add experimental annotation to DaVinciRecordTransformer, and add back the key parameter to put
* [fast-client] Long tail retry budget for fast client 1. Introduced RetryManager to measure and calculate retry budget based on a percentage of requests. 2. The RetryManager is only applied to long tail retries and currently hard coded to be 10% of the total original request rate. This means if there were any exception that's not a 429 the retry will be triggered without going through the RetryManager. If the retry budget is exhausted then the retry task will do nothing and the request will either complete eventually (original future) or time out.
…ue (linkedin#954) VeniceChangeLogConsumerImpl tries to perform topic switch after receiving EOP. It is a NoOp for after-image CDC, as it should always consume from VT. However, it should not drop unprocessed records in the same polled batch which contains EOP control message. This PR fixes the issue by adding a simple check to see whether it actually switch the topic. If TS does not actually happen, the consumer should finish processing all the data records it polled.
…ic when bootstrapping current version (linkedin#957) During the analysis of DaVinci bootstrapping, we realized that metric collection takes a significant amount of cpu resources (>60%) when bootstrapping a store with many small entries. This PR adds a new config to disable record-level metrics when bootstrapping current version: server.record.level.metrics.when.bootstrapping.current.version.enabled: default true When disabling the above config, the record-level metrics will be disabled during bootstrapping, but the record-level metric emission will resume after all the subscribed partitions report ready to serve.
* [changelog] Add heartbeat lag and other metrics to change capture client This will report the most lagged partition for a given consumer, as well as the number of records consumed.
…n#959) This commit enables the "KafkaTopicDumper" in Admin tool to dump compressed records. It also fixes "TopicMessageFinder" to use the serialized key without the chunking suffix to determine the partition
Some links in our docs were broken. This PR standardizes to use relative paths, and fixes some broken links. These links work well in README and the website.
…izer (linkedin#962) Currently Routers and Servers log the CN to help with investigation of the callers. However, this information is insufficient and we'd like to log the service principal name instead.
…ut backup version (linkedin#963) Since fetching the current version from all router and server is an expensive operation, we should not call backup version cleanup on stores without current version or backup version. Also add a keep-alive period of 1hr to the http client. --------- Co-authored-by: Sourav Maji <[email protected]>
…ion is empty (linkedin#965) There was a bug in the previous optmization, which would pause the record-level metric emission for the current version bootstrapping. When the subscription is empty, we shouldn't resume it as it can happen when the store ingestion task is created, but the subscripition is not done yet.
…e stores (linkedin#939) * PR 775 introduced a feature for hybrid stores' followers to go ready to serve only after leader is marked completed. But it was disabled for non-AA stores as the Heartbeats (which this feature piggybacks on) are written only to local RT's but certain non-AA cases' LFSIT consumes only from parent RT. One fix considered for these non-AA stores to get onboard to this feature was to write HB to parent RT as well. But as Parent RT is soon to be deprecated and we don't want to add more to it, this PR works around that considering the below aspects and enabling this feature for all stores except non-AA and DRP=AGGREGATE stores. 1. All incremental push stores should be Active-active by default. 2. All stores not on AA without incremental push except DRP=AGGREGATE will be reading from local RT. * the long-term goal is to remove DRP completely and make all existing hybrid stores either: 1. AA: If cross region replication needed 2. non-AA: If cross region replication not needed * Also removed LEADER_COMPLETE_STATE_UNKNOWN and firstHeartBeatSOSReceived as it was introduced for deployment transition phase to start using this header. Config SERVER_LEADER_COMPLETE_STATE_CHECK_IN_FOLLOWER_ENABLED should be used to enable this feature after deployment.
…inkedin#969) The overall status should not be updated to COMPLETE till all the region are pushed when target region push is enabled. Co-authored-by: Sourav Maji <[email protected]>
… consumer is interrupted (linkedin#967) Currently, if a thread owning a consumer is interrupted, the consumer is not returned to the consumer pool. This occurs because the blocking calls throw an interrupted exception when invoked by a thread with the interrupt status set. In this case, it is the `LBQ::put` call in `TMDF::releaseConsumer`. As a result, we eventually run out of consumers to execute metadata requests. This situation typically arises when non-blocking metrics collection threads cancel active executions to terminate slow queries.
Recently, we refactored some code around compression metric collection and using mapper to build dictionary. This refactoring had a bug where the dictionary building for repush jobs was skipped and if VPJ sends control messages directly, then the push would fail, as the "START_OF_PUSH" would not have any dictionary. This change fixes the issue and reenables a dictionary to be built for KIF repush as well.
…own (linkedin#972) In "AbstractDataWriterSparkJob", we need a "SparkSession" and to get one, we use "SparkSession.Builder()...getOrCreate()". This either provides an existing "SparkSession", or creates a new one if one doesn't exist already. "SparkContext" has a "ShutdownHook" that closes the session at the end of the JVM. When VPJ is executed via "spark-submit", it executes in the Spark driver. In this case, we do not wish to close the "SparkSession" and instead let the "ShutdownHook" take care of it. This is also okay for client-mode of Spark execution as those can be expected to be short lived. If someone needs a way to explicitly close a "SparkSession", we can add a new function that helps achieve that. We don't have such a need right now.
…linkedin#966) Make bootstrapping consumer also consume after-image data from version topic, instead of before-after image data from cc topic. Fixed a few small bugs during the journey, and clean up the integration tests. Bugs: seekToChekpoint needs to be blocked by get(), otherwise consumer will jump to non-deterministic offsets and yield wrong results. when persisting after-image values to local storage, we should extra bytes from ByteBuffer, instead of converting to array directly.
…on (linkedin#968) Store migration's end migration command assumes /store returns null for non-existing store. However, it actually throws exception instead and is not handled properly which causes confusion for the user.
* [router] Added config to enable DNS resolution before SSL Config to enable this feature: router.resolve.before.ssl When this config is enabled, "SslInitializer#enableSslTaskExecutor" will not be called, and the SSL handshake thread pool count will be used to construct the DNS resolution thread pool. Besides, added two new SSL related metrics using the API from Netty: pending_ssl_handshake_count total_failed_ssl_handshake_count
…edin#964) Truncate topic after push is in terminal state if 1. Its a hybrid store or regular push. (Hybrid store target push uses repush which does not have target regions) 2. If target region push is enabled and job to push data only to target region completed (status == PUSHED) --------- Co-authored-by: Sourav Maji <[email protected]>
…inkedin#973) Make separate drainer for batch and hybrid store ingestions so that batch store ingestions does not affect hybrid store ingestions. --------- Co-authored-by: Sourav Maji <[email protected]>
…error (linkedin#952) * [server][da-vinci] Log VersionTopic offset when logging lossy rewind error Without VT offset, it's difficult to find the right offset to dump VT for troubleshooting.
…ata validation error before EOP (linkedin#937) When the ingestion of a new store version is failed, today, we truncate the Kafka topic of the store version by updating its retention time to a small value (15 seconds), specified by DEPRECATED_TOPIC_RETENTION_MS. This is fine for regular failures but for fatal data validation errors, which indicates critical issues happened during the batch push period (before EOP), truncating the topic too early can prevent us from finding the root cause, as the Kafka data is gone. This change adjusts the Kafka topic retention time (to 2 days) when fatal data validation errors is identified, so that we can have enough time to investigate. Meanwhile, there is additional logic added to the TopicCleanupService, i.e. even if a topic's retention time is > DEPRECATED_TOPIC_MAX_RETENTION_MS, it can still be consider for deletion: - If topic retention is 2 (DEPRECATED_TOPIC_RETENTION_MS) days. - If The topic is a version topic. - Get topic creation time (from venice_system_store_push_job_details_store) and check it's already more than 2 days (DEPRECATED_TOPIC_RETENTION_MS), if yes, delete it.
…re at the same time. (linkedin#918) Current state: when we create AdminExecutionTask , we did not check if there is AdminExecutionTask for one store on the fly, so if one AdminExecutionTask is waiting for a lock (e.g. updating store operation waiting for the store level write lock and that lock is currently unavailable), it is possible that AdminConsumptionTask will create multiple AdminExecutionTask s for a single store until they occupy all threads from ExecutorService 's thread pool, they are all waiting for the same store-level lock. Besides, executorService.invokeAll(tasks, processingCycleTimeoutInMs, TimeUnit.MILLISECONDS) will cancel every invoked tasks, even the task is emitted by not getting thread from pool to execute. ExecutorService will try to cancel the AdminExecutionTask waiting for a lock after timeout, but that will not terminate that thread (as the acquiring locking operation by AutoCloseableLock is not interruptible). Then that AdminExecutionTask will keep occupying one thread from ExecutorService 's thread pool until the lock is released. This PR has the following changes to address the issue: Adding check to see if there is AdminExecutionTask is running for a store. Integration test to simulate the lock blocking one thread from pool and recover after acquiring the lock. Co-authored-by: Hao Xu <[email protected]>
…default (linkedin#950) * [server][da-vinci] Bumped RocksDB dep and adopt multiget async io by default This PR bumps up the RocksDB dep and expose a config to enable async io for multi-get and the default value is true. rocksdb.read.async.io.enabled: default true In theoy, with this config and posix filesystem, RocksDB multiget API will be speeded up quite a bit based on the benchmarking: https://rocksdb.org/blog/2022/10/07/asynchronous-io-in-rocksdb.html So far, such optimization only applies to the chunk lookup for large value/rmd, and if it is proved to be more performant by checking the lookup latency for large value in the read path, we can apply such optimization in more areas: 1. DaVinci with DISK mode. 2. Ingestion code path by looking up entries in batch for AA/WC use cases. 3. Regular read path. So far, multi-get API can only be used against the same RocksDB database, so we might need some re-org of RocksDB databases if this API is truly helpful. * Removed unused code * Addressed the spotbug issue * Addressed comment
…es (linkedin#978) Do not allow target region push for deferred version swap and hybrid stores. Since hybrid store uses repush which allow concurrent pushes, this PR disable target region push for such cases.
* [changelog] Multiple fixes to changelog consumer logic * Fix test and comments * protected * fix static
…T DIV separation (linkedin#1179) * [compat][da-vinci][test] Global RT DIV improvement (part 1): RT and VT DIV separation **Problem:** Data Integrity Validator (DIV) is used for validating and discovering data issues residing in the Kafka topics when consumed. However, without a shared global view of the RealTime (RT) topics among all leaders, DIV states for RT topics are scattered among all past leaders and without a way to pass RT DIV from leader to its successors, DIV results can be inaccurate especially during the leadership transitions. At a high level, we proposes to replicate the leader DIV states into the local VT periodically or on-demand. Followers consume these leader DIV states and sync itself up to the leader state along the way. **This PR:** As the first part of the implementation for global DIV improvement, this PR contains the following: 1. introduce a new flag for the global rt div feature. 2. introduce a new realtimeTopicProducerStates in PartitionState in the local rocksdb checkpoint. ``` avro diff: { "name": "realtimeTopicProducerStates", "doc": "A map that maps upstream Kafka bootstrap server url -> to a map of producer GUID -> producer state for real-time data.", "type": { "type": "map", "values": { "type": "map", "values": "com.linkedin.venice.kafka.protocol.state.ProducerPartitionState" }, "java-key-class": "java.lang.String", "avro.java.string": "String" }, "default": {} } ``` 4. divide today's complete DIV into two groups: VT and RT DIV, and RT DIV is further divided by its source fabric, so that each group can be updated separately based on which topic the Kafka records come from. (without feature enabled, only VT DIV is updated). 5. Adding test to verify the read/write of an `OffsetRecord` with new the new added field. **Future changes:** After this PR, as the next step, we will start to implement the following: 1. In the leader replica, create a snapshot of the RT DIV and replicate it to the local VT by using a chunking supported heartbeat messages (DIV heartbeat). 2. In follower replicas, consume the leader's RT DIV snapshot and sync itself up to the leader state along the way. (this actually needs to be checked in first to handle potential deployment rollbacks). 3. The lifecycle of the in memory RT DIV states will be managed by the leader and is shown in the following figure: <img width="500" alt="Screenshot 2024-10-02 at 9 34 08 AM" src="https://github.com/user-attachments/assets/b06d6a72-c617-42e1-a7ed-15a391b604c9">
…topic partitions. (linkedin#1192)" (linkedin#1221) This reverts commit 06cc1fc.
Bumped up the tehuti dep to the latest 0.12.3 which has the fix to a deadlock issue involving lock in MetricRepository and Sensor.
…inkedin#1220) fix priority determination in TopicCleanupService address review comment
…eeded (linkedin#1204) * [common] Allow store config repo to fetch configs only when needed
Co-authored-by: Sourav Maji <[email protected]>
…ecific - Enable incremental push status reads from PS3 based on cluster configuration. - Query the push status store only if it’s enabled for the specific user store; otherwise, fallback to Zookeeper for status reads. - Ensure removal of removeFromSupposedlyOngoingIncrementalPushVersions status from PS3 when PS3 is enabled, regardless of read configuration.
Emit logs for push job status whenever push job metrics are emitted
…to Zookeeper or PS3 Currently, incremental push status updates are written to both Zookeeper and PS3. As part of a phased approach to eventually discontinue Zookeeper updates, this PR introduces a configuration option allowing selective control over where these updates are written, enabling flexibility to disable updates to either Zookeeper, PS3, or both.
…ta system store (linkedin#1228) When materializing the meta store, we generate a snapshot of ready to serve instances for all user store versions. However, if a version is in an ERROR state, its Helix resources may be absent. Checking these resources in such cases can trigger exceptions, potentially interrupting the addVersion process. Previously, the meta store required readyToServe instances data for client use, but since no clients currently rely on it, it's safe to discontinue storing this information in the meta system store.
… during blob transfer (linkedin#1218)
…the controller cluster, not on storage clusters (linkedin#1231) Fix bug where setting the cloud config was only done on the controller cluster, not on storage clusters. This was done by adding clusterName to the method signature of setCloudConfig.
…#1232) In recent past we have seen cases where end offset fetch call stalls for a very long time. It could happen due to multiple reasons, like general Kafka outage, or mismatched VT vs RT partition count leading to non-existed partition or other network issues. Even though we cache the offset value, there is a blocking call before cache is updated. This hold the lock for a very long time and subsequent calls (either from metics collection) or the ready to serve call waits forever. Since drainer thread is shared this blocks the processing of other resources in a drainer thread leading to cluster-wide impact. This PR check for cache miss and returns immediately with some sentinel value while a nonblocking call updates the cache asynchronously.
…kedin#1233) Trying to deflake testDaVinciMemoryLimitShouldFailLargeDataPush --------- Co-authored-by: Sourav Maji <[email protected]>
…tion during loadAllPushes() call (linkedin#1217) The issue: We observed such race condition in controller deployment/restart. When it restarts, it will call loadAllPushes() which load all OfflinePushStatus snapshot for all topics from ZK, it will then iterate the list and try to subscribe to the changes. If there is any change in between, it will be missed and push job can hang forever. This PR adds a refresh offline pushes call after listener subscription. It is using the same store level lock, so it should not cause any race condition or deadlock with change listener callback.
…inkedin#1229) For a participant to auto-register with Helix without causing a rebalance every time a new participant joins the cluster (i.e. during deployment), we need to set the instance operation to UNKNOWN. Then these participants would be ENABLED in a batch, so it only rebalances once.
…dRepository (linkedin#1235) * [changelog][dvc] Disable TC internal retry in ThinClientMetaStoreBasedRepository 1. The existing TC internal retry implemented in RetriableStoreClient is vulnerable to a dead lock issue due to deserialization thread performing and waiting on the retry. Once the thread pool is exhausted all deserialization threads will be waiting forever because there won't be any threads left to handle the transport response. We will fix this internal retry in another commit. 2. Disable the TC internal retry in ThinClientMetaStoreBasedRepository because we recently added external retry with RetryUtils.executeWithMaxAttempt and timeout. Remove hardcoded retry configs from NativeMetadataRepository since it's no longer needed and it was leaking implementation details. 3. Added staleness metric in NativeMetadataRepository that should help us detect any issue causing the local metadata to be stale (including the dead lock issue mentioned above) for both DVC and change log use cases. * Addressed review comments
…it (linkedin#1238) The SystemStoreInitializationHelper, when running in parent controller mode, would keep creating new versions of system stores, thus leading to data loss in those system stores. This commit changes to check to make it idempotent. Also bumped up the Kafka dependency to 2.4.1.78. This fixes in an issue in NetworkClient::leastLoadedNode which manifested as a flood of IllegalArgumentException when some Kafka partition became completely unavailable. Miscellaneous test changes: - Tactical changes to attempt to mitigate DaVinciClientMemoryLimitTest's flakiness. It unfortunately does not address the RC of the flakiness, however. - Also bumped up the RF to 2 on some flaky integration tests to see if it helps.
…ection (linkedin#1216) * [changelog] Centralize pubsubconsumer assignment to synchronized collection This is to handle potential concurrent modification exceptions when iterating on partitions
…atus (linkedin#1245) Offline push job polling checks replica's current status to determine if replicas are ready, if status is not terminal, it will also check if it is ready to start buffer rewind. Incremental push will also update the replica's current status, so it is possible that it will flip the current status from terminal (COMPLETED) -> non terminal (SOIP) If there are concurrent / consecutive incremental push, it could potentially result in offline push job polling could not complete forever (especially when there are other factors - slow nodes and such). This PR changes the behavior so that SOIP/EOIP won't affect replica current status polling. Also, it should stop logging tons of "not ready to start buffer replay" due to certain partition status flipped to non-terminal status and which is not EOIP
linkedin#1214) * [common][controller] Introduce new VeniceView config: RePartitionView 1. Added new VeniceView implementation RePartitionView. This view will be used later in controller, server ingestion and DVC to create, maintain and materialize the re-partition view. 2. Modified corresponding static methods to ensure they function as expected for both ChangeCaputureView and RePartitionView. 3. Added logic in update view methods in controller to put view name into the view parameters for RePartitionView since one store version can have multiple re-partition view.
… with a floor of 1 (linkedin#1242) Currently, the min replication factor (RF) for controllers is equal to replication factor. During a deployment, the number of instances could go below what is set to RF, which would cause a rebalance failure when using Waged. To prevent this, we will be setting min RF = max(RF - 1, 1). This will be applied to the storage clusters too.
Use smaller retry delay with exponential backoff in metadata fetch, so that stores with few records does not miss the RTS check, --------- Co-authored-by: Sourav Maji <[email protected]>
…status (linkedin#1219) * [controller] Enable backend to do dual reads on DVC incremental push status Controller will first try to read from version level key for DVC incremental push status; if version level key exists and the overall status is END_OF_INCREMENTAL_PUSH, controller would still check the partition level key and combine the result from both keys, in case the DVC is partially upgraded. During partition level push status analysis, the instances that are reporting version level keys will be ignored. Other changes: Add a few more logs for each job status polling request to help us further confirm that the version level keys are working well, besides the signal of successful push jobs.
…ucer in Server for nearline workload (linkedin#1258) * [compat][server][controller] Introduced options to speed up KafkaProducer in Server for nearline workload During benchmarking, we noticed that the producing is slow and sometimes it can take 10+ms for producing one single record for some medium size record: 20-30KB. There are two issues we discovered during the benchmarking: 1. Kafka producer compression is taking a lot of time. Even we disable KafkaProducer compression, Kafka broker will compress it according to the target compression setup in the broker. 2. Kafka Producer is not scalable and when there are multiple threads invoking the same producer, there are a lot of contentions. These optimizations have overheads: 1. By disabling compression, the Kafka workload will increase and considering the optimization is only for nearline workload, hope the workload increase is not too much. Also there is a store-level control. 2. More Producers in Venice Server. More memory consumption as each producer has its own buffer, and more producer threads. Ideally, we would like to enable these optimizations to some critical stores, which have heavy AA/WC nearline workload to improve the E2E write latency. New Server Config: server.nearline.workload.producer.throughput.optimization.enabled: default true This setting can override all the store-level config to disable the optimization entirely. Two new store-level config: NearlineProducerCountPerWriter: default 1 NearlineProducerCompressionEnabled: default true We can use admin-tool to update these store-level configs. This PR also changes the admin operation protocol, so we need to deploy child controllers first and then parent controller.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary, imperative, start upper case, don't end with a period
Resolves #XXX
How was this PR tested?
Does this PR introduce any user-facing changes?