-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(elasticity): Added a test of 90% utilization with a lot of small tables #9785
base: master
Are you sure you want to change the base?
Conversation
Keeping this PR as draft since not sure it worth merging. |
Could you elaborate more on why you need to make so many changes to decorators? |
It's an ad hoc variation of a latency decorator. It's not a must as well and can be removed. |
If we need to make only small-to-none changes to the decorator, I think this is worth merging |
If we remove this new decorator we still get all needed latency results by Grafana. |
I think for this testcase it will be enough (Even though I was pushing "lets use the decorator" before), it is a weird testcase that we are not yet prepared for and we cannot call it a performance test regardless due to methodology changes. I add a configuration file and in it I would document shortcomings for this test case and then we can merge safely |
ffb43a1
to
58d3702
Compare
Any reason why it is still in draft? |
58d3702
to
245e5d2
Compare
ready for review now |
7fa68dc
to
2f70af1
Compare
e601f9a
to
958a02c
Compare
@pehala , all checks passed and comments addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test in argus is failing - timeout is reached and from graphs I see it's more than 90% of disk utilization - is this an Scylla issue? If yes, please add reference to it.
Can you clear out what is tested here - possibility of creating 500 tables? Or the tables to have 1 tablet per table (as it's small)?
This was a debug run for debug testing, This branch wasn't originally meant to be merged. I started another test to see it runs as expected. |
@pehala , the test basically accomplished the task of running 500 tables workload with 90% utilization. |
They for sure need to resolved |
… tables Test splitting the 90% utilization among a lot of small tables.
6025504
to
a5c93b9
Compare
a5c93b9
to
348ad37
Compare
348ad37
to
06e0b4e
Compare
@vponomaryov you recently optimized the 5000 tables in one keyspace that uses the machinery touched here Can you please review it ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your referenced test job has 8 batches with 62 stress commands.
But it's config has batch_size: 125
.
So, you have some bug in this scope.
longevity_test.py
Outdated
|
||
user_profile_table_count = self.params.get('user_profile_table_count') # pylint: disable=invalid-name | ||
# Creating extra table during batch only for a high number of tables (>500). | ||
create_extra_tables = True if int(user_profile_table_count) > 500 else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach looks strange.
Why exactly 500? What if later we will have 400 and 800 cases?
So, need either make it be configurable or describe the reasoning for it here.
Also, coding of this particular condition can be simplified to the following:
create_extra_tables = int(user_profile_table_count) > 500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well i don't really know the idea of adding extra table, So i didn't want to impact existing code flow.
@fruch , can you please advise why the need for an extra table? or perhaps it's not needed anymore and can be removed in a followup fix? would adding a parameter for it is reasonable or an overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original 5000 table case, we created all 5000 tables upfront
we wanted few to be created in run time, via c-s as well
In each cycle we added 4 of them, I don't know how it became just 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes if you want to take it out for some tests, it should be configurable.
No one's gonna remember what's this logic you did here in two weeks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, @fruch , fixed by:
@vponomaryov , i don't see any bug according to the log:
|
06e0b4e
to
47da586
Compare
Argus link from the PR description: https://argus.scylladb.com/tests/scylla-cluster-tests/a22f9c23-6289-497d-8202-db2b4ba85d2f Part of log:
The |
47da586
to
7449ba7
Compare
@vponomaryov , the loader memory doesn't care how many tables there are. It cares about c-s threads memory consumptin. So there are 1000 stresses in 8 cycles of 125 threads each. It's 2 c-s stresses per table. |
f5a2b5e
to
d613b1a
Compare
d613b1a
to
e377881
Compare
…% utilization scenario The test of test_user_batch_custom_time is fixed and improved in order to run both 5000 tables scenario and a scenario like 500 tables for testing 90% utilization with many small tables.
e377881
to
459b90b
Compare
@yarongilor new branch |
@scylladb/qa-maintainers , all comments are resolved, please review. cc: @pehala |
This is a test case for having 90% disk utilization with a lot of small tables.
The data is split equally among 500 tables.
The dataset size is aligned with 'i4i.xlarge'.
It uses a c-s user-profile template for all 500 tables.
It runs 4 batches of 125 tables each.
On each batch cycle, 125 tables are created, then a load is generated for all of these tables.
When all the 125 stress writes/reads are done, it continue with the next batch until stress to all 500 tables is completed (after 4 cycles).
Each one of the 500 tables has both write and read load.
Closes #9309
Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)