-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-7485: Add metrics for time taken during mutation plan creation and execution #2061
base: master
Are you sure you want to change the base?
Conversation
@virajjasani @tkhurana @jpisaac can you please review the PR. Thanks |
@sanjeet006py Using nanoseconds resolution for measuring calls which themselves take less than 1ms adds considerable measurement overhead. AFAIK each call to System.nanosecond itself takes ~25- 30ns so for 2 calls you are adding double overhead. Now if the operation itself takes < 1 ms let us say 600ns you are adding a 10% measurement overhead. https://www.javaadvent.com/2019/12/measuring-time-from-java-to-kernel-and-back.html |
Thanks, it was a great article. I got two key takeaways from the article:
I was of understanding that by measuring mutation plan creation and execution only, I am already minimizing amount of calls being made to track time. I have seen time taken by mutation plan creation and execution is of the order of 5-15 micro seconds (thus, using nano seconds clock). Should we just track time taken by a single |
Have you looked at the time taken by the execution of mutationPlan for upsert select on a table which has large number of rows and client side deletes which first require rows to be scanned on the server to determine which rows to be deleted ? I am pretty sure their execution times will be much larger. |
I haven't benchmarked this use case explicitly but I agree that execution of mutation plan will be time taking in this due to needing to scan first to prepare row mutations. I benchmarked upsert value and that is pretty fast for a single call (5-15 micro seconds) thus, decided to measure elapsed time in nano seconds. |
@@ -487,7 +495,7 @@ public void testMetricsForUpsert() throws Exception { | |||
String t = entry.getKey(); | |||
assertEquals("Table names didn't match!", tableName, t); | |||
Map<MetricType, Long> p = entry.getValue(); | |||
assertEquals("There should have been sixteen metrics", 16, p.size()); | |||
assertEquals("There should have been sixteen metrics", 22, p.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either remove the text, or update the text
assertEquals("There should have been sixteen metrics", 22, p.size()); | |
assertEquals("There should have been 22 metrics", 22, p.size()); |
// Hence mutation metrics are not expected during connection close | ||
loggedConn.close(); | ||
assertTrue("Mutation write metrics are not logged for " + tableName2, | ||
mutationWriteMetricsMap.size() == 0); | ||
mutationWriteMetricsMap.size() > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the text for assertTrue
, and the comment above, should be updated to reflect the new reality.
@@ -1660,4 +1660,12 @@ public static long getCurrentScn(ReadOnlyProps props) { | |||
String scn = props.get(CURRENT_SCN_ATTRIB); | |||
return scn != null ? Long.parseLong(scn) : HConstants.LATEST_TIMESTAMP; | |||
} | |||
|
|||
public static long convertTimeInNsToMs(long value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these duplicate EnvironmentEdge#convertNsToMs
- can we share?
@@ -156,9 +156,9 @@ public void testPhoenixMetricsLoggedOnClose() throws Exception { | |||
// Autocommit is turned off by default | |||
// Hence mutation metrics are not expected during connection close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line 157 comment is still here though (are mutation metrics now expected during connection close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Yes, now when auto-commit
is off then we do expect mutation metrics to be there on connection close() as we are now recording metrics for executeUpdate()
call also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this could be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the scenario and the impact. The test makes me think that before, without a call to commit
or setAutoCommit(true)
, then an executeUpdate
is insufficient to make any writes, and therefore you have no mutation metrics. But now, even without a commit
, we have some write metrics. But what are those metrics? Is it just the time taken for the mutation planning? Should the test confirm that those are the only metrics present?
Otherwise no, it does not seem to be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the scenario and the impact.
The test seems to be asserting the metrics are being populated on calling Connection.close()
only when they should be populated and is testing two cases: for reads and for writes. Before my change, no metrics were expected for writes on calling Connection.close()
when commit()
has not been called (explicitly/implicitly). But now we do expect metrics to be populated for writes even if commit()
is not called and these metrics are for time taken in mutation plan creation and execution. So, time taken by mutation planning (plan creation + execution) is also being populated in MutationMetrics and will be available even if commit()
is not called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the test confirm that those are the only metrics present?
Sure, I can add this assertion also. Thanks
Summary of changes:
EnvironmentEdgeManager
andEnvironmentEdge
.executeMutation
call in nano seconds. There are already metrics available to track time taken by upserts and deletes inexecuteMutation
call but they track at milli seconds granularity. If we change those to track at nano seconds granularity then this will be a breaking non-backward compatible change.