-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Vertex AI instrumentation #3069
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad to see this! I have some initial comments which are basically an effort to conserve time.
If the first PR has basically exactly the same tests, readme and examples that are in openai, we have less to talk about which means less latency in getting this merged.
Afterwards, we can build on it to additionally add vision model etc. Meanwhile, please keep tests and fixtures exactly the same except for obvious s/openai/vertexai and library invocation differences. Then, if you can self-review highlighting parts unique to vertex for the next person.
Hope this guidance helps!
...pentelemetry-instrumentation-vertexai-v2/tests/cassettes/test_vertexai_generate_content.yaml
Outdated
Show resolved
Hide resolved
@pytest.fixture(autouse=True) | ||
def vertexai_init(vcr: VCR) -> None: | ||
# Unfortunately I couldn't find a nice way to globally reset the global_config for each | ||
# test because different vertex submodules reference the global instance directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind opening a github issue on this in the google project, so we can link it and click there to see the details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear this isn't a bug or anything. It's just difficult to monkey patch. This more of a note to self during development, I'd prefer to remove this comment as it's a bit distracting
|
||
|
||
@pytest.mark.vcr | ||
def test_vertexai_generate_content(exporter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe start with plain chat tests from openai (just convert them) I think vision/multi-modal could be better discussed in aggregate across the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update tomorrow
instrumentation-genai/opentelemetry-instrumentation-vertexai-v2/example/main.py
Show resolved
Hide resolved
|
||
vertexai.init(location="us-central1") | ||
|
||
model = GenerativeModel("gemini-1.5-flash-002") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar can you use an ENV? even better to basically copy the openai example word for word then adjust the library calls etc. This reduces the amount of attention needed to land your change vs having differences in setup and also library https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation-genai/opentelemetry-instrumentation-openai-v2/examples/zero-code/main.py#L9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually copied this directly from the Vertex quickstart. I can try adapt it to be the same as the OpenAI sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @codefromthecrypt. Before you do a full review (I'll mark the PR as not draft), I know this PR is big with a lot of boilerplate and copied code from openllmetry package. I could split it into multiple PRs so we don't need to bikeshed on code from openllmetry and can rather just review the changes. Wdyt?
All the tests were disabled in openllmetry so I'd like to get test coverage up before doing any serious refactoring.
|
||
vertexai.init(location="us-central1") | ||
|
||
model = GenerativeModel("gemini-1.5-flash-002") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually copied this directly from the Vertex quickstart. I can try adapt it to be the same as the OpenAI sample
instrumentation-genai/opentelemetry-instrumentation-vertexai-v2/example/main.py
Show resolved
Hide resolved
...pentelemetry-instrumentation-vertexai-v2/tests/cassettes/test_vertexai_generate_content.yaml
Outdated
Show resolved
Hide resolved
@pytest.fixture(autouse=True) | ||
def vertexai_init(vcr: VCR) -> None: | ||
# Unfortunately I couldn't find a nice way to globally reset the global_config for each | ||
# test because different vertex submodules reference the global instance directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear this isn't a bug or anything. It's just difficult to monkey patch. This more of a note to self during development, I'd prefer to remove this comment as it's a bit distracting
|
||
|
||
@pytest.mark.vcr | ||
def test_vertexai_generate_content(exporter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update tomorrow
@aabmass since the code style, conventions, semver, approach to handling events etc are very different in openllmetry vs how openai is instrumented, still suggest to start with how openai is done here and adapt vertexai vs the other way. That's just my opinion considering how many review comments may result from doing otherwise. You choose! |
@lmolkova @lzchen @nirga @karthikscale3 @alizenhom I can see we have different coding pattern and file layout for instrumentation from different repos
Shall we reach to an agreement for the coding pattern and file layout, filing naming style etc. in this early stage? Thanks! |
@gyliu513 I actually think everything in this draft PR is matching the format and file layout of this repo. Is there something in particular that's different? |
Taking a step back, can we agree on a step by step process of migrating a package into this repo from openllmetry? As a reviewer it is easier to review PRs that are focused. I think it might be easier to have folks first send a PR copying the code in without any significant changes. Then they can send smaller focused PRs (a PR to get CI running with tox, then a PR migrating to events, etc.) to bring them up to repo style. There are 26 instrumentations in openllmetry to migrate, and it seems daunting to review a full rewrite for each one. I can write a proposal for this on an issue if folks would be interested. |
@aabmass I totally agree with you, based on the comments from yesterday's discussion, I think @nirga is going to share some migration plan soon. The first instrumentation for OpenAI is from https://github.com/Scale3-Labs/langtrace-python-sdk/tree/main/src/langtrace_python_sdk/instrumentation/openai , and it has different code style compared with existing traceloop/openllmetry instrumentation code, that's my concern. |
Ah thanks I didn't know this came from langtrace.
any specific differences you would like to call out? IMO it's OK for the structure to be different |
It really depends on the library to trace I guess, haven't looked at vertex ai but for openai it was splitted like this:
So I would do more or less the same for others too. Maybe we can add a first step adding only the instrumentor class with the tox and workflows to remove some boilerplate for later. |
@aabmass please take a look at the Having same code pattern, coding format, style etc. can help future contribution as well. |
@gyliu513 I see what you mean. I want to call out that there are already 51 instrumentation packages in this repo, with lots of different patterns and idioms. The techniques used for GenAI instrumentation are not really that different and we still don't have a consensus on how to instrument. So my preference here is more practical–I think we should lean on existing implementations (langtrace, openllmetry, etc.) and not make updating to the |
TL'DR' Ultimately, if the maintainers of this project are happy with intentional anchoring towards external repos, that's ok with me. Just it is more work to maintain change that crossed multiple projects and there will be a lot of that. So, just be conscious and meanwhile glad to see folks joining and I'll help regardless. FWIW, my main concern is at least the tests are the same as how we do them in openai-v2, as for example, this helps with things openllmetry doesn't yet do, like content capturing into logs. The openai-v2 code had a lot of diverse feedback, folks from openllemetry, developing alternatives, as well folks not using openai but experts at this repo. I believe it is helpful to look at that work in openai-v2, but even having many more people involved, it is indeed not a pure consensus (no code is with an unbounded group). I strongly believe in house rules. This is not the openllmetry project, this is opentelemetry contrib. We should try our best to follow what the maintainers here have approved, and not intentionally deviate in style or structure. This makes maintenance so much easier especially staring down many potential refactors in the future. If I've not convinced, that's fine, but thanks for reading! |
Another thing I will mention is that the authors of "openai-v2" were subject to a higher bar in review. They weren't told to skip steps in the process of review for convenience of making it similar to their original source. While we aren't talking about fairness, I think knowing that we are following a different model now is important. Fast and fix later vs get in in a shape we want. cc @lmolkova as if this is the case, we should probably appoint zookeepers to go back and fix the new packages once merged. I'm ok doing that. |
With many vendors involved and the first LLM providers PRs not authored by openllmetry, rather by different folks, with openllmetry also contributing, this begs some Q&A:
I started a google doc that isn't yet and may never be sanctioned by this group. I want to emphasize that there are several active, code contributing vendors who also have many of the same instrumentation. The process of merging openai-v2 was the first time multiple folks got together and agreed on a style. That is a moving target as more features are added. The main point is, let's please not assume there is one source project that code comes from, as so far that hasn't been the case. https://docs.google.com/document/d/16KLUHUUXKyNmpE8Vt9VFcJTiCcYd6AckL222b_DujAU/edit?tab=t.0 |
instrumentation-genai/opentelemetry-instrumentation-vertexai-v2/example/main.py
Show resolved
Hide resolved
|
||
|
||
def should_send_prompts(): | ||
# Previously was opt-in by the following check for privacy reasons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aabmass FYI, @lzchen is building some common utiles to help this https://github.com/open-telemetry/opentelemetry-python-contrib/pull/3081/files#diff-cebc2e35b4637202a2226845d87b5093dff75151a31c30000201327be64e6cf0
# https://github.com/traceloop/openllmetry/blob/v0.33.12/packages/opentelemetry-instrumentation-vertexai/opentelemetry/instrumentation/vertexai/__init__.py#L141-L143 | ||
# I guess prompt may be in kwargs instead or in addition? | ||
prompt = kwargs.get("prompt") | ||
if prompt: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we handle prompt
twice here? Another one is in Line 165
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment, was wondering the same thing lol hoping to get clarity from TraceLoop folks who originally wrote this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nirga ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aabmass @gyliu513 it's to support the original language_models
API:
https://cloud.google.com/vertex-ai/generative-ai/docs/reference/python/latest/vertexai.language_models.TextGenerationModel#vertexai_language_models_TextGenerationModel_predict
@aabmass @codefromthecrypt @gyliu513 @xrmx my 2c for the discussion around structure above - |
"gen_ai.system": "vertex_ai", | ||
"gen_ai.operation.name": "text_completion", | ||
"gen_ai.request.model": "gemini-pro-vision", | ||
"gen_ai.response.model": "gemini-pro-vision", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it always the case that vertex request and response model are the same? If so, we should probably clarify semconv and only populate the request model in this case. WDYT?
instrumentation-genai/opentelemetry-instrumentation-vertexai-v2/example/.env
Show resolved
Hide resolved
instrumentation-genai/opentelemetry-instrumentation-vertexai-v2/example/.env
Show resolved
Hide resolved
span, event_logger, llm_model, complete_response, token_usage | ||
) | ||
|
||
span.set_status(Status(StatusCode.OK)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't set span status in instrumentations to OK
- only to ERROR
if there is an error. In general case we don't know what constitutes a success.
): | ||
complete_response = "" | ||
token_usage = None | ||
async for item in response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to know if response terminates due to an error? we should try to catch it and record an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out. Usually if code can be structured to use a with span: ...
block, it also records error on the span. I can dig into it
span, event_logger, llm_model, complete_response, token_usage | ||
) | ||
|
||
span.set_status(Status(StatusCode.OK)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
span.set_status(Status(StatusCode.OK)) |
response.usage_metadata, | ||
) | ||
|
||
span.set_status(Status(StatusCode.OK)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
span.set_status(Status(StatusCode.OK)) |
return await wrapped(*args, **kwargs) | ||
|
||
llm_model = "unknown" | ||
if hasattr(instance, "_model_id"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like a slippery slope to rely on internal APIs, is there some way we can avoid it?
**kwargs, | ||
): | ||
tracer = get_tracer( | ||
__name__, __version__, tracer_provider=tracer_provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add schema_url - that's important for consumers to know which version is being emitted.
event_logger = get_event_logger( | ||
__name__, | ||
version=__version__, | ||
event_logger_provider=event_logger_provider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add schema_url here too
def user_event( | ||
*, | ||
gen_ai_system: str, | ||
# TODO: should I just leave role out since it's not required if "user" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I don't see why it'd be necessary in the event body given it's already in the name.
}, | ||
body={ | ||
"role": role, | ||
"content": content, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we record content only if it's enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we skip emitting the event altogether if disabled, or emit event without content?
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
__version__ = "2.1b0.dev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not "2.0b0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how all of the packages are versioned not on the release commit, I guess similar to SNAPSHOT
in java. See e.g.
Line 15 in 54cbf59
__version__ = "2.1b0.dev" |
From my perspective what's important (in descending order):
All of this things are fixable while libs are not stable. It's definitely easier to maintain any of these libraries if they follow the same patterns and layout, but it not should be a hard requirement. I think we can move faster if reviewers would point to specific discrepancies if they see them and maybe we can create a template with all the boilerplate someone would copy when they contribute new instrumentation. Talking about copy-paste:
I'm not aware of any repos in OTel that prohibit legal copy-paste and I don't see why this repo should. |
build-backend = "hatchling.build" | ||
|
||
[project] | ||
name = "opentelemetry-instrumentation-vertexai-v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this package will have feature parity with https://github.com/traceloop/openllmetry/blob/main/packages/opentelemetry-instrumentation-vertexai/
I think we should publish it as opentelemetry-instrumentation-vertexai
@nirga what would be your thoughts on this?
In particular, OTel can start publishing to v1 (or v2) train and OpenLLMetry can still continue v0 train if necessary.
On if we should copy paste, I wrote this
To answer that, @lmolkova clarified we don't mind copy/pasting and pointed a document describing what to do about it For example this sort of NOTICE entry or similar:
Since I gather at this point, the genai team are happy with copy/pasting, next step is to attribute copy/pasted parts back to openllmetry either file by file, a NOTICE in the package root or something else. A soft agreement in the PR comments won't be visible to those using this package, so I think a file change is needed. @lzchen @xrmx since this is ultimately a python repository and you are maintainers, what should be done here? I didn't see any NOTICE files, so it isn't quite clear in this specific repo the way out? |
Thanks everyone for your feedback. I'm breaking this out into smaller PRs incorporating the comments here. For now #3123 has the boilerplate and the examples/conftest.py are left as similar as possible to openai-v2. PTAL. To wrap up the discussions here
Any concerns, lets please discuss on an issue. I'll leave this PR open until I get it all sent out in smaller PRs, but planning to close it after that. |
Good plan @aabmass thanks for the effort put into it and the summary |
Description
Part of #3041
Adds vertexai instrumentation with updates to match the latest semantic conventions.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Added some new Cassette tests with a lot of sanitization logic which seems to work.
Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.