LiteLLM Minor Fixes & Improvements (12/23/2024) - p3 (#7394)

* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching * fix(context_caching/transformation.py): just use last identified cache point Fixes #6738 * fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google Fixes #6738 * fix(vertex_ai/gemini/): track context caching tokens * refactor(gemini/): place transformation.py inside `chat/` folder make it easy for user to know we support the equivalent endpoint * fix: fix import * refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder make it easier to see cost calculation logic * fix: fix linting errors * fix: fix circular import * feat(gemini/cost_calculator.py): support gemini context caching cost calculation generifies anthropic's cost calculation function and uses it across anthropic + gemini * build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching Closes #6891 * docs(gemini.md): add gemini context caching architecture diagram make it easier for user to understand how context caching works * docs(gemini.md): link to relevant gemini context caching code * docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code * fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario * fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation * test: fix test
BerriAI · Dec 24, 2024 · c3edfc2 · c3edfc2
1 parent 442d309
commit c3edfc2
Show file tree

Hide file tree

Showing 20 changed files with 692 additions and 420 deletions.
diff --git a/docs/my-website/docs/providers/gemini.md b/docs/my-website/docs/providers/gemini.md
@@ -10,7 +10,8 @@ import TabItem from '@theme/TabItem';
 | Provider Route on LiteLLM | `gemini/` |
 | Provider Doc | [Google AI Studio ↗](https://ai.google.dev/aistudio) |
 | API Endpoint for Provider | https://generativelanguage.googleapis.com |
-| Supported Endpoints | `/chat/completions`, `/embeddings` |
+| Supported OpenAI Endpoints | `/chat/completions`, `/embeddings`, `/completions` |
+| Pass-through Endpoint | [Supported](../pass_through/google_ai_studio.md) |
 
 <br />
 
@@ -552,24 +553,179 @@ content = response.get('choices', [{}])[0].get('message', {}).get('content')
 print(content)
 ```
 
+## Usage - PDF / Videos / etc. Files
+
+### Inline Data (e.g. audio stream)
+
+LiteLLM follows the OpenAI format and accepts sending inline data as an encoded base64 string. 
+
+The format to follow is 
+
+```python
+data:<mime_type>;base64,<encoded_data>
+```
+
+** LITELLM CALL **
+
+```python
+import litellm
+from pathlib import Path
+import base64
+import os
+
+os.environ["GEMINI_API_KEY"] = "" 
+
+litellm.set_verbose = True # 👈 See Raw call 
+
+audio_bytes = Path("speech_vertex.mp3").read_bytes()
+encoded_data = base64.b64encode(audio_bytes).decode("utf-8")
+print("Audio Bytes = {}".format(audio_bytes))
+model = "gemini/gemini-1.5-flash"
+response = litellm.completion(
+    model=model,
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": "Please summarize the audio."},
+                {
+                    "type": "image_url",
+                    "image_url": "data:audio/mp3;base64,{}".format(encoded_data), # 👈 SET MIME_TYPE + DATA
+                },
+            ],
+        }
+    ],
+)
+```
+
+** Equivalent GOOGLE API CALL ** 
+
+```python
+# Initialize a Gemini model appropriate for your use case.
+model = genai.GenerativeModel('models/gemini-1.5-flash')
+
+# Create the prompt.
+prompt = "Please summarize the audio."
+
+# Load the samplesmall.mp3 file into a Python Blob object containing the audio
+# file's bytes and then pass the prompt and the audio to Gemini.
+response = model.generate_content([
+    prompt,
+    {
+        "mime_type": "audio/mp3",
+        "data": pathlib.Path('samplesmall.mp3').read_bytes()
+    }
+])
+
+# Output Gemini's response to the prompt and the inline audio.
+print(response.text)
+```
+
+### https:// file 
+
+```python
+import litellm
+import os
+
+os.environ["GEMINI_API_KEY"] = "" 
+
+litellm.set_verbose = True # 👈 See Raw call 
+
+model = "gemini/gemini-1.5-flash"
+response = litellm.completion(
+    model=model,
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": "Please summarize the file."},
+                {
+                    "type": "image_url",
+                    "image_url": "https://storage..." # 👈 SET THE IMG URL
+                },
+            ],
+        }
+    ],
+)
+```
+
+### gs:// file 
+
+```python
+import litellm
+import os
+
+os.environ["GEMINI_API_KEY"] = "" 
+
+litellm.set_verbose = True # 👈 See Raw call 
+
+model = "gemini/gemini-1.5-flash"
+response = litellm.completion(
+    model=model,
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": "Please summarize the file."},
+                {
+                    "type": "image_url",
+                    "image_url": "gs://..." # 👈 SET THE cloud storage bucket url
+                },
+            ],
+        }
+    ],
+)
+```
+
+
+## Chat Models
+:::tip
+
+**We support ALL Gemini models, just set `model=gemini/<any-model-on-gemini>` as a prefix when sending litellm requests**
+
+:::
+| Model Name            | Function Call                                          | Required OS Variables          |
+|-----------------------|--------------------------------------------------------|--------------------------------|
+| gemini-pro            | `completion(model='gemini/gemini-pro', messages)`            | `os.environ['GEMINI_API_KEY']` |
+| gemini-1.5-pro-latest | `completion(model='gemini/gemini-1.5-pro-latest', messages)` | `os.environ['GEMINI_API_KEY']` |
+| gemini-pro-vision     | `completion(model='gemini/gemini-pro-vision', messages)`     | `os.environ['GEMINI_API_KEY']` |
+
+
+
 ## Context Caching
 
 Use Google AI Studio context caching is supported by
 
 ```bash
 {
-    ...,
-    "cache_control": {"type": "ephemeral"}
+    {
+        "role": "system",
+        "content": ...,
+        "cache_control": {"type": "ephemeral"} # 👈 KEY CHANGE
+    },
+    ...
 }
 ```
 
 in your message content block.
 
-:::note
+### Architecture Diagram
+
+<Image img={require('../../img/gemini_context_caching.png')} />
+
+
+
+**Notes:**
 
-Gemini Context Caching only allows 1 block of continuous messages to be cached. 
+- [Relevant code](https://github.com/BerriAI/litellm/blob/main/litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py#L255)
+
+- Gemini Context Caching only allows 1 block of continuous messages to be cached. 
+
+- If multiple non-continuous blocks contain `cache_control` - the first continuous block will be used. (sent to `/cachedContent` in the [Gemini format](https://ai.google.dev/api/caching#cache_create-SHELL))
+
+
+- The raw request to Gemini's `/generateContent` endpoint looks like this: 
 
-The raw request to Gemini looks like this: 
 ```bash
 curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-001:generateContent?key=$GOOGLE_API_KEY" \
 -H 'Content-Type: application/json' \
@@ -587,7 +743,8 @@ curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5
 
 ```
 
-:::
+
+### Example Usage
 
 <Tabs>
 <TabItem value="sdk" label="SDK">
@@ -720,140 +877,3 @@ response = await client.chat.completions.create(
 
 </TabItem>
 </Tabs>
-
-## Usage - PDF / Videos / etc. Files
-
-### Inline Data (e.g. audio stream)
-
-LiteLLM follows the OpenAI format and accepts sending inline data as an encoded base64 string. 
-
-The format to follow is 
-
-```python
-data:<mime_type>;base64,<encoded_data>
-```
-
-** LITELLM CALL **
-
-```python
-import litellm
-from pathlib import Path
-import base64
-import os
-
-os.environ["GEMINI_API_KEY"] = "" 
-
-litellm.set_verbose = True # 👈 See Raw call 
-
-audio_bytes = Path("speech_vertex.mp3").read_bytes()
-encoded_data = base64.b64encode(audio_bytes).decode("utf-8")
-print("Audio Bytes = {}".format(audio_bytes))
-model = "gemini/gemini-1.5-flash"
-response = litellm.completion(
-    model=model,
-    messages=[
-        {
-            "role": "user",
-            "content": [
-                {"type": "text", "text": "Please summarize the audio."},
-                {
-                    "type": "image_url",
-                    "image_url": "data:audio/mp3;base64,{}".format(encoded_data), # 👈 SET MIME_TYPE + DATA
-                },
-            ],
-        }
-    ],
-)
-```
-
-** Equivalent GOOGLE API CALL ** 
-
-```python
-# Initialize a Gemini model appropriate for your use case.
-model = genai.GenerativeModel('models/gemini-1.5-flash')
-
-# Create the prompt.
-prompt = "Please summarize the audio."
-
-# Load the samplesmall.mp3 file into a Python Blob object containing the audio
-# file's bytes and then pass the prompt and the audio to Gemini.
-response = model.generate_content([
-    prompt,
-    {
-        "mime_type": "audio/mp3",
-        "data": pathlib.Path('samplesmall.mp3').read_bytes()
-    }
-])
-
-# Output Gemini's response to the prompt and the inline audio.
-print(response.text)
-```
-
-### https:// file 
-
-```python
-import litellm
-import os
-
-os.environ["GEMINI_API_KEY"] = "" 
-
-litellm.set_verbose = True # 👈 See Raw call 
-
-model = "gemini/gemini-1.5-flash"
-response = litellm.completion(
-    model=model,
-    messages=[
-        {
-            "role": "user",
-            "content": [
-                {"type": "text", "text": "Please summarize the file."},
-                {
-                    "type": "image_url",
-                    "image_url": "https://storage..." # 👈 SET THE IMG URL
-                },
-            ],
-        }
-    ],
-)
-```
-
-### gs:// file 
-
-```python
-import litellm
-import os
-
-os.environ["GEMINI_API_KEY"] = "" 
-
-litellm.set_verbose = True # 👈 See Raw call 
-
-model = "gemini/gemini-1.5-flash"
-response = litellm.completion(
-    model=model,
-    messages=[
-        {
-            "role": "user",
-            "content": [
-                {"type": "text", "text": "Please summarize the file."},
-                {
-                    "type": "image_url",
-                    "image_url": "gs://..." # 👈 SET THE cloud storage bucket url
-                },
-            ],
-        }
-    ],
-)
-```
-
-
-## Chat Models
-:::tip
-
-**We support ALL Gemini models, just set `model=gemini/<any-model-on-gemini>` as a prefix when sending litellm requests**
-
-:::
-| Model Name            | Function Call                                          | Required OS Variables          |
-|-----------------------|--------------------------------------------------------|--------------------------------|
-| gemini-pro            | `completion(model='gemini/gemini-pro', messages)`            | `os.environ['GEMINI_API_KEY']` |
-| gemini-1.5-pro-latest | `completion(model='gemini/gemini-1.5-pro-latest', messages)` | `os.environ['GEMINI_API_KEY']` |
-| gemini-pro-vision     | `completion(model='gemini/gemini-pro-vision', messages)`     | `os.environ['GEMINI_API_KEY']` |
diff --git a/docs/my-website/img/gemini_context_caching.png b/docs/my-website/img/gemini_context_caching.png
diff --git a/litellm/__init__.py b/litellm/__init__.py
@@ -1049,9 +1049,11 @@ def add_known_models():
 from .llms.deprecated_providers.aleph_alpha import AlephAlphaConfig
 from .llms.vertex_ai.gemini.vertex_and_google_ai_studio_gemini import (
     VertexGeminiConfig,
-    GoogleAIStudioGeminiConfig,
     VertexAIConfig,
-    GoogleAIStudioGeminiConfig as GeminiConfig,
+)
+from .llms.gemini.chat.transformation import (
+    GoogleAIStudioGeminiConfig,
+    GoogleAIStudioGeminiConfig as GeminiConfig,  # aliased to maintain backwards compatibility
 )