Google Gemini
Google's multimodal AI model family, available through Vertex AI and the Gemini API. Gemini models are natively multimodal, processing text, images, audio, and video in a unified architecture. Gemini 1.5 Pro has a 2M token context window.
Implements
Concepts this tool claims to implement:
- Large Language Model primary
Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini Ultra. Competitive with GPT-4 on benchmarks.
- Multimodal Model primary
Native multimodal architecture—text, image, audio, video inputs processed by the same model, not separate encoders.
- Context Window primary
Gemini 1.5 Pro supports up to 2 million tokens—the largest context window available, enabling entire codebases or books.
- Function Calling secondary
Function calling support in Gemini API. Tool definitions and parallel function calls.
- Grounding secondary
Google Search grounding to connect responses to current web information. Vertex AI grounding with custom data.
Integration Surfaces
Details
- Vendor
- License
- proprietary
- Runs On
- cloud
- Used By
- human, agent, system
Links
Notes
Gemini's 2M context window is a differentiator for processing large documents. The native multimodal architecture (vs. separate vision encoders) enables more natural cross-modal reasoning. Available in Google Cloud and through the standalone Gemini API.