How to Use Amazon Nova Now with Invoke API

5 min readDec 4, 2024

Amazon has recently unveiled its latest generation of foundation models, known as Amazon Nova, at the AWS re:Invent conference. These models are designed to provide advanced capabilities in artificial intelligence, particularly in multimodal tasks that involve processing and generating text, images, and videos. With a focus on performance and cost-effectiveness, Amazon Nova aims to set new standards in the AI landscape, competing directly with leading models such as OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5.

Hey, if you are working with APIs, Apidog is here to make your life easier. It’s an all-in-one API development tool that streamlines the entire process — from design and documentation to testing and debugging.

Key Features of Amazon Nova

Amazon Nova comprises several models tailored for different tasks and performance needs:

Amazon Nova Micro: A text-only model optimized for low latency and cost, delivering rapid responses.
Amazon Nova Lite: A multimodal model that processes text, images, and videos quickly at a low cost.
Amazon Nova Pro: Offers a balanced combination of accuracy, speed, and cost for a variety of tasks.
Amazon Nova Premier: The most advanced model for complex reasoning tasks, set to be available in early 2025.
Amazon Nova Canvas: A model focused on generating high-quality images from textual prompts.
Amazon Nova Reel: Designed for creating high-quality videos based on text and image inputs.

These models are integrated into Amazon Bedrock, a fully managed service that allows users to access various foundation models via a single API.

Amazon Nova’s Performance Benchmarks

Amazon conducted extensive testing of the Nova models against industry-standard benchmarks. Notably:

Amazon Nova Micro outperformed Meta’s LLaMa 3.1 8B and Google’s Gemini 1.5 Flash-8B across multiple benchmarks, achieving an impressive output speed of 210 tokens per second.
Amazon Nova Lite demonstrated exceptional speed and efficiency in processing multimodal inputs.
Amazon Nova Pro excelled in accuracy across several benchmarks compared to GPT-4o, Gemini 1.5 Pro, and Claude Sonnet 3.5v2. It achieved superior results in instruction-following tasks and multimodal workflows.

The models also support multilingual capabilities, with support for over 200 languages, and can handle extensive context lengths — up to 300K tokens for Lite and Pro models.

Cost Efficiency

One of the standout features of Amazon Nova is its cost-effectiveness. The models are priced significantly lower than competing offerings while maintaining high performance levels. For instance, they are reported to be at least 75% less expensive than other leading models in their respective categories.

Customization and Fine-Tuning

The Amazon Nova models support custom fine-tuning, allowing users to adapt the models to their specific datasets for improved accuracy. This feature is particularly beneficial for organizations looking to leverage proprietary data effectively.Additionally, the distillation process enables knowledge transfer from larger models (teacher models) to smaller ones, enhancing efficiency while retaining high accuracy.

Running Amazon Nova via API

To utilize Amazon Nova’s capabilities through an API, developers can integrate it using the AWS SDK (Boto3) in Python. Here’s a step-by-step guide on how to get started:

The Invoke API allows you to interact with Amazon Nova models (Micro, Lite, and Pro) in a consistent manner, supporting multi-turn conversations and both buffered and streamed responses. Below is a step-by-step guide along with a code snippet that illustrates how to use the Invoke API effectively.

Step-by-Step Guide

Set Up Your Environment: Ensure you have the AWS SDK for Python (Boto3) installed and configured with your AWS credentials.
Create a Bedrock Runtime Client: This client will allow you to interact with the Nova models.
Define Your System Prompt and User Messages: Create prompts that guide the model’s behavior.
Configure Inference Parameters: Set parameters such as token limits, temperature, and other settings that affect the model’s output.
Invoke the Model: Use the invoke_model_with_response_stream method to get responses from the model.

Example Code

Here’s an updated example of how to use the Invoke API with Amazon Nova Lite:

import boto3
import json
from datetime import datetime

# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client("bedrock-runtime", region_name="us-east-1")
LITE_MODEL_ID = "us.amazon.nova-lite-v1:0"
# Define your system prompt(s).
system_list = [
    {
        "text": "Act as a creative writing assistant. When the user provides you with a topic, write a short story about that topic."
    }
]
# Define one or more messages using the "user" and "assistant" roles.
message_list = [{"role": "user", "content": [{"text": "A camping trip"}]}]
# Configure the inference parameters.
inf_params = {
    "max_new_tokens": 500,
    "top_p": 0.9,
    "top_k": 20,
    "temperature": 0.7
}
request_body = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}
start_time = datetime.now()
# Invoke the model with the response stream
response = client.invoke_model_with_response_stream(
    modelId=LITE_MODEL_ID, 
    body=json.dumps(request_body)
)
request_id = response.get("ResponseMetadata").get("RequestId")
print(f"Request ID: {request_id}")
print("Awaiting first token...")
chunk_count = 0
time_to_first_token = None
# Process the response stream
stream = response.get("body")
if stream:
    for event in stream:
        chunk = event.get("chunk")
        if chunk:
            # Print the response chunk
            chunk_json = json.loads(chunk.get("bytes").decode())
            content_block_delta = chunk_json.get("contentBlockDelta")
            if content_block_delta:
                if time_to_first_token is None:
                    time_to_first_token = datetime.now() - start_time
                    print(f"Time to first token: {time_to_first_token}")
                chunk_count += 1
                print(content_block_delta.get("delta").get("text"), end="")
    print(f"\nTotal chunks: {chunk_count}")
else:
    print("No response stream received.")

Key Features of This Example

Buffered and Streamed Responses: The code demonstrates how to handle both types of responses from the model.
System Prompts: The system prompt helps direct the model’s behavior effectively.
Inference Parameters: Adjusting parameters like max_new_tokens, top_p, top_k, and temperature allows for fine-tuning of responses based on your requirements.
Response Handling: The code processes streamed responses in real-time, providing an interactive experience.

If you are seeking an All-in-One AI platform that manages all your AI subscriptions in one place, including

Virtually any LLMs, such as: Claude 3.5 Sonnet, Google Gemini, GPT-40 and GPT-o1, Qwen Models & Other Open Source Models.
You can even use the uncensored Dolphin Mistral & Llama models!
Best AI Image Generation Models such as: FLUX, Stable Diffusion 3.5, Recraft