Amazon Nova Micro: The Fastest Text Model on AWS Bedrock

Nova Micro is Amazon's text-only model with sub-millisecond time-to-first-token and a $0.035/1M input price — designed for high-volume classification, extraction, and routing pipelines inside AWS infrastructure.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 2, 2026

7 min read

// tags

#amazon-nova-micro#aws#speed#cost#text-only

FIG. ART-28

7 min read

“

Amazon Nova Micro: The Fastest Text Model on AWS Bedrock

// reading plan

sections

506

words

min read

// AI Cost & Optimization

Helicone: Track LLM Costs, Cache Responses, and Rate-Limit Users

Helicone sits between your app and LLM APIs as a one-line proxy — giving you per-user cost attribution, response caching, and rate limiting without changing your application logic.

7 min read

// Developer Tools

Running DeepSeek R1 via Together AI: Fastest Hosted Reasoning Model API

What Nova Micro Is Built For

Nova Micro is not designed to compete with GPT-4o or Claude 3.5 Sonnet on reasoning tasks. It is engineered for one purpose: the highest possible throughput at the lowest cost for simple text tasks that do not require image understanding or complex reasoning.

The target workloads:

Classification — route support tickets, categorize emails, classify intent
Extraction — pull structured fields from unstructured text
Routing — determine which specialized model or queue to send a request to
Simple Q&A — answer questions from a provided context passage
Filtering — determine if content meets criteria before expensive processing

For these tasks, Nova Micro's sub-millisecond time-to-first-token (TTFT) changes what is architecturally possible — you can run it synchronously in a request path without adding perceptible latency.

Pricing

| Model | Input ($/1M) | Output ($/1M) | Context | |---|---|---|---| | Nova Micro | $0.035 | $0.140 | 128k | | Nova Lite | $0.060 | $0.240 | 300k | | Nova Pro | $0.800 | $3.200 | 300k | | Claude 3 Haiku | $0.250 | $1.250 | 200k |

Nova Micro is roughly 7x cheaper than Claude 3 Haiku for input tokens — the cheapest general-purpose LLM available on AWS Bedrock.

Using Nova Micro via Bedrock SDK

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def classify_intent(user_message: str) -> str:
    response = bedrock.invoke_model(
        modelId="amazon.nova-micro-v1:0",
        body=json.dumps({
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "text": f"""Classify this customer message into one category.
Categories: BILLING, TECHNICAL, SHIPPING, RETURNS, OTHER

Message: {user_message}

Respond with only the category name."""
                        }
                    ]
                }
            ],
            "inferenceConfig": {
                "maxTokens": 10,
                "temperature": 0.0,
            }
        }),
    )
    result = json.loads(response["body"].read())
    return result["output"]["message"]["content"][0]["text"].strip()

# Example usage
intent = classify_intent("I was charged twice for my order last week")
print(intent)  # BILLING

Cascaded Routing Pattern with Nova Pro

The most cost-effective pattern on Bedrock is cascaded routing: use Nova Micro to classify complexity, then route hard requests to Nova Pro.

def intelligent_route(user_query: str) -> str:
    # Step 1: Nova Micro classifies complexity (cost: ~$0.000035 per call)
    complexity_check = bedrock.invoke_model(
        modelId="amazon.nova-micro-v1:0",
        body=json.dumps({
            "messages": [{"role": "user", "content": [{"text": f"""Is this query simple or complex?
Simple: factual questions, short extraction, yes/no answers
Complex: multi-step reasoning, analysis, synthesis, long generation

Query: {user_query}
Answer with one word: SIMPLE or COMPLEX"""}]}],
            "inferenceConfig": {"maxTokens": 5, "temperature": 0.0},
        }),
    )
    complexity = json.loads(complexity_check["body"].read())["output"]["message"]["content"][0]["text"].strip()

    # Step 2: Route accordingly
    if complexity == "SIMPLE":
        model_id = "amazon.nova-micro-v1:0"  # $0.035/1M
    else:
        model_id = "amazon.nova-pro-v1:0"    # $0.800/1M

    # Step 3: Generate actual response
    response = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps({
            "messages": [{"role": "user", "content": [{"text": user_query}]}],
            "inferenceConfig": {"maxTokens": 1024},
        }),
    )
    return json.loads(response["body"].read())["output"]["message"]["content"][0]["text"]

In practice, 60–80% of support queries are "simple" by this definition. A pipeline that routes correctly saves 20x on model costs for those queries.

Batch Inference on Bedrock

For non-latency-sensitive workloads (nightly report processing, weekly data enrichment), Bedrock Batch Inference applies additional discounts (up to 50% off) and removes the need to handle rate limits:

bedrock_client = boto3.client("bedrock", region_name="us-east-1")

job = bedrock_client.create_model_invocation_job(
    modelId="amazon.nova-micro-v1:0",
    jobName="nightly-classification-job",
    inputDataConfig={"s3InputDataConfig": {"s3Uri": "s3://my-bucket/input/"}},
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": "s3://my-bucket/output/"}},
)

Amazon Nova Micro: The Fastest Text Model on AWS Bedrock

Related Articles

Helicone: Track LLM Costs, Cache Responses, and Rate-Limit Users

Running DeepSeek R1 via Together AI: Fastest Hosted Reasoning Model API

What Nova Micro Is Built For

Pricing

Using Nova Micro via Bedrock SDK

Cascaded Routing Pattern with Nova Pro

Batch Inference on Bedrock

Links

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

GPT-4o Mini: When to Use It Instead of GPT-4o and Save 93% on Costs

Amazon Nova Micro: The Fastest Text Model on AWS Bedrock

Related Articles

Helicone: Track LLM Costs, Cache Responses, and Rate-Limit Users

Running DeepSeek R1 via Together AI: Fastest Hosted Reasoning Model API

What Nova Micro Is Built For

Pricing

Using Nova Micro via Bedrock SDK

Cascaded Routing Pattern with Nova Pro

Batch Inference on Bedrock

Links

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

GPT-4o Mini: When to Use It Instead of GPT-4o and Save 93% on Costs

The workspace your team
actually needs