What Nova Micro Is Built For
Nova Micro is not designed to compete with GPT-4o or Claude 3.5 Sonnet on reasoning tasks. It is engineered for one purpose: the highest possible throughput at the lowest cost for simple text tasks that do not require image understanding or complex reasoning.
The target workloads:
- Classification — route support tickets, categorize emails, classify intent
- Extraction — pull structured fields from unstructured text
- Routing — determine which specialized model or queue to send a request to
- Simple Q&A — answer questions from a provided context passage
- Filtering — determine if content meets criteria before expensive processing
For these tasks, Nova Micro's sub-millisecond time-to-first-token (TTFT) changes what is architecturally possible — you can run it synchronously in a request path without adding perceptible latency.
Pricing
| Model | Input ($/1M) | Output ($/1M) | Context | |---|---|---|---| | Nova Micro | $0.035 | $0.140 | 128k | | Nova Lite | $0.060 | $0.240 | 300k | | Nova Pro | $0.800 | $3.200 | 300k | | Claude 3 Haiku | $0.250 | $1.250 | 200k |
Nova Micro is roughly 7x cheaper than Claude 3 Haiku for input tokens — the cheapest general-purpose LLM available on AWS Bedrock.
Using Nova Micro via Bedrock SDK
import boto3
import json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
def classify_intent(user_message: str) -> str:
response = bedrock.invoke_model(
modelId="amazon.nova-micro-v1:0",
body=json.dumps({
"messages": [
{
"role": "user",
"content": [
{
"text": f"""Classify this customer message into one category.
Categories: BILLING, TECHNICAL, SHIPPING, RETURNS, OTHER
Message: {user_message}
Respond with only the category name."""
}
]
}
],
"inferenceConfig": {
"maxTokens": 10,
"temperature": 0.0,
}
}),
)
result = json.loads(response["body"].read())
return result["output"]["message"]["content"][0]["text"].strip()
# Example usage
intent = classify_intent("I was charged twice for my order last week")
print(intent) # BILLING
Cascaded Routing Pattern with Nova Pro
The most cost-effective pattern on Bedrock is cascaded routing: use Nova Micro to classify complexity, then route hard requests to Nova Pro.
def intelligent_route(user_query: str) -> str:
# Step 1: Nova Micro classifies complexity (cost: ~$0.000035 per call)
complexity_check = bedrock.invoke_model(
modelId="amazon.nova-micro-v1:0",
body=json.dumps({
"messages": [{"role": "user", "content": [{"text": f"""Is this query simple or complex?
Simple: factual questions, short extraction, yes/no answers
Complex: multi-step reasoning, analysis, synthesis, long generation
Query: {user_query}
Answer with one word: SIMPLE or COMPLEX"""}]}],
"inferenceConfig": {"maxTokens": 5, "temperature": 0.0},
}),
)
complexity = json.loads(complexity_check["body"].read())["output"]["message"]["content"][0]["text"].strip()
# Step 2: Route accordingly
if complexity == "SIMPLE":
model_id = "amazon.nova-micro-v1:0" # $0.035/1M
else:
model_id = "amazon.nova-pro-v1:0" # $0.800/1M
# Step 3: Generate actual response
response = bedrock.invoke_model(
modelId=model_id,
body=json.dumps({
"messages": [{"role": "user", "content": [{"text": user_query}]}],
"inferenceConfig": {"maxTokens": 1024},
}),
)
return json.loads(response["body"].read())["output"]["message"]["content"][0]["text"]
In practice, 60–80% of support queries are "simple" by this definition. A pipeline that routes correctly saves 20x on model costs for those queries.
Batch Inference on Bedrock
For non-latency-sensitive workloads (nightly report processing, weekly data enrichment), Bedrock Batch Inference applies additional discounts (up to 50% off) and removes the need to handle rate limits:
bedrock_client = boto3.client("bedrock", region_name="us-east-1")
job = bedrock_client.create_model_invocation_job(
modelId="amazon.nova-micro-v1:0",
jobName="nightly-classification-job",
inputDataConfig={"s3InputDataConfig": {"s3Uri": "s3://my-bucket/input/"}},
outputDataConfig={"s3OutputDataConfig": {"s3Uri": "s3://my-bucket/output/"}},
)