What Makes Weaviate Different
Most vector databases store pre-computed vectors — you embed content yourself and hand the vector to the DB. Weaviate goes further: it ships built-in vectorizer modules that embed content automatically at ingest time. Define your schema once with a vectorizer configured, and Weaviate handles the rest. This is especially powerful for multimodal apps where you want to embed both text and images in the same collection.
Schema and Data Classes
Weaviate uses a schema of "data classes" (analogous to collections):
import weaviate
client = weaviate.connect_to_local()
client.collections.create(
name="Article",
vectorizer_config=weaviate.classes.config.Configure.Vectorizer.text2vec_openai(),
properties=[
weaviate.classes.config.Property(name="title", data_type=weaviate.classes.config.DataType.TEXT),
weaviate.classes.config.Property(name="body", data_type=weaviate.classes.config.DataType.TEXT),
],
)
With text2vec_openai configured, inserting a document triggers automatic embedding via the OpenAI embeddings API.
Built-In Vectorizer Modules
| Module | Input | Notes |
|---|---|---|
| text2vec-openai | Text | Calls OpenAI embeddings API |
| text2vec-cohere | Text | Cohere embed API |
| text2vec-transformers | Text | Local HuggingFace model |
| multi2vec-clip | Text + Images | CLIP embeddings, multimodal |
| img2vec-neural | Images | ResNet-50, self-hosted |
Batch Import
articles = client.collections.get("Article")
with articles.batch.dynamic() as batch:
for item in my_data:
batch.add_object({"title": item["title"], "body": item["body"]})
Weaviate handles batching, rate limiting, and retries automatically.
Hybrid Search (BM25 + Vector)
results = articles.query.hybrid(
query="LLM memory management techniques",
alpha=0.5, # 0 = pure BM25, 1 = pure vector
limit=5,
)
for r in results.objects:
print(r.properties["title"])
alpha=0.5 gives equal weight to keyword and semantic similarity. Tune based on your dataset.
GraphQL Query API
Weaviate also exposes a GraphQL API for complex queries:
{
Get {
Article(
nearText: {concepts: ["memory management"]}
limit: 3
) {
title
body
_additional { certainty }
}
}
}
Docker Self-Host
version: "3.8"
services:
weaviate:
image: semitechnologies/weaviate:latest
ports:
- "8080:8080"
environment:
QUERY_DEFAULTS_LIMIT: 20
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
DEFAULT_VECTORIZER_MODULE: text2vec-openai
ENABLE_MODULES: text2vec-openai,multi2vec-clip
OPENAI_APIKEY: "${OPENAI_API_KEY}"
Weaviate Cloud
Weaviate Cloud offers a managed service with a free sandbox tier. Connect with:
client = weaviate.connect_to_weaviate_cloud(
cluster_url="https://your-cluster.weaviate.network",
auth_credentials=weaviate.classes.init.Auth.api_key("YOUR_KEY"),
)
Full documentation at weaviate.io/developers/weaviate.