Raw Embeddings
The RawEmbeddingsIndex is an embedding index without an encoder. You supply the vectors yourself — from an external service, a domain-specific model, a notebook, or whatever produced them — and the index stores them in HNSW and serves nearest-neighbor queries.
Use it when:
- You already have embeddings for your data and don't want the workspace to recompute them.
- You need a model the workspace doesn't ship (e.g. a fine-tuned domain model, a multi-modal encoder, an internal proprietary embedder).
- You want to compare/AB different embedding sources side by side — register one
RawEmbeddingsIndexper source and query each.
The class lives in Mosaik.GraphDB.Indexes.RawEmbeddingsIndex. It implements ITextSimilarityIndex (via an optional related text encoder) and ISimilarityIndex.
How it differs from the other indexes
| Sentence Embeddings | Page Space | Raw Embeddings | |
|---|---|---|---|
| Where vectors come from | Built-in / external encoder runs the text field | StarSpace training over the graph | You provide them. |
| Auto-indexed on commit? | Yes (encodes the field) | Predicted from PageSpace model | No — you push vectors |
| Searchable by text? | Yes | No | Only if you wire a related text encoder (RelatedTextEncoderIndex) |
Registering the index
var settings = new SettingsHolder();
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.Binary), "False");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.Buckets), "1");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.WithIndexes), "False");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.EnableAISearch), "True");
var index = await Graph.Indexes.AddRawEmbeddingsIndexAsync(
nodeType: N.Product.Type,
fieldName: "DomainEmbedding", // logical name — vectors aren't read from this field
setting: settings);
fieldName here is a label that lets you distinguish multiple raw indexes on the same node type. It does not read a property off the node — the workspace doesn't know how your vectors were produced.
Options reference
| Option | Type | Default | Effect |
|---|---|---|---|
Binary |
bool | false | Store as 1-bit vectors (smaller, faster, less accurate). |
Buckets |
int | 1 | Shard HNSW across N buckets. Use >1 for very large corpora. |
WithIndexes |
bool | false | Allow multiple vectors per node (chunk-like). Required if you call the UIDandVectorAndIndex overload of AddVectors. |
EnableAISearch |
bool | false | Allow the search controller to inject these vectors into hybrid search results. |
InjectResultCutoff |
float | — | Min cosine similarity to enter the result set during AI Search. |
RerankResultCutoff |
float | — | Min similarity to reorder an existing BM25 hit during AI Search. |
ResultsToExpand |
int | — | Top-N pulled from HNSW before cutoffs. |
RelatedTextEncoderIndex |
UID64 |
default | UID of a SentenceEmbeddingsIndex whose encoder will be reused to embed query text — lets you serve text-to-vector lookups against this index even though you supplied the vectors yourself. |
RestrictToAccessGroup |
UID128 |
default | If set, the index is only queried for users in this access group (cheap row-level filtering). |
Feeding vectors in
You have two channels to push vectors into the index.
From inside the workspace (C#)
Inside a custom endpoint, scheduled task, code index, or shell you can call RawEmbeddingsIndex.AddVectors directly:
var index = Graph.Indexes
.OfType<RawEmbeddingsIndex>(N.Product.Type)
.First(i => i.FieldName == "DomainEmbedding");
// Compute vectors however you like.
var vectors = new List<(UIDandVector uidAndVector, float[] vector)>();
foreach (var productNode in Q().StartAt(N.Product.Type).AsEnumerable())
{
float[] v = await MyEncoderClient.EncodeAsync(productNode.GetString(N.Product.Description));
vectors.Add((
UIDandVector.Initialize(productNode.UID, v, binary: false),
v
));
}
await index.AddVectors(vectors);
If you configured WithIndexes = true, use the UIDandVectorAndIndex overload to push multiple vectors per node (e.g. one per chunk of a document):
var chunked = new List<(UIDandVectorAndIndex uidAndVectorAndIndex, float[] vector)>();
for (ushort i = 0; i < chunks.Length; i++)
{
var v = await MyEncoderClient.EncodeAsync(chunks[i]);
chunked.Add((
UIDandVectorAndIndex.Initialize(productNode.UID, i, v, binary: false),
v
));
}
await index.AddVectors(chunked);
From outside the workspace (Curiosity.Library)
For ingestion from an external connector or notebook, push vectors over the Library HTTP API. The endpoint is Endpoints.Library.AddEmbeddingsToIndex (POST /library/add-embeddings?indexUID=…) and accepts a NodeAndVector[] body:
// In a connector built with Curiosity.Library
public async Task IndexExternalEmbeddingsAsync()
{
var node = new NodeAndVector
{
T = N.Product.Type,
K = "P-12345",
Vector = await MyEncoder.EncodeAsync(productText)
};
await library.AddEmbeddingsToIndexAsync(rawIndexUID, new[] { node });
}
The workspace queues each batch on the connector queue and forwards it to RawEmbeddingsIndex.AddVectors, applying connector tracking and concurrency control. See Data Connectors for the connector lifecycle.
Consuming vectors
Once vectors are in, the index behaves exactly like the other similarity indexes. Narrow to the specific index by UID when you want to consume only these vectors:
var indexUID = Graph.Indexes
.OfType<RawEmbeddingsIndex>(N.Product.Type)
.First(i => i.FieldName == "DomainEmbedding")
.UID;
// From a seed UID:
var neighbors = Q().StartAt(productUID)
.Similar(indexUID: indexUID, count: 20)
.EmitWithScores();
// From an external vector (e.g. encoded user query):
float[] queryVector = await MyEncoderClient.EncodeAsync(userQuery);
var index = Graph.Indexes
.OfType<RawEmbeddingsIndex>(indexUID)
.First();
var hits = index.FindSimilar(
userUID: CurrentUser,
vector: queryVector,
count: 20);
return Q().StartAt(hits).EmitWithScores();
If you set RelatedTextEncoderIndex on the options, you can also do StartAtSimilarTextAsync(text, …, indexUID: yourRawIndexUID) — the workspace will use the related sentence index's encoder to embed the text before searching your raw vectors.
See also
- Sentence Embeddings — when you want the workspace to do the encoding.
- IQuery Similarity Search — how
Similar()andStartAtSimilarTextAsync()consume any embedding index. - Code Indexes — running C# to compute derived vectors as part of the indexing pipeline.
- Data Connectors — the external-ingestion path that backs
Curiosity.Librarycalls.