Curiosity

Raw Embeddings

The RawEmbeddingsIndex is an embedding index without an encoder. You supply the vectors yourself — from an external service, a domain-specific model, a notebook, or whatever produced them — and the index stores them in HNSW and serves nearest-neighbor queries.

Use it when:

  • You already have embeddings for your data and don't want the workspace to recompute them.
  • You need a model the workspace doesn't ship (e.g. a fine-tuned domain model, a multi-modal encoder, an internal proprietary embedder).
  • You want to compare/AB different embedding sources side by side — register one RawEmbeddingsIndex per source and query each.

The class lives in Mosaik.GraphDB.Indexes.RawEmbeddingsIndex. It implements ITextSimilarityIndex (via an optional related text encoder) and ISimilarityIndex.

How it differs from the other indexes

Sentence Embeddings Page Space Raw Embeddings
Where vectors come from Built-in / external encoder runs the text field StarSpace training over the graph You provide them.
Auto-indexed on commit? Yes (encodes the field) Predicted from PageSpace model No — you push vectors
Searchable by text? Yes No Only if you wire a related text encoder (RelatedTextEncoderIndex)

Registering the index

var settings = new SettingsHolder();
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.Binary),         "False");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.Buckets),        "1");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.WithIndexes),    "False");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.EnableAISearch), "True");

var index = await Graph.Indexes.AddRawEmbeddingsIndexAsync(
    nodeType:  N.Product.Type,
    fieldName: "DomainEmbedding",   // logical name — vectors aren't read from this field
    setting:   settings);

fieldName here is a label that lets you distinguish multiple raw indexes on the same node type. It does not read a property off the node — the workspace doesn't know how your vectors were produced.

Options reference

Option Type Default Effect
Binary bool false Store as 1-bit vectors (smaller, faster, less accurate).
Buckets int 1 Shard HNSW across N buckets. Use >1 for very large corpora.
WithIndexes bool false Allow multiple vectors per node (chunk-like). Required if you call the UIDandVectorAndIndex overload of AddVectors.
EnableAISearch bool false Allow the search controller to inject these vectors into hybrid search results.
InjectResultCutoff float Min cosine similarity to enter the result set during AI Search.
RerankResultCutoff float Min similarity to reorder an existing BM25 hit during AI Search.
ResultsToExpand int Top-N pulled from HNSW before cutoffs.
RelatedTextEncoderIndex UID64 default UID of a SentenceEmbeddingsIndex whose encoder will be reused to embed query text — lets you serve text-to-vector lookups against this index even though you supplied the vectors yourself.
RestrictToAccessGroup UID128 default If set, the index is only queried for users in this access group (cheap row-level filtering).

Feeding vectors in

You have two channels to push vectors into the index.

From inside the workspace (C#)

Inside a custom endpoint, scheduled task, code index, or shell you can call RawEmbeddingsIndex.AddVectors directly:

var index = Graph.Indexes
    .OfType<RawEmbeddingsIndex>(N.Product.Type)
    .First(i => i.FieldName == "DomainEmbedding");

// Compute vectors however you like.
var vectors = new List<(UIDandVector uidAndVector, float[] vector)>();

foreach (var productNode in Q().StartAt(N.Product.Type).AsEnumerable())
{
    float[] v = await MyEncoderClient.EncodeAsync(productNode.GetString(N.Product.Description));

    vectors.Add((
        UIDandVector.Initialize(productNode.UID, v, binary: false),
        v
    ));
}

await index.AddVectors(vectors);

If you configured WithIndexes = true, use the UIDandVectorAndIndex overload to push multiple vectors per node (e.g. one per chunk of a document):

var chunked = new List<(UIDandVectorAndIndex uidAndVectorAndIndex, float[] vector)>();

for (ushort i = 0; i < chunks.Length; i++)
{
    var v = await MyEncoderClient.EncodeAsync(chunks[i]);
    chunked.Add((
        UIDandVectorAndIndex.Initialize(productNode.UID, i, v, binary: false),
        v
    ));
}

await index.AddVectors(chunked);

From outside the workspace (Curiosity.Library)

For ingestion from an external connector or notebook, push vectors over the Library HTTP API. The endpoint is Endpoints.Library.AddEmbeddingsToIndex (POST /library/add-embeddings?indexUID=…) and accepts a NodeAndVector[] body:

// In a connector built with Curiosity.Library
public async Task IndexExternalEmbeddingsAsync()
{
    var node = new NodeAndVector
    {
        T   = N.Product.Type,
        K   = "P-12345",
        Vector = await MyEncoder.EncodeAsync(productText)
    };

    await library.AddEmbeddingsToIndexAsync(rawIndexUID, new[] { node });
}

The workspace queues each batch on the connector queue and forwards it to RawEmbeddingsIndex.AddVectors, applying connector tracking and concurrency control. See Data Connectors for the connector lifecycle.

Consuming vectors

Once vectors are in, the index behaves exactly like the other similarity indexes. Narrow to the specific index by UID when you want to consume only these vectors:

var indexUID = Graph.Indexes
    .OfType<RawEmbeddingsIndex>(N.Product.Type)
    .First(i => i.FieldName == "DomainEmbedding")
    .UID;

// From a seed UID:
var neighbors = Q().StartAt(productUID)
                   .Similar(indexUID: indexUID, count: 20)
                   .EmitWithScores();

// From an external vector (e.g. encoded user query):
float[] queryVector = await MyEncoderClient.EncodeAsync(userQuery);

var index = Graph.Indexes
    .OfType<RawEmbeddingsIndex>(indexUID)
    .First();

var hits = index.FindSimilar(
    userUID:  CurrentUser,
    vector:   queryVector,
    count:    20);

return Q().StartAt(hits).EmitWithScores();

If you set RelatedTextEncoderIndex on the options, you can also do StartAtSimilarTextAsync(text, …, indexUID: yourRawIndexUID) — the workspace will use the related sentence index's encoder to embed the text before searching your raw vectors.

See also

© 2026 Curiosity. All rights reserved.