Raw Embeddings

The RawEmbeddingsIndex is an embedding index without an encoder. You supply the vectors yourself — from an external service, a domain-specific model, a notebook, or whatever produced them — and the index stores them in HNSW and serves nearest-neighbor queries.

Use it when:

You already have embeddings for your data and don't want the workspace to recompute them.
You need a model the workspace doesn't ship (e.g. a fine-tuned domain model, a multi-modal encoder, an internal proprietary embedder).
You want to compare/AB different embedding sources side by side — register one RawEmbeddingsIndex per source and query each.

The class lives in Mosaik.GraphDB.Indexes.RawEmbeddingsIndex. It implements ITextSimilarityIndex (via an optional related text encoder) and ISimilarityIndex.

How it differs from the other indexes

	Sentence Embeddings	Page Space	Raw Embeddings
Where vectors come from	Built-in / external encoder runs the text field	StarSpace training over the graph	You provide them.
Auto-indexed on commit?	Yes (encodes the field)	Predicted from PageSpace model	No — you push vectors
Searchable by text?	Yes	No	Only if you wire a related text encoder (`RelatedTextEncoderIndex`)

Registering the index

var settings = new SettingsHolder();
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.Binary),         "False");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.Buckets),        "1");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.WithIndexes),    "False");
settings.ManuallySet(nameof(RawEmbeddingsIndexOptions.EnableAISearch), "True");

var index = await Graph.Indexes.AddRawEmbeddingsIndexAsync(
    nodeType:  N.Product.Type,
    fieldName: "DomainEmbedding",   // logical name — vectors aren't read from this field
    setting:   settings);

fieldName here is a label that lets you distinguish multiple raw indexes on the same node type. It does not read a property off the node — the workspace doesn't know how your vectors were produced.

Options reference

Option	Type	Default	Effect
`Binary`	bool	false	Store as 1-bit vectors (smaller, faster, less accurate).
`Buckets`	int	1	Shard HNSW across N buckets. Use >1 for very large corpora.
`WithIndexes`	bool	false	Allow multiple vectors per node (chunk-like). Required if you call the `UIDandVectorAndIndex` overload of `AddVectors`.
`EnableAISearch`	bool	false	Allow the search controller to inject these vectors into hybrid search results.
`InjectResultCutoff`	float	—	Min cosine similarity to enter the result set during AI Search.
`RerankResultCutoff`	float	—	Min similarity to reorder an existing BM25 hit during AI Search.
`ResultsToExpand`	int	—	Top-N pulled from HNSW before cutoffs.
`RelatedTextEncoderIndex`	`UID64`	default	UID of a `SentenceEmbeddingsIndex` whose encoder will be reused to embed query text — lets you serve text-to-vector lookups against this index even though you supplied the vectors yourself.
`RestrictToAccessGroup`	`UID128`	default	If set, the index is only queried for users in this access group (cheap row-level filtering).

Feeding vectors in

You have two channels to push vectors into the index.

From inside the workspace (C#)

Inside a custom endpoint, scheduled task, code index, or shell you can call RawEmbeddingsIndex.AddVectors directly:

var index = Graph.Indexes
    .OfType<RawEmbeddingsIndex>(N.Product.Type)
    .First(i => i.FieldName == "DomainEmbedding");

// Compute vectors however you like.
var vectors = new List<(UIDandVector uidAndVector, float[] vector)>();

foreach (var productNode in Q().StartAt(N.Product.Type).AsEnumerable())
{
    float[] v = await MyEncoderClient.EncodeAsync(productNode.GetString(N.Product.Description));

    vectors.Add((
        UIDandVector.Initialize(productNode.UID, v, binary: false),
        v
    ));
}

await index.AddVectors(vectors);

If you configured WithIndexes = true, use the UIDandVectorAndIndex overload to push multiple vectors per node (e.g. one per chunk of a document):

var chunked = new List<(UIDandVectorAndIndex uidAndVectorAndIndex, float[] vector)>();

for (ushort i = 0; i < chunks.Length; i++)
{
    var v = await MyEncoderClient.EncodeAsync(chunks[i]);
    chunked.Add((
        UIDandVectorAndIndex.Initialize(productNode.UID, i, v, binary: false),
        v
    ));
}

await index.AddVectors(chunked);

From outside the workspace (`Curiosity.Library`)

For ingestion from an external connector or notebook, push vectors over the Library HTTP API. The endpoint is Endpoints.Library.AddEmbeddingsToIndex (POST /library/add-embeddings?indexUID=…) and accepts a NodeAndVector[] body:

// In a connector built with Curiosity.Library
public async Task IndexExternalEmbeddingsAsync()
{
    var node = new NodeAndVector
    {
        T   = N.Product.Type,
        K   = "P-12345",
        Vector = await MyEncoder.EncodeAsync(productText)
    };

    await library.AddEmbeddingsToIndexAsync(rawIndexUID, new[] { node });
}

The workspace queues each batch on the connector queue and forwards it to RawEmbeddingsIndex.AddVectors, applying connector tracking and concurrency control. See Data Connectors for the connector lifecycle.

Consuming vectors

Once vectors are in, the index behaves exactly like the other similarity indexes. Narrow to the specific index by UID when you want to consume only these vectors:

var indexUID = Graph.Indexes
    .OfType<RawEmbeddingsIndex>(N.Product.Type)
    .First(i => i.FieldName == "DomainEmbedding")
    .UID;

// From a seed UID:
var neighbors = Q().StartAt(productUID)
                   .Similar(indexUID: indexUID, count: 20)
                   .EmitWithScores();

// From an external vector (e.g. encoded user query):
float[] queryVector = await MyEncoderClient.EncodeAsync(userQuery);

var index = Graph.Indexes
    .OfType<RawEmbeddingsIndex>(indexUID)
    .First();

var hits = index.FindSimilar(
    userUID:  CurrentUser,
    vector:   queryVector,
    count:    20);

return Q().StartAt(hits).EmitWithScores();

If you set RelatedTextEncoderIndex on the options, you can also do StartAtSimilarTextAsync(text, …, indexUID: yourRawIndexUID) — the workspace will use the related sentence index's encoder to embed the text before searching your raw vectors.

Raw Embeddings

How it differs from the other indexes

Registering the index

Options reference

Feeding vectors in

From inside the workspace (C#)

From outside the workspace (`Curiosity.Library`)

Consuming vectors

See also

Referenced by

Raw Embeddings

How it differs from the other indexes

Registering the index

Options reference

Feeding vectors in

From inside the workspace (C#)

From outside the workspace (Curiosity.Library)

Consuming vectors

See also

Referenced by

From outside the workspace (`Curiosity.Library`)