Curiosity

S3 recipe

Source: S3Sample/ · S3 bucket prefixes containing one JSON document per object. Falls back to a local directory when no AWS credentials are configured.

Owns in the academic graph: subjects, topics, books, authors — the course-material side of the picture.

What it teaches

  • An object-store abstraction (IObjectStore) with ListAsync(prefix) and GetObjectAsync semantics, implemented twice — once for S3, once for a local filesystem.
  • Prefix-based document organization — different prefixes hold different types (subjects/, books/).
  • Cross-reference by shared key in memory — load subjects/ and books/, then link them by ISBN.

Cross-reference by key

public static async Task IngestAsync(Graph graph, IObjectStore store)
{
    var subjects    = await ReadAllAsync<SubjectDoc>(store, "subjects/");
    var books       = await ReadAllAsync<BookDoc>(store, "books/");
    var booksByIsbn = books.ToDictionary(b => b.Isbn, b => b);

    foreach (var s in subjects)
    {
        var subject = graph.AddOrUpdate(new Nodes.Subject
        {
            Name        = s.Name,
            Level       = s.Level,
            Description = s.Description,
        });

        foreach (var isbn in s.BookIsbns)
        {
            if (!booksByIsbn.TryGetValue(isbn, out var book)) continue;
            var bookNode = graph.AddOrUpdate(new Nodes.Book
            {
                Isbn  = book.Isbn,
                Title = book.Title,
                Year  = book.Year,
            });
            graph.Link(subject, bookNode, Edges.RecommendsBook, Edges.RecommendedFor);
        }
    }
}

Configuration

Variable Purpose Default
RECIPE_S3_BUCKET S3 bucket name (blank → local mode) (blank)
RECIPE_S3_REGION AWS region us-east-1
RECIPE_LOCAL_ROOT Local fallback root data/

S3 credentials are picked up from the standard AWS chain (env vars, IAM role, ~/.aws/credentials).

Reuse notes

  • Swap the IObjectStore implementation, not the ingest method, when moving between S3, Azure Blob, or GCS.
  • The in-memory join pattern is the right move whenever both sides fit in memory; for very large objects, stream and link in passes.