Excel Connector
Ingest rows from an Excel workbook (.xlsx) using ClosedXML — the highest-fidelity managed library for reading and writing the Open XML format. For read-only ingest of huge sheets, ExcelDataReader is faster; both are covered below.
Packages
Curiosity.Library on NuGet ClosedXML on NuGet
dotnet add package Curiosity.Library
dotnet add package ClosedXML
If you only need to read very large sheets and don't care about formulas, conditional formatting, or styles, swap ClosedXML for ExcelDataReader + ExcelDataReader.DataSet.
Expected source shape
A worksheet named Products with a header row in row 1:
| sku | name | category | price | updated_at |
|---|---|---|---|---|
| SKU-01 | Widget Mini | Hardware | 4.99 | 2025-10-01T0000Z |
| SKU-02 | Bracket Pack | Hardware | 12.50 | 2025-10-05T0000Z |
| SKU-03 | Foo Cleaner | Consumable | 8.00 | 2025-10-12T0000Z |
Connector code (ClosedXML)
using ClosedXML.Excel;
using Curiosity.Library;
[Node]
public class Product
{
[Key] public string Sku { get; set; }
[Property] public string Name { get; set; }
[Property] public string Category { get; set; }
[Property] public double Price { get; set; }
[Timestamp] public DateTimeOffset UpdatedAt { get; set; }
}
using var graph = Graph.Connect(
endpoint: Environment.GetEnvironmentVariable("CURIOSITY_ENDPOINT")!,
token: Environment.GetEnvironmentVariable("CURIOSITY_TOKEN")!,
connectorName: "excel-products");
await graph.CreateNodeSchemaAsync<Product>();
graph.SetAutoCommitCost(everyNodes: 10_000);
var path = args.Length > 0 ? args[0] : "products.xlsx";
using var workbook = new XLWorkbook(path);
var worksheet = workbook.Worksheet("Products");
// Map header text -> column number (1-based).
var headerRow = worksheet.Row(1);
var headers = headerRow.Cells(1, worksheet.LastColumnUsed().ColumnNumber())
.ToDictionary(c => c.GetString().Trim().ToLowerInvariant(), c => c.Address.ColumnNumber);
var ingested = 0;
foreach (var row in worksheet.RowsUsed().Skip(1)) // skip header
{
var product = new Product
{
Sku = row.Cell(headers["sku"]).GetString().Trim(),
Name = row.Cell(headers["name"]).GetString(),
Category = row.Cell(headers["category"]).GetString(),
Price = row.Cell(headers["price"]).GetDouble(),
UpdatedAt = new DateTimeOffset(row.Cell(headers["updated_at"]).GetDateTime(), TimeSpan.Zero),
};
if (string.IsNullOrWhiteSpace(product.Sku)) continue;
graph.AddOrUpdate(product);
ingested++;
}
await graph.CommitPendingAsync();
Console.WriteLine($"Ingested {ingested} products from {path}");
How it works
worksheet.RowsUsed() skips fully-empty rows, which is what you usually want — Excel sheets often have phantom blank rows at the end. The header-to-column-number dictionary decouples the connector from column ordering: rearrange columns in the workbook and the code keeps working as long as the headers stay the same.
ClosedXML uses IXLCell.GetString / GetDouble / GetDateTime for typed reads — if a cell contains a formula, you get the computed value, not the formula text. For raw values use cell.Value.ToString().
Fast path: ExcelDataReader for huge sheets
ClosedXML loads the whole workbook into a typed DOM. For sheets with millions of rows that's wasteful — ExcelDataReader streams.
ExcelDataReader on NuGet ExcelDataReader.DataSet on NuGet
using ExcelDataReader;
using System.Text;
// Required on .NET Core / .NET 5+ for some legacy code pages.
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
using var stream = File.OpenRead(path);
using var reader = ExcelReaderFactory.CreateReader(stream); // auto-detects .xls / .xlsx
// Move past the header row.
reader.Read();
while (reader.Read())
{
var sku = reader.GetString(0)?.Trim();
if (string.IsNullOrEmpty(sku)) continue;
graph.AddOrUpdate(new Product
{
Sku = sku,
Name = reader.GetString(1),
Category = reader.GetString(2),
Price = reader.GetDouble(3),
UpdatedAt = new DateTimeOffset(reader.GetDateTime(4), TimeSpan.Zero),
});
}
This is positional (reader.GetString(0), 1, …) — there's no convenient header mapping out of the box. Use IExcelDataReader's field count once on the header row if you need to handle arbitrary column orders.
Notes & pitfalls
Cell.GetString()vsCell.Value.ToString().GetString()formats the value the same way Excel displays it (respecting cell format).Value.ToString()ignores formatting and returns the raw stored value. For text columns either works; for dates and numbers prefer the typed accessors (GetDateTime,GetDouble).- Multiple sheets. Loop
workbook.Worksheetsif the file has a sheet per category/region/year. Use the sheet name as part of the[Key]to keep keys stable across sheets. - Merged cells. ClosedXML returns the value only on the top-left cell of a merged range; other cells in the range return empty. If your source uses merging for visual grouping, flatten it before iterating.
- Date interpretation. Excel stores dates as floating-point days since 1900-01-01 (with the famous 1900 leap-year bug).
GetDateTime()handles it correctly. If you read withGetDouble()by accident, you'll get the serial number — not what you want. .xlsvs.xlsx. ClosedXML only reads.xlsx. ExcelDataReader reads both. If you need to ingest legacy.xls, go with ExcelDataReader.- Locking on Windows. ClosedXML opens the file with shared read by default; ExcelDataReader needs an explicit
FileShare.ReadWriteif the file may be open in Excel during ingest.
See also
- Schemas —
[Node],[Key],[Property],[Timestamp]. - Idempotency — what to use as the
[Key]when the spreadsheet has no stable identifier. - CSV connector — when the source is an export of an Excel sheet without the formatting metadata.
- ClosedXML on GitHub · ExcelDataReader on GitHub