XML Connector
Stream XML documents (RSS feeds, SOAP exports, legacy data dumps) into the graph using System.Xml.XmlReader from the .NET BCL — no third-party package required. For small documents the example also shows the LINQ-to-XML alternative (XDocument), which is more convenient when memory isn't a concern.
Packages
System.Xml is part of the BCL — no additional package needed.
dotnet add package Curiosity.Library
Expected source shape
A flat list of <article> elements inside a root:
<?xml version="1.0" encoding="utf-8"?>
<feed>
<article id="A-1001">
<title>Battery life improvements</title>
<author>Alice Anders</author>
<publishedAt>2025-10-21T09:00:00Z</publishedAt>
<body>...</body>
</article>
<article id="A-1002">
<title>Q4 roadmap</title>
<author>Bob Boniface</author>
<publishedAt>2025-10-22T14:30:00Z</publishedAt>
<body>...</body>
</article>
</feed>
Connector code — streaming with XmlReader
using System.Xml;
using Curiosity.Library;
[Node]
public class Article
{
[Key] public string Id { get; set; }
[Property] public string Title { get; set; }
[Property] public string Author { get; set; }
[Property] public string Body { get; set; }
[Timestamp] public DateTimeOffset PublishedAt { get; set; }
}
using var graph = Graph.Connect(
endpoint: Environment.GetEnvironmentVariable("CURIOSITY_ENDPOINT")!,
token: Environment.GetEnvironmentVariable("CURIOSITY_TOKEN")!,
connectorName: "xml-articles");
await graph.CreateNodeSchemaAsync<Article>();
graph.SetAutoCommitCost(everyNodes: 10_000);
var path = args.Length > 0 ? args[0] : "feed.xml";
var settings = new XmlReaderSettings
{
Async = true,
IgnoreWhitespace = true,
IgnoreComments = true,
DtdProcessing = DtdProcessing.Ignore, // safety: never resolve external DTDs
};
using var reader = XmlReader.Create(path, settings);
var ingested = 0;
while (await reader.ReadAsync())
{
if (reader.NodeType != XmlNodeType.Element || reader.Name != "article") continue;
var article = new Article
{
Id = reader.GetAttribute("id"),
};
// Read children of <article>.
using var subtree = reader.ReadSubtree();
while (await subtree.ReadAsync())
{
if (subtree.NodeType != XmlNodeType.Element) continue;
switch (subtree.Name)
{
case "title": article.Title = await subtree.ReadElementContentAsStringAsync(); break;
case "author": article.Author = await subtree.ReadElementContentAsStringAsync(); break;
case "body": article.Body = await subtree.ReadElementContentAsStringAsync(); break;
case "publishedAt":
var raw = await subtree.ReadElementContentAsStringAsync();
article.PublishedAt = DateTimeOffset.Parse(raw);
break;
}
}
if (string.IsNullOrEmpty(article.Id)) continue;
graph.AddOrUpdate(article);
ingested++;
}
await graph.CommitPendingAsync();
Console.WriteLine($"Ingested {ingested} articles from {path}");
How it works
XmlReader is a forward-only cursor over the XML stream — only the current node is in memory. ReadSubtree() returns a reader scoped to the current element and its descendants, so the inner while loop walks children of <article> without leaving the element. After the inner loop, the outer reader is positioned just past </article>, ready for the next sibling.
DtdProcessing = DtdProcessing.Ignore is a security hardening step: it stops the reader from resolving external DTD references (the XXE attack vector). Never leave this at the default Parse for untrusted input.
Small-file alternative — XDocument
For files small enough to fit comfortably in memory, LINQ-to-XML is much more readable:
using System.Xml.Linq;
var doc = XDocument.Load(path);
foreach (var el in doc.Root!.Elements("article"))
{
graph.AddOrUpdate(new Article
{
Id = (string) el.Attribute("id"),
Title = (string) el.Element("title"),
Author = (string) el.Element("author"),
Body = (string) el.Element("body"),
PublishedAt = (DateTimeOffset) el.Element("publishedAt"),
});
}
await graph.CommitPendingAsync();
The explicit casts (string) element / (DateTimeOffset) element handle missing children by returning null / throwing — choose your operator based on whether the field is optional.
Use XDocument when the file is under ~100 MB. Past that, switch back to XmlReader to avoid loading the whole DOM.
Namespace-aware reading
Real-world feeds (RSS, Atom, OData) use XML namespaces. With XmlReader check LocalName instead of Name, and NamespaceURI to disambiguate. With XDocument use XName.Get("article", "http://example.com/feed") or the XNamespace + "article" idiom.
XNamespace ns = "http://example.com/feed";
foreach (var el in doc.Root!.Elements(ns + "article"))
{
var title = (string) el.Element(ns + "title");
// ...
}
Notes & pitfalls
- XXE. Always set
DtdProcessing = DtdProcessing.Ignore(orProhibit) for untrusted XML. The default opens the door to denial-of-service and file-disclosure attacks. - Large files + LINQ.
XDocument.Loadloads the whole tree. A 2 GB XML file is a 10+ GB DOM. UseXmlReaderinstead. - Mixed content. If an element contains text and child elements (
<p>Hello <b>world</b></p>),ReadElementContentAsStringthrows. UseReadInnerXmlto get the raw content or implement a recursive walk. - Encoding declarations.
XmlReader.Create(path, ...)reads the encoding from the XML prolog. If you've already opened the file as aStreamReaderwith the wrong encoding, the BOM mismatch will cause silent character corruption — letXmlReaderhandle the file directly. - CDATA. Treated as text content.
ReadElementContentAsStringstrips the<![CDATA[…]]>wrapper for you.
See also
- Schemas —
[Node],[Key],[Property],[Timestamp]. - JSON connector — when the feed is JSON instead of XML.
- Microsoft docs: XmlReader — full API reference.