Parsing XML

libxml2-wasm supports parsing xml from a string or from a buffer:

import fs from 'node:fs';
import { XmlDocument } from 'libxml2-wasm';

const doc1 = XmlDocument.fromString('<note><to>Tove</to></note>');
const doc2 = XmlDocument.fromBuffer(fs.readFileSync('doc.xml'));
doc1.dispose();
doc2.dispose();

The underlying libxml2 library processes MBCS(mostly UTF-8) only, the UTF-16 string in Javascript needs an extra step to be converted, thus XmlDocument.fromBuffer is much faster than XmlDocument.fromString. See the benchmark.

XInclude 1.0 is supported.

When libxml2 find <xi:include> tag and need the content of another XML, it uses the callbacks to read the data.

With xmlRegisterInputProvider, an XmlInputProvider object with a set of 4 callbacks could be registered.

These 4 callbacks are

  • match
  • open
  • read
  • close

First, match will be called with the url of the included XML. if match returns true, the other 3 corresponding callbacks will be used to retrieve the content of the XML; otherwise, other set of callbacks will be considered.

Sometimes the href attribute of the xinclude tag has a relative path. In this case, an initial url could be passed into the parsing function, so that libxml could calculate the actual url of the included XML.

For example, if the href is sub.xml, and the parent XML is parsed in the following call,

const doc = XmlDocument.fromBuffer(
await fs.readFile('/path/to/doc.xml'),
{ url: 'file:///path/to/doc.xml' },
);
doc.dispose();

The registered callbacks will be called with file name file:///path/to/sub.xml.

For Node.js user who need the callbacks for local file access, module nodejs predefines fsInputProviders, which supports file path or file url. To enable it, register this provider, or simply call xmlRegisterFsInputProviders:

import { XmlDocument } from 'libxml2-wasm';
import { xmlRegisterFsInputProviders } from 'libxml2-wasm/lib/nodejs.mjs';

xmlRegisterFsInputProviders();

const doc = XmlDocument.fromBuffer(
await fs.readFile('path/to/doc.xml'),
{ url: 'path/to/doc.xml' },
);
doc.dispose();

Serialize an XML

XmlDocument.toBuffer dumps the content of the XML DOM tree into a buffer gradually, and calls the XmlOutputBufferHandler to process the data.

Note that UTF-8 is the only supported encoding for now.

Based on toBuffer, two more convenience functions are provided: XmlDocument.toString and saveDocSync.

For example, to save an XML to compact string,

xml.toString({ format: false });

To save a formatted XML to file in Node.js environment,

import { saveDocSync } from 'libxml2-wasm/lib/nodejs.mjs';

saveDocSync(xml);