← Testing in the Age of AI · domain deep-dive

Testing Rule-Based Format Converters

Format conversion software — rule-driven transformation of data from one format into another — fails in characteristic ways: silent data loss, corner-case inputs, and environment-dependent output. This is a worked playbook for verifying it, applying the five-layer verification model to one demanding domain.

round-trip testingmetamorphic testingdifferential testingcorner-case generationrule coverageenvironment matrix

The defining problem: a format converter takes input in format A, applies a ruleset, and emits format B. The hard part of testing it is the oracle problem — for an unusual input, you often cannot state the correct output in advance. Most of the strategies below exist to verify a conversion without a hand-written expected output for every case.

Beat the oracle problem

When you cannot write down the right answer, verify behavior through relationships that must hold regardless of the specific output.

Round-trip and inverse properties

If the conversion has an inverse (B→A), then A→B→A must return the original input, modulo normalization. Even where no exact inverse exists, the round trip should preserve semantic content. Add idempotence: converting an already-converted document (B→B) should be the identity. These properties hold for every valid input, so you can check them on thousands of generated documents without ever specifying an expected output — the single highest-leverage technique for converters. Drive the input generation with a property-based testing framework: Hypothesis, fast-check.

Differential testing against an oracle

Run your converter and an independent one over the same corpus and diff the output. Three useful oracles: a competing or reference implementation (catches genuine rule bugs); the previous version of your own converter (a free regression oracle on every release); and a simpler, slower, obviously-correct implementation written purely for testing. Any divergence is a finding — either a regression or a deliberate change that should be recorded.

Metamorphic relations

When you cannot know the exact output, assert relations that must hold between related conversions: adding an irrelevant element to the input changes only the corresponding region of the output; reordering independent elements reorders the output correspondingly; conversion composes (A→C equals A→B→C where both paths exist); a comment or no-op in A leaves B's data content unchanged. Each relation is a test that needs no expected value — only the rule that ties two runs together.

Validate every output independently

Every output must pass a strict, independent validator for format B — never your own writer module, which shares your blind spots. Round-trip the output back through an independent strict reader. This catches the common failure where a converter emits plausible-looking but technically invalid B.

Generate corner cases — don't list them

"A lot of data corner cases" is the signal to stop hand-enumerating them. Use three complementary generators, then make the remaining checklist explicit.

Schema- and grammar-driven generation

If format A has a schema or grammar — JSON Schema, XSD, an ABNF/EBNF grammar, a DTD — generate inputs from it rather than writing examples by hand. Property-based frameworks can drive this directly, producing documents that are valid but rare. That is exactly the region of the input space manual test design misses.

Fuzz the parser

The input parser must never crash, hang, or silently corrupt on malformed input — it must produce a precise, located error. Run coverage-guided fuzzers (AFL++, libFuzzer, continuously via OSS-Fuzz) and grammar-aware fuzzers — which mutate otherwise-valid documents and so reach deep transformation logic rather than bouncing off the parser. Differential fuzzing feeds the same fuzzed inputs to two implementations and diffs the results.

Keep a permanent regression corpus

Maintain a curated corpus of real-world inputs, and make one rule absolute: every bug ever found becomes a permanent corpus entry — the exact file that triggered it, kept forever. Pair it with approval testing (a stored snapshot of the output, with diffs human-reviewed when they change). Over time this corpus becomes the most valuable test asset you own.

Make the corner-case taxonomy explicit

Turn the implicit "weird input" space into an explicit checklist and a test-matrix axis. The dimensions that recur in format conversion:

Dimension	Cases to cover
Character encoding	UTF-8 / 16 / 32, BOM vs none, Latin-1, mixed encodings, invalid byte sequences
Line endings	LF, CRLF, CR, and mixed within a single file
Emptiness	Zero bytes, whitespace-only, empty elements or records, missing optional fields
Size & structure	1 byte, exactly buffer-size, very large inputs, maximum nesting depth, recursion
Unicode	Normalization forms (NFC/NFD), combining marks, surrogate pairs, RTL, zero-width and control characters
Escaping	The format's own delimiters and reserved characters appearing inside data values
Numbers & locale	Decimal and thousands separators, leading zeros, exponent forms, NaN / Infinity, integer overflow
Dates & times	Timezones, DST transitions, leap years and seconds, ambiguous formats
Structural quirks	Duplicate keys, element ordering, optional vs required, deeply nested constructs

Assert conservation invariants

A converter's worst failure is silent data loss. Assert what must be conserved across a conversion: record and element counts, the set of identifiers or keys, sums of numeric fields, word counts. Make every intentional drop explicit — emit a dropped-items report — and test that the report accounts for everything the output omits. The contract: every input element either maps into the output or appears in the drop report; nothing vanishes silently.

Test the rules, not just the runs

Because the converter is rule-based, the suite has to exercise the ruleset itself — and rule interactions, where most converter bugs actually live.

Rule coverage. Track which rules actually fired during the suite, separately from line coverage. A rule that never fired is untested, however green the build looks.
Combinatorial / pairwise testing. Rules interact, and the interaction space is too large to cover exhaustively. Pairwise tools — Microsoft PICT, NIST ACTS — generate a small set of input-feature combinations that covers every pair (or triple) of feature values, catching most interaction bugs cheaply.
Mutation testing on the ruleset. Mutate the rules — change a condition, a precedence, a mapping target — and confirm the suite catches it. This is the only reliable measure that your tests exercise rule logic with real assertions rather than merely running it. See Layer 3 of the parent page for mutation-testing tools.
Rule-conflict and precedence tests. Construct inputs where several rules are eligible at once and assert the resolution behavior explicitly. This boundary is where converters most often surprise their authors.

Environment corner cases

Conversion output that depends on the host environment is a defect waiting for a different machine to surface it. Address it from both ends.

Design the environment out

Never rely on platform defaults. Set input and output encoding, locale, timezone, decimal handling, and line-ending policy explicitly at the boundary. Read environment and configuration once, normalize it, and pass typed values inward — the same "parse, don't validate" discipline applied to the environment itself.

Run a hostile-environment matrix in CI

Containerize tests for a deterministic baseline, then add CI jobs that deliberately set hostile values. The classics that break format converters:

Turkish locale (tr_TR) — its dotted / dotless-i casing rules silently break case-insensitive string matching inside rules.
A timezone on a DST boundary, and TZ values far from the build host — exposes off-by-one-hour and ambiguous-time bugs in date conversion.
A non-UTF-8 default encoding — exposes every place the code quietly relied on the platform default.
Non-ASCII filesystem paths and CRLF-default platforms.

Pin dependency versions and matrix-test the libraries that drive parsing, encoding, and Unicode behavior — ICU and the XML/JSON stacks among them. A converter's output can shift with a library upgrade alone.

Compare outputs the right way

If format B permits more than one valid representation of the same content — insignificant whitespace, attribute or key order, optional quoting — a byte-exact comparison produces false failures. Canonicalize both sides before diffing (Canonical XML, sorted-key JSON) or compare at the parsed-tree / semantic level. A suite that cries wolf gets its failures ignored; normalization-aware comparison is what keeps it trusted.

Recommended approach for a format converter

Assembled into a working order of priority:

Make round-trip and idempotence properties the backbone of the suite, driven by schema-generated inputs.
Run differential testing against the previous release on every build.
Keep a permanent regression corpus — every bug found adds a file to it forever.
Validate every output with an independent strict validator for format B.
Assert conservation invariants; make every dropped element explicit and reported.
Track rule coverage, and use pairwise generation plus rule mutation for rule-interaction tests.
Design the environment out, then matrix-test hostile environments in CI.
Compare outputs normalization-aware — never byte-exact for a format with free representation.

The throughline: a format converter is mostly untestable by example, because the expected output for the awkward cases cannot be written down in advance. Almost every technique here replaces "assert the output equals X" with a property, a relation, an independent oracle, or an invariant — something true of a correct conversion that does not require you to already know the answer.

Continue

← Testing in the Age of AIThe five-layer verification playbook this deep-dive expands on. Tool landscapeAssistants, agents, IDEs, review bots, test tools, and observability.