← Testing in the Age of AI · domain deep-dive
Testing Rule-Based Format Converters
Format conversion software — rule-driven transformation of data from one format into another — fails in characteristic ways: silent data loss, corner-case inputs, and environment-dependent output. This is a worked playbook for verifying it, applying the five-layer verification model to one demanding domain.
round-trip testingmetamorphic testingdifferential testingcorner-case generationrule coverageenvironment matrix
Beat the oracle problem
When you cannot write down the right answer, verify behavior through relationships that must hold regardless of the specific output.
Round-trip and inverse properties
If the conversion has an inverse (B→A), then A→B→A must return the original input, modulo normalization. Even where no exact inverse exists, the round trip should preserve semantic content. Add idempotence: converting an already-converted document (B→B) should be the identity. These properties hold for every valid input, so you can check them on thousands of generated documents without ever specifying an expected output — the single highest-leverage technique for converters. Drive the input generation with a property-based testing framework: Hypothesis, fast-check.
Differential testing against an oracle
Run your converter and an independent one over the same corpus and diff the output. Three useful oracles: a competing or reference implementation (catches genuine rule bugs); the previous version of your own converter (a free regression oracle on every release); and a simpler, slower, obviously-correct implementation written purely for testing. Any divergence is a finding — either a regression or a deliberate change that should be recorded.
Metamorphic relations
When you cannot know the exact output, assert relations that must hold between related conversions: adding an irrelevant element to the input changes only the corresponding region of the output; reordering independent elements reorders the output correspondingly; conversion composes (A→C equals A→B→C where both paths exist); a comment or no-op in A leaves B's data content unchanged. Each relation is a test that needs no expected value — only the rule that ties two runs together.
Validate every output independently
Every output must pass a strict, independent validator for format B — never your own writer module, which shares your blind spots. Round-trip the output back through an independent strict reader. This catches the common failure where a converter emits plausible-looking but technically invalid B.
Generate corner cases — don't list them
"A lot of data corner cases" is the signal to stop hand-enumerating them. Use three complementary generators, then make the remaining checklist explicit.
Schema- and grammar-driven generation
If format A has a schema or grammar — JSON Schema, XSD, an ABNF/EBNF grammar, a DTD — generate inputs from it rather than writing examples by hand. Property-based frameworks can drive this directly, producing documents that are valid but rare. That is exactly the region of the input space manual test design misses.
Fuzz the parser
The input parser must never crash, hang, or silently corrupt on malformed input — it must produce a precise, located error. Run coverage-guided fuzzers (AFL++, libFuzzer, continuously via OSS-Fuzz) and grammar-aware fuzzers — which mutate otherwise-valid documents and so reach deep transformation logic rather than bouncing off the parser. Differential fuzzing feeds the same fuzzed inputs to two implementations and diffs the results.
Keep a permanent regression corpus
Maintain a curated corpus of real-world inputs, and make one rule absolute: every bug ever found becomes a permanent corpus entry — the exact file that triggered it, kept forever. Pair it with approval testing (a stored snapshot of the output, with diffs human-reviewed when they change). Over time this corpus becomes the most valuable test asset you own.
Make the corner-case taxonomy explicit
Turn the implicit "weird input" space into an explicit checklist and a test-matrix axis. The dimensions that recur in format conversion:
| Dimension | Cases to cover |
|---|---|
| Character encoding | UTF-8 / 16 / 32, BOM vs none, Latin-1, mixed encodings, invalid byte sequences |
| Line endings | LF, CRLF, CR, and mixed within a single file |
| Emptiness | Zero bytes, whitespace-only, empty elements or records, missing optional fields |
| Size & structure | 1 byte, exactly buffer-size, very large inputs, maximum nesting depth, recursion |
| Unicode | Normalization forms (NFC/NFD), combining marks, surrogate pairs, RTL, zero-width and control characters |
| Escaping | The format's own delimiters and reserved characters appearing inside data values |
| Numbers & locale | Decimal and thousands separators, leading zeros, exponent forms, NaN / Infinity, integer overflow |
| Dates & times | Timezones, DST transitions, leap years and seconds, ambiguous formats |
| Structural quirks | Duplicate keys, element ordering, optional vs required, deeply nested constructs |
Assert conservation invariants
A converter's worst failure is silent data loss. Assert what must be conserved across a conversion: record and element counts, the set of identifiers or keys, sums of numeric fields, word counts. Make every intentional drop explicit — emit a dropped-items report — and test that the report accounts for everything the output omits. The contract: every input element either maps into the output or appears in the drop report; nothing vanishes silently.
Test the rules, not just the runs
Because the converter is rule-based, the suite has to exercise the ruleset itself — and rule interactions, where most converter bugs actually live.
- Rule coverage. Track which rules actually fired during the suite, separately from line coverage. A rule that never fired is untested, however green the build looks.
- Combinatorial / pairwise testing. Rules interact, and the interaction space is too large to cover exhaustively. Pairwise tools — Microsoft PICT, NIST ACTS — generate a small set of input-feature combinations that covers every pair (or triple) of feature values, catching most interaction bugs cheaply.
- Mutation testing on the ruleset. Mutate the rules — change a condition, a precedence, a mapping target — and confirm the suite catches it. This is the only reliable measure that your tests exercise rule logic with real assertions rather than merely running it. See Layer 3 of the parent page for mutation-testing tools.
- Rule-conflict and precedence tests. Construct inputs where several rules are eligible at once and assert the resolution behavior explicitly. This boundary is where converters most often surprise their authors.
Environment corner cases
Conversion output that depends on the host environment is a defect waiting for a different machine to surface it. Address it from both ends.
Design the environment out
Never rely on platform defaults. Set input and output encoding, locale, timezone, decimal handling, and line-ending policy explicitly at the boundary. Read environment and configuration once, normalize it, and pass typed values inward — the same "parse, don't validate" discipline applied to the environment itself.
Run a hostile-environment matrix in CI
Containerize tests for a deterministic baseline, then add CI jobs that deliberately set hostile values. The classics that break format converters:
- Turkish locale (
tr_TR) — its dotted / dotless-i casing rules silently break case-insensitive string matching inside rules. - A timezone on a DST boundary, and
TZvalues far from the build host — exposes off-by-one-hour and ambiguous-time bugs in date conversion. - A non-UTF-8 default encoding — exposes every place the code quietly relied on the platform default.
- Non-ASCII filesystem paths and CRLF-default platforms.
Pin dependency versions and matrix-test the libraries that drive parsing, encoding, and Unicode behavior — ICU and the XML/JSON stacks among them. A converter's output can shift with a library upgrade alone.
Compare outputs the right way
If format B permits more than one valid representation of the same content — insignificant whitespace, attribute or key order, optional quoting — a byte-exact comparison produces false failures. Canonicalize both sides before diffing (Canonical XML, sorted-key JSON) or compare at the parsed-tree / semantic level. A suite that cries wolf gets its failures ignored; normalization-aware comparison is what keeps it trusted.
Recommended approach for a format converter
Assembled into a working order of priority:
- Make round-trip and idempotence properties the backbone of the suite, driven by schema-generated inputs.
- Run differential testing against the previous release on every build.
- Keep a permanent regression corpus — every bug found adds a file to it forever.
- Validate every output with an independent strict validator for format B.
- Assert conservation invariants; make every dropped element explicit and reported.
- Track rule coverage, and use pairwise generation plus rule mutation for rule-interaction tests.
- Design the environment out, then matrix-test hostile environments in CI.
- Compare outputs normalization-aware — never byte-exact for a format with free representation.