XML / EDI Basics

Schematron validation

Thanks to the European eInvoicing standard (EN16931) and OpenPEPPOL, Schematron validation has become a well-known topic, especially in eProcurement.

Schema validation is a bad quality metric

Traditionally senders have been able to check their XML documents using schema validation only. If the XML document has passed schema validation, it has been good enough to be sent forward.

Schema validation covers message structure and some content constraints. But it cannot express conditional and integrity requirements. In addition, message schemas usually aim to cover multiple use cases and are composed of reusable components. These are good design practices, but schemas become very generic: only a handful of mandatory elements, no length constraints, no code list constraints, and no pattern constraints.

Schematron enables rule-based validation

Schematron is a rule-based validation language for transforming business and using case-specific requirements into technical validation rules. Schematron helps to tighten and narrow down message structure by stating structures that should or should not be used. It also supports integrity requirements like sum checks, date comparisons, dependent elements (either-or, if-then, one-of, all-or-none, etc.), and conditional value requirements. Schematron should be used as an additional validation layer on top of schema validation.

Why should Schematron validation be used?

Country, industry, and company-specific requirements are usually published as an exhausting PDF document called message implementation guideline (MIG). It's definitely needed, but it leaves a lot of room for interpretation. Each and every person assumes that they have constructed a standard-compliant implementation, but in the end, implementations will not interoperate. Schematron can be used to convert documented requirements to technical validation rules, which effectively sets minimum requirements for all standard-compliant implementations.

Integrity and content requirements are often checked only when data is read to a receiving system. When issues are found, an XML document is moved to the exception handling process, where either missing/invalid data is completed manually, or an error notice is returned to a sender. This process wastes time and money. In addition, most error notices generated by receiving systems are not clear enough to identify and locate an issue from an XML document. If a sender is not sure how to fix the issue, even more time and money will be wasted when the help desk gets involved.

Benefits of using Schematron

Schematron can be used to publish integrity and content requirements in a technical format which makes it possible to apply detailed XML validation automatically by a sender. This makes sure that most invalid XML documents will never reach the receiver's system, minimizing the need for mutual exception handling.

Why has Schematron not been popular before? Most of all, it's due to unawareness, but you may also hear bad excuses like "Schematron cannot cover all requirements because some checks need to be run against a back-end system, and that's why we are not using it." So what? Wouldn't it be great if 80% of the requirements could be checked already by a sender's system? Most issues would be solely solved by a sender who would get instant and detailed feedback and could quickly iterate to make sure that the issue has been fixed.

What's lacking from Schematron

Schematron validation generates a technical output, just like schema validation. Schematron report is a long XML document listing which checks were applied and which errors were found. Each error notice contains a requirement as a clear text, XPath of an invalid element, and the validation rule's technical implementation as a bunch of XPath "code". Without special attention to the implementation of calculation rules, such as sum checks, one cannot easily decide whether an issue is related to rounding or something else. Schematron error doesn't either contain a line number for an invalid element. In the case of a bigger XML document, it is like seeking a needle in a haystack.

Schematron validation fits perfectly with automated content validation when one needs to check that an XML document complies with requirements. For locating and fixing issues, most will find its output too harsh and technical as such.

Truugo + Schematron - perfect together

Specify your business rules as a Schematron file instead of embedding your requirement into your system code. Then you can use the Schematron file in your production flow and in Truugo.

Truugo makes it efficient to locate and fix issues instantly: Each error notice is visualized and includes a link to an invalid element. Complicated checks can be split into parts, for example, to inform users about a gap in a sum check.

Truugo's self-service model speeds up deployments by providing an easy user interface and test output to support efficient error detection. In automated scenarios, Truugo Validation API ensures that a detailed test report is always seconds away. In addition, The Schematron API can be used for the plain Schematron validation.

Need to create your own Schematron file and wondering what would be the best way to get started? Please contact us!

Next article: Electronic Data Interchange (EDI)