Yamato DaiwaE(CMA)S(cript) extensions

RawObjectDataProcessor — Problem Overview

Working with unknown at advance data coming from outside is one of the basic tasks of programming. Such external data may be:

  • The data retrieved from the client side during client–server interaction
  • Conversely, the data retrieving from the server side during client–server interaction
  • The data retrieved from a database
  • The data read from a file (JSON, YAML, and similar)

Because such data lies outside the reach of TypeScript, it initially has the type unknown or, even worse, any.

How is this situation usually handled? Unfortunately, often not in the way appropriate for writing quality code. When receiving data from the client side in client–server interaction, data validation is regarded as a security measure and therefore is performed, but in many other cases — for example, when receiving data from the server — this data is simply trusted and thus annotated with the desired type:

If you want to ask: "Why we need to validate the data when retrieving it from the server side?", our counter question is: "Why we don't need to validate the data when retrieving it from the server side?". Don't you think the answer like "Because there is only 100% correct data in the server side" does not  conforming to reality? As practice shows, in the overwhelming majority of medium and large projects discrepancies occur between expected and actual data, especially if the client and server parts are developed in different programming languages by separate teams. The number of such discrepancies can be very large — from several dozens to several hundreds or even thousands. The reasons may range from simple human error to a lack of timely notification to relevant engineers about changes in the data. Also, the data may be saved to the database but the several ways: the GUI interface of the application, database manager, SQL requests, via data import etc. Depending of the specific way, the data may not be validated in full what may entail the saving of the invalid data. Such discrepancies should be detected ASAP.

And if you are creating a utility with declarative configuration via a file (usually JSON, YAML, etc.) similar to docker compose, then incorrectly specified configuration is a common scenario, thus in this case the validation is also required.

Native Approaches

Type guards are a native feature of TypeScript. Type guards are functions that return a booleans values; however, the type a return value is annotated not as boolean, but according x is T template, where x is the parameter whose type is to be checked, and T is the desired type:

In addition to the TypeScript documentation, type guards were well explained in the article by Marius Schulz, the frontend engineer. What matters to us for now is that:

Here is an erroneous example where the guard isUser checks on the parameter those fields which have nothing common with desired type User:

Despite the fact that the value of the parameter potentialUser are nowhere near the type User, isUser will return true, and TypeScript will not even suspect that something is wrong. Moreover, TypeScript will not raise the slightest complaint even if nothing is checked at all in the body of the guard function:

Why is it so bad? In short, because of fundamental limitations of TypeScript. Data validation (including via type guards) happens at runtime of JavaScript, when the original TypeScript code already does not exist. In the output JavaScript code, a type guard is already an ordinary JavaScript function, in no way different in nature from other functions returning a boolean value.

Such TypeScript functionality as type aliases (type keyword) or interfaces exist only within the source TypeScript code, but they are absent in the output JavaScript code, and therefore there is no way to refer to them. Theoretically, it is possible to implement the automatic generating of helper functions and/or objects based on type aliases and interfaces in the source TypeScript code and then use them for validation without manual coding, but it is unlikely that the TypeScript team will implement anything like that in the near future.

Besides the above one, type guards have several other significant problems:

  • Type guards by design only answer the question (herewith the answer truthfulness is implementation dependent) of whether the parameter is valid or not; they report neither where exactly the violations has been detected nor violations count.
  • A type guard returns false at the very first unmatching of the expected data with real, but the violations number may be arbitrary large.

Strictly speaking, these problems are such de facto only problems, because as already mentioned above, when implementing type guards, TypeScript requires only two things: returning a boolean value and a special annotation of the return type. In the function body, it is possible to do anything, including logging, complete checking of all properties, and so on. However in reality almost no one does this, and there is a strong reason for that. If in a real (non‑educational) medium or large‑scale project you start implementing type guards with the above functionality, a new problem will quickly arise: too much boilerplate code, and parts of it are almost identical. This is especially true for quality logging: there will be many uniform messages, and you will either have to write them from scratch each time, or organize extracting messages into separate objects and/or files, until the question of extracting all this code into a library arises. And in a real project, objects will not be as simple as the User from the example above: they can have 2030 properties, and this is not the limit, and often there will be nested objects, in particular arrays, often with elements of the "object" type to be validated too. The abundance of boilerplate increases the likelihood of errors due to fatigue, therefore while the native solution exists, it is not practical.

However, this does not make type guards useless — they just not good for validating objects with many properties, but for other value types (strings, numbers, etc.) they are not just suitable but are usually used a lot. YDEE also offers a set of type guards, many of which are used inside the library as well:

RawObjectDataProcessor Approach

So, since during transpilation from TypeScript to JavaScript interfaces and type aliases (the type keyword) cease to exist, there is no way at all to refer to them at runtime of JavaScript code; therefore, it is impossible to prove conclusively that a particular object has a certain type. Even if there is a corresponding type guard, the logic inside it may not relate to checking the required properties at all. However, before using the as keyword, this usage should be substantiated with something, that is the validation. For the sake of maintainability, the validator must log in detail all mismatches of the actual data with expected data, not only the first mismatch. Under this arrangement, even if the specified validation rules do not match the real type (for example, due to mistake because of the fatigue), practice shows it will surface very quickly in most cases.

Thus, RawObjectDataProcessor takes upon itself the sin of using as, in exchange requiring that the specification of valid data be given almost in a declarative form. Let us once again look at the demo in light of the theory described above.

  1. First, RawObjectDataProcessor will check whether externalData is an object at all. If not, there is already nothing further to validate or process.
  2. Next, RawObjectDataProcessor will check in the object externalData each property mentioned in the valid data specification validDataSpecification including nested objects and arrays (a special case of objects from the ECMAScript viewpoint).

    • In addition to checking the data type, for the demonstration purposes the sample valid data specification is including additional constraints. For example, the bar property must not only be a string, but also have at least 5 characters.
    • For each constraint specified in the valid data specification, RawObjectDataProcessor performs a specific check, and if a mismatch between actual and expected data is found during this check, the checking will  not stop immediately (except for the case where the input itself is not an object); instead, a message about the mismatch will be saved into an array which can be accessed via the value returned by the process method.
  3. If no mismatches of actual data to the established constraints are found during validation, then RawObjectDataProcessor will take upon itself the sin of marking the input data with the as keyword by the type passed via the generic parameter (SampleType in the example above).

There are the following similarities between RawObjectDataProcessor and type guards (of course, implemented per the concept but without additional functionality):

  1. They check whether the raw data matches expectations
  2. Can not fully guarantee conformance to a particular type due to fundamental limitations of TypeScript

As for the differences, there are many more of them:

  1. RawObjectDataProcessor is designed to work only with objects (in particular, with index arrays), although their properties/elements can be of any compatible with JSON type.
  2. RawObjectDataProcessor returns not a boolean value, but a polymorphic object. The question of whether the data is invalid is answered by the property isRawDataInvalid. When this property is false, you can access the object cast to the desired type via the property processedData; otherwise, instead of it there will be the property validationErrorsMessages containing messages about all mismatches between actual and expected data.
  3. The majority of API is declarative although if required it is possible to define the additional checks or manipulations imperatively.
  4. It can check properties/elements not only for type but for satisfying to other constraints as well.
  5. If necessary, in addition to validation it can make changes to the original object. This functionality is not discouraged, because in some cases it is extremely useful (for example, if you need to convert numbers stored as strings into the type BigInt, which is currently incompatible with JSON, or rename properties); however, it should be used with care since it may break validation or make data marked as valid become invalid. Importantly, RawObjectDataProcessor has 2 strategies when working with objects: manipulating the original object (by default) and constructing a new object based on the original. These strategies are especially important when besides the validation it is required to modify the source object.

Finally, RawObjectDataProcessor has high‑quality templates of messages describing mismatches between actual and expected data, and although they are documented, the messages are written so that you can understand what is wrong even without documentation. See the localization source code and try to estimate how much time you need to prepare there messages yourself including the providing of the reusage of these messages across the projects.