Events Mapping
Event mapping#
1. Introduction#
As part of our event-driven architecture (EDA), we process events structured with a dynamic identifier payload
. To ensure consistent and efficient aggregation in both Elasticsearch and OpenSearch, it is for some use cases essential to explicitly define the mapping for fields used in the payload.
This document outlines the need, benefits of this approach, and an example of the configuration.
2. Context#
Context#
-
Event-driven architecture: Each event consists of many identifiers (
id
,type
,source.correlationId
...) and apayload
whose structure varies depending on the use case. -
Dynamic payload: The payload is dynamically generated based on the business requirements.
-
Issues with dynamic inference:
-
Aggregation inconsistencies: Automatic type inference can cause issues when aggregating data, especially when a field is interpreted differently in Elasticsearch and OpenSearch.
-
Performance and accuracy: Without explicit mapping, some aggregations may be impossible or less efficient, particularly when using runtime aggregations.
-
Proposed solution:
For each use case (within a workspace), go to YAML configuration file to explicitly define the field mapping. This standardizes data handling and ensures optimal compatibility between different drivers.
3. Technical Explanation#
- Dynamic Mapping vs. Explicit Mapping:
-
Dynamic mapping: While it offers flexibility, it can lead to inconsistencies in field typing and limit aggregation capabilities, especially when dealing with heterogeneous payloads.
-
Explicit mapping: By defining the schema precisely, we control the field types (e.g., number, string, object, etc.), making aggregation and data analysis easier.
-
Elasticsearch / OpenSearch Compatibility:
Explicit mapping ensures that both search drivers (Elasticsearch and OpenSearch) interpret data similarly, reducing errors such as "unknown" when executing aggregations.
4. Benefits of this approach#
-
Reliability and consistency:
Aggregations and data analysis rely on uniform typing, preventing errors caused by automatic inference. -
Better performance:
Explicit mapping optimizes indexing and search performance, particularly for complex aggregations or runtime fields. -
Easy maintenance and scalability:
The YAML configuration is easy to read and modify. Each workspace can have its own configuration file tailored to its use case, simplifying schema evolution based on business needs. -
Cross-platform compatibility:
Ensures consistency between Elasticsearch and OpenSearch, making sure that aggregations work identically on both platforms.
5. Example of configuration#
Events mapping#
events:
types:
usage:
schema:
usage:
type: object
title: Usage
properties:
total_tokens:
type: number
completion_tokens:
type: number
prompt_tokens:
type: number
cost:
type: number
format: double
firstTokenDuration:
type: number
Explanation of the example#
-
events.types.usage.schema.usage
:
This section defines the schema for theusage
event type. -
Definition of properties:
Each property (e.g.,total_tokens
,completion_tokens
, etc.) is explicitly typed to ensure correct interpretation during indexing and aggregation. -
Using the format:
Thecost
field is specified with adouble
format, which is crucial for precise calculations and avoiding misinterpretations between search engines.
6. Conclusion#
By integrating explicit mapping configuration via YAML for each workspace, we achieve:
-
More consistent indexed data.
-
Reliable and performant aggregations in Elasticsearch and OpenSearch.
-
A configuration that is easy to maintain and evolve according to business needs.
This documentation should be used as a reference when creating or updating event schemas to ensure consistency and robustness in the overall architecture.