Use of Wrangler "mapping" jobs in Designer projects

Assignee

Reporter

Sprint

Description

One of common use-cases is mapping of data coming from source to some internal data layout. This frequently comes up as part of data ingestion processes - mapping from many different (but similar) source layouts to one common (internal) data layout. In many cases, this mapping needs to be designed by business users who do not have access to the Designer. Traditionally, this is solved by building frameworks that allow such mapping to be specified via external configuration (often "business user friendly" Excel file).

To improve this, we will implement ability for these kinds of mapping jobs to be designed in Wrangler. The workflow will be the following:

  1. Wrangler user creates mapping using any Wrangler tools as needed.

  2. Wrangler user exports the mapping via Export job functionality. They will get a ZIP file they will send to IT team.

  3. IT team imports the ZIP file into Designer project. The ZIP will contain Wrangler artefacts (wjob, wsrc, wtgt) and a generated mapping subgraph but only generated subgraph will be imported. It must be possible to import multiple ZIPs at once.

  4. IT team then links the imported mapping subgraph to a job as needed.

Export structure

The mapping subgraph in the exported ZIP will be created during export time and is essentially the "core" of a Wrangler job - it implements the transformation but does not have the source or target. The mapping subgraph will:

  • Have exactly one required input port. This input port will publish the same metadata as the output of the source of the original Wrangler job.

  • Have exactly two output ports:

    • Port 0: required, valid records output. This port will publish the metadata as computed by Wrangler at the end of the Wrangler job.

    • Port 1: optional, rejected records output. This port will publish reject metadata that will provide information about records rejected by the mapping subgraph. The reject metadata will contain:

      • string[] errorMessage

      • string[] errorColumn

      • string[] step

      • long sourceRowNumber

      • data columns: same columns as in Wrangler job output (port 0) but converted to string to allow storage of invalid data

The exported subgraph must contain a note which provides additional details about the export to allow investigation if needed:

Import location and structure

The mapping subgraph from the Wrangler export will be imported into subgraph folder by default but the user can select different folder.

Import wizard

The import wizard will be accessible in Designer from project context menu via Import. New item will be created among CloverDX import wizards - "Import Wrangler mapping jobs".
The wizard will allow user to select one or more ZIP files with the export and will have following options:

  • Selection of the target folder ( subgraph folder by default)

  • Checkbox "Overwrite without warning" - unchecked by default. If unchecked and imported artefacts already exist, the wizard will ask for confirmation whether to overwrite or not.

Runtime

Subgraphs imported via the above process can be used just like any other subgraph - via subgraph component. The have few basic properties that will make their usage a bit simpler:

  • These subgraphs are static and do not depend on any Wrangler artefacts. All their metadata and code is embedded within the subgraph so they can be easily moved if needed.

  • They do not publish any public parameter - they are intended to be uses as is.

  • They can reference libraries installed on the Server - for example if the Wrangler job used Lookup step.

Steps to reproduce

None

Attachments

1

Activity

Show:

Tomas Horsky September 14, 2023 at 6:39 AM
Edited

Example of generated subgraph, with Note according to the latest requirements:

Graph can be found in exported job archive under graph/<JobName>__mapping.sgrf.

Fixed

Details

Priority

Fix versions

QA Testing

UNDECIDED

Components

Created August 9, 2023 at 10:06 AM
Updated October 3, 2023 at 8:44 AM
Resolved September 15, 2023 at 6:45 AM