Implement a step that will allow Wrangler users to split column into multiple other columns based on a delimiter.
Step name: Split column
Step label: Split column based on a delimiter
Step description: Split data in column into multiple new columns based on a delimiter.
Step parameters:
Input column: a column to split, string
Delimiter: a delimiter, can be multiple characters, NOT a regular expression (we can introduce regexp delimiter later)
Number of columns to create: integer, must be at least 1. Defines how many columns to create at maximum (see example below).
Step label in UI
Split column 'inputColumnName' to 4 columns using ','
The step will take input string, search for delimiters and return text between those delimiters in newly created columns. New columns will have names like SourceColumn part 1, SourceColumn part 2, ..., SourceColumn part N where SourceColumn is the name of the input column and the parts are numbered from 1 to number of splits.
Empty value will produce empty output. If there are fewer delimiters than number of splits in the parameters, the remaining columns will be set to null. If there are more delimiters than the number of splits, the last column will receive the remainder of the data.
Note that this behaves almost like CTL split function but it does not use regexp to define a delimiter but uses just plain string.
Examples:
Split "1,2,3,4" into columns, delim=",", columns=4
Returns data in 4 columns: "1", "2", "3", "4"
Split "1,2,3,4" into columns, delim=",", columns=3
Returns data in 3 columns: "1", "2", "3,4" (note the last one which contains two values)
Split "1,2,3,4" into columns, delim=",", columns=5
Returns data in 5 columns: "1", "2", "3", "4", (null)
Split "1,,,4" into columns, delim=",", columns=4
Returns data in 4 columns: "1", "", "", "4"
This means that the empty parts in the middle are represented by empty strings
Implement a step that will allow Wrangler users to split column into multiple other columns based on a delimiter.
Step name: Split column
Step label: Split column based on a delimiter
Step description: Split data in column into multiple new columns based on a delimiter.
Step parameters:
Input column: a column to split, string
Delimiter: a delimiter, can be multiple characters, NOT a regular expression (we can introduce regexp delimiter later)
Number of columns to create: integer, must be at least 1. Defines how many columns to create at maximum (see example below).
Step label in UI
Split column 'inputColumnName' to 4 columns using ','
The step will take input string, search for delimiters and return text between those delimiters in newly created columns. New columns will have names like SourceColumn part 1, SourceColumn part 2, ..., SourceColumn part N where SourceColumn is the name of the input column and the parts are numbered from 1 to number of splits.
Empty value will produce empty output. If there are fewer delimiters than number of splits in the parameters, the remaining columns will be set to null. If there are more delimiters than the number of splits, the last column will receive the remainder of the data.
Note that this behaves almost like CTL split function but it does not use regexp to define a delimiter but uses just plain string.
Examples:
Split "1,2,3,4" into columns, delim=",", columns=4
Returns data in 4 columns: "1", "2", "3", "4"
Split "1,2,3,4" into columns, delim=",", columns=3
Returns data in 3 columns: "1", "2", "3,4" (note the last one which contains two values)
Split "1,2,3,4" into columns, delim=",", columns=5
Returns data in 5 columns: "1", "2", "3", "4", (null)
Split "1,,,4" into columns, delim=",", columns=4
Returns data in 4 columns: "1", "", "", "4"
This means that the empty parts in the middle are represented by empty strings
Split " 1 , 2 , 3 ,4" into columns, delim=",", columns=4
Returns data in 4 columns: "1", "2", "3", "4"
This means that all the individual parts should be trimmed automatically by the step
Split "" into columns, delim=",", columns=4
Returns data in 4 columns: (null), (null), (null), (null)
All the parts are null because this is special case - the input has no delimiter and no content therefore no parts after split
Same should happen if the input value is null