Pipes Input
Input sanitation
Data enrichment systems ingest user-provided data from sources like:
- CRMs
- ATSs
- Web forms
User-provided data can contain mistakes or be invalid. Pipe0 has a robust sanitation layer to clean and regenerate data.
Cleanup
The following request payload contains common errors but will be processed successfully.
{
"pipes": [
{
"pipe_id": "company:identity@1",
},
],
"input": [
{
"id": 1,
"name": "Susi Jui",
"company_websiste_url": "pipe0.com", // missing protocol 'https://'
"email": "mailto:susi@pipe0.com", // "malto:" prefix
"personal_website_url": "wwwww.susi.com" // wwww instad of www
},
{
"id": 2,
"name": "Tom Schmidt",
"company_name": "Pipe0",
"company_websiste_url": "not today" // invalid: expected URL
},
],
}
Here’s how we clean this request:
- Parse URLs into a consistent format and clean common mistakes
- Parse email addresses into a consistent format and clean common mistakes
- Fix obvious typos and remove invalid characters
- Parse data formats on demand (int > float, float > int, int > string)
Regeneration
In our example, "company_websiste_url": "not today"
is not a valid URL.
Because company_websiste_url
is an output field of company:identity@1
,
we can find the correct value.
During processing, company:identity@1
detects that company_websiste_url
is of invalid format
and replaces it with the correct value.
The result may look like this:
{
"id": 2,
"name": "Tom Schmidt",
"company_name": "Pipe0",
"company_websiste_url": "https://valid-url.com" // healed
},
Valid input values are not regenerated. Instead, they are copied from the input
to the record.
Incomplete data
It is common for input data to be incomplete.
Failing the entire task because one input object cannot be processed is impractical and annoying.
Partially missing input fields
If we find at least one input object that can be processed, pipeline validation will pass.
Let’s look at the following request payload:
{
"pipes": [
{
"pipe_id": "company:identity@1",
},
],
"input": [
{
"id": 1,
"name": "Susi Jui",
"company_name": "Pipe0",
"company_websiste_url": "pipe0.com",
},
{ // CANNOT be processed by "company:identity@1"
"id": 2,
// required `company_name` missing
},
],
}
The pipe company:identity@1
requires the input field company_website_url
which is not present
in record id=2
. In this case:
- Pipeline validation passes
- Record
id=1
is processed in full. - Record
id=2
has failed fields
No input object has required input fields
Let’s look at another example:
{
"pipes": [
{
"pipe_id": "company:identity@1",
},
],
"input": [
{ // CANNOT be processed by "company:identity@1"
"id": 1,
"name": "Susi Jui",
},
{ // CANNOT be processed by "company:identity@1"
"id": 2,
"name": "Tom Schmidt"
},
],
}
No input object has the required field company_websiste_url
. The request will fail during pipeline validation.
The entire task will fail before processing starts.
Never fail a task
In practice, dealing with failing tasks can be annoying. If you don’t want to deal
with failing tasks, there’s an escape hatch: If you define the expected input fields and set them to null
, pipeline validation will
pass. The task will not fail. Instead, only individual fields fail.
Input expansion
Input expansion is an advanced concept that you can safely ignore if you don’t plan to use pipe0 for complex UIs.
When you enrich data with pipe0 you transform your “input objects” into output records. An input object may look like this:
{
"id": 2,
"name": "Tom Schmidt",
}
Some interactions require you to reprocess previously processed fields. For this, it is common to transform your output records
back to input objects
. By doing so, previous processing information is lost. This includes metadata like
the result of a waterfall, UI widgets, etc.
If you pass a plain value to the API, it will always be marked as resolved_by:input
.
Instead of passing your input as a plain value, there is another way: Input expansion.
You can pass your inputs fully or partially expanded (as the field value of the response object).
{
"id": 2,
"name": {
"value": "emma@amazon.com",
"status": "completed",
"type": "string",
"reason": null,
"meta": null,
"ui": {
"severity": "none"
}
},
}
Expanding inputs gives control but shifts the responsibility of providing valid input states to you.