Joins and Unions in a Data Flow


You can add multiple sources in a data flow as joins or unions. 

  • In a join, shared fields or a condition in two sources will be used to add additional fields to rows. If fields in both sources do not match the join conditions, rows may be excluded from the joined sources. The Join Editor lets you edit the join, selecting join type and join fields.
  • In a union, all fields should match, and the union will add additional rows. For a union, select the matching fields in each source.

You can add a join or union by dragging an additional source onto the existing source, as shown in the following image.

Drag it to the Join bubble or Union bubble. If you drag it onto the canvas near the existing source, a join will be created, by default. Alternatively, you can right-click the SQL object, click Add, then click Join or Union.

The dminv source has product data, but no order data. To see the order information, you can drag the dmord synonym onto the flow canvas, as shown in the following image

The Join Editor and Sample Data panels open. You can also right-click the Join object and click Join Editor. The Join Configuration panel shows the join properties and enables you to edit them, as shown in the following image.

An inner join was automatically created based on the same field name PROD_NUM in both files. You can edit the join condition, add additional join conditions, and change the type of join. The types of joins supported are:

  • Inner join. The join fields in both sources must match, or the row will be excluded.
  • Left outer join. All rows from the host file will be retained. If a row in the target file matches, its field values will be included in the row, otherwise default values will be substituted.
  • Right outer join. All rows from the target file will be retained. If a row in the host file matches, its field values will be included in the row, otherwise default values will be substituted.
  • Full outer join. All rows from both files will be retained. Any row from both files that matches has the field values from both files, otherwise default values will be substituted.

The Join Profiler tab shows the effects of the join. There was not a perfect match between the two files, as shown in the following image.

 

The excluded count, represent product numbers that were not found in one of the files. Because of the inner join, these records will be excluded.

Click the Close button (X) at the top right of the window to return to the data flow view.

Right-click the SQL object, and click Edit.

Multi-select the fields from the joined source that you want in the flow.

Right-click and select Add to Query, then Column.