Join Editing in a Data Flow

The trip data contains station names, IDs, latitudes, and longitudes, but no zip code, city, or county information. To see the zip code and county information, you can download the station station_zip.csv file from http:/ You can then upload this file to the server using the instructions on Uploading Files in the Server Administration manual, and join it to the trip data file by dragging it onto the flow canvas, as shown in the following image

Right-click the Join object and click Join Editor. The Join Configuration panel shows the join properties and enables you to edit them, as shown in the following image.

An inner join was automatically created based on the similar field names start_station _id in the trip data file and STATION_ID in the station zip file. You can edit the join condition, add additional join conditions, and change the type of join.

The Join Profiler tab shows the effects of the join. There was not a perfect match between the two files, as shown in the following image.

The trip data file was sampled, but the station zip file is small, so the entire file was used for the profiling. In addition, stations may have been opened or closed or have been unused during the time period reflected in the data.

The gray area at the top of the first bar, and the excluded count, represent start stations that were not found in the station_zip file. For the second bar, the excluded count represents stations in the station_zip file that did not have any rides. Because of the inner join, these records will be excluded.

Click the Close button (X) at the top right of the window to return to the data flow view. Right-click the SQL object, and click Edit.

Multi-select the fields ZIP_CODE, COUNTY and CITY. Right-click and select Add to Query, then Column.

Click Show Profiling, then scroll over to see the distribution charts and values. You will see different values because of the random sampling, as shown in the following image.