Accessing Data


Applying a Filter in a Data Flow

Using Citi Bike trip data, we will restrict our analysis to rides that start in Manhattan, so we can limit the data loaded to just that borough. The geographic data identifies the county, so we will just load the data in New York County.

  1. In the COUNTY field, click the bar for New York County.

    The display changes to reflect the selection. The dark portions of the bars in each column show the proportion of rows that are selected, as shown in the following image.

Editing Fields in a Data Flow

By default, all fields in a single-segment data source, or all fields from top segment in a multi-segment data source are automatically added to flow. You can turn off this option in the Advanced Options dialog box.

To edit the fields in the flow, right-click the SQL object, and click Edit. The Metadata and Query panes open.

Enabling Sampling for a Data Flow

When a data source in a flow has a large volume of data, you can enable sampling for better response time. You can make decisions based on a sample, provided that sample is representative of the entire data set. Data Prep has a built-capability to automatically generate a random sample (with a 99% confidence level and +/- 1% margin of error).

To enable sampling:

Generating Sample Files for Data Preparation

Citi Bike provides data monthly as zipped comma-separated values (.csv) files that contain the following data values:

  • Trip Duration (seconds) 
  • Start Time and Date
  • Stop Time and Date
  • Start Station Name
  • End Station Name
  • Station ID
  • Station Latitude/Longitude
  • Bike ID
  • User Type (Customer = 24-hour pass or 3-day pass user. Subscriber = Annual Member.)
  • Gender (Zero=unknown; 1=male; 2=female)
  • Year of Birth 

To download a ridership file;

Video: How to Edit Metadata

Depending on your version of WebFOCUS and your permissions, you may have access to additional metadata options, so you can edit previously created synonyms.

Video: How to Pivot Data

When you upload a spreadsheet or delimited file with columns of repeating values, you can pivot the columns to rows. 

Video: How to Join Data

After you upload or connect to data, you can Join data sources based on shared fields to enhance the data available to you.