Generating Sample Files for Data Preparation


Citi Bike provides data monthly as zipped comma-separated values (.csv) files that contain the following data values:

  • Trip Duration (seconds) 
  • Start Time and Date
  • Stop Time and Date
  • Start Station Name
  • End Station Name
  • Station ID
  • Station Latitude/Longitude
  • Bike ID
  • User Type (Customer = 24-hour pass or 3-day pass user. Subscriber = Annual Member.)
  • Gender (Zero=unknown; 1=male; 2=female)
  • Year of Birth 

To download a ridership file;

  1. Go to http://www.citibikenyc.com/system-data.
  2. Click the link that says Download Citi Bike trip history data.
  3. Click 201907-citibike-tripdata.csv.zip to download the file.

As part of the data preparation, this data will be augmented to have:

  • Trip duration in minutes.
  • Age in years.
  • Alphanumeric gender values.
  • Additional date components.
  • Start Station Zip Code, City and County.

Depending on which zip file you download, your results may vary from ones in this topic, which uses the data from July, 2019.

Once you have downloaded a file, you can upload it to the server.

A supplementary file was created that has station zip codes and counties (station_zip.csv). You can download this file from http://techsupport.informationbuilders.com/public/station_zip.csv after which you can upload it to the server.