DEEP MATCH - Exploiting the Power of the MATCH command


By  
Author
Position
Director of Product Management and Support
Body

True to the form of any good language, FOCUS includes more than one way to connect multiple data sources to use their combined data for reporting, graphing, or charting. The most common connection is JOIN, which in its many variations—JOIN to UNIQUE, JOIN to MULTIPLE, DEFINE-based JOIN, WHERE-based or conditional JOIN, and recursive JOIN—allows data in one file to be connected and correlated with data in a second file. The JOIN command establishes the conditions of the connection between two files. Up to 1,024 JOINs can be established and used by TABLE, TABLEF, GRAPH, ANALYSE and even MATCH—all of which are processes rather than conditions.

The primary mechanism for the JOIN is the nested loop, illustrated below. As records are selected from file A for the given request, they are compared, one at a time, to the records in file B. File B, the cross-referenced file, and those pairs of records from host and cross-reference that meet the conditions of the JOIN, are kept  for processing. This process is repeated in a loop until there are no additional qualified records in the host file.

Nested loop Sort Merge

The MATCH command also allows us to connect data files, but its modus operandi is very different. MATCH is a process, much like TABLE or GRAPH, with its own flow and output. You can MATCH any two structures together, regardless of their file type, and that includes structures created with JOIN or static cross-references in the Master File. Before we look at the MATCH syntax, let’s examine the MATCH process.

MATCH uses a sort-merge mechanism to combine data records from two files. All of the records that meet the MATCH command conditions are selected from the first file, also referred to as the OLD file, and held in memory or the FOCSORT file. Then all of the records that meet the MATCH command conditions for the second file are selected and also placed in memory or the FOCSORT file.  (As of FOCUS 7.6/WebFOCUS 7.7, the size limits on the FOCSORT file were removed, greatly enhancing the usability of MATCH.  You are no longer restricted to 2 GB. Your limit now is the space you have available on After they are sorted, the two hold files are merged together with an additional selection based upon their source, OLD or NEW, and placed in your specified output format. This process can be iterated multiple times in a single MATCH process, involving multiple files and output selections, for up to 16 merges in a single MATCH request.

Sort-merge mechanism Match request

The Basic MATCH Request

The fields included in a MATCH request are listed following the verb in each section of the MATCH. The sequence of the verb objects is reflected in the final output. Most simply, the verb objects are listed in order, OLD file first, then NEW. Duplicate fields are repeated in the final output. Only the first field is visible by referencing the field name. Consider the following request:

MATCH FILE CAR
SUM RETAIL_COST
BY COUNTRY
BY CAR    
RUN       
FILE CAR  
PRINT RETAIL_COST
BY COUNTRY
BY CAR    
BY MODEL  
AFTER MATCH HOLD OLD-OR-NEW
END
FILENAME=HOLD    , SUFFIX=FIX     , IOTYPE=BINARY, $          
SEGMENT=HOLD, SEGTYPE=S2, $              
FIELDNAME=COUNTRY, ALIAS=E01, USAGE=A10, ACTUAL=A12, $    
FIELDNAME=CAR, ALIAS=E02, USAGE=A16, ACTUAL=A16, $
FIELDNAME=RETAIL_COST, ALIAS=E03, USAGE=D7, ACTUAL=D08, $ 
FIELDNAME=MODEL, ALIAS=E04, USAGE=A24, ACTUAL=A24, $      
FIELDNAME=RETAIL_COST, ALIAS=E05, USAGE=D7, ACTUAL=D08, $

If you need to reference both copies of RETAIL_COST in the output, you can rename the second instance of the field using the AS phrase in the request, or reference it using the MATCH-created ALIAS.

In current releases, the SET MATCHCOLUMNORDER = GROUPED switch offers new functionality that allows the generation of hierarchical structures that can be stored in FOCUS of XFOCUS files. This will be described further on. Suffice it to say that this setting allows for the automatic repositioning of fields based upon their common sort level.

Each section of the MATCH request can be subjected to its own set of selections—WHERE or IF condition—limiting the data values that are actually held in the initial sort. These conditions can be based on real fields or DEFINEs that are established before the MATCH is run using DEFINE FILE.  COMPUTES are currently not available in MATCH.

The Match Verbs: PRINT and SUM

Each section of the MATCH should have at least one verb. That’s right, multi-verb requests are supported in each part of the MATCH command. When using TABLE, the multi-verb request that is output to a FOCUS or XFOCUS file creates a multi-segment structure reflecting nested sort levels. In MATCH, this technique can also be used to create a hierarchically structured output file, highly efficient in data storage. The does require the use of the new SET MATCHCOLUMNORDER=GROUPED switch.  Along with the generation of hierarchies, this switch also removes the limits on output type, making available the extensive range of HOLD output types directly from your MATCH request.

As you might expect, SUM aggregates the data values that are selected based upon the sort fields specified, while PRINT includes each selected detail record based upon the specified sort. There are no constraints on the order in which the verbs in the two sections of the MATCH appear, so at its simplest level, your MATCH request can have four basic structures producing four distinctly different outputs.

These are the structures of the output records that you are generating. In the SUM/PRINT, a record is created with the aggregated value in the OLD file and the detail values in the NEW file. In the PRINT/SUM, the individual detail values in the OLD file each have the aggregated values from the NEW file. Take a look at this simple request producing all four variations:

MATCH FILE CAR Records in Car file
SUM/SUM MATCH output SUM/PRINT MATCH output
PRINT/SUM MATCH output PRINT/PRINT MATCH output

The MATCH Connections: The BY Fields

The sort fields that are included in each part of the MATCH request function in two ways. First, they specify the sorting levels to be used with the data selected. Second, and most importantly, the BY fields specify the fields that are to be used to connect the OLD and NEW files in the merge that follows. The values of these fields are compared, record by record, and merged into a single record that is then subjected to the AFTER MATCH conditions.

The BY fields in the OLD and NEW files must have the same name and comparable formats.  Converting field formats can be done using DEFINE FILE commands before issuing the MATCH command itself. For fields that are already comparable, such as two alpha fields with different names, using AS on the BY phrase is sufficient.

Because the BY fields are merged based upon equality, there is only one column for each BY field in the final output. This assumes that the BY fields are exactly the same for both the OLD and the NEW files. When this is not the case, each BY field will be shown in the final output. This leads to some very useful functionality based on the ability to connect data with different sorts in the same output record.

MATCH with different sorts MATCH syntax

The New MATCH Grouping: SET MATCHCOLUMNORDER - GROUPED or UNGROUPED

As mentioned above, this recently added SET switch in MATCH provides the possibility for a much broader range of direct MATCH output. For the most part, MATCH has been used with the same sort levels in both the OLD and the NEW files. The output record under the UNGROUPED setting—previous behavior—used the order of the verb objects specified in the request. It listed common BY fields, then the OLD verb objects, followed by the non-common BY fields, if any, and finally the verb objects from the NEW file. Common BY fields are included at the beginning of the record.

If you include different sorts in the OLD and NEW MATCH requests, common sorts indicating the connection are included at the beginning of the record, and differing sorts are included at the beginning of the OLD or NEW section that they fall into. So far there is no difference between the GROUPED and UNGROUPED behavior.

When MATCH requests include more than two files, and multi-verb requests are used creating hierarchies, and the sort levels for the hierarchies are not the same, you have a rare but powerful combination. The GROUPED switch automatically associates the fields with the same common sort levels together. This results in the shifting of the positions of verb objects within the output record, grouping them by sort level. The effect of this is to provide a mechanism to successfully generate and merge  hierarchies within the MATCH request that can be directly output to FOCUS or XFOCUS files, or passed on to other programs.  This new behavior was made the default behavior—GROUPED. For those with MATCH requests already in place, the SET MATCHCOLUMNORDER=UNGROUPED command returns to the previous behavior if that is required. As we will see shortly, the benefits of the new possible direct outputs from MATCH make this default setting highly desirable.

The AFTER MATCH Selections: Venn Diagrams

Following each merge in a MATCH request, the records to be placed in the output are selected based upon their origin. This is most easily visualized using Venn Diagrams:

OLD or NEW Venn diagram

OLD-OR-NEW - All records from the first and second files. This is the default if the AFTER MATCH line is omitted. (It is the UNION of the sets.)

OLD and NEW Venn diagram

OLD-AND-NEW- Only records common to both files. (It is the INTERSECTION of the sets.)

OLD not NEW Venn diagram

OLD-NOT-NEW- Records from the first file with no match in the second file.

NEW not OLD Venn diagram

NEW-NOT-OLD - Records from the second file with no match in the first file.

OLD nor NEW Venn diagram

OLD-NOR-NEW - Non-matching records from both files, records from the first file with no match in the second file, and records from the second file with no match in the first file.

OLD Venn diagram

OLD - Records from the first file with matching records in the second file.

NEW Venn diagram

NEW - Records from the second file with matching records in the first file.

This final selection determines the origin of the merged records that will be included in the output file.  Using OLD-OR-NEW ensures that all records retrieved from both files will be included in the output. To include only those records that are common to both original files, you can specify OLD-AND-NEW. To find those records that are included in only one file, but not the other, you have OLD-NOT-NEW and NEW-NOT-OLD. To use one file as a base for commonality, you can use OLD or NEW, taking only those records-shared or not-from the base file. Finally, to capture the records without commonality, use OLD-NOR-NEW.

AFTER MATCH syntax

AFTER MATCH HOLD OLD or NEW AFTER MATCH HOLD OLD and NEW
AFTER MATCH HOLD OLD nor NEW AFTER MATCH HOLD OLD not NEW
AFTER MATCH HOLD NEW not OLD AFTER MATCH HOLD NEW
AFTER MATCH HOLD OLD  

The MATCH Outputs

Prior to SET MATCHCOLUMNORDER=GROUPED, MATCH output was limited to the flat fixed file. This is still the case when the UNGROUPED value is used. This will provide a HOLD file that can be used for subsequent reporting, formatting, styling, charting, graphing, and analyzing.

Now with SET MATCHCOLUMNORDER=GROUPED, MATCH output can be directly created in any format that generates a Master File, including FOCUS/XFOCUS, comma and tab delimited files, flat files, etc. Notice in the following example that the SALES column appears in the output grouped with the RETAIL COST field, which was also sorted BY COUNTRY BY CAR. Fields that are sorted at the same level are now grouped together. This feature allows the creation of FOCUS/XFOCUS files from multi-verb MATCH requests. In current releases, MATCHCOLUMNORDER defaults to the GROUPED value.

MATCH syntax SET MATCHCOLUMNORDER=GROUPED
  SET MATCHCOLUMNORDER = UNGROUPED

Compare the GROUPED version to the UNGROUPED version shown in the previous images. This is the original behavior for MATCH output. If you have older MATCH requests that are showing unexpected behavior, you can return to the original behavior by issuing the SET MATCHCOLUMNORDER=UNGROUPED command.

When using GROUPED, you have a much wider range of match output, including FOCUS/XFOCUS, shown here from the request used in the previous image.

GROUPED

Delimited format DFIX - Master and Access File

Delimited format DFIX Master File

Delimited format DFIX Access File

Multiple Independent Sorts in the Same Output

One unique use of MATCH is the inclusion of multiple sorts based on different fields in the same output. For instance, if you need to output the SUM of REVENUE_US based on the day of the week, the Store Type, and the Product Category, you could create three independent requests that generate HOLD files.

Three different sorts

These reports can then be displayed or combined, or you can create the aggregates using MATCH, placing the data in a single output file. MATCH does not like multi-headed Master Files, so data from WF_RETAIL_LITE was dumped to an XFOCUS file first.

Three different sort syntax

Creating the Exact Output Record That You Want

If you need to create a very specific structured output from one or more sources for use by another application, then you may already be familiar with MATCH to concatenate data fields together from different outputs. Consider using MATCH to create the individual pieces you are assembling, rather than just using it for assembling.

Exact record structure

Exact record syntax

Generating Hierarchical Outputs With Multi-Verb Requests in MATCH

One of the interesting and useful characteristics of the hierarchical structure is its ability to reduce data duplication in file output while retaining data in a single file, limiting the overhead of file opens, closes, etc. FOCUS/XFOCUS files are often used as intermediate files because they can easily be created with an index-useful for creating subsequent dynamic JOINS, or for rapid retrieval of data values in lookup tables. Using multi-verb requests generates these multi-segment hierarchical structures. And with the new SET MATCHCOLUMNORDER command, you can now match on multiple layers of these hierarchies to generate correctly sorted, aggregated, and segmented output.

Hierarchical output

MULTIVERB/MULTISEGMENT XFOCUS FILE

In my next article, I will discuss define-based JOINS and the MacGyver technique.