In the first post in this series I talked about different places I could place complex calculations. I reviewed some pros and cons of doing calculations during ETL, in the universe, or on a report. In this post I am going to focus only on report calculations. I will cover the formulas versus variables debate and show a couple of tricks to make working with variables a little bit easier along the way.
This presentation covers slides 10 through 12 from my 2009 GBN presentation titled “Return of the Variables” which can be downloaded from my conference page.
Constant, Formula, or Variable?
A constant is a value that is never going to change. Constants are probably quite rare, to be honest. I could hard-code a sales tax rate into my report but it is far more likely that the tax rate will come from the database. Same for commission percentage or project deadline dates or any other data along those lines. In any case, they’re not really complex and they’re certainly not calculations so I will move on. 
That leaves formulas and variables. How are they different? How are they the same?
Both can use the full range of report engine functions. Both can be used just about anywhere on the report. Is there any reason to select one technique over the other?
Yes, yes there is.
The Four Cs
Years ago when I wrote my first Variables presentation I came up with what I called “The Four Cs” to describe why variables are better than formulas. They are:
- Convenience
A variable can be used with every report feature. I can’t build an Alerter using a formula, as shown here:

I can’t rank on a formula. In Web Intelligence I get a list of query objects and a list of variables, but not formulas. The list probably goes on from there but that’s probably enough important features to prove the point. Variables can be used in places where formulas cannot be used.
- Clarity
Simply put, a variable is a formula with a name. A name defines the purpose of the calculation. It is much easier to understand a complex calculation like this:
=((100-[Discount Rate])/100)*[List Price]*[Qty Sold]
… when that formula is also named “Sale Value After Discount” and saved as a variable.
- Complexity
Sometimes it’s hard to figure out the syntax for a complex calculation. I will often break something down into simple steps and store the results as variables. Because variables can be reused (formulas cannot since they don’t have a name and therefore cannot be referenced) I can check my work at various points and make sure I am seeing the expected results. In another post related to this conference presentation I showed an alternative way to calculate the number of business days between two dates. If you review that post you will see that the formulas get quite complex indeed!
You can bet that I didn’t come up with the final version without working carefully through each step. Using variables, of course. - Consistency
Once a variable is built it appears on the list of report “components” saved as part of the document. I have touched on this aspect of variables already in this post. Because variables appear on this list they are reusable throughout the report. Because I can reuse them throughout the report I can edit them in one place and have the changes propagate everywhere I use that particular calculation. If I spend the time and effort to create and test a complex calculation, it’s nice to know that it is reusable and will therefore provide consistent results throughout my document.
Wish List: Universe (or Universal) Variable Repository
At one point I had started writing a VBA script that would allow me to create report variables and check them into a “variable repository” that I stored in a Microsoft Access database. The idea was that once a variable was created I could then easily reuse it in other documents. This worked fine for desktop documents. Web Intelligence documents were not supported at all since the utility was based on VBA. It would be nice if we could see this feature supported directly in the universe framework.
Building Variables
Building variables in a document is fairly simple. I can click the proper button on the toolbar (shown below).

After clicking this button I am presented with the following screen that allows me to create my local calculation. (Note: this screen is from XI 3.0 and is different from what you will see in prior versions.)

This screen allows me to enter a name for my variable, set the qualification (dimension, detail, or measure), and access the full range of report functions.
There is another way that I can create a calculation; I can use the formula bar much like I can in a spreadsheet. Here I have activated the formula bar using the appropriate button and have clicked inside the edit area and started to create my formula.

Is this a problem?
Formula Challenges
For this demonstration I have created the following simple formula.
=[Revenue] / 100
I can use this formula in a report block. In fact I can copy it to multiple report blocks. The problem is, after I have done this I have three (or more) different copies of the same formula. If I click on the initial cell where I created the formula and edit it (perhaps I meant to type 1000 instead of 100 earlier) then only that instance of the formula is changed. The rest of the formulas are stored elsewhere in the document and are not affected by the edit.
This is a problem.
Converting a Formula to a Variable
If I made this mistake and want to convert my wayward formula into a variable I can do that. There is a button on the formula toolbar here:

When I click that button I am presented with the following dialog box that allows me to assign a name to my formula.

Once a formula has a name it becomes a variable and shows up in the document data tab.

Summary
This part of the presentation was probably considered to be fairly basic by most (if not all) of the folks in the room. However, I always repeat this same information every time I deliver a variables presentation. Why? Because of what happened years ago when I presented “Variables and More… Fourth Edition” at the conference in Las Vegas. I thought that by then everyone knew about the toolbar button that provided a way to convert a formula into a variable but it seems that was not the case. The very last question that I took from the audience was from a woman who wanted to know what to do with a formula she had created by mistake. I quickly created a formula and then showed how to convert it into a variable using the process I outlined here. I then found out just how many folks didn’t know about this technique, as it seemed the entire audience went “oooh, aaaah” and proceeded to give me the highest rated presentation marks that I have ever received. 
It just goes to show that anything and everything that you know is probably worth sharing. It is very likely that there is somebody, somewhere that doesn’t know what you consider to be basic information. Share those simple tricks and they are much better off. Plus you earn some good karma. 
I have discussed the MultiCube() function a number of times in my series of Variables presentations. In certain situations it allows you to fix measure objects when you have two (or more) unbalanced data providers in a full-client document. ForceMerge() is the equivalent for Web Intelligence.
So how do they work, exactly? What do I mean by unbalanced data providers? How is it fixed with these functions? Are there any special conditions to be aware of?
Merged Data Provider Review
I recently published a blog post that described unbalanced data providers. If you want the full description, I have included a link at the end of this post. If you want the executive summary, here it is:
When you have two (or more) data providers in the same document and want to merge the data, you have to link them using dimension objects. If you don’t have the same number of dimensions from each data provider, then they are unbalanced. That can cause issues.
It took me twelve hundred words (and six pictures) to provide a detailed explanation in my earlier post.
In the sample report I used for that post I had one common dimension (Year) and one unique dimension from each side (Resort and Region). I showed how I could use Year with either measure since it was a shared (and linked) dimension.

I also showed how (and explained why) trying to use the “extra” dimensions like Region or Resort caused problems. I don’t intend to repeat everything here; please use the link at the end of this post to read the prior post if you are unclear on anything so far.
What About MultiCube()?
One of the comments I got on that post suggested that MultiCube() could be used to fix the issue. My response was accurate as far as I was concerned, but later when I read it again I realized it could also be quite confusing. Here’s what I said:
…the MultiCube() function will help if data providers are unbalanced “upwards” but not “downwards” …
After reviewing my response, I realized that it was quite likely that I was the only person that understood what I meant by that statement. While at times I do enjoy talking with myself
that’s not why I have this blog. So I am going to try to explain that a bit further.
The problem with trying to use MultiCube() or the Web Intelligence equivalent ForceMerge() to solve this specific case is that the data doesn’t support the solution. There is no hierarchy or relationship between Region and Resort. (Yes, they are both related via links to the fact table but that’s not what I mean.) Resort is a place. Region is an attribute of the customer. There is a many to many relationship between regions and resorts, and that sort of relationship is essentially worthless. If I can’t determine how to put things together, how are the numbers going to make sense? Neither of these functions can create a relationship from nothing.
Yet there are some cases where I can fix unbalanced data providers. I need to talk about the data first, then I need to create a new document with the proper structure in order to show how the ForceMerge() function works.
Hierarchical Data
Here is what I consider to be a really good example of why this is a problem. Most people think of time as a really clean hierarchy. On the surface the hierarchy is simple: Months go into quarters, and quarters go into years. Yet if I create a query with the Quarter and Revenue from the Island Resorts database here is what I get.

Take a look at the data and tell me how you would break out the quarter data into different years.
Go ahead and think a bit, I can wait.
One-Mississippi.
Two-Mississippi.
You can’t do it, can you?
You don’t have enough information. As defined, with the format “Q1″ and “Q2″ and so on, there is no indication as to which year the quarter revenues are coming from. You can’t “break down” the quarter by year with the information that you have. That is what I meant by data being unbalanced “downwards” in my comment on my prior post. I can’t break data down because the information simply isn’t there.
What I can do, however, is roll up or move “upwards” via a hierarchy. That’s where the ForceMerge() function comes into play.
Introducing the ForceMerge() Function
Here is what the help text says for the ForceMerge() function.
Forces Web Intelligence to account for synchronized dimensions in measure calculations when the synchronized dimensions do not appear in the calculation context of the measure
I will explain what this means, but first I am going to build my example. My document will have two data providers from the Island Resorts Marketing universe. My first data provider has dimension values Country and Resort and the Revenue measure. My second data provider has the dimension Resort and the measure Guests. I will link the two data providers by Resort as it is the only common dimension.
Data Provider 1


Data Provider 2


Data Provider Results


Merged Dimensions


From this point I will create a merged block using all of the measures and both dimensions. Here are the results from that experiment.

Everything is fine so far. The issue becomes evident when I remove the Resort object from the block. Remember that my two data providers are merged (synchronized) by Resort. When that object is no longer present in the block, the measures roll up. Revenue is still okay because the Country object was part of the query context of that data provider. The Guests measure fails because it is now being viewed out of context. (At the risk of becoming tedious, I will point out one last time that this issue is covered in much more detail in the first post in this series.)

I have created a problem. Next, I will explain how to fix it.
ForceMerge() In Action
All of the information I need is there in the document. It just does not show up in the block. Here once again is the help text for the ForceMerge() function:
Forces Web Intelligence to account for synchronized dimensions in measure calculations when the synchronized dimensions do not appear in the calculation context of the measure
Simply put, the function tells Web Intelligence to use all of the information from the cube, rather than being limited to what is shown in the block. I will create a new variable with this formula:
=ForceMerge([Number of guests])
I will replace my “broken” measure with my new variable and observe the results.

I haven’t changed my dimension linking. I haven’t changed my data providers. I didn’t even change the block structure, other than to replace the broken measure with my corrected calculation. If you were looking for something more complex, I’m afraid that’s all there is.
The complicated part is understanding just what the function does and when to use it, rather than how to use it. I am glad this function is now available in Web Intelligence starting with XI 3.x. In my opinion, it should have been there since merging data providers was an option.
Summary
The magic of the ForceMerge() function (or if using Desktop Intelligence the MultiCube() function) is that it changes the behavior of the report engine. Normally when dimension values are removed from a block they are taken out of context. By using one of these functions I can specify that linked dimensions still have to be considered by the measures even if those dimensions no longer appear in the block structure.
Even with this magic function in my arsenal, I cannot rely on it to fix every problem that I have. If the data can be “rolled up” by the calculation process then I am fine. If the data has to be “broken down” then I am in trouble, and these functions will not fix the issue. Each resort exists in exactly one country so I can move up the hierarchy from the resort link and the results are valid.
Special Note
If there are smart measures in a universe, then ForceMerge() will not work. In order to calculate the results correctly, smart measures (using the “database delegated” projection function) need to have the block structure in place to define the query. According to the documentation, the ForceMerge() function will return a #MULTIVALUE error if used in this situation.
If there are smart measures in a universe, then ForceMerge() will not work. In order to calculate the results correctly, smart measures (using the “database delegated” projection function) need to have the block structure in place to define the query. According to the documentation, the ForceMerge() function will return a #MULTIVALUE error if used in this situation.
One of the more powerful features of Web Intelligence (Desktop Intelligence as well) is the ability to combine data from more than one source into one document. This is done by creating multiple data providers, linking them up, and using the combined results in a report block. However, everything has to work out just right. Sometimes it doesn’t. 
Merging Data Providers in Web Intelligence
Dimensions are the “keys” of a data provider. When two database tables are linked in a universe, certain columns are used to join them together. Dimensions serve this role in linking two data providers. When two (or more) data providers are built in Web Intelligence from the same universe they are automatically linked together using any common dimension objects. What happens when two different data sources are used? In that case, the report writer has to merge the information manually. And if the dimensions from each data provider are not identical, the report has what I call “unbalanced” data providers. That’s what I want to describe in this blog post.
For my sample I will join two data providers from the Island Resorts Marketing universe. I am going to structure my queries in such a way that they will result in an unbalanced scenario. For my first query I will return Resort + Year + Revenue. For the second query I will return Region + Year + Number of Guests. Resort, Year, and Region are all dimension objects, of course, and Revenue and Number of Guests are my measures. Here are the raw results from each individual data provider. (I am using Web Intelligence XI 3.0 for this demonstration.)

I have one common dimension. I also have two dimensions that are unique, one from each data provider. Since both data providers come from the same universe, the common or shared dimension value (Year) has already been linked, as shown here.

Because the Year object was already merged, I can create a report block that contains Year, Revenue, and Number of Guests. I can also do local variable calculations like Revenue per Guest using simple division.

This is what happens when everything works.
What if something doesn’t work?
Mixing Dimensions Generates Unexpected Results
This block shows what happens when I add Resort to the block shown previously.

Notice anything interesting about the data? About the Number of Guests column specifically?
When the year repeats, the Number of Guests repeats. The Revenue measure seems fine. Before I explain this, let me exchange the Region object for the Resort object and show what happens.

In this case, the Revenue column is strange and the Number of Guests looks okay. To make matters even more confusing, here’s what I get if I apply a Sum() to the Revenue column after doing a break on the Year…

The sum on the footer is identical to the values displayed in the block! Any idea what’s going on? 
Measures Do Not Sum, They Project
The secret is that Web Intelligence does not sum a column of values like Excel does. A few weeks ago I discussed the difference between SQL Aggregation and Report Projection which helps explain this. Simply put: when a measure is combined with a set of dimension values in a block, the measure will project (roll up) based on the structure of the block. This is also called the context of the measure within the block. (Please note: The use of the word “context” in this case should not be confused with the concept of a context — set of joins — in a universe design.)
Because of the way I structured my queries, the Revenue context is determined by combinations of the Resort and Year objects. The context of the Number of Guests was set by Region and Year. The Year value is the only common dimension between the two data providers, which is why the block with only the Year dimension object showed valid data. However, if I include a dimension that is not part of a measure context, a problem occurs. When Region was added to the block the Revenue was wrong. When Resort was added the Number of Guests was wrong. In each case, the measure was out of context for at least one dimension.
To repeat: When I created the block with Year as the only dimension everything was fine. Each measure includes Year as part of its context. When I added Resort, the Revenue measure was still in context (and therefore showed the correct results) but the Number of Guests object was out of context. As a result, it used the only context it could (Year) and rolled up to that value, then repeated that value each time the year appeared on the block.
What about that weird behavior of the Sum() function? It turns out that a Sum() doesn’t really add up the column values. It does a projection instead. When the block included both Year and Region, the Revenue measure was valid for only the Year therefore it projected to that value. As a result, the measure value is repeated for each different Region in the body of the block as long as the Year is the same. Adding a break on Year created a footer for the block. Since the context for the footer was Year only, that made the Revenue projection (total) for the Year the same on the footer as it was for the body.
Unbalanced Dimensions
Now that I have been through all of that, let me finally explain what I mean by “unbalanced dimensions” as used in the title of this blog post. In my data providers I have Year as a common dimension and Resort hanging off of one side and Region hanging off of the other. Web Intelligence will prevent me from adding both Resort and Region to the block at the same time, but it will allow me to add one of them. Assuming I do that (add an extra dimension) I will have two dimensions from one data provider and one from the other. Two against one is unbalanced.
Resort, Year <-> Year
Year <-> Year, Region
It doesn’t matter which side the extra dimension comes from either.
If you have more than one data provider the following rules apply.
- All common dimension values should be linked
- Measure context is defined by the dimensions in their source data provider
- A block with linked dimensions and measures works fine
- Unlinked dimensions from one data provider can be used; others will be designated as incompatible
- Measures from the data provider that includes the extra unlinked dimension will be fine
- Measures from the other data provider will repeat for each unlinked dimension value because they are at least partially out of context
With all of these complications, Web Intelligence does what it can to show the data properly, even to the point of having the footer totals be correct despite the presense of an unlinked dimension. It’s better to manage your data providers to avoid this challenge, but it’s nice to know how things work if you can’t do that. There are other techniques that can help (like creating variables to demote a dimension to a detail object, if that action is appropriate) but this post is long enough already. 
22 | 3 |
Based on the conclusion I made earlier, I can use the number of weeks and multiply by five to get the number of week days included in the range. In order to get that number I make one small modification to the formula presented above; I multiply the result by five. The result:
Truncate(DaysBetween([Start Date]; [End Date]) / 7 ; 0) * 5
All I have left to deal with are the leftover of days that don’t make up a full week. The decimal part of 2.14 is 0.14 which represents one extra day in my range. In order to know whether to count that day or not, I have to know the name (or alternatively the number) of the day my range starts on.
How Many Extra Days
There is a function called DayNumberOfWeek(). It numbers Monday as 1 and Sunday as 7 so I’m going to pretend that my calendar looks like this:
M T W T F S S
1 2 3 4 5 6 7
1 2 3 4 5 6 7
What I need to get next is the number of days that are not a full week. I used the Truncate() command earlier to get the number of full weeks. It removes the decimal part of the division result. What I need now is the decimal part, and the Mod() function can be used to get it. This function does a division and returns the remainder. The first argument will be the total number of days, and the second argument will be seven as there are seven days in a week. The result will be a number that ranges from zero (0) to six (6) because those are the only possible remainder values when dividing by sevent.
For example, Mod(15; 7) = 1. 15 / 7 is 2 with 1 left over, therefore 1 is the return value from the Mod() function. To apply that to my current problem: 15 days is made up of two full weeks and one extra day. Twenty days would be Mod(20; 7) which results in 6 remaining days. Mod(21; 7) is zero because there are exactly three weeks and no extra days.
Can I put these two pieces of information together? I can easily get the number of the day in the week, and I know how many extra days there are. I should be able to do some math…
Categories: 2009 GBN - Dallas, 2010 Mastering ... Melbourne, Recursive Data, Universe Design Dave Rathbun @ 1:38 pm
This is the first of several posts that will review my presentation “Universe Models For Recursive Data” which was originally presented at the 2009 GBN conference, then at the North Texas / Oklahoma ASUG chapter meeting, and finally at the Mastering BusinessObjects conference in Melbourne. After presenting it three times it seemed like an appropriate time to (finally) get started writing up the blog posts. As with my other presentations there is a PDF file that can be downloaded from my conference presentations page.
This post will cover slides 6 through 13 as a basic introduction of recursive data and challenges presented to universe designers.
Defining Recursive Data
Sometimes there is confusion about the distinction between hierarchical and recursive data. Hierarchical data does not present a big challenge for BusinessObjects. It can be something related to time (Year, Quarter, Month, Day), geography (Country, Region, State, City), or something more specific like an accounting structure (Business Unit, Account, Sub-Account). What makes this hierarchical structure work easily is that each element is stored in a different place. It could be in a different column in the same table (flattened) or even in different tables (snowflake). As long as I can drill from one column to another in the hierarchy everything works fine.
Self-referencing or recursive data may initially look like a hierarchy. The key difference is that all of the elements are stored in the same place. There are keys that relate one row in a table back to a different row in the same table. That’s how recursive data is different from hierarchical data.
Why is recursion is a problem for BusinessObjects? The language used “behind the curtain” is SQL, and SQL does not natively support recursion. Some database vendors offer extensions (for example the CONNECT BY PRIOR structure in Oracle) but these are not used by BusinessObjects.
How common is recursive data? It is certainly not unusual. Consider any of the following:
- Company organizational structure
Object levels: President – Vice President – Director
Object type: Person - Inventory BOM (Bill of Materials)
Object levels: Product – Assembly – Sub-Assembly – Component
Object type: Inventory item - Project Management
Object levels: Project – Task – Sub-Task
Object type: Project entry - Multi-Level Marketing (MLM)
Object levels: Founder – Recruit – Recruit Level 2
Object type: Person
In each of the above examples the type of object (or node) type is the same at any level. For example, a company organization chart is made up of people. Some people are at different levels, and there are therefore relationships from one person to another. In order to show all of the relationships from the top of the company to the bottom (or the bottom to the top) I have to keep going back to the same table. That is recursion.
Because it’s easy to think about a company organizational structure I used that example for the rest of the presentation.
Note: The Motors database is used in the standard Universe Designer training course and will not be presented in its entirety in the download package for this presentation for copyright reasons. However, I will be providing the standard HR table and all of the modified versions used in this presentation.
Example of Recursive Data Using Prestige Motors HR
A picture will help at this point. Here is a screen shot from the Prestige Motors HR universe that I built for this presentation. Notice that there are two tables in the picture, but one is an alias of the other. In other words, I am really using the same table twice.

The table on the left is the Employees table. I have aliased the table and called it Manager. The two tables are joined using the link from EMPLOYEE.EMP_MGR_ID to Manager.EMP_ID. Since this is really the same table twice, this join defines the relationship from any particular person to their immediate manager. It’s a recursive relationship from a person to a person.
Notice that in this case I have defined the join as an outer (optional) join? That’s because the top person in the company does not have a manager, and the relationship would fail in that case. I want to ensure that I return every person and their manager… even if that person does not have a manager. Here is a sample of some of the data to help show why this is important.

I can review the relationships manually if I want. I can look at the data (shown above) and determine that Pickworth works for Noakes. Davis and Ferrerez also work for Noakes. How am I making that determination? Each of those three folks has a manager ID of 101, and 101 is the employee id for Noakes.
Who does Noakes work for? The EMP_MGR_ID column is blank (null) for Noakes, which implies that he is at the top of the company organization chart.
Another way to see where people fall in the organization chart is to look at their level. Here is output from a report that I eventually will want to generate from my recursive data. It is shown in the format of a tree, with each person showing up as a node on the tree.

Noakes is at level 1. Davis, Ferrerez, and Pickworth are all at level 2. But the tree does not stop there. I have employees at level 3 and level 4 as well.
Typical Recursion Questions
This brings me to the set of questions that I need to be able to answer with my recursive data. I need to know:
Who do I work for?
Who works for me?
Who works at my same level and shares the same manager?
Who is my manger’s manager? My manager’s manager’s manager?
What is the total salary of my direct reports (people who work directly for me)?
What is the total salary of my indirect reports (people who work for people who work for me)?
Who works for me?
Who works at my same level and shares the same manager?
Who is my manger’s manager? My manager’s manager’s manager?
What is the total salary of my direct reports (people who work directly for me)?
What is the total salary of my indirect reports (people who work for people who work for me)?
I am sure there are many more questions but these should serve as a starting point. Some of the questions only require one level of the hierarchy (who works for me, or who do I work for). Those are simple enough to answer, and in fact can be answered with the simple alias structure already shown in this post. But in order to traverse the tree for multiple levels I need a solution that is a but more robust.
Next Time
In the next post of this series I want to talk about some of the different challenges encountered when working with recursive data. Once I define the challenges I will be in a position to start talking about solutions. As a preview, here are the four types of hierarchies I will be talking about:
- Clean – a hierarchy with clean data, consistent node depths, and consistent node paths
- Unbalanced- a hierarchy with inconsistent node depths
- Ragged- a hierarchy with inconsistent node paths
- Lateral- a hierarchy with sideways node paths
If it is not clear what some of those mean, don’t be too concerned; I will be defining each with examples in the next post.
Finally, here is a preview of the various solutions I will talk about:
- Universe aliases
- Flattened structures (columns or snowflake tables)
- Ancestor / Descendant model
- Depth first tree traversal
And a few that I won’t:
- Oracle CONNECT BY PRIOR
- Stored procedures
Part II of this series will talk in more detail about each of the recursive challenges. After I detail the different challenges the next post will talk about the solutions. My plans for the final post for this series are to review the impact of each solution on the native drilling functionality and then to wrap things up.
Simulating Arrays
Web Intelligence does not have a concept of arrays. Because of that, most of the solutions I have seen proposed for a Web Intelligence solution involve a massive “If” statement at about this time. I’m not going to do that.
For one thing, the “If” statement would have to be nested in order to first check the starting day, and then check the number of days to be added. It might look something like this:
if Start Date = 'Mon' and Days to Add <= 5 then Days to Add else 5
else if Start Date = 'Tue' and Days to Add <= 4 then Days to Add else 4
else if ...
In theory, that would work. However, I'm going to go back to a very old technique and ressurect it here. It involves combining data into a single string and then using offset calculations to retrieve only the value I want. Let me go back and look at the "Days to Add" table I posted a few paragraphs back. The "Monday" line looked like this:
Starting Day | Days to Add | |||||||
Day Name | Day Number | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
Mon | 1 | 1 | 2 | 3 | 4 | 5 | 5 | 5 |
If I retrieve only the "days to add" values they are 1 2 3 4 5 5 5. If I smash them together I get this 1234555. Tuesday is 1234445. Wednesday is 1233345. If I put those three days together I get this:
123455512344451233345
Each set of values is exactly seven characters long, and will never be more or less than that amount. I know that for certain. I also know that there will be seven different strings of values for Monday through Sunday. I also know that the DayNumberofWeek() function returns a value of 1 for Monday and 7 for Sunday. Can I do anything with all of this?
Offset and Index Calculation
I have already detailed how to get the number of full weeks between two days using this:
Truncate(DaysBetween([Start Date]; [End Date]) / 7 ; 0) * 5
I know how many days are left to add using this formula:
Mod(DaysBetween([Start Date]; [End Date]); 7)[
I know which day my range starts on with this formula:
DayNumberOfWeek([Start Date])
To get the number of days to add, I propose this:
ToNumber(Substr("1234555123444512333451222345111234500123450123455"; ((DayNumberOfWeek([Start Date])-1)*7)+Mod(DaysBetween([Start Date];[End Date]);7) ; 1))
Which would then make the final calculation for Business Days Between look like this:
(Truncate(DaysBetween([Start Date]; [End Date]) / 7 ; 0) * 5) + ToNumber(Substr("1234555123444512333451222345111234500123450123455"; ((DayNumberOfWeek([Start Date])-1)*7)+Mod(DaysBetween([Start Date];[End Date]);7)+1 ; 1))
Does it work? Based on my testing, yes.
But how?
If anyone reading this ever wrote COBOL code (and is willing to admit it) then you might remember that data was often stored in a stream of characters. In order to be able to retrieve the desired value , the program would calculate an offset into the string and return a set number of characters. That's what I am doing here. Despite my best efforts (and the efforts of many folks on BOB) I have never seen a single simple formula that would properly return all of the correct values for the "extra days" portion of this challenge. Since it's easy enough to calculate all of the answers, that's what I have done. I know the answer to all 49 possible cases. The nasty-looking formula used above is using the day number and the number of days and calculating the offset into the string of possible answers to retrieve the one character value that is needed. That long string? It is made of of the 49 possible answers concatenated together. Think of it as an array, and the Substr() function is looking up the answer based on an array index.
It's not really an array, of course. But it works, and it keeps me from having to create a massive "If" statement to handle all 49 potential cases.
Conclusion
The formulas developed in this blog post operate strictly on a weekend / weekday basis. No attempt was made to incorporate holidays that occur during the week. Note, however, that this same approach can handle a business week that includes Saturday (six day week) quite easily. All I would have to do is modify the matrix (the array) that I am using to store the results of "days to add" and I am finished.
Would this be better done in the database? Certainly. Every project I have worked on that required this sort of logic had a calendar table with columns to support this calculation. But if you don't, this formula does seem to work.
What about leap years? They don't matter.
Everything done here is date math, and therefore would not be affected by months of different lengths. If we ever had a "leap week" then I would be in trouble.
Help Text Bug
As an aside, the help text for the DayNumberOfWeek() function in XIR2 is incorrect. It states:
As an aside, the help text for the DayNumberOfWeek() function in XIR2 is incorrect. It states:
Web Intelligence always treats Sunday as the first day of the week
Based on my testing this is incorrect. Sunday is always day 7, and Monday is always day 1. This factors into the math used in this post. If that ever changes, the formulas provided in this post will no longer work.
I have discussed the MultiCube() function a number of times in my series of Variables presentations. In certain situations it allows you to fix measure objects when you have two (or more) unbalanced data providers in a full-client document. ForceMerge() is the equivalent for Web Intelligence.
So how do they work, exactly? What do I mean by unbalanced data providers? How is it fixed with these functions? Are there any special conditions to be aware of?
Merged Data Provider Review
I recently published a blog post that described unbalanced data providers. If you want the full description, I have included a link at the end of this post. If you want the executive summary, here it is:
When you have two (or more) data providers in the same document and want to merge the data, you have to link them using dimension objects. If you don’t have the same number of dimensions from each data provider, then they are unbalanced. That can cause issues.
It took me twelve hundred words (and six pictures) to provide a detailed explanation in my earlier post.
In the sample report I used for that post I had one common dimension (Year) and one unique dimension from each side (Resort and Region). I showed how I could use Year with either measure since it was a shared (and linked) dimension.

I also showed how (and explained why) trying to use the “extra” dimensions like Region or Resort caused problems. I don’t intend to repeat everything here; please use the link at the end of this post to read the prior post if you are unclear on anything so far.
What About MultiCube()?
One of the comments I got on that post suggested that MultiCube() could be used to fix the issue. My response was accurate as far as I was concerned, but later when I read it again I realized it could also be quite confusing. Here’s what I said:
…the MultiCube() function will help if data providers are unbalanced “upwards” but not “downwards” …
After reviewing my response, I realized that it was quite likely that I was the only person that understood what I meant by that statement. While at times I do enjoy talking with myself
that’s not why I have this blog. So I am going to try to explain that a bit further.
The problem with trying to use MultiCube() or the Web Intelligence equivalent ForceMerge() to solve this specific case is that the data doesn’t support the solution. There is no hierarchy or relationship between Region and Resort. (Yes, they are both related via links to the fact table but that’s not what I mean.) Resort is a place. Region is an attribute of the customer. There is a many to many relationship between regions and resorts, and that sort of relationship is essentially worthless. If I can’t determine how to put things together, how are the numbers going to make sense? Neither of these functions can create a relationship from nothing.
Yet there are some cases where I can fix unbalanced data providers. I need to talk about the data first, then I need to create a new document with the proper structure in order to show how the ForceMerge() function works.
Hierarchical Data
Here is what I consider to be a really good example of why this is a problem. Most people think of time as a really clean hierarchy. On the surface the hierarchy is simple: Months go into quarters, and quarters go into years. Yet if I create a query with the Quarter and Revenue from the Island Resorts database here is what I get.

Take a look at the data and tell me how you would break out the quarter data into different years.
Go ahead and think a bit, I can wait.
One-Mississippi.
Two-Mississippi.
You can’t do it, can you?
You don’t have enough information. As defined, with the format “Q1″ and “Q2″ and so on, there is no indication as to which year the quarter revenues are coming from. You can’t “break down” the quarter by year with the information that you have. That is what I meant by data being unbalanced “downwards” in my comment on my prior post. I can’t break data down because the information simply isn’t there.
What I can do, however, is roll up or move “upwards” via a hierarchy. That’s where the ForceMerge() function comes into play.
Introducing the ForceMerge() Function
Here is what the help text says for the ForceMerge() function.
Forces Web Intelligence to account for synchronized dimensions in measure calculations when the synchronized dimensions do not appear in the calculation context of the measure
I will explain what this means, but first I am going to build my example. My document will have two data providers from the Island Resorts Marketing universe. My first data provider has dimension values Country and Resort and the Revenue measure. My second data provider has the dimension Resort and the measure Guests. I will link the two data providers by Resort as it is the only common dimension.
Data Provider 1


Data Provider 2


Data Provider Results


Merged Dimensions


From this point I will create a merged block using all of the measures and both dimensions. Here are the results from that experiment.

Everything is fine so far. The issue becomes evident when I remove the Resort object from the block. Remember that my two data providers are merged (synchronized) by Resort. When that object is no longer present in the block, the measures roll up. Revenue is still okay because the Country object was part of the query context of that data provider. The Guests measure fails because it is now being viewed out of context. (At the risk of becoming tedious, I will point out one last time that this issue is covered in much more detail in the first post in this series.)

I have created a problem. Next, I will explain how to fix it.
ForceMerge() In Action
All of the information I need is there in the document. It just does not show up in the block. Here once again is the help text for the ForceMerge() function:
Forces Web Intelligence to account for synchronized dimensions in measure calculations when the synchronized dimensions do not appear in the calculation context of the measure
Simply put, the function tells Web Intelligence to use all of the information from the cube, rather than being limited to what is shown in the block. I will create a new variable with this formula:
=ForceMerge([Number of guests])
I will replace my “broken” measure with my new variable and observe the results.

I haven’t changed my dimension linking. I haven’t changed my data providers. I didn’t even change the block structure, other than to replace the broken measure with my corrected calculation. If you were looking for something more complex, I’m afraid that’s all there is.
The complicated part is understanding just what the function does and when to use it, rather than how to use it. I am glad this function is now available in Web Intelligence starting with XI 3.x. In my opinion, it should have been there since merging data providers was an option.
Summary
The magic of the ForceMerge() function (or if using Desktop Intelligence the MultiCube() function) is that it changes the behavior of the report engine. Normally when dimension values are removed from a block they are taken out of context. By using one of these functions I can specify that linked dimensions still have to be considered by the measures even if those dimensions no longer appear in the block structure.
Even with this magic function in my arsenal, I cannot rely on it to fix every problem that I have. If the data can be “rolled up” by the calculation process then I am fine. If the data has to be “broken down” then I am in trouble, and these functions will not fix the issue. Each resort exists in exactly one country so I can move up the hierarchy from the resort link and the results are valid.
Special Note
If there are smart measures in a universe, then ForceMerge() will not work. In order to calculate the results correctly, smart measures (using the “database delegated” projection function) need to have the block structure in place to define the query. According to the documentation, the ForceMerge() function will return a #MULTIVALUE error if used in this situation.
If there are smart measures in a universe, then ForceMerge() will not work. In order to calculate the results correctly, smart measures (using the “database delegated” projection function) need to have the block structure in place to define the query. According to the documentation, the ForceMerge() function will return a #MULTIVALUE error if used in this situation.
One of the more powerful features of Web Intelligence (Desktop Intelligence as well) is the ability to combine data from more than one source into one document. This is done by creating multiple data providers, linking them up, and using the combined results in a report block. However, everything has to work out just right. Sometimes it doesn’t. 
Merging Data Providers in Web Intelligence
Dimensions are the “keys” of a data provider. When two database tables are linked in a universe, certain columns are used to join them together. Dimensions serve this role in linking two data providers. When two (or more) data providers are built in Web Intelligence from the same universe they are automatically linked together using any common dimension objects. What happens when two different data sources are used? In that case, the report writer has to merge the information manually. And if the dimensions from each data provider are not identical, the report has what I call “unbalanced” data providers. That’s what I want to describe in this blog post.
For my sample I will join two data providers from the Island Resorts Marketing universe. I am going to structure my queries in such a way that they will result in an unbalanced scenario. For my first query I will return Resort + Year + Revenue. For the second query I will return Region + Year + Number of Guests. Resort, Year, and Region are all dimension objects, of course, and Revenue and Number of Guests are my measures. Here are the raw results from each individual data provider. (I am using Web Intelligence XI 3.0 for this demonstration.)

I have one common dimension. I also have two dimensions that are unique, one from each data provider. Since both data providers come from the same universe, the common or shared dimension value (Year) has already been linked, as shown here.

Because the Year object was already merged, I can create a report block that contains Year, Revenue, and Number of Guests. I can also do local variable calculations like Revenue per Guest using simple division.

This is what happens when everything works.
What if something doesn’t work?
Mixing Dimensions Generates Unexpected Results
This block shows what happens when I add Resort to the block shown previously.

Notice anything interesting about the data? About the Number of Guests column specifically?
When the year repeats, the Number of Guests repeats. The Revenue measure seems fine. Before I explain this, let me exchange the Region object for the Resort object and show what happens.

In this case, the Revenue column is strange and the Number of Guests looks okay. To make matters even more confusing, here’s what I get if I apply a Sum() to the Revenue column after doing a break on the Year…

The sum on the footer is identical to the values displayed in the block! Any idea what’s going on? 
Measures Do Not Sum, They Project
The secret is that Web Intelligence does not sum a column of values like Excel does. A few weeks ago I discussed the difference between SQL Aggregation and Report Projection which helps explain this. Simply put: when a measure is combined with a set of dimension values in a block, the measure will project (roll up) based on the structure of the block. This is also called the context of the measure within the block. (Please note: The use of the word “context” in this case should not be confused with the concept of a context — set of joins — in a universe design.)
Because of the way I structured my queries, the Revenue context is determined by combinations of the Resort and Year objects. The context of the Number of Guests was set by Region and Year. The Year value is the only common dimension between the two data providers, which is why the block with only the Year dimension object showed valid data. However, if I include a dimension that is not part of a measure context, a problem occurs. When Region was added to the block the Revenue was wrong. When Resort was added the Number of Guests was wrong. In each case, the measure was out of context for at least one dimension.
To repeat: When I created the block with Year as the only dimension everything was fine. Each measure includes Year as part of its context. When I added Resort, the Revenue measure was still in context (and therefore showed the correct results) but the Number of Guests object was out of context. As a result, it used the only context it could (Year) and rolled up to that value, then repeated that value each time the year appeared on the block.
What about that weird behavior of the Sum() function? It turns out that a Sum() doesn’t really add up the column values. It does a projection instead. When the block included both Year and Region, the Revenue measure was valid for only the Year therefore it projected to that value. As a result, the measure value is repeated for each different Region in the body of the block as long as the Year is the same. Adding a break on Year created a footer for the block. Since the context for the footer was Year only, that made the Revenue projection (total) for the Year the same on the footer as it was for the body.
Unbalanced Dimensions
Now that I have been through all of that, let me finally explain what I mean by “unbalanced dimensions” as used in the title of this blog post. In my data providers I have Year as a common dimension and Resort hanging off of one side and Region hanging off of the other. Web Intelligence will prevent me from adding both Resort and Region to the block at the same time, but it will allow me to add one of them. Assuming I do that (add an extra dimension) I will have two dimensions from one data provider and one from the other. Two against one is unbalanced.
Resort, Year <-> Year
Year <-> Year, Region
It doesn’t matter which side the extra dimension comes from either.
If you have more than one data provider the following rules apply.
- All common dimension values should be linked
- Measure context is defined by the dimensions in their source data provider
- A block with linked dimensions and measures works fine
No comments:
Post a Comment