Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Data Dashboard challenge in Notes - Better sometimes, than using an RDB

By Andrew Pollack on 07/19/2008 at 10:50 AM EDT

This challenge is about dealing with a significant volume of raw data, and reporting on in a useful way. The fundamental difference in working with data in Notes versus working with it in a relational database, is that while typically an RDB can handle more raw data, viewing that raw data through reports only gives you exactly what you ask for. Notes on the other hand presents data in "Views" which are much more organic -- particularly if you've designed them well. You don't just see what you're asking for, you see the context in which that data exists. To do that in an RDB you start having to look at very expensive data mining tools.

The Challenge
Second Signal Logo
It's time to set up a monitoring dashboard for Second Signal. The system has gotten busy enough that I want to start monitoring the total call volume at any given time across departments, the average calls per department, and the range of call duration across users.

This tool will allow me to monitor telephony use by department, by user, and by concurrency -- that is to say, at any moment, how many channels are open to the system from the telephone world at the same time. As Second Signal scales, the server costs on a per department basis really don't increase very much. There is a terrific economy of scale there. The telephony side, however, does tend to increase in line with the number of users up to a certain point. Once that tipping point is reached however, I can move to a different model of purchasing inbound telephony that is much more effectively pooled. This tool will help me move toward that point.

Asterisk, the telephony switch I'm using, produces excellent logging data as you would expect. One option I have is to simply parse that logging data to get the statistics I need. There are also API calls that can query the switch in real time to see things like call volume and status. I want the data a little differently however, so I'm doing some logging of my own as part of the call scripting. That's not really an issue. The issue is storing that raw data in a way that is most useful for producing the reports and statistics that I want.

Deciding between Notes and an RDB

Relational Databases are generally better had handling large volumes of structured data, while Notes is generally better at storing "Information" -- which is data that has already been processed into something meaningful to people. There is a myth that a database is going to be faster for storing data. That's only partly true. Storing the data isn't hard at all. Retrieving it in a meaningful way is where the performance gets more interesting. With an RDB you have to construct reports fed by queries. Those queries take longer and are harder on the server based on their complexity. In Notes, the query and report work is built into the overall performance because most people consider it all in the same scope. Its all in the "Notes Database" so its all one thing.

Where it makes sense to store data in Notes is when you can know in advance the kind of reporting and visualization you're going to want out of the system and you can pre-build that into the storage mechanism. If I just stored each phone call as a unique entity in Notes, it would end up being a very large volume of data that wasn't well suited to giving me the kinds of reports I want. In that case, an RDB would be the better choice. If I can find a way to store the data in Notes in a way that is meaningful to people and can drive views easily, then Notes will be the more effective storage choice.

Determine what you want to store

The first thing you want to do if you're going to be storing data in Notes, is to think about what aspect of that data contains the core value yet will be the hardest to bring to together in a single view entry. In my case, it isn't the specifics of a single phone call. It turns out that the single phone call record actually holds almost no value to me.

What matters to me is concurrency. In other words, what was going on at a given moment. I want to be able to see how busy the system was at any given moment and what the trend of activity is over time and over certain periods of time. By coming to that conclusion, I've just identified the data records I want to store. I want to store "Concurrency Records". I want one record which will show me the concurrency for a specific moment.

Determine the granularity to store

With concurrency, you have a tricky problem. How long a period are you watching for each measure of concurrent activity? If you stored individual call records with a start and end time, good reporting tools could give you a curve rather than a numeric chart but that's not necessary in this case. Again, it goes back to what matters to my particular use case. That's billing. Telephony billing is generally done is six second increments -- one tenth of a minute each. That means if I want meaningful data, I need it to be stored at the level of "Concurrent Activity per Six Second Interval". This will give me a view on data which tells me how busy the system was during each period for which I have to shell out money.

Fit the data storage to the platform

The most obvious way to store "Activity per Six Second Interval" into Notes would be to use a single document for each interval. Unfortunately, that would mean 5,256,000 records per year or nearly half a million documents per month. That's not a very good model for Notes, though its feasible in newer versions. Reporting on that kind of Notes database would be cumbersome. The unstructured nature of Notes really helps with that. First of all, I don't need to store any records for time periods containing no activity. Second, I don't have to store a single Note for each interval. I considered storing one document per hour but that isn't granular enough. I end up with very few documents ( 8760 per year ) but too many fields on each document.

What I decided to do is to store one document per minute with one field on the document for each period within that minute. In other words, 10 fields. This balances the overall document count (about half a million per year) with the ease of reporting I need. If each field stores the concurrency data (i.e. 5 active calls) for each 1/10 of a minute, reporting is easy and document volume isn't excessive.

Make the most out of the data you store

Before I got started writing, however, there was one more thing I wanted to consider. I need to know not just how many calls were happening during each interval, but I wanted to know who was making them. I want to be able to report on which department was using the system and also within the departments which telephone numbers were calling in. In a relational database, I would do this with a complex "Join" in the query. In Notes, I'm taking advantage of the ability to pre-build that data and keep it available. To do that, I decided to create three fields for each interval. One field would be a list of the active calls each department had open at the time. The second field would be a list of each callerid that was active during that interval. The third would be a simple count of the total number of active calls at that time. The third would is essentially just an "@Elements()" function of either of the other two fields. That could be done just as easily in a view column, however by storing it on the record I avoid that calculation for every document in every view every time the view is rebuilt. Since the data won't change, it is much better to pre-build it and store it on the document.

The resulting document fields

Interval Form One Document Per Minute
 
IntervalConcurrencyDepartmentsCallers
 
EXAMPLE3DEPT0001
DEPT0002
DEPT0003
555-555-1001
555-555-1002
555-555-1003
Document Fields
 
0-5 SecondsBlock_0_Total ConcurrencyBlock_0_DepartmentsBlock_0_CallerIds
6-11 SecondsBlock_1_Total ConcurrencyBlock_1_DepartmentsBlock_1_CallerIds
12-17 SecondsBlock_2_Total ConcurrencyBlock_2_DepartmentsBlock_2_CallerIds
18-23 SecondsBlock_3_Total ConcurrencyBlock_3_DepartmentsBlock_3_CallerIds
24-29 SecondsBlock_4_Total ConcurrencyBlock_4_DepartmentsBlock_4_CallerIds
30-35 SecondsBlock_5_Total ConcurrencyBlock_5_DepartmentsBlock_5_CallerIds
36-41 SecondsBlock_6_Total ConcurrencyBlock_6_DepartmentsBlock_6_CallerIds
42-47 SecondsBlock_7_Total ConcurrencyBlock_7_DepartmentsBlock_7_CallerIds
48-53 SecondsBlock_8_Total ConcurrencyBlock_8_DepartmentsBlock_8_CallerIds
54-59 SecondsBlock_9_Total ConcurrencyBlock_9_DepartmentsBlock_9_CallerIds


The documents will also have a few other fields to make views easier to create. For example, they will have a "TimePeriod" field which will be the date and time which is covered by the document. They'll also have that time broken down in the "Year, Month, Day, Hour, and Minute" fields which are numeric, and also MonthName and DayName text fields.

That's seven fields which could easily be calculated in view columns or selection formulas. By storing them on the document, it amounts to hundreds of thousands of formula iterations saved on each view re-index or database scan (in the case of view selection formulas).

The result of all this analysis and planning is that I'll have a data record set which exactly matches my reporting needs and is very easy for the system to keep indexed. All of this is possible without the cost and complexity of using an external relational database.

Getting the most out of a platform comes from really understanding three things. The platform, The Data, and the Desired Result. Without a full and careful examination of all three, you won't get as good a result.


There are  - loading -  comments....



Other Recent Stories...

  1. 05/05/2016Is the growing social-sourced economy the modern back door into socialism?Is the growing social-sourced economy the modern back door into socialism? I read a really insightful post a couple of days ago that suggested the use of social network funding sites like “Go Fund Me” and “Kickstarter” have come about and gained popularity in part because the existing economy in no longer serving its purpose for anyone who isn’t already wealthy. Have the traditional ways to get new ventures funded become closed to all but a few who aren’t already connected to them and so onerous as to make ...... 
  2. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertisingAn increasing number of websites are now detecting when users have ad-blocking enabled, and refuse to show content unless you "whitelist" their site (disable your ad-blocking for them). I think that is a fair decision on their part, it's how they pay for the site. However, if you want me (and many others) to white list your site, there are some rules you should follow. If you violate these rules, I won't whitelist your site, I'll just find content elsewhere. 1. The total space taken up by advertisements ...... 
  3. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction[] “The Expanse” is a new science fiction series being broadcast onthe Syfy channelthis winter. It’s closely based on a series of books by author James S. A. Corey beginning with “Leviathan Wakes”. There are 5 books in the “Expanse” series so far. If you’re a fan of the novels you’ll appreciate how closely the books are followed.TIP: The first five episodes are already available on Syfy.com. If you’re having trouble getting into the characters and plot, use those to get up to speed.The worlds created for ...... 
  4. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  5. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  6. 08/06/2015The Killer of Orphans (Orphan Documents) 
  7. 06/02/2015Homeopathic Marketing: Traveler on my Android is now calling itself VERSE. Allow me to translate that for the IBM Notes community... 
  8. 03/17/2015A review of British Airways Premium Economy Service – How to destroy customer goodwill all at once 
  9. 02/26/2015There's a bug in how @TextToTime() and @ToTime() process date strings related to international standards and browser settings. 
  10. 01/21/2015Delivering two new presentations at Developer Camp (EntwicklerCamp) 2015 in Germany 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.