Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Data Dashboard challenge in Notes - Better sometimes, than using an RDB

By Andrew Pollack on 07/19/2008 at 10:50 AM EDT

This challenge is about dealing with a significant volume of raw data, and reporting on in a useful way. The fundamental difference in working with data in Notes versus working with it in a relational database, is that while typically an RDB can handle more raw data, viewing that raw data through reports only gives you exactly what you ask for. Notes on the other hand presents data in "Views" which are much more organic -- particularly if you've designed them well. You don't just see what you're asking for, you see the context in which that data exists. To do that in an RDB you start having to look at very expensive data mining tools.

The Challenge
Second Signal Logo
It's time to set up a monitoring dashboard for Second Signal. The system has gotten busy enough that I want to start monitoring the total call volume at any given time across departments, the average calls per department, and the range of call duration across users.

This tool will allow me to monitor telephony use by department, by user, and by concurrency -- that is to say, at any moment, how many channels are open to the system from the telephone world at the same time. As Second Signal scales, the server costs on a per department basis really don't increase very much. There is a terrific economy of scale there. The telephony side, however, does tend to increase in line with the number of users up to a certain point. Once that tipping point is reached however, I can move to a different model of purchasing inbound telephony that is much more effectively pooled. This tool will help me move toward that point.

Asterisk, the telephony switch I'm using, produces excellent logging data as you would expect. One option I have is to simply parse that logging data to get the statistics I need. There are also API calls that can query the switch in real time to see things like call volume and status. I want the data a little differently however, so I'm doing some logging of my own as part of the call scripting. That's not really an issue. The issue is storing that raw data in a way that is most useful for producing the reports and statistics that I want.

Deciding between Notes and an RDB

Relational Databases are generally better had handling large volumes of structured data, while Notes is generally better at storing "Information" -- which is data that has already been processed into something meaningful to people. There is a myth that a database is going to be faster for storing data. That's only partly true. Storing the data isn't hard at all. Retrieving it in a meaningful way is where the performance gets more interesting. With an RDB you have to construct reports fed by queries. Those queries take longer and are harder on the server based on their complexity. In Notes, the query and report work is built into the overall performance because most people consider it all in the same scope. Its all in the "Notes Database" so its all one thing.

Where it makes sense to store data in Notes is when you can know in advance the kind of reporting and visualization you're going to want out of the system and you can pre-build that into the storage mechanism. If I just stored each phone call as a unique entity in Notes, it would end up being a very large volume of data that wasn't well suited to giving me the kinds of reports I want. In that case, an RDB would be the better choice. If I can find a way to store the data in Notes in a way that is meaningful to people and can drive views easily, then Notes will be the more effective storage choice.

Determine what you want to store

The first thing you want to do if you're going to be storing data in Notes, is to think about what aspect of that data contains the core value yet will be the hardest to bring to together in a single view entry. In my case, it isn't the specifics of a single phone call. It turns out that the single phone call record actually holds almost no value to me.

What matters to me is concurrency. In other words, what was going on at a given moment. I want to be able to see how busy the system was at any given moment and what the trend of activity is over time and over certain periods of time. By coming to that conclusion, I've just identified the data records I want to store. I want to store "Concurrency Records". I want one record which will show me the concurrency for a specific moment.

Determine the granularity to store

With concurrency, you have a tricky problem. How long a period are you watching for each measure of concurrent activity? If you stored individual call records with a start and end time, good reporting tools could give you a curve rather than a numeric chart but that's not necessary in this case. Again, it goes back to what matters to my particular use case. That's billing. Telephony billing is generally done is six second increments -- one tenth of a minute each. That means if I want meaningful data, I need it to be stored at the level of "Concurrent Activity per Six Second Interval". This will give me a view on data which tells me how busy the system was during each period for which I have to shell out money.

Fit the data storage to the platform

The most obvious way to store "Activity per Six Second Interval" into Notes would be to use a single document for each interval. Unfortunately, that would mean 5,256,000 records per year or nearly half a million documents per month. That's not a very good model for Notes, though its feasible in newer versions. Reporting on that kind of Notes database would be cumbersome. The unstructured nature of Notes really helps with that. First of all, I don't need to store any records for time periods containing no activity. Second, I don't have to store a single Note for each interval. I considered storing one document per hour but that isn't granular enough. I end up with very few documents ( 8760 per year ) but too many fields on each document.

What I decided to do is to store one document per minute with one field on the document for each period within that minute. In other words, 10 fields. This balances the overall document count (about half a million per year) with the ease of reporting I need. If each field stores the concurrency data (i.e. 5 active calls) for each 1/10 of a minute, reporting is easy and document volume isn't excessive.

Make the most out of the data you store

Before I got started writing, however, there was one more thing I wanted to consider. I need to know not just how many calls were happening during each interval, but I wanted to know who was making them. I want to be able to report on which department was using the system and also within the departments which telephone numbers were calling in. In a relational database, I would do this with a complex "Join" in the query. In Notes, I'm taking advantage of the ability to pre-build that data and keep it available. To do that, I decided to create three fields for each interval. One field would be a list of the active calls each department had open at the time. The second field would be a list of each callerid that was active during that interval. The third would be a simple count of the total number of active calls at that time. The third would is essentially just an "@Elements()" function of either of the other two fields. That could be done just as easily in a view column, however by storing it on the record I avoid that calculation for every document in every view every time the view is rebuilt. Since the data won't change, it is much better to pre-build it and store it on the document.

The resulting document fields

Interval Form One Document Per Minute
Document Fields
0-5 SecondsBlock_0_Total ConcurrencyBlock_0_DepartmentsBlock_0_CallerIds
6-11 SecondsBlock_1_Total ConcurrencyBlock_1_DepartmentsBlock_1_CallerIds
12-17 SecondsBlock_2_Total ConcurrencyBlock_2_DepartmentsBlock_2_CallerIds
18-23 SecondsBlock_3_Total ConcurrencyBlock_3_DepartmentsBlock_3_CallerIds
24-29 SecondsBlock_4_Total ConcurrencyBlock_4_DepartmentsBlock_4_CallerIds
30-35 SecondsBlock_5_Total ConcurrencyBlock_5_DepartmentsBlock_5_CallerIds
36-41 SecondsBlock_6_Total ConcurrencyBlock_6_DepartmentsBlock_6_CallerIds
42-47 SecondsBlock_7_Total ConcurrencyBlock_7_DepartmentsBlock_7_CallerIds
48-53 SecondsBlock_8_Total ConcurrencyBlock_8_DepartmentsBlock_8_CallerIds
54-59 SecondsBlock_9_Total ConcurrencyBlock_9_DepartmentsBlock_9_CallerIds

The documents will also have a few other fields to make views easier to create. For example, they will have a "TimePeriod" field which will be the date and time which is covered by the document. They'll also have that time broken down in the "Year, Month, Day, Hour, and Minute" fields which are numeric, and also MonthName and DayName text fields.

That's seven fields which could easily be calculated in view columns or selection formulas. By storing them on the document, it amounts to hundreds of thousands of formula iterations saved on each view re-index or database scan (in the case of view selection formulas).

The result of all this analysis and planning is that I'll have a data record set which exactly matches my reporting needs and is very easy for the system to keep indexed. All of this is possible without the cost and complexity of using an external relational database.

Getting the most out of a platform comes from really understanding three things. The platform, The Data, and the Desired Result. Without a full and careful examination of all three, you won't get as good a result.

There are  - loading -  comments....

Other Recent Stories...

  1. 01/26/2023Better Running VirtualBox or VMWARE Virtual Machines on Windows 10+ Forgive me, Reader, for I have sinned. I has been nearly 3 years since my last blog entry. The truth is, I haven't had much to say that was worthy of more than a basic social media post -- until today. For my current work, I was assigned a new laptop. It's a real powerhouse machine with 14 processor cores and 64 gigs of ram. It should be perfect for running my development environment in a virtual machine, but it wasn't. VirtualBox was barely starting, and no matter how many features I turned off, it could ...... 
  2. 04/04/2020How many Ventilators for the price of those tanks the Pentagon didn't even want?This goes WAY beyond Trump or Obama. This is decades of poor planning and poor use of funds. Certainly it should have been addressed in the Trump, Obama, Bush, Clinton, Bush, and Reagan administrations -- all of which were well aware of the implications of a pandemic. I want a military prepared to help us, not just hurt other people. As an American I expect that with the ridiculous funding of our military might, we are prepared for damn near everything. Not just killing people and breaking things, but ...... 
  3. 01/28/2020Copyright Troll WarningThere's a copyright troll firm that has automated reverse-image searches and goes around looking for any posted images that they can make a quick copyright claim on. This is not quite a scam because it's technically legal, but it's run very much like a scam. This company works with a few "clients" that have vast repositories of copyrighted images. The trolls do a reverse web search on those images looking for hits. When they find one on a site that looks like someone they can scare, they work it like ...... 
  4. 03/26/2019Undestanding how OAUTH scopes will bring the concept of APPS to your Domino server 
  5. 02/05/2019Toro Yard Equipment - Not really a premium brand as far as I am concerned 
  6. 10/08/2018Will you be at the NYC Launch Event for HCL Domino v10 -- Find me! 
  7. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects.  
  8. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino? 
  9. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back. 
  10. 02/15/2018Andrew’s Proposed Gun Laws 
Click here for more articles.....

pen icon Comment Entry
Your Name
*Your Email
* Your email address is required, but not displayed.
Your thoughts....
Remember Me  

Please wait while your document is saved.