The traditional telco (and OSS) ran at different speeds. Some tasks had to happen immediately (e.g. customers calling one another) while others took time (e.g. getting a connection to a customer’s home, which included designs, approvals, builds, etc), often weeks.
Our OSS have processes that must happen sequentially and expediently. They also have processes that must wait for dependencies, conditional events and time delays. Some roles need “fast,” others can cope with “slow.” Who wins out in this dilemma?
Even the data we rely on can transact at different speeds. For capacity planning, we’re generally interested in longer-term data. We don’t have to process at real-time. Therefore we can choose to batch process at longer cycle times and with summarised data sets. For network assurance, we’re generally interested in getting data as quick as is viable.
Today’s post is about that word, viable, and pragmatism we sometimes have to apply to our OSS.
For example, if our operations teams want to reduce network performance poll cycles from every 15 mins down to once a minute, we increase the amount of data to process by 15x. That means our data storage costs go up by 15x (assuming a flat-rate cost structure applies). The other hidden cost is that our compute and network costs also go up because we have to transfer and process 15x as much data.
The trade-off we have to make in responses to this rapid escalation of cost (when going from 15 to 1 min) is in the benefits we might derive. Can we avoid SLA (Service Level Agreement) breach costs? Can we avoid costly outages? Can we avoid damage to equipment? Can we reduce the risk of losing our carrier license?
The other question is whether our operators actually have the ability to respond to 15x as much data. Do we have enough people to respond at an increased cycle time? Do we have OSS tools that are capable of filtering what’s important and disregarding “background” activity? Do we have OSS tools that are capable of learning from every single metric (e.g. AI), at volumes the human brain could never cope with?
Does it make sense that we have a single platform for handling fast and slow processes? For example, do we use the same platform to process 1 minute-cycle performance data for long-term planning (batch-processed once daily) and quick-fire assurance (processed as fast as possible)?
If we stick to one platform, can our OSS apply data reduction techniques (eg selective discard of records) to get the benefits of speed, but with the cost reduction of slow?