Enabling 70% Reduction in MTTR for a Global IT Company
About the customer
The customer is a leading IT services, consulting, and business solutions organization and one of India’s largest employers. It has been recognized for its employee management focus, powered by a bedrock of powerful workplace technology. Its HR operations are responsible for the 300,000 processes catering to a massive workforce. Any delays in month-end operations would severely impact employee satisfaction as well as business performance. Due to continuous scaling and a steadily growing workforce, the employee portal was witnessing sub-par performance, holding back productivity.
The customer is recognized as a preferred employer in India, with a large distributed workforce spread across the globe. Maintaining seamless employee management operations (even at peak periods) was a major differentiator for the company, influencing workforce engagement and productivity. However, the customer was facing frequent downtimes, on its employee portal towards the end of every month. High traffic volumes from 300,000 internal processes caused several unknown issues which were difficult to resolve. As a result, the mean-time to repair was over two days, delaying critical HR operations. The IT company required a robust IT operations management solution that could monitor, manage, and prevent failure causes.
Here’s what we did:
- Consolidated data from a variety of data sources (infra and software)
- Delivered Operational Intelligence across the application, integrated APIs, and underlying infra.
- Simplified Infrastructure Monitoring to maintain core application health
- Setup an investigation platform for ad-hoc investigation into session issues and performance degradation.
SmartCirqls deployed an end-to-end IT operations management solution which covered data sources across weblogs, workflows, audits, and network devices.
We consolidated data from Apache Weblogs, Windows systems, Cisco devices (routers, switches, and firewalls), Oracle audit logs, and workflow logs into a ‘Single Pane of Glass’ view to gain a holistic understanding of operational issues. This data was processed via powerful Operational Intelligence use cases to reveal key insights.
The platform helped to identify root cause of failures, as well as error recurrence causes. Further, all of these were promptly redressed via deep dive investigation, and remediation measures were taken to ensure optimal application performance even during peak periods. Finally, the customer could access employee-specific data on portal usage, process performance, and other operational details to ensure that the business KPIs were delivered within their highest standards of SLA.
- Frequent failures during peak periods
- High transaction volume for 300,000 users
- Delayed mean time to repair – over 48 hours
- Unknown issues and protracted downtimes at month-end
- Multiple disparate data sources
- Low visibility into system health and root cause of failures
- Accelerated troubleshooting
- 70% less efforts spent in root cause analysis
- Faster mean-time-to-repair – 70% improvement
- Freeing up of under-utilized servers via process optimization
- Real-time views of workflow trackers and employee status
- Recurring issue identification and resolution leading to application stability
To know how SmartCirqls can help your organization optimize processes through the power of Machine Data Analytics, contact us at [email protected]