Blog Post

Rethinking Ecommerce Performance Management

Published
March 10, 2020
#
 mins read
By 

in this blog post

Complex dynamic applications drive online businesses and application performance has a direct impact on business KPIs as it determines end-user experience. Performance monitoring tools play a crucial role is shaping end-user experience. These tools have evolved with performance management strategies. Application availability, reachability, reliability and performance are the central pillars of digital end-user experience monitoring.

The evolution of performance management has pushed the need for a more proactive monitoring mindset, especially when preparing for high-traffic events.

Ecommerce Performance Monitoring

The digital landscape is evolving continuously – everything from the way applications are built, the infrastructure to the skills involved in building, deploying and maintaining applications. Applications are no longer monolithic, it has transitioned to an architecture that relies on microservices as it provides a lot of flexibility in building and deploying applications. Teams can work exclusively on specific features within the application without impacting the overall application.

IT infrastructure has also evolved to support this new breed of applications. It has moved from on-premise to a multi-cloud, multi-CDN or even a hybrid model. Edge computing is the next level in this evolution process moving the spotlight to edge content delivery, networking, security and edge performance monitoring.

Even with all these changes, delivering a great customer experience remains the focus and application performance is central to maintaining the customer experience. The definition of good end-user experience has also changed over the last few decades. There was a time when a page that loads in 10 seconds was acceptable. Ecommerce giants like Amazon and others have redefined customer experience and now the acceptable page load time is below 2 seconds.

Considering the current highly distributed and complex architecture, there is a pressing need to rethink performance monitoring to provide insightful data and analysis. Performance monitoring is crucial as it –

  1. Mitigates the impact on revenue
  2. Every 1 second of performance improvement increases conversions by 2%
  3. Every 100 ms of performance improvement, grew incremental revenue by up to 1%
  4. SEO benefits for entry pages and reduce bounces
  5. Protects your brand value
  6. Downtime costs $8,000/minute – roughly $800,000 per incident
  7. Downtime can have a significant impact on brand value
  8. Saves IT productivity
  9. IT spends less time firefighting performance issues
  10. You can focus better building, deploying and marketing your products and services.

Preparing for High-Traffic Events

Before the event

When prepping for a peak event such as Black Friday, there are six important phases to consider:

1. Building effective strategies: Consider a multi-tenant application architecture that is resilient and delivers better performance. Invest in caching and failover strategies.

2. Caching: (i) Cache offload to CDNs (i) Improve Cache hit on CDN and tiered cache proxies (iii) Revisit cache-busting scripts.

3. Failover Implementations: (i) DC/Cloud-based delayed failovers (ii) Application level failovers (iii) Improved customer experience via wait room implementations.

4. Preparation and testing: Stress test the application to understand how performance varies with different amount of traffic to identify bottlenecks in the application. This involves testing all third-party services including the CDN provider as well as the monitoring tools you use.

5. Implement performance monitoring: Single pane simplified dashboards that give a clear picture of the entire delivery chain. Log aggregation intervals must be consistent across different layers in the infrastructure.

6. Alert configuration: Identify and set up relevant alert types, alert severity and map alerts to the right team to ensure it is addressed immediately.

During the Event

During the event, ensure you have a support team on call and ready to take action when needed. You must discuss guidelines defining escalation policies as part of the prep. This will make it easier for different teams to communicate any critical issues without delay. There should also be an effective plan of action in place to handle a crisis.

After the Event

Once the event is over, it is important to conduct a retrospective analysis of the performance data and incidents during the event. This helps to understand –

  • Did everything go according to plan?
  • What could have been done better?
  • How did the infrastructure handle the traffic and load behaved?
  • How does the stress test data aggregated during the event prep compare to the post even data?

The performance data can also be used to benchmark different metrics that will help you prepare better for the next peak event.

Proactive Monitoring for Improved Performance

There are multiple third-party infrastructure and service providers in the industry. The adoption of services such as multi-DNS, multi- and hybrid-cloud, multi-CDN, and others mean that when everything is operating smoothly, end users are getting content and services delivered to them faster than ever before.

However, these developments in architecture and digital delivery come with a cost. Every additional layer in the delivery chain adds complexity, introduces visibility gaps, and reduces these teams’ ability to understand how infrastructure health is affecting the end-user experience. This means that whenever there is a disruption in the infrastructure delivery chain, IT teams are often left scrambling to identify the root cause of the problem.

Proactive monitoring essentially eliminates the blindspots created by all the different components in the delivery chain. Root cause analysis is easier as the IT teams can correlate and analyse data effectively. The performance data will help identify and resolve bottlenecks easily. Third-party integrations can be monitored, and you can hold service providers accountable for any SLA breaches. Proactive monitoring is especially useful during A/B testing as you can evaluate the performance of each component.

Tips for Improved Performance Monitoring

Proactive monitoring is a must when preparing for high traffic events. We suggest a five-step process for effective and improved performance monitoring:

1. Measure everything: Latency or downtime can be introduced at any layer in the application. Critical endpoints, microservices and tag management tools are all potential bottlenecks. Monitor every single component including every third-party so there is end-to-end performance visibility. Measuring real user performance is recommended along with synthetic to help correlate performance trends and understand user behavior.

2. Benchmark: Benchmarking is essential to performance monitoring. It helps you understand industry best practices. You can evaluate multiple service providers and identify those with ideal performance. The trends from benchmarking provides interesting insights to help improve performance of your application.

3. Establish a baseline: A performance baseline is the expected performance of an application/service under certain conditions. With this information, we can determine:

- Expected performance when there is a surge in traffic.

- How to scale our application and services.

- How a new version of the application/service is performing compared to a previous version.

By baselining data, we will learn to:

- Look beyond averages and understand percentiles.

- Look at historical data and analyse trends.

4. Identify optimization areas: There are hundreds of performance metrics but measuring every single metric does not help. Each performance scenario calls for a set of metrics relevant to that scenario. It is easier to understand and correlate the data without having to pore through unnecessary information. So, identify areas that need optimization, focus on the optimization methodology while picking only the metrics that matte

5. Tie to business KPIs: When trying to improve performance, start with the business KPIs, look at historical data trends/patterns and then the metrics that impact these KPIs. You can then focus on generating performance budgets to build process that ensure a focus on performance across the project lifecycle.

These five points are essential when prepping for a peak event to ensure great end-user experience. We must remember that no matter how great the tools are, they’ll count for little if organizations don’t have visibility into the health and reliability of each of the pieces that make the whole application.

To conclude, we believe that performance management must be viewed as a year-round priority and the performance strategies you have implemented should help you:

  • Gain performance visibility
  • Analyze and learn from the data
  • Implement changes and improve consistently

Complex dynamic applications drive online businesses and application performance has a direct impact on business KPIs as it determines end-user experience. Performance monitoring tools play a crucial role is shaping end-user experience. These tools have evolved with performance management strategies. Application availability, reachability, reliability and performance are the central pillars of digital end-user experience monitoring.

The evolution of performance management has pushed the need for a more proactive monitoring mindset, especially when preparing for high-traffic events.

Ecommerce Performance Monitoring

The digital landscape is evolving continuously – everything from the way applications are built, the infrastructure to the skills involved in building, deploying and maintaining applications. Applications are no longer monolithic, it has transitioned to an architecture that relies on microservices as it provides a lot of flexibility in building and deploying applications. Teams can work exclusively on specific features within the application without impacting the overall application.

IT infrastructure has also evolved to support this new breed of applications. It has moved from on-premise to a multi-cloud, multi-CDN or even a hybrid model. Edge computing is the next level in this evolution process moving the spotlight to edge content delivery, networking, security and edge performance monitoring.

Even with all these changes, delivering a great customer experience remains the focus and application performance is central to maintaining the customer experience. The definition of good end-user experience has also changed over the last few decades. There was a time when a page that loads in 10 seconds was acceptable. Ecommerce giants like Amazon and others have redefined customer experience and now the acceptable page load time is below 2 seconds.

Considering the current highly distributed and complex architecture, there is a pressing need to rethink performance monitoring to provide insightful data and analysis. Performance monitoring is crucial as it –

  1. Mitigates the impact on revenue
  2. Every 1 second of performance improvement increases conversions by 2%
  3. Every 100 ms of performance improvement, grew incremental revenue by up to 1%
  4. SEO benefits for entry pages and reduce bounces
  5. Protects your brand value
  6. Downtime costs $8,000/minute – roughly $800,000 per incident
  7. Downtime can have a significant impact on brand value
  8. Saves IT productivity
  9. IT spends less time firefighting performance issues
  10. You can focus better building, deploying and marketing your products and services.

Preparing for High-Traffic Events

Before the event

When prepping for a peak event such as Black Friday, there are six important phases to consider:

1. Building effective strategies: Consider a multi-tenant application architecture that is resilient and delivers better performance. Invest in caching and failover strategies.

2. Caching: (i) Cache offload to CDNs (i) Improve Cache hit on CDN and tiered cache proxies (iii) Revisit cache-busting scripts.

3. Failover Implementations: (i) DC/Cloud-based delayed failovers (ii) Application level failovers (iii) Improved customer experience via wait room implementations.

4. Preparation and testing: Stress test the application to understand how performance varies with different amount of traffic to identify bottlenecks in the application. This involves testing all third-party services including the CDN provider as well as the monitoring tools you use.

5. Implement performance monitoring: Single pane simplified dashboards that give a clear picture of the entire delivery chain. Log aggregation intervals must be consistent across different layers in the infrastructure.

6. Alert configuration: Identify and set up relevant alert types, alert severity and map alerts to the right team to ensure it is addressed immediately.

During the Event

During the event, ensure you have a support team on call and ready to take action when needed. You must discuss guidelines defining escalation policies as part of the prep. This will make it easier for different teams to communicate any critical issues without delay. There should also be an effective plan of action in place to handle a crisis.

After the Event

Once the event is over, it is important to conduct a retrospective analysis of the performance data and incidents during the event. This helps to understand –

  • Did everything go according to plan?
  • What could have been done better?
  • How did the infrastructure handle the traffic and load behaved?
  • How does the stress test data aggregated during the event prep compare to the post even data?

The performance data can also be used to benchmark different metrics that will help you prepare better for the next peak event.

Proactive Monitoring for Improved Performance

There are multiple third-party infrastructure and service providers in the industry. The adoption of services such as multi-DNS, multi- and hybrid-cloud, multi-CDN, and others mean that when everything is operating smoothly, end users are getting content and services delivered to them faster than ever before.

However, these developments in architecture and digital delivery come with a cost. Every additional layer in the delivery chain adds complexity, introduces visibility gaps, and reduces these teams’ ability to understand how infrastructure health is affecting the end-user experience. This means that whenever there is a disruption in the infrastructure delivery chain, IT teams are often left scrambling to identify the root cause of the problem.

Proactive monitoring essentially eliminates the blindspots created by all the different components in the delivery chain. Root cause analysis is easier as the IT teams can correlate and analyse data effectively. The performance data will help identify and resolve bottlenecks easily. Third-party integrations can be monitored, and you can hold service providers accountable for any SLA breaches. Proactive monitoring is especially useful during A/B testing as you can evaluate the performance of each component.

Tips for Improved Performance Monitoring

Proactive monitoring is a must when preparing for high traffic events. We suggest a five-step process for effective and improved performance monitoring:

1. Measure everything: Latency or downtime can be introduced at any layer in the application. Critical endpoints, microservices and tag management tools are all potential bottlenecks. Monitor every single component including every third-party so there is end-to-end performance visibility. Measuring real user performance is recommended along with synthetic to help correlate performance trends and understand user behavior.

2. Benchmark: Benchmarking is essential to performance monitoring. It helps you understand industry best practices. You can evaluate multiple service providers and identify those with ideal performance. The trends from benchmarking provides interesting insights to help improve performance of your application.

3. Establish a baseline: A performance baseline is the expected performance of an application/service under certain conditions. With this information, we can determine:

- Expected performance when there is a surge in traffic.

- How to scale our application and services.

- How a new version of the application/service is performing compared to a previous version.

By baselining data, we will learn to:

- Look beyond averages and understand percentiles.

- Look at historical data and analyse trends.

4. Identify optimization areas: There are hundreds of performance metrics but measuring every single metric does not help. Each performance scenario calls for a set of metrics relevant to that scenario. It is easier to understand and correlate the data without having to pore through unnecessary information. So, identify areas that need optimization, focus on the optimization methodology while picking only the metrics that matte

5. Tie to business KPIs: When trying to improve performance, start with the business KPIs, look at historical data trends/patterns and then the metrics that impact these KPIs. You can then focus on generating performance budgets to build process that ensure a focus on performance across the project lifecycle.

These five points are essential when prepping for a peak event to ensure great end-user experience. We must remember that no matter how great the tools are, they’ll count for little if organizations don’t have visibility into the health and reliability of each of the pieces that make the whole application.

To conclude, we believe that performance management must be viewed as a year-round priority and the performance strategies you have implemented should help you:

  • Gain performance visibility
  • Analyze and learn from the data
  • Implement changes and improve consistently
This is some text inside of a div block.

You might also like

Blog post

When SSL Issues aren’t just about SSL: A deep dive into the TIBCO Mashery outage

Blog post

Preparing for the unexpected: Lessons from the AJIO and Jio Outage

Blog post

Demystifying API Monitoring and Testing with IPM