Automation-centric monitoring for cloud-scale infrastructure

I just published an ACG Research market impact report on the Juniper Networks’ AppFormix monitoring and automation solution for intent-driven cloud-scale infrastructure. The report examines the ramifications for data center operators of the highly dynamic, cloud-scale application deployment environments I described in the “3 D’s” of hybrid and multi-cloud application deployment.

Data center operators have access to a wealth of tools for application, infrastructure and network monitoring, provided by numerous vendors and open source initiatives. Yet the current generation of tools fall short in helping operators overcome the challenges of managing cloud-scale application deployment, which is characterized by massive scale, software-driven complexity and highly dynamic run-time environments in which workloads and resources fluctuate constantly. These operators need real-time, full stack monitoring that spans the entire environment:

They also need tools that can remove time-consuming manual workflows from the remedial action feedback loop. Infrastructure monitoring and analytics should feed actionable insights directly to the orchestration layer to automate the process of taking action in response to anomalies or changing conditions by reallocating resources or redistributing workloads. In other words, infrastructure monitoring needs to move from operator-centric to automation-centric.

Collecting and analyzing full stack monitoring data in real time is a Big Data problem, but Juniper Networks’ AppFormix takes an innovative approach to solving this problem that utilizes the distributed computing resources inherent in cloud-scale infrastructure to perform local machine learning on the metrics extracted from each node, significantly reducing the flow of data streamed to the central analytics engine and database.

Providers of infrastructure monitoring solution are busy incorporating machine learning and Big Data analytics into their products. However, in addition to its unique approach to Big Data analytics, what differentiates the Juniper Networks’ AppFormix solution is the integration of analytics-driven, policy-based control that continuously monitors key metrics against pre-defined SLAs and automatically triggers the orchestration layer to make the adjustments necessary to assure the operator’s business objectives.  The net result is automation-centric monitoring for intent-driven cloud-scale infrastructure.

For more information, watch the ACG Research Hot Seat video with Sumeet Singh, AppFormix founder and VP engineering, Juniper Networks.

 

 

 

 

Countdown to ONUG Fall 2017

Looking forward to participating in the ONUG Fall 2017 conference in New York, October 17 & 18.

I’ll be one of three judges for ONUG’s new Right Stuff Innovation Awards, which will be awarded to the companies presenting in the PoC Theater who are developing innovative products and solutions that are most closely aligned with the guidelines published by two ONUG working groups:

As I wrote in this blog, ONUG is now expanding its focus to helping enterprise IT managers address the challenges of hybrid multi-cloud application, infrastructure and network deployments, fostering cross-industry vendor collaboration in order to drive the development of more open, software-driven and cost-effective solutions.

I’m also moderating a Birds of a Feather session on Software-Defined Application, Infrastructure and Network Monitoring, with a strong focus on hybrid multi-cloud environments. This will be a forum for enterprise DevOps, ITOps, NetOps and SecOps users to share their experiences, challenges, concerns and recommendations with the vendor community. The session will immediately follow the Monitoring & Analytics Meetup where the M&A working group will discuss the progress ONUG is making in this operationally critical area.

Hope to see you in New York in just over a week, and if you are able to attend, be sure to check out some of the PoC Theater presentations and the vendors exhibiting in the Technology Showcase.

If you plan to attend the conference, register using discount code ACG30 to save 30%.

 

 

The “3 D’s” of hybrid and multi-cloud application deployment

While describing the challenges of enterprise IT application development in his FutureStack keynote, New Relic CEO Lew Cirne addressed the key question: “How to go fast at scale?” He pointed out that it’s not uncommon for DevOps shops to perform HUNDREDS of application deploys per DAY while larger outfits even deploy 1000’s. Listening to Lew describe how New Relic’s customers are rapidly developing and deploying cloud-based applications, it really hit me again that “Toto, we’re not in Kansas anymore”.

This got me thinking about the “3 D’s” of cloud application deployment:

  1. Dynamic
  2. Distributed
  3. Diverse

Let’s explore each of these and the challenges they are creating for DevOps, ITOps, SecOps and NetOps teams charged with deploying, securing, monitoring and managing hybrid and multi-cloud applications along with the underlying application and network infrastructure.

Dynamic. The basic premise of DevOps is that small, highly focused teams are working separately, but in parallel, continuously developing and deploying independent parts that make up a greater whole. This process itself is dynamic by its very nature, with some teams doing 100’s of deploys per day. More importantly, application run-time environments are becoming increasingly dynamic. In a Docker environment, new containers can be spun up and down in seconds, driven by the ebb and flow of application demands. In a microservices architecture, in which applications are composed of small, modular services, the various interactions between the microservices themselves will be inherently dynamic and unpredictable as new application capabilities are created by different combinations of the supporting microservices.

Distributed. Hybrid and multi-cloud environments are highly distributed, with applications and data possibly residing on-premise in legacy three tier data centers, on-premise in private clouds built using cloud-scale architectures, or in one or more public clouds utilizing SaaS, PaaS, IaaS capabilities and serverless computing. In addition, the underlying cloud compute and application infrastructures are highly distributed in order to ensure high availability and be able to easily scale compute and storage capacity on-demand. The interactions between application components distributed across these different environments can be very complex, both within in a given data center and over the network between data centers. We truly live in an age when “the network is the computer”.

Diverse. Application development is highly diverse, with enterprise IT developers using many different programming languages and run-time environments, including bare metal servers, virtual machines and containers. There are also multiple software frameworks that are used to implement these different environments, and developers may mix and match various components to create their own custom stacks.  Each cloud service provider offers its own set of application services, supported by its own full stack and characterized by a comprehensive set of APIs. There are also many different ways data can be stored and queried, ranging from legacy RDBMS systems to the latest NoSQL Big Data repositories.

Combined, these “3 D’s” are creating serious challenges for enterprise operations teams and have put a premium on monitoring and analytics solutions for gaining real-time visibility into what is happening at the application, infrastructure and network layers, as well as how to correlate anomalies and events at one layer with observed behavior and conditions at another. I think it’s safe to say “we’re not in Kansas anymore”!

Returning to FutureStack, Lew closed his keynote by describing the challenge of “interconnectivity” in “3 D” environments and the use of instrumentation for “transaction tracing” in order to map out the flow of service execution to identify problematic services that may be negatively impacting overall performance. Lew noted that in this area, New Relic is leveraging open source software – OpenTracing – which is a Cloud Native Computing Foundation member project.

The interconnectivity problem is yet another reason why the solutions that New Relic and other APM vendors are developing are so critical. If DevOps and ITOps teams don’t have the tools they need to monitor and manage large-scale deployments of highly dynamic and distributed applications across heterogeneous environments, enterprise IT won’t be able to “go fast at scale”. The result will be higher operating expenses, lost business opportunities and a serious drag on digital transformation initiatives.

 

 

FutureStack – New Relic’s customer conference

I recently attended New Relic’s FutureStack customer conference in New York City, which was a well organized event with great content delivered by subject matter experts, including many New Relic customers. It was my first engagement with the New Relic team and a good opportunity to take an in-depth look at the world of visibility and analytics top-down from the perspective of application performance monitoring (APM).

New Relic is a fast growing leader in the APM market, with revenue of $263.5 million in fiscal 2017, up 45% from fiscal 2016. More than 16,000 customers worldwide use New Relic’s SaaS-based product suite, including 40% of the Fortune 100. Company founder and CEO Lew Cirne was a pioneer in the modern APM market, founding Wily Technology almost 20 years ago. It was refreshing to hear that Lew is still a developer at heart and takes regular week long sabbaticals to work on ideas for new products.

New Relic offers a complementary set of products that serve as a “Digital Intelligence Platform” across three inter-related domains: digital user experience, application performance and infrastructure monitoring. The company’s core technology and expertise is embodied in its APM product line, which is used to instrument applications written in the leading programming languages and running across a wide range of execution environments. In his keynote, Lew emphasized that New Relic’s approach is to “instrument everything” so that DevOps teams always have full visibility into the behavior and performance of all applications. He noted that the old rule was nothing goes into production without a full QA cycle, but the new rule is no application should be deployed without complete instrumentation.

New Relic also provides several products for monitoring user experience by instrumenting mobile applications and browsers, including synthetic monitoring solutions that can proactively detect problems before users are impacted. Last year, the company moved into infrastructure monitoring that extends beyond basic server/OS monitoring to integrate a wide range of cloud-native application services provided by AWS and Microsoft Azure. Together, the full suite of New Relic products enables development and IT operations teams to see a complete picture of application behavior and performance from the end point to the execution environment and the underlying service infrastructure.

How does New Relic make sense of all the metrics and event data that are extracted using this ubiquitous instrumentation? “Applied intelligence” is the other side of the “instrument everything” coin, and this is where New Relic is doing impressive work with Big Data and real-time analytics. The company operates its own cloud infrastructure to deliver SaaS-based services to its customers. In order to be able to ingest, process and store the massive amount of metric and event data collected from customer applications, New Relic built its own high performance, multi-tenant, Big Data database from the ground up. The system currently processes on average 1.5 BILLION metrics and events per MINUTE. That’s a whole lot of data and speaks to why I believe SaaS-based analytics is the preferred approach for the vast majority of Big Data monitoring solutions, for several reasons.

First, SaaS solutions have significantly lower up front costs and can be deployed rapidly. Second, the elastic nature of the cloud allows the customer to rapidly scale monitoring, on-demand. Third, Big Data technology is a moving target and a SaaS solution shields the customer from having to deal with software updates and hardware upgrades, in addition to possible technology obsolescence. Last, and perhaps most importantly, since applications are migrating to the cloud, monitoring and analytics should follow. Given the option of a cloud-based Big Data monitoring solution, I can’t think of a good reason why mainstream enterprise IT organizations would choose to deploy on-premise.

Visibility into applied intelligence is provided by New Relic’s Insights product for visualizing application insights, including user-customizable dashboards that were showcased by customers in the main tent session that concluded the conference. Under the hood, New Relic has employed advanced statistical analysis and other techniques for correlating data extracted from user experience, application and infrastructure monitoring.

One example is RADAR, a new Insights feature that was introduced at FutureStack. RADAR “looks ahead and behind” to automatically glean useful intelligence for “situational awareness” that might not be readily apparent to customers looking at the usual dashboards. The analytics software acts like an intelligent assistant, constantly searching for anomalies and conditions that the customer might overlook or not discover until it’s too late. Not necessarily AI in the strictest use of the term, but certainly just as helpful.

FutureStack was also a great forum for learning how many leading enterprise IT organizations are embracing DevOps for application deployments spanning hybrid and multi-cloud environments, but I’ll wrap up my thoughts on FutureStack in my next post with a closer look at this far-reaching trend and its market impact.

“Toto, we’re not in Kansas anymore”

Last week I attended the Open Networking User Group (ONUG) workshop held at NYU in Manhattan. One highlight was Lakshmi Subramanian’s presentation on the impressive and relevant work being done by the researchers in NYU’s Open Networks and Big Data Lab. Lakshmi is also spearheading industry education and training programs in networking, cloud computing, security and Big Data to help address the growing technical skills gap that enterprise IT organizations face as they embrace new application development and delivery paradigms that would have been hardly conceivable ten years ago.

ONUG co-chair Nick Lippis kicked off the workshop with an overview of the upcoming ONUG Fall 2017 event, which will be held in New York City October 17 & 18. Nick described how ONUG’s charter now extends beyond open networking to the full stack of software-defined infrastructure needed to deploy and support a myriad of enterprise IT applications in complex hybrid and multi-cloud environments. As Nick was talking, it brought to mind Dorothy’s line after the tornado drops her down in the Land of Oz: “Toto, I have a feeling we’re not in Kansas anymore.”

Faced with a bewildering array of new software technologies and cloud services, combined with the breakneck pace of innovation, there must be times when IT managers feel like they’ve suddenly landed in a metaphorical Oz, but unlike Dorothy, they don’t have magic ruby slippers to transport them safely home to Kansas. Instead, they need to acquire the skills, tools and know-how to thrive in this amazing new world.

Open networking and open compute platforms have proven to be key enablers for migrating enterprise IT applications to the cloud, but ONUG now has four active working groups whose members are collaborating to identify and map out additional user challenges and critical success factors in other areas of interest:

  • Open SD-WAN Exchange (OSE)
  • Monitoring & Analytics (M&A)
  • Software-Defined Security Services (S-DSS)
  • Hybrid Multi-Cloud (HMC)

Software-driven SD-WANs promise to upend the legacy enterprise WAN model and deliver services that are more flexible, adaptable and responsive to the demands of hybrid and multi-cloud applications, while allowing enterprises to leverage ubiquitous, high speed Internet connectivity for SaaS applications and other cloud-based services. However, with so many different vendors developing SD-WAN products and solutions, interoperability is a key concern for enterprise users.

The M&A working group is looking at the tools and techniques needed for application, infrastructure and network monitoring, including new technologies like software-based instrumentation, streaming telemetry, Big Data and real-time analytics. Monitoring needs to extend from the legacy on-premise data center and private enterprise WAN, to private clouds built using cloud-scale infrastructure, across multiple public cloud services and to SaaS applications. This is where I spend a lot of my time these days, and it looks nothing like Kansas to me!

The S-DSS working group is developing a security architecture framework that is intent-based and wraps security policies around workloads that are independent of the underlying compute infrastructure, portable across multiple environments and not tied to physical locations. This work is important because security will ultimately be the gating factor for large-scale hybrid and multi-cloud deployment of mission critical applications.

The focus of the HMC working group brings us back to my Dorothy analogy. This team is looking at the full spectrum of business, people, security, regulatory and technology issues that IT organizations must address in order to successfully migrate their applications to hybrid and multi-cloud environments. Most mainstream IT managers are still living happily in Kansas, but the tornado is coming and before too long they will find themselves in the Land of Oz. Hopefully the HMC working group guidelines and recommendations will help them successfully navigate the complex array of issues they will be facing.

I hope you are able to attend ONUG Fall 2017 in October. The conference features many sessions with enterprise trailblazers and thought leaders who are pushing the envelope and operationalizing hybrid and multi-cloud application deployment. There will also be a series of vendor proof-of-concept presentations and demos, as well as “Right Stuff” awards for vendors in the vanguard who are providing monitoring and security solutions that address key operational requirements as specified by the M&A and S-DSS working groups.

One last thing. You won’t see any flying monkeys at the event, but there’s always a chance Glinda, the Good Witch of the South will make an appearance.

 

Deeper visibility into Deepfield

I just watched today’s SDxCentral Nokia Deepfield DemoFriday webinar, featuring Nokia GM and Deepfield architect Dr. Craig Labovitz, who described the product and demonstrated some of its features. Nokia acquired Deepfield earlier this year, and is now disclosing more information about Deepfield and how it fits into Nokia’s IP/optical networks portfolio, which Craig and others described at last month’s IP Networks Reimagined announcement (see my recent blog post).

I’ve been tracking Deepfield since I launched my ACG practice over a year ago, and had been briefed by the company prior to the acquisition, but as Craig acknowledged, Deepfield had been fairly secretive about the product and its technology. So it was good to finally see a demonstration of the actual product and hear Craig describe its capabilities in more detail.

Raising Deepfield’s profile is a good move by Nokia because the company’s global footprint will enable them to sell the product well beyond North America, where Deepfield is deployed by many leading service providers, although the company also has customers in Europe.

The premise for Deepfield is straightforward:

  1. The Internet has become much more complicated in the last 10 years, with complex network topologies, particularly in the metro area, the deployment of CDNs, the explosion of streaming video, and adoption of real-time voice and video communications. The big shift is from the Internet as a set of pipes for best effort bit delivery to a reliable end-to-end transport mechanism for high quality content and services with assured quality and performance.
  2. But what tools are available for service providers to deal with this shift? Deepfield recognized early on that advances in network instrumentation, streaming telemetry and Big Data analytics made it feasible to build a software-only platform for network visibility & analytics that was more powerful and yet more cost-effective than solutions employing DPI probes and monitoring appliances.

I would encourage those who are interested to watch a replay of the webinar, but here are some of the highlights:

  1. Deepfield uses “connectors” to implement southbound interfaces that collect data from a disparate array of sources, including many sources of telemetry data from the network itself, service provider data from OSS/BSS, customer care and billing systems, and data from Deepfield’s “Cloud Genome”, which maintains an up-to-date map of all of the content sources, services and devices on the Internet.
  2. Deepfield supports a petabyte-scale Big Data analytics engine for multi-dimensional, real-time analytics. Craig demonstrated how the system tracks network traffic by content source, website and application type, as well as by network or CDN, and generates intuitive visualizations of traffic load using built-in reports and in response to ad-hoc queries.
  3. Deepfield supports four main use cases: real-time QoE, network engineering, customer care and network security/DDoS. These are implemented as Deepfield applications that leverage a set of northbound interfaces from the core analytics engine. Craig also pointed out that these interfaces are also used to feed actionable intelligence to external systems supporting these various use cases.

It was clear from Craig’s brief demo that Deepfield’s software is a powerful tool for service providers, enabling them to gain in-depth, multi-dimensional, real-time visibility into traffic flowing across their networks and the Internet. Without the ability to gain this level of visibility, network operators would be flying blind and likely have a difficult time monitoring network performance and ensuring digital QoE for content and service delivery.

The webinar was light on implementation details, but Craig did say that the software can run on a cluster of Linux servers in a customer’s data center or can be hosted in the Amazon cloud as a SaaS-based service. Naturally, I’m keen to learn more about the full stack supporting real-time Big Data analytics and how the software is typically deployed operationally by service providers. However, it was good to gain deeper visibility into Deepfield, and I look forward to learning more.

 

 

Nokia couples cloud-scale network visibility & analytics for network automation

I attended Nokia’s IP Networks Reimagined event in June, where the company announced new 7750 SR-s IP core routers based on its new FP4 network processor chip, both impressive technical achievements in their own right.

However, what really got my attention is how Nokia is integrating the technology obtained via the Deepfield acquisition to directly couple cloud-scale network visibility with Big Data analytics for security, performance and network automation.

Deepfield’s petabyte-scale Big Data analytics engine provides visibility into tens of thousands of Internet applications and services as well as billions of IP addresses, mapping what it calls the Cloud Genome. The software is currently used by many leading service providers for DDoS protection and traffic engineering.

Nokia designed the FP4 chip so it can look anywhere inside packets for extracting real-time flow telemetry data. This data, along with machine data and network state provided by Nokia’s SR OS router software, then feeds the Deepfield analytics engine, which derives insights that are used to determine the actions taken by Nokia’s NSP software, which is an SDN-based network automation and management platform.

Using real-time network visibility & analytics for deriving actionable intelligence to drive network automation is the industry’s “holy grail”, and Nokia has articulated its vision for how to achieve this goal, so I’m keen to learn more about how these three pieces fit together.

For more information about Deepfield, be sure to tune into Nokia Deepfield DemoFriday at SDnxCentral this Friday, July 14, where Deepfield architect and Nokia GM Dr. Craig Labovitz will demonstrate the product’s capabilities.

 

NETSCOUT embraces disruption by porting packet flow visibility software to Open Compute platforms

NETSCOUT recently announced new nGenius PFS 5000 network packet brokers based on off-the-shelf Open Compute platforms. While Big Switch Networks blazed this trail back in 2013 with the launch of its Big Monitoring Fabric, NETSCOUT, which sells a family of purpose-built network packet broker platforms, is embracing disruption by porting its packet flow visibility software to white box switches, providing customers with a more cost-effective, easily scalable solution for network-wide visibility.

The benefits of this approach are described in the ACG white paper I authored: “Open Compute Platforms Power Software-Driven Packet Flow Visibility”.

Note that while Big Switch’s BMF is based on the classic SDN architecture using a central controller and the OpenFlow protocol, NETSCOUT has taken a different approach with the PFS 5000, which is based on a fully distributed, mesh architecture that is self-organizing and self-healing.

It will be interesting to watch this market segment evolve as the power of switching platforms based on merchant silicon continues to increase and other network packet broker vendors embrace the disruption of Open Compute.

 

Visibility & analytics at the ONUG Spring 2017 conference

I was invited to speak at the Open Networking User Group’s ONUG Spring 2017 conference held in San Francisco back in April about “A Framework for Infrastructure Visibility, Analytics and Operational Intelligence”. My presentation is up on Slideshare and ONUG has posted a video of the session.

My goal was to stimulate thinking about how we bring the power of Big Data to infrastructure monitoring and analytics by creating a common framework for tools to share visibility data from an array of sources and feed this data into a set of shared analytics engines to support various operational use cases.

It’s not economically feasible, nor is it technically desirable, for each tool to bring its own Big Data analytics stack and ingest dedicated streaming telemetry feeds. As an industry, we need to think about how we can create more commonality at the lower layers of the stack to implement lower cost solutions that facilitate data sharing and common analytics across a wide range of use cases.

On this front, ONUG has a Monitoring & Analytics initiative that is working to define user requirements and develop proof-of-concept demos for a new, comprehensive suite of tools to manage software-defined infrastructure.  There was a panel at the conference that provided an update on the status of the initiative, and ONUG has posted a video of this session.

I also moderated an interesting panel discussion on Retooling for the Software-Defined Enterprise that featured Aryo Kresnadi from FedEx, Ian Flint from Yahoo and Dan Ellis from Kentik, who all have extensive experience using and building monitoring & analytics tools in cloud-scale environments. ONUG has also posted a video of this session, along with many others from the conference on ONUG’s Vimeo channel.

If these topics interest you, be sure to save the date for ONUG Fall 2017, which will be held October 17 & 18 in New York City.

Cloud-scale technologies for cloud-scale infrastructure visibility & analytics

I think we can all agree that cloud-scale technologies are wonderful things, enabling hyper-agile delivery of applications and services to billions of users worldwide. Software-defined networking, virtualization, microservices, containers, open source software and Open Compute platforms are enabling cloud service providers to achieve mind-boggling economies of scale while keeping pace with insatiable user demand.

However, as telecom service providers and large-scale enterprises move to embrace cloud-scale technologies, they are proving to be both a blessing and a curse. The benefits are straightforward: rapidly deliver a broader range of applications and services at lower cost while being able to quickly respond to changing customer needs. The downside is that both service providers and enterprises need to employ new toolsets for developing, deploying and managing these applications and services.

Disaggregation and decomposition are consistent themes for cloud-scale technology. Monolithic platforms are separated into a software-driven control plane running on commodity hardware platforms. Network functions and computing resources are virtualized and decoupled from the underlying hardware. Monolithic applications are decomposed into many microservices that each run in their own container. The business value in terms of lower hardware costs coupled with increased flexibility and agility is real, but there are added costs associated with managing all these different piece parts.

The problem becomes obvious when service providers and enterprises try to apply existing management tools and methodologies to cloud-scale infrastructure. For all their internal complexity, configuring, monitoring and controlling monolithic platforms and applications is simpler than managing multiple layers of many different software components running on virtualized infrastructure. While the industry has recently made great strides by adopting new tools for cloud-scale infrastructure configuration and orchestration, we are still playing catch-up in terms of equally effective approaches to cloud-scale visibility and analytics.

Yet here is where cloud-scale technologies come to their own rescue. By disaggregating and decomposing software and hardware functions, with the proper instrumentation implemented at each layer and in every component, we are able to gain full visibility into the entire stack from top to bottom, while utilizing new technologies like streaming telemetry to provide extremely granular, real-time visibility into the application and service delivery infrastructure.

Therefore, it’s only natural that cloud-scale visibility and analytics should be implemented on native cloud-scale platforms, leveraging the same technologies: software-defined networking, virtualization, microservices, containers, open source software and Open Compute platforms. This is especially critical when employing Big Data analytics, where the basic technologies are inherently cloud-scale, and well-suited for ingesting Big Data streaming telemetry feeds and performing real-time streaming analytics on this data.