Serverless Monitoring


  • Home
  • Articles
  • Serverless Monitoring
Serverless Security Monitoring 16th Oct 2019

Serverless Monitoring

Serverless Controls Part One

AWS Lambda

In part one of a two part set I will review and compare the Software as a Service offerings from Lumigo, Dashbird and Thundra who all offer management oversight of logs, metrics and traces to enable the user to better understand what is happening with their AWS Lambda deployments. In part two I will look at two other services that are somewhat different in that they cover other cloud providers beside AWS and are also offering different features. Puresec is more of a security tool than a monitoring tool and BlueMatador covers container services such as kubernetes as well as serverless services (functions as a service, Lambda).

TL:DR Executive Summary Supports Node.js, Python and Java. Essentially a "Set and forget" deployment and configuration strategy for businesses who are not constantly changing or adding functions and just want to monitor performance and look for cost savings. Is the cheapest service considered here. Will not suit businesses with a strict security compliance posture as log data is copied from your site to the Dashbird site for aggregation. Polling and synchronous data transfers between S3 buckets (within and outside your account) will not suit businesses that do such activities as image processing that also involve copying files across buckets
Support for Node.js, Python or Java with support for other AWS Lambda supported languages in development. Will be favoured by serverless development teams with a DevSecOps continuous delivery pipeline using the Serverless Framework or AWS SAM where adding security and monitoring components to Lambda functions is what they do everyday. The advanced, real time tracing and detailed mapping will help debug and identify slow running functions. Probably the developer's choice as it is very flexible. Follow their blog for advanced techniques and insights. Lumigo now also offers a SaaS serverless monitoring and tracing service via the AWS MarketPlace. Supports Node.js, Python, Java, .NET, Golang. The most developed of the three options in terms of documentation, breadth of features, flexibility. Has an outstanding user interface, options for those who use Serverless Framework and AWS SAM and those who do not. Is a good choice for Java or .NET teams moving to serverless and for Ops teams who are not involved in development. A really well-crafted service that has "wow" factor

Why Use Special Services to Monitor Serverless?

So first of all, why use a serverless monitoring service at all when the likes of big cloud logging and monitoring services such as Splunk, DataDog, Securonix, NewRelic, to mention a few, already offer battle hardened, comprehensive services, as well as cloud native services already provided by public cloud providers such as AWS CloudWatch logs and alarms combined with X-Ray and Athena? It should be noted that some of the companies mentioned above do have serverless monitoring services.

A general introduction to monitoring cloud native deployments

Is it necessary to use another service?

When considering this question, keep in mind that monitoring infrastructure within a VPC where multi-layered defences can be installed, much like an on-premise data centre, is different from monitoring a serverless infrastructure where the user has no control over the environment in which functions are run apart from deciding how much memory and time to run apply. Function timeouts, cold starts and over provisioning of memory can lead to unacceptable error rates, delays and cost.

AWS Lambda has its own dashboards which cover such metrics as invocation count, duration, error count and success rate, throttles, dead letter errors, and cost at an individual function level or account level. However, what if your primary interest is in Lambda activity for say, production, or purchase orders, only? Then custom metrics will be required. It should be noted that if the Lambda functions were created with CloudFormation then tags (key value pairs) such as stage: production are not automatically propagated to CloudWatch logs, the main source of monitoring data. Tags are the main way of grouping and classifying resources. One of the services we are looking at, Lumigo, has helpfully produced an open source solution for this gap.

Propagate tags used in CloudFormation, such as stage, to resources such as CloudWatch logs where propagation by CloudFormation is not automatic

So what should we expect from a paid-for Serverless Monitoring Service?

Everything that AWS Dashboards offer together with a more flexible way of slicing and dicing the data from CloudWatch logs. Each of the SaaS offerings we will look at has a free tier, some more generous than others, limited raw log data retention periods and a range of views of log data with the aim of better managing cost, enabling visibility of service thresholds being reached, and tracing of errors. All use encryption for data in transit and at rest.

While each of our three services have many similarities of aim and capabilities in monitoring, metrics and alerts they go about it in different ways which is perhaps the biggest differentiator between them apart from pricing.

Dashbird Onboarding

For it to work, Dashbird needs limited read access to your AWS account to collect the data. You give Dashbird access by using their onboarding workflow. It will create a CloudFormation template that sets up all necessary policies and roles. Dashbird polls your Lambda deployments to identify your functions and copies the data to its own bucket every ten minutes. It then polls your logs. The exact polling interval is determined by the amount of lambda functions and the amount of requests they have. Data is polled a few times a minute for a single function. Polling adheres to AWS limits and every throttle error from the AWS API is tracked to avoid overwhelming them. However, if there are other services using the same client APIs, then throttling might still occur.


View Projects - groups of Lambdas - on dashboard
# Benefits of Dashbird
1 Supports node.js, Python and Java
2 Documentation is clear, well set out, easy to follow Watch an overview of the services on Youtube
3 Generous free tier of 1 million Lambda invocations per month and Professional tier of $99/mo 25 GB of ingested data or 10M invocations per month would suit many SME users
4 Will show over and under provisioning of Lambda cpu and time enabling user to reduce cost and avoid timeouts
5 You can create Projects, basically a group of Lambdas, each containing as many Lambdas as you would like. The same Lambda can be assigned to multiple Projects as well. Dashbird will provide a custom metrics dashboard, as well as a central repository of errors and alerts particular for each Project
6 You can view API Gateway invocations of Lambda to see the performance of your API endpoints
7 Can be combined with AWS X-Ray to find root cause of error (tracing)
8 Email and Slack notification channels
9 Webhooks can be set up separately for each lambda
10 Free serverless resources and email newsletter written with a bit of humour
11 Dashbird is an AWS Advanced Technology Partner and Marketplace Seller


# Considerations for Dashbird
1 Not real time as polling and data migration involved.It usually takes an average of 30 seconds for Lambda logs to display in the Dashbird interface. Also, after opening a list of invocation logs, it won’t get automatically updated with new incoming logs
2 Export of logs out of your environment by copying may not be acceptable for some organisations where giving cross account access to third parties to copy data may present compliance issues
3 Potential for service limit throttling where S3 buckets are heavily used for copying data within the organisation such as could potentially happen for a business that receives image files and moves them to another bucket before or after processing
4 For tracing functionality integration with AWS X-Ray is required. X-Ray is a paid for service and the additional cost needs to be factored in. X-Ray pricing Beyond the free tier, traces recorded cost $5.00 per 1 million traces recorded ($0.000005 per trace).

Lumigo Onboarding

Lumigo takes a completely different approach to onboarding compared to Dashbird. A working familiarity with AWS Lambda and your language of choice: Node.js, Python or Java is required. Each function to be monitored needs a wrapper and an IAM role with permissions to call an API is required. There is a CloudFormation script to make this straightforward for someone with basic coding and AWS skills. Lumigo is more of a coder's application and the benefit is that you get complete control over which Lambda functions are monitored and a high degree of visibility of your architecture, tracing and executions. Lumigo has a SaaS offering in the AWS MarketPlace. and an introductory video in which the powerful tracing capabilities are demonstrated. The advantage of a SaaS service is that you get the latest version without needing to upgrade or change your code which is valuable in light of the significant improvements being made to AWS Lambda such as Concurrency. Note that the Lumigo SaaS offering uses a different pricing model - only the original tool is covered here - which should appeal to development teams who want to save time debugging.

System map of Lambda function topography

Lumigo video introduction

# Benefits of Lumigo
1 Lumigo works within your Lambda function making an asynchronous API call to pass on your transaction data
2 Works seamlessly as a fully scalable, Serverless monitoring system and troubleshooting platform in a way that a developer would expect
3 Live system map with the ability to filter for active/inactive services. Visualisation of decision path of function flow is a great way to debug
4 Works in real time
5 Visualize your application architecture
6 Highly granular view of an individual function execution
7 Easily identify unexpected behaviors like production service accessing staging service, unused “orphan” functions and more
8 Powerful research tool to identify architectural anomalies, latencies, and cost abnormalities and inefficiencies. Seems to use machine learning to understand normal usage patterns and identify anomalies. Artificial intelligence to forecast/predict future problems such as service limit throttling
9 Only monitor what you want and remove monitoring at will
10 Alerts by Slack or email
11 Blog posts by Yan Cui of whom I am a big fan See also The Burning Monk Analysing Lambda cold starts
12 Lumigo is an AWS Advanced Technology Partner
# Considerations for Lumigo
1 Involves code change and redeploy for each function that needs monitoring
2 Higher cost - 1 million invocations limit for lowest priced paid tier may mean many businesses may require a higher service limit at greater cost though cost could be reduced by limiting which functions are monitored

I have to confess this application has “WOW” factor. The documentation, the UI, and the level of detail and presentation create a feeling of love at first sight. But we can’t do software by giddy feelings so let’s get down to business.

Thundra Onboarding

Once registered you will be invited to set up a CloudFormation stack to create a user with access to read your functions and logs. As far as setting up an agent for each function to send data to Thundra via an asynchrous API call you are spoilt for choice.

No code, Low code, zero overhead implementation options are available.

The no code implementation uses a Java based Deployer Tool installed on the user end environment and acting as an agent. Otherwise there are Serverless framework-based deployment options that use a wrapper around functions to push data to Thundra via an API.

If a picture is worth a thousand words, I am going to save typing and your reading time by showing some part screen shots to demonstrate the "look and feel" of this webapp. However if this sounds like something that would be of interest I recommend visiting the site as the documentation and visuals are better than I can describe.

Excellent UX and Design

Close up of individual function dashboard

Trace map of function execution path

# Benefits of Thundra
1 Supports Node.js, Python, Java, .NET, Golang
2 Works seamlessly as a fully scalable, Serverless monitoring system and troubleshooting platform in a way that a developer would expect
3 Live system map with the ability to filter for active/inactive services. Visualisation of decision path of function flow is a great way to debug
4 Works in real time
5 Visualize your application architecture

What is Thundra? on Youtube

6 Highly granular view of an individual function execution
7 Powerful research tool to identify architectural anomalies, latencies, and cost abnormalities and inefficiencies
8 Only monitor what you want and remove monitoring at will
9 Alerts by Slack or email
10 Thundra is an AWS Advanced Technology Partner with DevOps Competency and AWS Marketplace Seller
# Considerations for Thundra
1 Involves code change and redeploy for each function that needs monitoring if using Serverless Framework or AWS SAM
2 Not much to add as I think it is an excellent service for most use cases
Written by Andrew Plater on 17th October 2019

Did I miss something? Want to read more?

Serverless Framework article

  • Share: