TL;DR: You can reduce “cold start” response times in AWS Lambda functions (using the Java runtime) down to an acceptable amount by setting the optional MemorySize
value to 512 (megabytes) or greater.
In a recent post about building AWS Lambda functions with Scala the 10-second “cold start” time issue for Lambdas with the Java runtime surfaced. After the initial “cold start” request, the “warmed” Scala function would generally return responses in under 250 ms. The “warmed” request performance appears to be acceptable for most apps, but not an endpoint that occasionally takes 10 seconds to return a simple “Hello, developer”, at least for use in web pages.
After some research on the “cold start” issue I learned that the response time for the Java runtime after a period of inactivity decreases inversely to the Lambda’s memory size. In the above article I had left out the MemorySize
field from the SAM configuration file (template.yaml
), and so the default value of 128 (megabytes) was used. Explicitly setting this field to larger values than 128 in experiments resulted in faster “cold start” response times.
Measurings
To figure out how increasing the Lambda function memory size correlates to “cold start” response, I wrote a script to deploy the example Lambda function with a selected range of memory sizes. For each deployment I measured the initial “cold start” response time and then four additional response times. I ran the script three times and computed the average (mean) response times for each memory size.
Here’s the result of the response time per MemorySize
measuring:
Measurement Summary
Using the default memory size of 128mb (by leaving the MemorySize
field out of the SAM template) clearly results in the greatest “cold start” response time. Just doubling it to 256mb is a 90% improvement, one that should be a recommended default for AWS Lambda functions with the Java runtime. Continuing to increase the memory size provides improvements, but not the dramatic change from 128mb to 256mb.
This concludes the results of the AWS Lambda function measurements. Read on for a detailed background into how this was collected & clever code samples for data wrangling in Scala.
How Response Times Were Measured
In this section I’ll cover the methodology used to measure the Lambda function response times. This may be interesting to you if you want to similarly measure the performance of discrete settings in your deployed apps, script infrastructure changes for testing, and perform calculations in a Scala REPL.
Extra Credit: Do you know of an easier / shorter way to accomplish the same measurings? Pls post it below!
Adding In That Missing MemorySize Field
The first step is to add the MemorySize
field to the SAM template. This is done not only to ensure that the deployed function will have sufficient memory, but so that we can use sed
to alter it in the script to a series of different values for testing. Here’s the template from the Zero to AWS Lambda in Scala post with the missing field added to line 17:
8 | Resources: |
Remember, even if you choose the default memory size of 128 (mb) for your AWS Lambda functions, it is worthwhile to explicitly call that out with the MemorySize
field in your SAM template.
Scripting The Deployments in Zsh
Why use Zsh? It’s my default shell, available on most systems, and its easier to break commands down across multiple lines.
Here’s the test script. It iterates over the selected memory sizes (inexhaustive; you can actually set the memory size for lambda functions anywhere from 128mb to 3008mb in 64mb increments but that is unnecessary for this test), deploying all the lambda functions in parallel, then testing each one five times using the generated url from the cloudformation stack (see the CloudFormation Outputs), and finally undeploying all the functions. Since everything’s cleaned up, running the script a second- and third-time will ensure the first request per function will be a “cold start”.
1 |
|
For my hasty measurement test I only ran this script three times. A more thorough testing of the response times would include more iterations and additional memory sizes. Another improvement to testing would be to vary the size of the deployed code (eg, “do smaller jar files reduce load time?”).
The script output includes aws cli messages, a few status messages (eg “now testing”), and the curl
lines including the stack name, iteration, and response time. It looks like this:
1 | 22:56 local ~/Code/aws-lambda-hello-scala on master ● > ./test.sh |
The next step is to process the output from multiple runs of this script. I’ll need to calculate the average (mean) response time for the function per MemorySize
parameter (helpfully included in the CloudFormation stack name) aggregated by whether the request was a “cold start” (iteration = 1) or ran on a “warmed” Java runtime (iteration > 1).
The awk
command would be excellent for this task, along with grep
to filter out non-curl lines and sed
to clean them up into space-delimited numeric values. Unfortunately I don’t know awk
at all, so I’ll do it all in Scala. Okay, okay, and because Scala is more fun for this task.
Organizing and Calculating Results in Scala
As noted right above, in lieu of traditional Unix tools I prefer to use Scala to clean, organize, collate, and summarize data. Somehow this always ends up in a spreadsheet, and then in a fancy chart (see above), and eventually in a slide deck (at work), Slack channel (also, at work), or right here on my blog. If there’s a better way to build charts than in pasting the data into Google Sheets I’d love to see it!
The Scala tool I use for these jobs is the Ammonite REPL which improves on the standard Scala REPL with syntax highlighting, easier multi-line editing, shell-like functions, and a host of other UI improvements. In OS X you can install it with brew install ammonite-repl
, and the site includes curl-based installations for other operating systems.
I’ve saved the output from multiple runs of the test script to timings.txt
as seen in the previous code sample. The first step (besides starting up the REPL with amm
) is to read the unfiltered output into a list of strings.
1 | @ val lines = scala.io.Source.fromFile("timings.txt").getLines.toList |
Note: I configured Ammonite to truncate the printer (the ‘P’ in “REPL”) down to 8 lines to keep the code samples at a readable length in this session: repl.pprinter() = repl.pprinter().copy(defaultHeight = 8)
The next step includes building a regular expression to match the timing lines and extract the numeric values, which I’ll also use to filter out non-timing lines like “now testing”. I’ll also define a class to hold the values with a custom enum to denote “cold start” times vs “warmed” times. Finally, I’ll use the excellent
List.collect() function to filter & parse the timing lines into a List[Stat]
collection.
1 | @ val curlResult = raw".*-(\d+) (\d+) ([^s]+)s".r |
Note: If the code looks polished and succinct, it may be because I’m a wonderful coder, or possibly because I experimented in the REPL dozens of times and purged all of the previous output from the code block. It’s one of those, really. This is one of the major benefits of a REPL, and of the excellent Ammonite REPL - experiment until you get it right!
With the List[Stat]
collection in hand, it is now possible to organize the timings by Lambda function memory size and cold/warm status. The also-excellent List.groupBy() function is used here to build a map of memory sizes to a map of cold/warm enums to durations:
1 | @ val meanStatsMap = stats.groupBy(_.memory).mapValues(memStats => |
The final step is to export this map of maps to a tab-delimited table, which can then be easily pasted into a spreadsheet like Google Sheets.
I’ll take the first map’s keys (the memory sizes) and sort them, format the response times to two decimal places, and print out the data along with a header:
1 | @ val header = "MemorySize\tCold Response Time (sec)\tWarm Response Time (sec)" |
After copying this into a new Google Sheet, I can create a chart with default settings that looks like this:
Next Steps
A more accurate test to measure response times could include:
- Running the test more than three times
- Capturing the median response times
- Capturing the response times percentiles, eg p99 (you would need far more than three iterations to make this worthwhile)
That concludes the article. I hope you found the measurement explanations useful!