Black Rock Blog

Measuring AWS Lambda Response Times | BKS2

Measuring AWS Lambda Response Times

TL;DR: You can reduce “cold start” response times in AWS Lambda functions (using the Java runtime) down to an acceptable amount by setting the optional MemorySize value to 512 (megabytes) or greater.


In a recent post about building AWS Lambda functions with Scala the 10-second “cold start” time issue for Lambdas with the Java runtime surfaced. After the initial “cold start” request, the “warmed” Scala function would generally return responses in under 250 ms. The “warmed” request performance appears to be acceptable for most apps, but not an endpoint that occasionally takes 10 seconds to return a simple “Hello, developer”, at least for use in web pages.

After some research on the “cold start” issue I learned that the response time for the Java runtime after a period of inactivity decreases inversely to the Lambda’s memory size. In the above article I had left out the MemorySize field from the SAM configuration file (template.yaml), and so the default value of 128 (megabytes) was used. Explicitly setting this field to larger values than 128 in experiments resulted in faster “cold start” response times.

Measurings

To figure out how increasing the Lambda function memory size correlates to “cold start” response, I wrote a script to deploy the example Lambda function with a selected range of memory sizes. For each deployment I measured the initial “cold start” response time and then four additional response times. I ran the script three times and computed the average (mean) response times for each memory size.

Here’s the result of the response time per MemorySize measuring:

cold-warm-times

Measurement Summary

Using the default memory size of 128mb (by leaving the MemorySize field out of the SAM template) clearly results in the greatest “cold start” response time. Just doubling it to 256mb is a 90% improvement, one that should be a recommended default for AWS Lambda functions with the Java runtime. Continuing to increase the memory size provides improvements, but not the dramatic change from 128mb to 256mb.

This concludes the results of the AWS Lambda function measurements. Read on for a detailed background into how this was collected & clever code samples for data wrangling in Scala.


How Response Times Were Measured

In this section I’ll cover the methodology used to measure the Lambda function response times. This may be interesting to you if you want to similarly measure the performance of discrete settings in your deployed apps, script infrastructure changes for testing, and perform calculations in a Scala REPL.

Extra Credit: Do you know of an easier / shorter way to accomplish the same measurings? Pls post it below!

Adding In That Missing MemorySize Field

The first step is to add the MemorySize field to the SAM template. This is done not only to ensure that the deployed function will have sufficient memory, but so that we can use sed to alter it in the script to a series of different values for testing. Here’s the template from the Zero to AWS Lambda in Scala post with the missing field added to line 17:

template.yaml
8
9
10
11
12
13
14
15
16
17
18
Resources:
HelloScalaFunction:
Type: AWS::Serverless::Function
Properties:
FunctionName: HelloScala
Description: A simple AWS Lambda function in Scala
Runtime: java8
Handler: lambda.ApiGatewayProxyHandler
CodeUri: target/scala-2.12/hello-scala.jar
MemorySize: 512
Timeout: 15

Remember, even if you choose the default memory size of 128 (mb) for your AWS Lambda functions, it is worthwhile to explicitly call that out with the MemorySize field in your SAM template.

Scripting The Deployments in Zsh

Why use Zsh? It’s my default shell, available on most systems, and its easier to break commands down across multiple lines.

Here’s the test script. It iterates over the selected memory sizes (inexhaustive; you can actually set the memory size for lambda functions anywhere from 128mb to 3008mb in 64mb increments but that is unnecessary for this test), deploying all the lambda functions in parallel, then testing each one five times using the generated url from the cloudformation stack (see the CloudFormation Outputs), and finally undeploying all the functions. Since everything’s cleaned up, running the script a second- and third-time will ensure the first request per function will be a “cold start”.

test.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!env zsh

set -e

TIMEFMT=%E
CURL_ITERATIONS=5
MEMORY_SIZES=(128 256 384 512 768 1024 1536 2048 2560 3008)

# Returns a stack named with the memory size in use
memToStack() { mem=$1 && echo "lambdatest-$mem" }

# Deploys the lambda function with the given memory size (also renames function & package file to avoid conflicts)
deployWithMemSize() {
mem=$1
template="packaged_$mem.yaml"
< packaged.yaml sed "s/MemorySize: .*$/MemorySize: $mem/;s/HelloScalaFunction:/HelloScalaFunction$mem:/" > $template
nohup sam deploy --template-file $template --stack-name $(memToStack $mem) --capabilities CAPABILITY_IAM &
}

# Undeploys the lambda function with the given memory size
undeployWithMemSize() { mem=$1 && aws cloudformation delete-stack --stack-name $(memToStack $mem) }

# For each output url we've generated in a cloudformation stack, test the url repeatedly and print its stack name and performance
timeAllStacks() {
stacks=$(
aws cloudformation describe-stacks |
jq -r '.Stacks[] | "\(.StackName) \(.Outputs[0].OutputValue)"' |
grep -v null | sed 's/{name}/developer/'
)
echo $stacks | while read -r stack url; do
for i in {1..$CURL_ITERATIONS}; do
echo -n "curl $stack $i " && time curl -s -o /dev/null $url
done
done
}

# Deploy all the memory-sized lambda functions and wait for them to be ready
for mem in $MEMORY_SIZES; do deployWithMemSize $mem; done
wait

# Test them all
echo "now testing"
timeAllStacks

# Undeploy all the lambdas
echo "undeploying"
for mem in $MEMORY_SIZES; do undeployWithMemSize $mem; done

For my hasty measurement test I only ran this script three times. A more thorough testing of the response times would include more iterations and additional memory sizes. Another improvement to testing would be to vary the size of the deployed code (eg, “do smaller jar files reduce load time?”).

The script output includes aws cli messages, a few status messages (eg “now testing”), and the curl lines including the stack name, iteration, and response time. It looks like this:

timings.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
22:56 local ~/Code/aws-lambda-hello-scala on master ● > ./test.sh 
...
Successfully created/updated stack - lambdatest-256
now testing

curl lambdatest-384 1 4.19s
curl lambdatest-384 2 0.17s
curl lambdatest-384 3 0.16s
curl lambdatest-384 4 0.18s
curl lambdatest-384 5 0.12s
curl lambdatest-256 1 6.04s
curl lambdatest-256 2 0.17s
...

The next step is to process the output from multiple runs of this script. I’ll need to calculate the average (mean) response time for the function per MemorySize parameter (helpfully included in the CloudFormation stack name) aggregated by whether the request was a “cold start” (iteration = 1) or ran on a “warmed” Java runtime (iteration > 1).

The awk command would be excellent for this task, along with grep to filter out non-curl lines and sed to clean them up into space-delimited numeric values. Unfortunately I don’t know awk at all, so I’ll do it all in Scala. Okay, okay, and because Scala is more fun for this task.

Organizing and Calculating Results in Scala

As noted right above, in lieu of traditional Unix tools I prefer to use Scala to clean, organize, collate, and summarize data. Somehow this always ends up in a spreadsheet, and then in a fancy chart (see above), and eventually in a slide deck (at work), Slack channel (also, at work), or right here on my blog. If there’s a better way to build charts than in pasting the data into Google Sheets I’d love to see it!

The Scala tool I use for these jobs is the Ammonite REPL which improves on the standard Scala REPL with syntax highlighting, easier multi-line editing, shell-like functions, and a host of other UI improvements. In OS X you can install it with brew install ammonite-repl, and the site includes curl-based installations for other operating systems.

I’ve saved the output from multiple runs of the test script to timings.txt as seen in the previous code sample. The first step (besides starting up the REPL with amm) is to read the unfiltered output into a list of strings.

Ammonite-REPL 1/4
1
2
3
4
5
6
7
8
9
@ val lines = scala.io.Source.fromFile("timings.txt").getLines.toList 
lines: List[String] = List(
"8:12 local ~/Code/aws-lambda-hello-scala on master \u25cf > ./test.sh ",
"now testing",
"curl lambdatest-3008 1 1.33s",
"curl lambdatest-3008 2 0.18s",
"curl lambdatest-3008 3 0.12s",
"curl lambdatest-3008 4 0.12s",
...

Note: I configured Ammonite to truncate the printer (the ‘P’ in “REPL”) down to 8 lines to keep the code samples at a readable length in this session: repl.pprinter() = repl.pprinter().copy(defaultHeight = 8)

The next step includes building a regular expression to match the timing lines and extract the numeric values, which I’ll also use to filter out non-timing lines like “now testing”. I’ll also define a class to hold the values with a custom enum to denote “cold start” times vs “warmed” times. Finally, I’ll use the excellent List.collect() function to filter & parse the timing lines into a List[Stat] collection.

Ammonite-REPL 2/4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
@ val curlResult = raw".*-(\d+) (\d+) ([^s]+)s".r 
curlResult: scala.util.matching.Regex = .*-(\d+) (\d+) ([^s]+)s

@ object RuntimeTemp extends Enumeration { type RuntimeTemp = Value; val Cold = Value("Cold"); val Warm = Value("Warm") }
defined object RuntimeTemp

@ case class Stat(memory: Int, temp: RuntimeTemp, duration: Double)
defined class Stat

@ val stats = lines.collect {
case curlResult(mem, i, duration) => Stat(mem.toInt, if (i.toInt > 1) Warm else Cold, duration.toDouble)
}
stats: List[Stat] = List(
Stat(3008, Cold, 1.33),
Stat(3008, Warm, 0.18),
Stat(3008, Warm, 0.12),
Stat(3008, Warm, 0.12),
Stat(3008, Warm, 0.16),
Stat(2048, Cold, 1.5),
...

Note: If the code looks polished and succinct, it may be because I’m a wonderful coder, or possibly because I experimented in the REPL dozens of times and purged all of the previous output from the code block. It’s one of those, really. This is one of the major benefits of a REPL, and of the excellent Ammonite REPL - experiment until you get it right!

With the List[Stat] collection in hand, it is now possible to organize the timings by Lambda function memory size and cold/warm status. The also-excellent List.groupBy() function is used here to build a map of memory sizes to a map of cold/warm enums to durations:

Ammonite-REPL 3/4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
@ val meanStatsMap = stats.groupBy(_.memory).mapValues(memStats => 
memStats.groupBy(_.temp).mapValues(coldWarmStats =>
coldWarmStats.map(_.duration).sum / coldWarmStats.length
)
)

meanStatsMap: Map[Int, Map[RuntimeTemp, Double]] = Map(
1024 -> Map(Warm -> 0.15666666666666668, Cold -> 2.103333333333333),
3008 -> Map(Warm -> 0.1608333333333333, Cold -> 1.3633333333333333),
384 -> Map(Warm -> 0.21416666666666664, Cold -> 4.533333333333333),
512 -> Map(Warm -> 0.14583333333333337, Cold -> 3.3766666666666665),
2560 -> Map(Warm -> 0.16416666666666666, Cold -> 1.3466666666666667),
256 -> Map(Warm -> 0.15416666666666665, Cold -> 5.976666666666667),
...

The final step is to export this map of maps to a tab-delimited table, which can then be easily pasted into a spreadsheet like Google Sheets.

I’ll take the first map’s keys (the memory sizes) and sort them, format the response times to two decimal places, and print out the data along with a header:

Ammonite-REPL 4/4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
@ val header = "MemorySize\tCold Response Time (sec)\tWarm Response Time (sec)" 
header: String = "MemorySize\tCold Response Time (sec)\tWarm Response Time (sec)"

@ val tsv = header :: meanStatsMap.keys.toList.sorted.map(mem =>
"%d\t%.2f\t%.2f".format(mem, meanStatsMap(mem)(Cold), meanStatsMap(mem)(Warm))
)
tsv: List[String] = List(
"MemorySize\tCold Response Time (sec)\tWarm Response Time (sec)",
"128\t11.44\t0.15",
"256\t5.98\t0.15",
"384\t4.53\t0.21",
"512\t3.38\t0.15",
"768\t2.48\t0.16",
...

@ tsv foreach println
MemorySize Cold Response Time (sec) Warm Response Time (sec)
128 11.44 0.15
256 5.98 0.15
384 4.53 0.21
512 3.38 0.15
768 2.48 0.16
1024 2.10 0.16
1536 1.76 0.16
2048 1.50 0.18
2560 1.35 0.16
3008 1.36 0.16

After copying this into a new Google Sheet, I can create a chart with default settings that looks like this:

cold-warm-times

Next Steps

A more accurate test to measure response times could include:

  • Running the test more than three times
  • Capturing the median response times
  • Capturing the response times percentiles, eg p99 (you would need far more than three iterations to make this worthwhile)

That concludes the article. I hope you found the measurement explanations useful!