How to speed things up
At the Scala Days conference in Lausanne this year there were a few interesting talks, and no, I am not talking about Shapeless for Scala 3.
The first talk that I found quite interesting was titled Serverless Scala - Functions as SuperDuperMicroServices by James Ward and Josh Suereth.
The two speakers (from Google) had an interesting take on the evolution of the interpreter pattern in Scala from free monads to tagless final to Zio to the conquer of Mars in the years to come until the day the scala community will figure out the definitive approach.
It was interesting to me how the interpreter pattern could actually be used to share responsibilities within large teams by having a stream in charge of the core domain logic, one in charge of the command line interpreter, one in charge of the web API interpreter and so on by sharing the same interface and the same core logic.
The main topic of the talk was (as per title) about deploying serverless Scala application to Google Cloud Run. The top one issue with serverless JVM applications is surely around cold start and performances. Surprisingly (at least for me) there is a “simple” answer to this problem: GraalVM.
The idea is ahead-of-time compilation to produce a standalone executable (aka a native image) by using GraalVM.
An approach presented was to have a docker image like:
FROM oracle/graalvm-ce:10.0.0 as builder WORDKIR /app COPY . /app RUN gu install native-image RUN ./sbt graalvm-native-image:packageBin FROM apline:3.9.4 COPY --from=builder /app/target/graalvm-native-image/app /app CMD ["/app", "-web"]
Being new to Graal I was quite confused at that point,
and I soon realised most other people are, so it’s important to emphasize the following statement:
Graal and GraalVM are not the same thing.
Graal is a JIT (just in time) compiler. GraalVM is an umbrella for a variety of technologies including Graal and SubstrateVM (SVM).
By using GraalVM, you can effectively run your Scala application outside of the JVM. You can run it on Android and iOS for example (a more in-depth talk about this topic was given by Vojin Jovanovic titled Run Scala Faster with GraalVM on any Platform).
In the reality not every application can (or should?) run outside of the JVM. On the other hand, thank to the layered compiler architecture, it is possible to replace the traditional C2 JIT compiler with Graal while still running on the JVM.
This concept was further explored by Chris Thalinger talk about
Performance tuning Twitter services with Graal and ML.
The speaker, who is part of the #TwitterVMTeam, presented the ambitious “Bayesian optimisation as a service”, a machine learning software that automatically tunes applications running in the cloud!
According to Thalinger, Twitter currently runs the majority of its scala services using this strategy with impressive advantages including drastic reduction of CPU usage and latency up to 28%, which (especially for a company like Twitter) translate directly to money savings on the infrastructure and a better user experience.
Graal for SpringerLink
It is actually surprisingly simple to try out Graal. In fact, if you’re already on JDK11 (SpringerLink was not yet), you only need to add the following settings:
-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler
I’ve decided to go YOLO and try it out on the main link.springer.com application, responsible for serving most of the content available on the website.
The experiment was about deploying two distinct applications (one instance each) that only differ from those settings and run some load testing using a simple command line tool (siege).
The load testing was run with 25 simultaneous users,
for 1 minute and with no delays between requests:
siege -c 25 -b -t 1M https://...:
Straight to the results:
|Transactions||11412 hits||8839 hits|
|Availability||100.00 %||99.84 %|
|Elapsed time||59.92 secs||59.54 secs|
|Data transferred||79.86 MB||61.95 MB|
|Response time||0.13 secs||0.17 secs|
|Transaction rate||190.45 trans/sec||148.45 trans/sec|
|Throughput||1.33 MB/sec||1.04 MB/sec|
At first this was looking very good. Too good in fact: I was hitting an article not found page.
Please take a moment to laugh at me now.
Lesson learned: do not start load testing stuff that you’re not absolutely sure is working fine.
Despite the dramatic findings, the results were still encouraging so I run the test again on a working article page:
|Transactions||5859 hits||3964 hits|
|Availability||100.00 %||100.00 %|
|Elapsed time||59.02 secs||59.11 secs|
|Data transferred||96.62 MB||65.34 MB|
|Response time||0.25 secs||0.37 secs|
|Transaction rate||99.27 trans/sec||67.06 trans/sec|
|Throughput||1.64 MB/sec||1.11 MB/sec|
To make sure external factors (e.g. cache) didn’t play a fundamental role here, I run a second test on the same article page, considering everything was warm at this point:
|Transactions||8117 hits||5536 hits|
|Availability||100.00 %||100.00 %|
|Elapsed time||59.83 secs||59.82 secs|
|Data transferred||133.87 MB||91.36 MB|
|Response time||0.18 secs||0.27 secs|
|Transaction rate||135.67 trans/sec||92.54 trans/sec|
|Throughput||2.24 MB/sec||1.53 MB/sec|
Two obvious things to notice:
- performances of both applications improved significantly compared to the first run, and
- Graal performances were significantly better than C2 in both occasions
I did not have time to verify improvements in terms of CPU and memory, and I could not extensively test the application to prove it works as expected (probably the most important factor).
However, tests suggest that the Graal application can potential handle 47% more transaction per seconds, with a 66% faster response in average.
Surely worth a try!