Thursday, November 8, 2018

Steps to improve the performance of a Java application

1. Introduction

In this article, we will discuss several approaches that may be useful to improve the performance of a Java application. We start with the definition of measurable performance goals and then we analyze different tools to measure, monitor the performance of applications and identify bottlenecks.

We'll also look at some of the common optimizations at the Java code level, as well as the best coding practices. Finally, we will discuss JVM-specific tuning tips and architecture changes to improve the performance of a Java application.

Keep in mind that performance optimization is a broad subject, and that's just a starting point to exploit in the JVM.

2. Performance Goals

Before we start working to improve the performance of the application, we need to define and understand our non-functional requirements in key areas, such as scalability, performance, availability, etc.

Here are some performance goals frequently used for typical Web applications:

1. Average response time of the application

2. Simultaneous media users must support the system

3. Requests expected per second during peak load

The use of metrics like these, which can be measured through different load testing tools and application monitoring, helps identify major bottlenecks and adjust performance accordingly.

3. Sample Application

We are going to define a baseline application that we can use throughout this article. We will use a simple Spring Boot web application, like what we created in this article. This application is managing a list of employees and exposes REST API to add an employee and retrieve existing employees.

We will use this as a reference to run load tests and monitor different application metrics in the following sections.

4. Identifying Bottlenecks

Load testing tools and APM (Application Performance Management) solutions are used to track and optimize the performance of Java applications. Load tests running in different application scenarios and simultaneous monitoring of CPU, I / O, heap usage, etc. using APM tools are essential to identify bottlenecks.

Gatling is one of the best load testing tools that provides excellent compatibility with the HTTP protocol, which makes it an excellent choice to test the load on any HTTP server.

The Stackify Retrace is a mature APM solution with a rich set of resources. Therefore, it is a great way to help you determine the baseline of this application. One of the main components of Retrace is its code profile, which collects runtime information without slowing down the application.

Retrace also provides widgets to monitor Memory, Threads and Classes for a running JVM-based application. In addition to the application metrics, it also supports CPU monitoring and the use of the IO of the server hosting our application.

Thus, a complete monitoring tool, such as Retrace, covers the first part of unlocking the performance potential of your application. The second part is really being able to reproduce the use in the real world and load into your system.

This is really harder to achieve than it seems, and it is also essential to understand the current performance profile of the application. That's what we're going to focus on now.

5. Gatling Load Test

The Gatling simulation scripts are written in Scala, but the tool also comes with a useful GUI, allowing you to record scenarios. The GUI then creates the Scala script representing the simulation.

And, after running the simulation, the Gatling generates useful HTML reports ready for analysis.

5.1. Define a scenario

Before launching the recorder, we need to define a scenario. It will be a representation of what happens when users browse a web application.

In our case, the scenario will be as we are going to initiate 200 users and each one makes 10,000 requests.

5.2. Configuring the Recorder

Based on Gatling first steps, create a new file EmployeeSimulation scala file with the following code:
class EmployeeSimulation extends Simulation {
    val scn = scenario("FetchEmployees").repeat(10000) {
        exec(
          http("GetEmployees-API")
            .get("http://localhost:8080/employees")
            .check(status.is(200))
        )
    }
 
    setUp(scn.users(200).ramp(100))
}
6. Monitoring the Application
To get started with using Retrace for a Java application, the first step is to sign up for a free trial here, on Stackify.
Next, we’ll need to configure our Spring Boot application as Linux service. We’ll also need to install Retrace agent on the server where our application is hosted as mentioned here.
Once we have started the Retrace agent and Java application to be monitored, we can go to Retrace dashboard and click AddApp link. Once this is done, Retrace will start monitoring our application.

6.1. Find the Slowest Part Of Your Stack

Retrace automatically instruments our application and tracks usage of dozens of common frameworks and dependencies, including SQL, MongoDB, Redis, Elasticsearch, etc. Retrace makes it easy to quickly identify why our application is having performance problems like:
·         Is a certain SQL statement slowing us down?
·         Is Redis slower all of a sudden?
·         Specific HTTP web service down or slow?


7. Code Level Optimizations

Load testing and application monitoring are quite helpful in identifying some of the key the bottlenecks in the application. But at the same time, we need to follow good coding practices in order to avoid a lot of performance issues before we even start application monitoring.
Let’s look at some of the best practices in the next section.

7.1. Using StringBuilder for String Concatenation

String concatenation is a very common operation, and also an inefficient one. Simply put, the problem with using += to append Strings is that it will cause an allocation of a new String with every new operation.
Here’s, for example, a simplified but typical loop – first using raw concatenation and then, using a proper builder:
public String stringAppendLoop() {
    String s = "";
    for (int i = 0; i < 10000; i++) {
        if (s.length() > 0)
            s += ", ";
        s += "bar";
    }
    return s;
}
 
public String stringAppendBuilderLoop() {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 10000; i++) {
        if (sb.length() > 0)
            sb.append(", ");
        sb.append("bar");
    }
    return sb.toString();
}
Using the StringBuilder in the code above is significantly more efficient, especially given just how common these String-based operations can be.
Before we move on, note that the current generation of JVMs does perform compile and or runtime optimizations on Strings operations.

7.2. Avoid Recursion

Recursive code logic leading to StackOverFlowError is another common scenario in Java applications.
If we cannot do away with recursive logic, tail recursive as an alternative is better.
Let’s have a look at a head-recursive example:
public int factorial(int n) {
    if (n == 0) {
        return 1;
    } else {
        return n * factorial(n - 1);
    }
}
And let’s now rewrite it as tail recursive:
private int factorial(int n, int accum) {
    if (n == 0) {
        return accum;
    } else {
        return factorial(n - 1, accum * n);
    }
}
 
public int factorial(int n) {
    return factorial(n, 1);
}
Other JVM languages, such as Scala, already have compiler-level support to optimize tail recursive code, and there’s discussion around bringing this type of optimization to Java as well.

7.3. Use Regular Expressions Carefully

Regular expressions are useful in a lot of scenarios, but they do, more often than not, have a very performance cost. It’s also important to be aware of a variety of JDK String methods, which use regular expressions, such as String.replaceAll(), or String.split().
If you absolutely must use regular expressions in computation-intensive code sections, it’s worth caching the Pattern reference instead of compiling repeatedly:
static final Pattern HEAVY_REGEX = Pattern.compile("(((X)*Y)*Z)*");
Using a popular library like Apache Commons Lang is also a good alternative, especially for manipulation of Strings.

7.4. Avoid Creating and Destroying too Many Threads

Creating and disposing of threads is a common cause of performance issues on the JVM, as thread objects are relatively heavy to create and destroy.
If your application uses a large number of threads, using a thread pool makes a lot of sense, to allow these expensive objects to be reused.
To that end, the Java ExecutorService is the foundation here and provides a high-level API to define the semantics of the thread pool and interact with it.
The Fork/Join framework from Java 7 is also well-worth mentioning, as it provides tools to help speed up parallel processing by attempting to use all available processor cores. To provide effective parallel execution, the framework uses a pool of threads called the ForkJoinPool, which manages the worker threads
To do a deeper dive into thread pools on the JVM, this is a great place to start.

8. JVM Tuning

8.1. Heap Size Tuning

Determining the appropriate heap size of the JVM for a production system is not a direct exercise. The first step is to determine the predictable memory requirements, answering the following questions:

1. How many different applications are we planning to implement in a single JVM process, for example, the number of EAR files, WAR files, jar files, etc.
2. How many Java classes will be loaded at runtime; including third-party APIs
3. Estimate the coverage area required for memory caching, for example, data structures from the internal cache loaded by our application (and third-party APIs), such as data cached from a database, data read from a file, etc.
4. Estimate the number of threads that the application will create
These numbers are difficult to estimate without some real world evidence.
The most reliable way to get a good idea on what the precise application is to run a realistic load test with respect to the application and monitor the metrics at run time. The Gatling-based tests we discussed earlier are a great way to do it.

8.2. Choose the correct garbage collector

The Stop-the-World garbage collection cycles used to represent a big problem for the responsiveness and overall Java performance of most customer-oriented applications.
However, the current generation of garbage collectors solved the issue and, with the appropriate adjustment and sizing, can lead to non-perceptible collection cycles. That said, you need a deep understanding of both GCs in the JVM as a whole, but also the specific profile of the application - to get there.

Tools such as a profile creator, heap dumps and detailed GC logging can help. And, again, they all need to be captured in real-world load patterns, which is where the Gatling performance tests that we discussed earlier come in.

9. Performance of the JDBC

Relational databases are another common performance problem in typical Java applications. To get a good response time to a complete request, we must of course examine each layer of the application and consider how the code interacts with the underlying SQL DB.

9.1. Connection Pooling

Let's start with the well-known fact that database connections are expensive. A connection pooling mechanism is a great first step in resolving this.
A quick recommendation here is the HikariCP JDBC - a very light (approximately 130Kb) and fast JDBC connection pool structure.

9.2. JDBC Batching

Another aspect of the way we deal with persistence is to try to perform batch operations whenever possible. The JDBC package allows us to send several SQL statements in a single roundtrip of the database.
The performance gain can be significant both in the controller and on the database side. PreparedStatement is an excellent candidate for batches and some database systems (for example, Oracle) support only batches for prepared instructions.
Hibernate, on the other hand, is more flexible and allows us to change to batching with a single configuration.

9.3. Statement Caching

Next, the instruction cache is another way to potentially improve the performance of our persistence layer - a lesser-known performance optimization that you can easily benefit from.
Depending on the underlying JDBC driver, you can cache PreparedStatement on the client side (the controller) or on the database side (the syntax tree or even the execution plan).

9.4. Scale-Up and Scale-Out

Replication and database partitioning are also excellent ways to increase performance, and we must take advantage of these battle-tested architecture patterns to scale the persistent layer of our corporate application.

Interested in learning Java? Join now:” java training in chennai “

10. Architectural improvements

10.1. Caching

Memory prices are low and lower, and recovering data from a disk or network is still expensive. Caching is undoubtedly an aspect of application performance that we should not ignore.

Of course, the introduction of an autonomous caching system in the topology of an application adds complexity to the architecture, so a good way to start leveraging caching is to make good use of storage capabilities in existing caches in the libraries and structures that we are already using.

Interested in learning Java? Join now:” java training in bangalore
For example, most persistence structures have optimal support for caching. Web structures, such as Spring MVC, can also take advantage of storage support cached in Spring, as well as powerful HTTP-level caching, based on ETags.

But, after the pending fruit is selected, the caching of content that is frequently accessed in the application, on a caching server such as Redis, Ehcache or Memcache, can be a good next step - reduce the Load the data bank and provide the performance of the application.

10.2. Scaling out

No matter how hardware we launch in one instance, at some point that will not be enough. Simplifying, staggering has natural limitations, and when the system achieves this, scalability is the only way to grow, evolve and simply handle more load.
It is not new that this stage comes with significant complexity, but, nevertheless, it is the only way to scale an application after a certain point.

And the support is good and is always improving, in most modern frameworks and libraries. The Spring ecosystem has a whole group of projects created specifically to address this specific area of application architecture, and most other stacks have similar support.

Finally, an additional advantage of scaling with the help of a cluster, in addition to the pure performance of Java - is that the addition of new nodes also leads to redundancy and the best techniques to handle failures, leading to a greater general availability of the system .

11. Conclusion

In this article, we explore several different concepts on how to improve the performance of a Java application. We started with load tests, applications based on APM tools and server monitoring, followed by some of the best practices around the creation of high performance Java code.

Finally, we examined the JVM-specific tuning tips, the database side optimizations, and the architecture changes to scale our application.




No comments:

Post a Comment

From Java 8 to Java 11

Switching from Java 8 to Java 11 is more complicated than most updates. Here are some of my notes on the process. Modules Java 9 i...