Friday, September 7, 2018

14 High-Performance Java Persistence Tips

Introduction

A powerful data access layer requires a lot of knowledge about the internal database, JDBC, JPA, Hibernate, and this article summarizes some of the key techniques you can use to optimize your business application.

1. SQL statement logging
When you use a structure that generates statements for you, you must always check the effectiveness and efficiency of each statement. A Test Time Assertion mechanism is even better because you can capture N + 1 query issues before even confirming your code.
2. Connection management
Because database connections are expensive, you must always use a connection pooling mechanism.Because the number of connections is provided by the underlying database cluster resources, you must release the connections as soon as possible.
In the performance parameter, you must always measure and set the correct size of the grouping, which is not different. A tool like FlexyPool can help you find the right size, even after the implementation of the application in production.

3. JDBC batching

The JDBC package allows us to send multiple SQL statements in a single round of the database. The performance gain is important for both the controller and the database. PreparedStatements are very good candidates for many, and some Oracle database systems (for example) only support batch-ready statements.
Because the JDBC API sets a different one for stacks (for example, Preparado Statement.addBatchand PreparedStatement.executeBatch), when you generate statements, you must manually know from the start whether you want to use stacks or not. With Hibernate, you can switch to a stack with a configuration.

4. Statement caching

Instruction cache is one of the least known performance optimizations that you can easily use. Depending on the underlying JDBC driver, you can store in the PreparedStatements cache on the client side (the controller) or on the side of the database (syntax tree or even the execution plan)

5. Hibernate identifiers

If you use Hibernate, the Identity Generator is not a good choice because it mutes the JDBC stack.
TABLE Generator is even worse because it uses a separate search for a new ID operation that can push the underlying transaction log and the connection pool as a separate connection whenever you need a new identifier
SEQUENCE is the right choice and even SQL Server supports version 2012. For Sequence Identifier, Hibernate has offered Optimizer as pooled pools that can reduce the number of database moves required to find a new value. entity identifier

6. Choosing the right column types

You should always use the right column types on the database side. The more compact the column type is, the more entries can be accommodated in the database working set, and indexes will better fit into memory. For this purpose, you should take advantage of database-specific types (e.g. inet for IPv4 addresses in PostgreSQL), especially since Hibernate is very flexible when it comes to implementing a new custom Type.

7. Relationships

Hibernate comes with many relationship mapping types, but not all of them are equal in terms of efficiency. However, unlike queries, collections are less flexible since they cannot be easily paginated, meaning that we cannot use them when the number of child associations is rather high. For this reason, you should always question if a collection is really necessary. An entity query might be a better alternative in many situations.
8. Inheritance
When it comes to inheritance, the impedance mismatch between object-oriented languages and relational databases becomes even more apparent. JPA offers SINGLE_TABLEJOINED, and TABLE_PER_CLASS to deal with inheritance mapping, and each of these strategies has pluses and minuses.
SINGLE_TABLE performs the best in terms of SQL statements, but we lose on the data integrity side since we cannot use NOT NULL constraints.
JOINED addresses the data integrity limitation while offering more complex statements. As long as you don’t use polymorphic queries or @OneToManyassociations against base types, this strategy is fine. Its true power comes from polymorphic @ManyToOne associations backed by a Strategy pattern on the data access layer side.
TABLE_PER_CLASS should be avoided since it does not render efficient SQL statements.

9. Persistence Context size

When using JPA and Hibernate, you should always mind the Persistence Context size. For this reason, you should never bloat it with tons of managed entities. By restricting the number of managed entities, we gain better memory management, and the default dirty checking mechanism is going to be more efficient as well.

10. Fetching only what’s necessary

Fetching too much data is probably the number one cause for data access layer performance issues. One issue is that entity queries are used exclusively, even for read-only projections.

11. Caching
Relational database systems use many in-memory buffer structures to avoid disk accessDatabase caching is very often overlooked. We can lower response time significantly by properly tuning the database engine so that the working set resides in memory and is not fetched from disk all the time.
Application-level caching is not optional for many enterprise application. Application-level caching can reduce response time while offering a read-only secondary store for when the database is down for maintenance or because of some serious system failure.

12. Concurrency control
The choice of transaction isolation level is of paramount importance when it comes to performance and data integrity. For multi-request web flows, to avoid lost updates, you should use optimistic locking with detached entities or an EXTENDEDPersistence Context.
To avoid optimistic locking false positives, you can use versionless optimistic concurrency control or split entities based write-based property sets.

13. Unleash database query capabilities

Just because you use JPA or Hibernate, it does not mean that you should not use native queries. You should take advantage of Window Functions, CTE (Common Table Expressions), CONNECT BYPIVOT.
These constructs allow you to avoid fetching too much data just to transform it later in the application layer. If you can let the database do the processing, you can fetch just the end result, therefore, saving lots of disk I/O and networking overhead. To avoid overloading the Master node, you can use database replication and have multiple Slave nodes available so that data-intensive tasks are executed on a Slave rather than on the Master.
14. Scale up and scale out
Relational databases do scale very well.
If FacebookTwitterPinterest or StackOverflow can scale their database system, there is good chance you can scale an enterprise application to its particular business requirements.

Database replication and sharding are very good ways to increase throughput, and you should totally take advantage of these battle-tested architectural patterns to scale your enterprise application.

Conclusion

A high-performance data access layer must resonate with the underlying database system. Knowing the inner workings of a relational database and the data access frameworks in use can make the difference between a high-performance enterprise application and one that barely crawls.
There are many things you can do to improve the performance of your data access layer, and I’m only scratching the surface here. If you want to read more on this particular topic, you should check my High-Performance Java Persistence book as well. With over 450 pages, this book explains all these concepts in great detail.





No comments:

Post a Comment

From Java 8 to Java 11

Switching from Java 8 to Java 11 is more complicated than most updates. Here are some of my notes on the process. Modules Java 9 i...