Friday, October 29, 2010

Sondasplorer - A bytecode instrumentalization framework (current state)

Here it is a small presentation about the current state of Sondasplorer, the framework to inspect J2EE apps that I am developing in my very sparse free time.

Sondasplorer Presentation in english
Presentación de Sondasplorer en español

Monday, April 5, 2010

ORA-01000 and Sentences Cache

One issue I have come across recently is the leak of cursors in a application in a high concurrency more than 600 users and more than 30 pages served per second. With this conditions, after 10 minutes of test, I started to see the ORA-01000 Maximum Number of Cursors Exceeded exception frequently.

If I executed the above second query, I saw that some users of my application had really reached the 300 real opened cursors that are the maximum number of cursors per session configured in Oracle. We can see this parameter with the following query:
  SELECT name, value FROM v$parameter WHERE name='open_cursors'
After analyzing the application source code deeply, I confirmed that it was releasing the database resources properly, CallableStatements in this case, so the problem seemed to be another. After a little debugging I realized that when the PreparedStatement was closed the
cursor keeped opened which could mean two things: a bug in the server application or the Oracle driver or that the application server was using any kind of setences cache ... it was a good thing that the problem was the second supposition.

The application was being executed in WebSphere which has a statement cache to optimize the processing of prepared statements and callable statements implementing the PreparedStatement interface with a propietary IBM class. When the number of cached statements is less than the size of this cache, a close over the statement doesn't really close the
resources, it is like a "soft" close. The execution of the query to show the opened cursors shows the real opened cursors, these "soft" closed cursors too!!

The problem was that the size of this cache was configured with a value of 1000 so when the Oracle limit of 300 was reached, started to fail with ORA-01000 exceptions because no statement was close because the total count was always <1000(<300 because of the Oracle limit)

PreparedStatements and Oracle Cursors

One thing we must always bear in mind when an application goes to a production environment is that it uses the shared resources properly. In this case, I'll talk about Oracle cursors related to PreparedStatements. Every time we execute a PreparedStatement a cursor is opened and it is not released until an explicit close is not invoked. The right way is to release the PreparedStatement, the ResultSet if exists and the Connection, for example:
// Create a connection or get one from a pool
Connection con = getConnection();
...
PreparedStatement ps = null;
ResultSet rs = null;

try {
ps = con.prepareStatement(sqlIsi);
rs = ps.executeQuery();
}finally {
if (rs!=null)
try {rs.close();} catch(SQLException e) {}
if (ps!=null)
try {ps.close();} catch(SQLException e) {}
}

...

// Release the connection:
// close the connection or return it to the pool
con.close();
We always have to release the resources in a finally block so the release is always executed even if exceptions are thrown in the try block. Doing that we'll avoid cursors leaks and the ugly ORA-01000 Maximum Number of Cursors Exceeded.

Concerning to the execution speed, every time a query is invoked, Oracle have to parse it. In the case of PreparedStatements, sentences with a high probability of being executed multiple times, the parsing time can be high. This time can be reduced by caching the sentences once parsed and that's what Oracle does. Oracle maintains a cache table where it stores the parsed sentences, speeding up the execution when the same ones are frequently invoked. As this cache is not unbounded, we have to invoke the PreparedStatements in a proper way: they have to be parametrized. For example:
  select name,number from person where id=2451
select name,number from person where id=?
From a functional point of view, the above sentences are identical, they retieve the information of a person from his/her Id. But from a performance point of view they are very different. If that kind of sentences have to be executed a lot of times, the overall execution of sentences will perform better with the second sentence because the cache will be working well.

In the first case, if the Id is different, every time this sentence is invoked, it has to be parsed because it really is a new sentence:
  select name,number from person where id=2451
select name,number from person where id=1121
select name,number from person where id=5489
select name,number from person where id=...
In the second case, the sentence is always the same:
  select name, number for person where id=?
We can monitorize the cached sentences using the following query:
  SELECT a.sid, user_name, status,
osuser, machine, c.sql_text
FROM v$session b, v$open_cursor a, v$sql c
WHERE a.sid = b.sid
AND a.address=c.address
AND a.user_name IN (
SELECT DISTINCT user_name FROM v$session)
AND user_name={user_name}
ORDER BY user_name DESC, a.sid, c.sql_text, 3;
where {user_name} has to be replaced by the desired user. Even though in the above sentence we are querying from the open_cursor table, this table doesn't contain real opened cursors but cached statements!!!

In order to know the real opened cursors of a session, we have to use a sentence like the following:
  SELECT a.value, s.username,
s.sid, s.serial#, p.spid
FROM v$sesstat a, v$statname b,
v$session s, gv$session s2
JOIN gv$process p ON p.addr = s2.paddr
AND p.inst_id = s2.inst_id
WHERE a.statistic#=b.statistic#
AND s.sid=a.sid AND s.sid=s2.sid
AND s.username={user_name}
AND b.name = 'opened cursors current';
It is important to remark that the CallableStatement interface extends from PreparedStatement so all said about PreparedStatements also applies to classes implementing this interface. This means that sentences like:
  BEGIN mypackage.myprocedure(param1,param2);END;
must be replaced by:
  BEGIN mypackage.myprocedure(?,?);END;
in order to improve the performance using the statements cache properly.

Friday, January 15, 2010

Sondasplorer - A bytecode instrumentalization framework

Sondasplorer is the new personal project which I am working on.


Sondasplorer is a framework that allows us to add probes to existing classes and methods according to necessities, which let us to isolate the complexity of modifying the Java bytecode. This framework let us use existing Sondasplorer probes or even create ours in a very easy and friendly way.

Sondasplorer is designed to be extremally fast being suitable to be used in both Preproduction and Production environments. We can make similar things to commercial tools as PerformaSure and Introscope !!!

In Sondasplorer framework a probe is a Java class which purpose is to assess and/or modify any parameter in runtime. For example we can:
  • Assess methods total execution time
  • Assess methods exclusive execution time
  • Assess execution errors
  • Investigate methods calling parameters
  • Assess methods invocation count
  • Generate a performance execution tree
  • Detect memory leaks
  • Alter existing classes funcionality in a transparent way
  • What your imagination can create
For example, there exists a probe specialized in PreparedStatements that let us to assess the execution time and to view which parameters they were called with!

The probes can send the collected information to remote managers where the raw information can be processed, stored and viewed both in runtime and offline. We can have a battery of probes as our working toolkit.

The bytecode modification (instrumentalization) is realized in a dynamic and transparent way meaning that it is not neccesary to alter existing applications to add probes. We have only to indicate the classes and/or methods we want to instrumentalize and the framework will automatically modify the classes just in the time of being loaded.

Every Java application can be instrumentalized, including standalone and web applications running in application servers (Websphere, Tomcat, etc).

It is possible to enable and disable probes on demand. That allow us, for example, to enable the full-trace of a probe only when is needed and during the neccesary time. This is very useful in Preproduction environments where it is not possible to enable the full-trace mode from the beginning of a heavy test.

Following I have attached some captures of a probe I am currently developing called WebProbe that let us analyze the performance of web applications:




As you can see, we can have multiples views over the same collected data set and even better, we can define our views adapted to our necessities.

The characteristcs will be better specified in new entries.

If you consider Sondasplorer interesting, any help would be really appreciated!!


Wednesday, January 13, 2010

Atomicy in primitive types

Sometimes we can have a thread changing the state of a primitive type flag, and other threads reading it.

The correct way to proceed is to synchronize that flag using synchronized blocks or defining it as volatile to assure visibility between threads. But what about atomicy?

If we define a variable of a primitive type and make read or write actions over it without synchronizing, supposing that it were visible from other threads, would that operation be atomic? That's to say, if the primitive type is for example an integer (4 bytes), are those 4 bytes written atomically?

The answer is almost a "Yes". Internally, all primitive types but double and long are managed atomically as 4 bytes blocks. That differs from C/C++ where, depending on the platform read and write actions can or cannot be atomic. If we don't synchronize correctly in C/C++ we can read a variable of a primitive type written partially with unexpected results.


What happens with double and long primitive types?

As they are 64 bits types they can be treated by the compiler as two 32 bits read and write operations. Therefore, in case of not being synchronized correctly we could run into those same strange things that might happen in C/C++.


Any other solution besides using synchronized blocks?

Yes, we have the volatile declaration. Besides assuring visibility, if we declare a primitive type as volatile we are also assuring atomicy in read and write actions over it.

Remember that accessing volatile variables are lighter than using synchronized blocks and they never block.

Friday, January 8, 2010

Volatile Fields

Volatile fields are present from a long time but even today they are a big unknown. In times where most machines had only a processor, although possibly necessary, we could write programs without having to deal with them. But in these days, with the arrival of multiprocessor systems, volatile fields have become vital. I am not going into details as there are a lot of resources in the web about them.

Let's say that volatile fields allow us to make fields visible between thread. A situation where it can be useful is the case when a flag can be set and unset by different thread concurrently. Suppose we need to turn the debug mode on/off in runtime. We could have a global flag field accessible by all threads that read it to know if they have to print traces. For example:


class Global {
static int debugLevel = false;
static boolean debugFlag = 0;

static void turnOnDebug(int level) {
debugLevel = level;
debugFlag = true;
}

static void turnOffDebug() {
debugFlag = false;
}
}

class Worker {
void doWork() {
...
if (Global.debugFlag) {
if (level>=1) {
System.out.println("INFO: in doWork method");
}

if (level>=2) {
System.out.println("DEBUG: more detailed information");
}
}
...
}
}


What would happen if a thread calls Global.turnOnDebug(2)?

The logical answer is that all threads would start printing traces but what really happends is that it is not guaranteed that other threads see debugFlag as true.

How come?

Because of visibility of fields as debugFlag field can be cached without being placed in main memory. That implies that the other threads see an outdated value of this field.

Any other surprise?

Yes, even in case of the other threads noticed the change of the field debugFlag, they could see the level value as 0!

How come?

Because of a technique used to improve performance called reordering. In this case, when turnOnDebug is called, the compiler is free to execute the sentence setting debugFlag as true before setting debugLevel. Then, if just before setting the level another thread is scheduled and executes the doWork method, it could see the field debugFlag as true and level as 0.

Well, to solve this, we can use volatile fields. A detailed explanation using the happens-before idiom can be read in JSR133, here I will only put it in practice.

We could access this field using synchronized blocks, but it would be better in performance terms to define the debugFlag field as volatile:

volatile static boolean debugFlag = 0;

Volatile fields are a lot lighter than synchronized blocks and are guaranteed to never block.

Why?

Reads of volatile fields are guaranteed to be updated!!!
Compiler assures that the field is placed in memory (no cache) before the read is made.

Shouldn't we make debugLevel volatile as well?

We could, but it is not necessary because when a thread reads a volatile field, it is guaranteed that the values of previous fields written are updated and visible from the reader thread. That means that the new value of debugLevel is sure to be in memory after reading the volatile field and this implies that the reordering commented before cannot take place.

As said before, all described here can be expressed formally using the happens-before idiom described in JSR133 and in the Java Memory Model of recent revisions of JLS (Java Language Specification). It applies officially since Java 1.5(unofficially since latest revisions of Java 1.4). Before this version, Java Memory Model had some issues and we had to use synchronized blocks instead of lighter volatile fields.