Stale Connection in WebSphere Application Server 6.1

Well, a customer of mine has a very weird bug. Sometimes he gets a StaleConnection exception when executing a statement against his Oracle Connection. The client runs WebSphere App Server 6.1, Oracle Database 10g, and uses EJBs and an XA datasource.

Finally, we discovered that there is a scenario that always raises the Stale Connection Exception, and we started investigating the root of the problem.
First of all, WebSphere throws the Stale Connection exception as a wrapper to specific SQL Exceptions received from the JDBC connection.
But why is the connection suddenly closed? A step-by-step following with the debugger found that we get the connection open from the datasource, and then, suddenly, the connection is closed.
We tried to convert the datasource to a non-XA datasource, and received allot of exceptions. This showed us that the process in question needed the XA capabilities of WAS.
We then wrote wrappers over Oracle’s XA connection manager (oracle.jdbc.xa.client.OracleXADataSource) and Oracle’s Connection (oracle.jdbc.pool.OraclePooledConnection), and seen the debug messages.
We still have no solution, but there can be 2 options:
1. There is a bug with the WAS XA handler.
2. When using statement.getConnection().close(), we close the physical connection, and not the logical connection coming from the datasource. The customer now changes his code, to see if he can get away without the statement.getConnection() bit, and close the logical connection received from the datasource.
I’ll post more details later.

WebSphere ESB Invalid Content Length

Well, turns out I was mistaken in my previous post. Invalid Content Length can occur when using MTOM in .net C# clients with WebSphere ESB, but that was not the case in our customer.

Invalid Content Length appeared when the client closes the socket before sending the entire request. This can happen when the process is halted during send time. WESB will sysout Invalid Content Length, but you can usually ignore it.
However, we still faced a problem with very large service calls (over 1MB in size – only XML, no attachments). Turns out that our synchronization code was messed up, and for some reason – our cache was not correctly initialized, and so we received allot of NPE (NullPointerExceptions).
So – we synched our cache, and voilla – all works.
That teaches me to blog before I see everything working in my own eyes.
One last important issue. Sending large service calls can take time. Allot of time. And so, each C# client has a Timeout property, that sets that Timeout for the service call, in milliseconds. Use it well, since you are very likely to get a Timeout exception before you get the web service response.