Sunday, November 15, 2009

HOWTO: Serialize data from an object from .NET to Java using custom binary formatting.

Problem: You want to send data across the network, from a .NET application to a Java application (or vice versa). The data is actually an object that you want to send over the network.

Since these are heterogenous platforms, you cannot use the Runtime specific serialization methods (for eg BinaryFormatter in .NET). You need to roll your own.

You basically have two choices. You can manually transfer the data, by converting the primitive types from one platform into a binary, over the wire representation. On the other side, you need to convert the binary, over the wire representation into a primitive type of the platform.

The other option is to use a serialization method supported by the platform, for eg XmlSerialization. You would then write code on the peer to parse the XML document from the network and convert it into your object representation. Since XML is a portable standard, you can do this fairly easily.

In this article, I will be talking about the first, i.e how to do Binary Serialization manually.

In order to demonstrate this, we will work with the following object in Java.

package net.example;

public class EmployeeData {
 private int id;
 private String firstName;
 private String lastName;
 private int salary;
 
 public void setId(int id) {
  this.id = id;
 }
 public int getId() {
  return id;
 }
 public void setFirstName(String firstName) {
  this.firstName = firstName;
 }
 public String getFirstName() {
  return firstName;
 }
 public void setLastName(String lastName) {
  this.lastName = lastName;
 }
 public String getLastName() {
  return lastName;
 }
 public void setSalary(int salary) {
  this.salary = salary;
 }
 public int getSalary() {
  return salary;
 }
}

This has an equivalent object in .NET

class Employee
    {
        public int Id;
        public String FirstName;
        public String LastName;
        public int Salary;
    }

As you can see, these objects are fairly equivalent in terms of schema.

For each of the platforms, the program will be similar. Each application will receive data that purports to be an Employee record. The app will deserialize the data into an employee record. It will then send the same data back to the peer. The peer will then deserialize the data back into it's equivalent Employee structure. Thus, this will demonstrate a good round-trip between the heterogenous platforms.

First, let us look at the complete application in .NET.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Net.Sockets;
using System.IO;

namespace netSock
{
    class Employee
    {
        public int Id;
        public String FirstName;
        public String LastName;
        public int Salary;
    }

    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                Run(args);
            }
            catch (Exception e)
            {
                Console.Error.WriteLine(e);
            }
        }

        static void Run(string[] args)
        {
            TcpListener listener = new TcpListener(8080);
            listener.Start();

            while (true)
            {

                using (TcpClient client = listener.AcceptTcpClient())
                {
                    try
                    {
                        Read(client);
                    }
                    catch (Exception e)
                    {
                        Console.Error.WriteLine(e);
                    }
                }
            }
        }

        static void Read(TcpClient client)
        {
            Console.WriteLine("Got connection: {0}", DateTime.Now);
            NetworkStream ns = client.GetStream();
            BinaryReader reader = new BinaryReader(ns);

            Employee emp = new Employee();
            // first read the Id
            emp.Id = reader.ReadInt32();

            // length of first name in bytes.
            int length = reader.ReadInt32();
            
            // read the name bytes into the byte array.
            // recall that java side is writing two bytes for every character.
            byte[] nameArray = reader.ReadBytes(length);
            emp.FirstName = Encoding.UTF8.GetString(nameArray);

            // last name
            length = reader.ReadInt32();
            nameArray = reader.ReadBytes(length);
            emp.LastName = Encoding.UTF8.GetString(nameArray);

            // salary
            emp.Salary = reader.ReadInt32();

            Console.WriteLine(emp.Id);
            Console.WriteLine(emp.FirstName);
            Console.WriteLine(emp.LastName);
            Console.WriteLine(emp.Salary);

            System.Threading.Thread.Sleep(5);

            Console.WriteLine("Writing data...");
            // now reflect back the same structure.
            BinaryWriter bw = new BinaryWriter(ns);

            bw.Write(emp.Id);
            byte [] data = Encoding.UTF8.GetBytes(emp.FirstName);
            bw.Write(data.Length);
            bw.Write(data);
            
            data = Encoding.UTF8.GetBytes(emp.LastName);
            bw.Write(data.Length);
            bw.Write(data);

            bw.Write(emp.Salary);

            Console.WriteLine("Writing data...DONE");

            client.Client.Shutdown(SocketShutdown.Both);
            ns.Close();
        }
    }
}

This program is very simple. It uses a TcpListener to create a server application, and waits for connections. After getting a connection, it reads data from the stream and deserializes it into a Employee class. It then sends the object back to the sender by manually serializing the primitive types into the stream.

The output of this program is as follows (when the java client connects to it):

Got connection: 11/15/2009 12:20:02 AM
1
John
Smith
2
Writing data...
Writing data...DONE

AS you can see, the employee First Name is "John" and Last name is "Smith".

If you have some problem with this, then you can debug it by getting a Tracelog for the application.

Now, let us look at the java example application:

package net.example;
import java.io.DataOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.Socket;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CharsetEncoder;

public class Main {
 public static void main(String [] args)
 {
  
  int written = 0;
  EmployeeData emp = new EmployeeData();
  emp.setFirstName("John");
  emp.setLastName("Smith");
  emp.setId(1);
  emp.setSalary(2);
  
  // Create the encoder and decoder for targetEncoding
  Charset charset = Charset.forName("UTF-8");
  CharsetDecoder decoder = charset.newDecoder();
  CharsetEncoder encoder = charset.newEncoder();
  byte [] underlyingBuffer = new byte[1024];
  ByteBuffer buffer = ByteBuffer.wrap(underlyingBuffer);
  buffer.order(ByteOrder.LITTLE_ENDIAN);
  try
  {
   Socket client = new Socket("localhost", 8080);
   
   OutputStream oStream = client.getOutputStream();
   InputStream iStream = client.getInputStream();

   serialize(buffer, emp, encoder);
   
   buffer.flip();
   
   int dataToSend = buffer.remaining();
   System.out.println("# bytes = " + dataToSend);
   
   System.out.println("#Bytes in output buffer: " + written + " limit = " + buffer.limit() + " pos = " + buffer.position() + " remaining = " + buffer.remaining());
   
   int remaining = dataToSend;
   while(remaining > 0)
   {
    oStream.write(buffer.get());
    -- remaining;
   }
   
   // now client echoes back the data.
   
   ByteBuffer readBuffer = ByteBuffer.allocate(1024);
   readBuffer.order(ByteOrder.LITTLE_ENDIAN);
   
   int db = iStream.read();
   while(db != -1)
   {
    System.out.println(db);
    readBuffer.put((byte)db);
    db = iStream.read();
   }
   
   int numberOfBytesRead = readBuffer.position();
   
   System.out.println("Number of bytes read: " + numberOfBytesRead);
   
   readBuffer.flip();

   EmployeeData rEmp = new EmployeeData();
   rEmp.setId(readBuffer.getInt());
   
   int length = readBuffer.getInt();
   System.out.println("FName Length: " + length);
   byte [] stringBuffer = new byte[length];
   readBuffer.get(stringBuffer);
   rEmp.setFirstName(decoder.decode(ByteBuffer.wrap(stringBuffer)).toString());
   
   length = readBuffer.getInt();
   System.out.println("LName Length: " + length);
   stringBuffer = new byte[length];
   readBuffer.get(stringBuffer);
   rEmp.setLastName(decoder.decode(ByteBuffer.wrap(stringBuffer)).toString());
   
   rEmp.setSalary(readBuffer.getInt());
   
   System.out.println("ID: " + rEmp.getId());
   System.out.println("First: " + rEmp.getFirstName());
   System.out.println("Last: " + rEmp.getLastName());
   System.out.println("Salary: " + rEmp.getSalary());
   System.out.flush();
   
   client.close();
   
  } catch(Exception e) {
   e.printStackTrace(System.err);
  } finally {
   System.out.println("Written bytes: " + written);
  }
 }
 
 private static void serialize(ByteBuffer buffer, EmployeeData emp, CharsetEncoder encoder)
 {
  // id
  buffer.putInt(emp.getId());
  
  CharBuffer nameBuffer = CharBuffer.wrap(emp.getFirstName().toCharArray());
  ByteBuffer nbBuffer = null;
  
  // length of first name
  try
  {
   nbBuffer = encoder.encode(nameBuffer);
  } 
  catch(CharacterCodingException e)
  {
   throw new ArithmeticException();
  }

  System.out.println(String.format("String [%1$s] #bytes = %2$s", emp.getFirstName(), nbBuffer.limit()));
  buffer.putInt(nbBuffer.limit());
  buffer.put(nbBuffer);

  // put lastname
  nameBuffer = CharBuffer.wrap(emp.getLastName().toCharArray());
  nbBuffer = null;
  
  // length of first name
  try
  {
   nbBuffer = encoder.encode(nameBuffer);   
  } 
  catch(CharacterCodingException e)
  {
   throw new ArithmeticException();
  }

  System.out.println(String.format("String [%1$s] #bytes = %2$s", emp.getLastName(), nbBuffer.limit()));
  buffer.putInt(nbBuffer.limit());
  buffer.put(nbBuffer);
  
  // salary
  buffer.putInt(emp.getSalary());
 }
}


The java client has a similar structure to the .NET app, except for the fact that the .NET app acts like the server, whereas the Java app is just a client. However that difference is not material to the current topic.

The output of the java program is as follows:

String [John] #bytes = 4
String [Smith] #bytes = 5
# bytes = 25
#Bytes in output buffer: 0 limit = 25 pos = 0 remaining = 25
1
0
0
0
4
0
0
0
74
111
104
110
5
0
0
0
83
109
105
116
104
2
0
0
0
Number of bytes read: 25
FName Length: 4
LName Length: 5
ID: 1
First: John
Last: Smith
Salary: 2
Written bytes: 0

Now, let me talk about some important details of this application. First realize, that each platform has it's own way of how it represents primitive datatypes (bytes/integers/floats/characters) in a byte representation in memory. This concept is called Endianness. In the case of Java, the JRE is "Big Endian". While, on the CLR (i.e .NET) the byte representation is "Little Endian". So, when we convert data into a binary representation, we need to make sure that both sides do it the same way.

Since .NET is "Little Endian" I decided to convert everything to Little Endian. On Java, this is very simple. Since I was using ByteBuffer to convert Java primitive types into bytes, I just set the ByteOrder on the ByteBuffer to be LITTLE_ENDIAN.

The second thing to remember is about strings. When you are serializing objects that contain heterogenous datatypes, or types that vary in length, you need a way of distinguishing the end of one type from the begining of another. In the case of strings, I chose the Length. So, when I serialize a string, I first write out the length of the string, and then the entire string. The receiver first reads the length of the string, and then allocates a byte buffer for that string. It then reads in the actual bytes (upto the length of the string) into the array. Note, that the quantity written is not actually the length in Characters, but the length in bytes of the Encoded string.

Which brings me to the final point. String conversion to Bytes can be done in various ways. I chose to use UTF-8 encoding. Again, the encoding used has to be the same on both the sides in order to be able to exchange the data correctly.

I hope this was useful to you. In the next article, I will talk about using XML as a mechanism of transporting objects across heterogenous platforms.