Sunday, December 6, 2009

Socket.SendFile - Implementing fast file transfer

Implementing fast file transfer with Socket.SendFile

When writing network applications, we usually have a need to implement file transfer between two hosts. For eg, imaging an FTP client, where the client is downloading or uploading a file from an FTP server. Similarly, you could be uploading an image file (probably a photo) as an attachment to a Blog or a website like Facebook or Flickr.

Usually, file transfer is implemented as a Read/Write pattern, where you read from the source stream and write into the destination stream. Here the source stream is the stream constructed from the Socket, and the target stream is the file, or vice versa if the file is being transferred to a destination server.

The simple Read/Write pattern for file transfer is implemented as follows.

In the .NET framework, there is a better way to do file uploads, which is exposed through Socket.SendFile method. This method exposes the underlying Winsock API TransmitFile. This API is much more powerful and faster in terms of performance.

In order to check the performance difference, I wrote an application that compares the difference in performance between using the Read/Write pattern and the Socket.SendFile method.

Here is the test program:
A couple of things to note about this implementation:

1) It uses Message framing to frame file transfers, since it uses the same socket for multiple file transfers. I have used the techniques in Serializing data from .NET to Java to do this. Even though there is no Java app that is involved here, the techniques are the same.

2) The server just drains the incoming stream. It does not save the incoming data to a file. Since we are just interested in benchmarking the performance between the two Send implementations, we should be ok here.

3) The program, which is basically a perf harness, uses a Strategy pattern to change the SendFile method used. That way everything else remains the same, and it just changes the SendFile method to get performance numbers.

Perf Comparison

The following graph shows the performance with the simple Read/Write pattern for file transfer.

performance of file transfer with Socket.BeginSend

The following chart shows the performance when Socket.SendFile is used.

performance of file transfer with Socket.SendFile

As you can see, there is a huge difference between the two, specially for 1M file size. With Socket.SendFile, it takes max 129ms for upload, whereas without this API, it takes 1000ms for upload. For smaller file sizes, there is not that much of a difference.

There is a huge variance in timings for the SendFile() method for 1M file size, but I havent been able to figure out the reason for that yet. Anyway, the fact that Socket.SendFile() is faster should not be impacted by that.

Sunday, November 15, 2009

How to send object from Java to .NET: Alternate implementation

In the last article, I described how to send an object from a Java application to a .NET application over a Socket connection.

https://ferozedaud.blogspot.com/2009/11/howto-serialize-data-from-object-from.html

The Java implementation was relying on ByteBuffer class to serialize the object in LittleEndian format that the .NET code understands.

However, it turns out that using a ByteBuffer is not necessary. Java's native format is BigEndian. The network byte order is also BigEndian. However, the X86 platform (and .NET) are LittleEndian. So, we only need a way to convert from BigEndian to LittleEndian and vice versa on .NET.

In .NET, the IPAddress.HostToNetworkOrder and IPAddress.NetworkToHostOrder methods are provided for doing this conversion.

On the java application, we can forego the ByteBuffer and use DataInputStream/DataOutputStream directly. This makes the code more concise and easy to understand.

With these changes, the .NET application looks as follows: (I am only including changes to the Read() method, the other stuff is the same.

static void Read(TcpClient client)
        {
            Console.WriteLine("Got connection: {0}", DateTime.Now);
            NetworkStream ns = client.GetStream();
            BinaryReader reader = new BinaryReader(ns);

            Employee emp = new Employee();
            // first read the Id
            emp.Id = IPAddress.NetworkToHostOrder(reader.ReadInt32());

            // length of first name in bytes.
            int length = IPAddress.NetworkToHostOrder(reader.ReadInt32());
            
            // read the name bytes into the byte array.
            // recall that java side is writing two bytes for every character.
            byte[] nameArray = reader.ReadBytes(length);
            emp.FirstName = Encoding.UTF8.GetString(nameArray);

            // last name
            length = IPAddress.NetworkToHostOrder(reader.ReadInt32());
            nameArray = reader.ReadBytes(length);
            emp.LastName = Encoding.UTF8.GetString(nameArray);

            // salary
            emp.Salary = IPAddress.NetworkToHostOrder(reader.ReadInt32());

            Console.WriteLine(emp.Id);
            Console.WriteLine(emp.FirstName);
            Console.WriteLine(emp.LastName);
            Console.WriteLine(emp.Salary);

            System.Threading.Thread.Sleep(5);

            Console.WriteLine("Writing data...");
            // now reflect back the same structure.
            BinaryWriter bw = new BinaryWriter(ns);

            bw.Write(IPAddress.HostToNetworkOrder(emp.Id));
            byte [] data = Encoding.UTF8.GetBytes(emp.FirstName);
            bw.Write(IPAddress.HostToNetworkOrder(data.Length));
            bw.Write(data);
            
            data = Encoding.UTF8.GetBytes(emp.LastName);
            bw.Write(IPAddress.HostToNetworkOrder(data.Length));
            bw.Write(data);

            bw.Write(IPAddress.HostToNetworkOrder(emp.Salary));

            Console.WriteLine("Writing data...DONE");

            client.Client.Shutdown(SocketShutdown.Both);
            ns.Close();
        }

As you can see, we still use BinaryReader/BinaryWriter. However we just call NetworkToHostOrder or HostToNetworkOrder before writing the bytes on the wire.

On the java application, there are changes in the Main() method and the serialize() method. We directly work with DataInputStream/DataOutputStream which are similar to .NET's BinaryReader/BinaryWriter.

public static void main(String [] args)
 { 
  int written = 0;
  EmployeeData emp = new EmployeeData();
  emp.setFirstName("John");
  emp.setLastName("Smith");
  emp.setId(1);
  emp.setSalary(2);
  
  // Create the encoder and decoder for targetEncoding
  Charset charset = Charset.forName("UTF-8");
  CharsetDecoder decoder = charset.newDecoder();
  CharsetEncoder encoder = charset.newEncoder();

  try
  {
   Socket client = new Socket("localhost", 8080);
   
   OutputStream oStream = client.getOutputStream();
   InputStream iStream = client.getInputStream();

   DataOutputStream dos = new DataOutputStream(oStream);

   serializeToStream(dos, emp, encoder);

   DataInputStream dis = new DataInputStream(iStream);

   // now client echoes back the data.
   EmployeeData rEmp = new EmployeeData();
   rEmp.setId(dis.readInt());
   
   int length = dis.readInt();
   System.out.println("FName Length: " + length);
   byte [] stringBuffer = new byte[length];
   dis.read(stringBuffer);
   rEmp.setFirstName(decoder.decode(ByteBuffer.wrap(stringBuffer)).toString());
   
   length = dis.readInt();
   System.out.println("LName Length: " + length);
   stringBuffer = new byte[length];
   dis.read(stringBuffer);
   rEmp.setLastName(decoder.decode(ByteBuffer.wrap(stringBuffer)).toString());
   
   rEmp.setSalary(dis.readInt());
   
   System.out.println("ID: " + rEmp.getId());
   System.out.println("First: " + rEmp.getFirstName());
   System.out.println("Last: " + rEmp.getLastName());
   System.out.println("Salary: " + rEmp.getSalary());
   System.out.flush();
   
   client.close();
   
  } catch(Exception e) {
   e.printStackTrace(System.err);
  } finally {
   System.out.println("Written bytes: " + written);
  }
 }
 
 private static void serializeToStream(DataOutputStream os, EmployeeData emp, CharsetEncoder encoder) throws IOException
 {
  // id
  os.writeInt(emp.getId());
  
  CharBuffer nameBuffer = CharBuffer.wrap(emp.getFirstName().toCharArray());
  ByteBuffer nbBuffer = null;
  
  // length of first name
  try
  {
   nbBuffer = encoder.encode(nameBuffer);
  } 
  catch(CharacterCodingException e)
  {
   throw new ArithmeticException();
  }

  System.out.println(String.format("String [%1$s] #bytes = %2$s", emp.getFirstName(), nbBuffer.limit()));
  os.writeInt(nbBuffer.limit());
  os.write(nbBuffer.array());

  // put lastname
  nameBuffer = CharBuffer.wrap(emp.getLastName().toCharArray());
  nbBuffer = null;
  
  // length of first name
  try
  {
   nbBuffer = encoder.encode(nameBuffer);   
  } 
  catch(CharacterCodingException e)
  {
   throw new ArithmeticException();
  }

  System.out.println(String.format("String [%1$s] #bytes = %2$s", emp.getLastName(), nbBuffer.limit()));
  os.writeInt(nbBuffer.limit());
  os.write(nbBuffer.array());
  
  // salary
  os.writeInt(emp.getSalary());
 }

We do not need to use ByteBuffer anymore - we were only using it because it supports LittleEndian format. The java app always expects to read/write bytes from the wire in BigEndian format (which is java's native format, as well as the Network order for all communications). The .NET application does the conversion from BigEndian to LittleEndian and back.

HOWTO: Serialize data from an object from .NET to Java using custom binary formatting.

Problem: You want to send data across the network, from a .NET application to a Java application (or vice versa). The data is actually an object that you want to send over the network.

Since these are heterogenous platforms, you cannot use the Runtime specific serialization methods (for eg BinaryFormatter in .NET). You need to roll your own.

You basically have two choices. You can manually transfer the data, by converting the primitive types from one platform into a binary, over the wire representation. On the other side, you need to convert the binary, over the wire representation into a primitive type of the platform.

The other option is to use a serialization method supported by the platform, for eg XmlSerialization. You would then write code on the peer to parse the XML document from the network and convert it into your object representation. Since XML is a portable standard, you can do this fairly easily.

In this article, I will be talking about the first, i.e how to do Binary Serialization manually.

In order to demonstrate this, we will work with the following object in Java.

package net.example;

public class EmployeeData {
 private int id;
 private String firstName;
 private String lastName;
 private int salary;
 
 public void setId(int id) {
  this.id = id;
 }
 public int getId() {
  return id;
 }
 public void setFirstName(String firstName) {
  this.firstName = firstName;
 }
 public String getFirstName() {
  return firstName;
 }
 public void setLastName(String lastName) {
  this.lastName = lastName;
 }
 public String getLastName() {
  return lastName;
 }
 public void setSalary(int salary) {
  this.salary = salary;
 }
 public int getSalary() {
  return salary;
 }
}

This has an equivalent object in .NET

class Employee
    {
        public int Id;
        public String FirstName;
        public String LastName;
        public int Salary;
    }

As you can see, these objects are fairly equivalent in terms of schema.

For each of the platforms, the program will be similar. Each application will receive data that purports to be an Employee record. The app will deserialize the data into an employee record. It will then send the same data back to the peer. The peer will then deserialize the data back into it's equivalent Employee structure. Thus, this will demonstrate a good round-trip between the heterogenous platforms.

First, let us look at the complete application in .NET.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Net.Sockets;
using System.IO;

namespace netSock
{
    class Employee
    {
        public int Id;
        public String FirstName;
        public String LastName;
        public int Salary;
    }

    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                Run(args);
            }
            catch (Exception e)
            {
                Console.Error.WriteLine(e);
            }
        }

        static void Run(string[] args)
        {
            TcpListener listener = new TcpListener(8080);
            listener.Start();

            while (true)
            {

                using (TcpClient client = listener.AcceptTcpClient())
                {
                    try
                    {
                        Read(client);
                    }
                    catch (Exception e)
                    {
                        Console.Error.WriteLine(e);
                    }
                }
            }
        }

        static void Read(TcpClient client)
        {
            Console.WriteLine("Got connection: {0}", DateTime.Now);
            NetworkStream ns = client.GetStream();
            BinaryReader reader = new BinaryReader(ns);

            Employee emp = new Employee();
            // first read the Id
            emp.Id = reader.ReadInt32();

            // length of first name in bytes.
            int length = reader.ReadInt32();
            
            // read the name bytes into the byte array.
            // recall that java side is writing two bytes for every character.
            byte[] nameArray = reader.ReadBytes(length);
            emp.FirstName = Encoding.UTF8.GetString(nameArray);

            // last name
            length = reader.ReadInt32();
            nameArray = reader.ReadBytes(length);
            emp.LastName = Encoding.UTF8.GetString(nameArray);

            // salary
            emp.Salary = reader.ReadInt32();

            Console.WriteLine(emp.Id);
            Console.WriteLine(emp.FirstName);
            Console.WriteLine(emp.LastName);
            Console.WriteLine(emp.Salary);

            System.Threading.Thread.Sleep(5);

            Console.WriteLine("Writing data...");
            // now reflect back the same structure.
            BinaryWriter bw = new BinaryWriter(ns);

            bw.Write(emp.Id);
            byte [] data = Encoding.UTF8.GetBytes(emp.FirstName);
            bw.Write(data.Length);
            bw.Write(data);
            
            data = Encoding.UTF8.GetBytes(emp.LastName);
            bw.Write(data.Length);
            bw.Write(data);

            bw.Write(emp.Salary);

            Console.WriteLine("Writing data...DONE");

            client.Client.Shutdown(SocketShutdown.Both);
            ns.Close();
        }
    }
}

This program is very simple. It uses a TcpListener to create a server application, and waits for connections. After getting a connection, it reads data from the stream and deserializes it into a Employee class. It then sends the object back to the sender by manually serializing the primitive types into the stream.

The output of this program is as follows (when the java client connects to it):

Got connection: 11/15/2009 12:20:02 AM
1
John
Smith
2
Writing data...
Writing data...DONE

AS you can see, the employee First Name is "John" and Last name is "Smith".

If you have some problem with this, then you can debug it by getting a Tracelog for the application.

Now, let us look at the java example application:

package net.example;
import java.io.DataOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.Socket;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CharsetEncoder;

public class Main {
 public static void main(String [] args)
 {
  
  int written = 0;
  EmployeeData emp = new EmployeeData();
  emp.setFirstName("John");
  emp.setLastName("Smith");
  emp.setId(1);
  emp.setSalary(2);
  
  // Create the encoder and decoder for targetEncoding
  Charset charset = Charset.forName("UTF-8");
  CharsetDecoder decoder = charset.newDecoder();
  CharsetEncoder encoder = charset.newEncoder();
  byte [] underlyingBuffer = new byte[1024];
  ByteBuffer buffer = ByteBuffer.wrap(underlyingBuffer);
  buffer.order(ByteOrder.LITTLE_ENDIAN);
  try
  {
   Socket client = new Socket("localhost", 8080);
   
   OutputStream oStream = client.getOutputStream();
   InputStream iStream = client.getInputStream();

   serialize(buffer, emp, encoder);
   
   buffer.flip();
   
   int dataToSend = buffer.remaining();
   System.out.println("# bytes = " + dataToSend);
   
   System.out.println("#Bytes in output buffer: " + written + " limit = " + buffer.limit() + " pos = " + buffer.position() + " remaining = " + buffer.remaining());
   
   int remaining = dataToSend;
   while(remaining > 0)
   {
    oStream.write(buffer.get());
    -- remaining;
   }
   
   // now client echoes back the data.
   
   ByteBuffer readBuffer = ByteBuffer.allocate(1024);
   readBuffer.order(ByteOrder.LITTLE_ENDIAN);
   
   int db = iStream.read();
   while(db != -1)
   {
    System.out.println(db);
    readBuffer.put((byte)db);
    db = iStream.read();
   }
   
   int numberOfBytesRead = readBuffer.position();
   
   System.out.println("Number of bytes read: " + numberOfBytesRead);
   
   readBuffer.flip();

   EmployeeData rEmp = new EmployeeData();
   rEmp.setId(readBuffer.getInt());
   
   int length = readBuffer.getInt();
   System.out.println("FName Length: " + length);
   byte [] stringBuffer = new byte[length];
   readBuffer.get(stringBuffer);
   rEmp.setFirstName(decoder.decode(ByteBuffer.wrap(stringBuffer)).toString());
   
   length = readBuffer.getInt();
   System.out.println("LName Length: " + length);
   stringBuffer = new byte[length];
   readBuffer.get(stringBuffer);
   rEmp.setLastName(decoder.decode(ByteBuffer.wrap(stringBuffer)).toString());
   
   rEmp.setSalary(readBuffer.getInt());
   
   System.out.println("ID: " + rEmp.getId());
   System.out.println("First: " + rEmp.getFirstName());
   System.out.println("Last: " + rEmp.getLastName());
   System.out.println("Salary: " + rEmp.getSalary());
   System.out.flush();
   
   client.close();
   
  } catch(Exception e) {
   e.printStackTrace(System.err);
  } finally {
   System.out.println("Written bytes: " + written);
  }
 }
 
 private static void serialize(ByteBuffer buffer, EmployeeData emp, CharsetEncoder encoder)
 {
  // id
  buffer.putInt(emp.getId());
  
  CharBuffer nameBuffer = CharBuffer.wrap(emp.getFirstName().toCharArray());
  ByteBuffer nbBuffer = null;
  
  // length of first name
  try
  {
   nbBuffer = encoder.encode(nameBuffer);
  } 
  catch(CharacterCodingException e)
  {
   throw new ArithmeticException();
  }

  System.out.println(String.format("String [%1$s] #bytes = %2$s", emp.getFirstName(), nbBuffer.limit()));
  buffer.putInt(nbBuffer.limit());
  buffer.put(nbBuffer);

  // put lastname
  nameBuffer = CharBuffer.wrap(emp.getLastName().toCharArray());
  nbBuffer = null;
  
  // length of first name
  try
  {
   nbBuffer = encoder.encode(nameBuffer);   
  } 
  catch(CharacterCodingException e)
  {
   throw new ArithmeticException();
  }

  System.out.println(String.format("String [%1$s] #bytes = %2$s", emp.getLastName(), nbBuffer.limit()));
  buffer.putInt(nbBuffer.limit());
  buffer.put(nbBuffer);
  
  // salary
  buffer.putInt(emp.getSalary());
 }
}

The java client has a similar structure to the .NET app, except for the fact that the .NET app acts like the server, whereas the Java app is just a client. However that difference is not material to the current topic.

The output of the java program is as follows:

String [John] #bytes = 4
String [Smith] #bytes = 5
# bytes = 25
#Bytes in output buffer: 0 limit = 25 pos = 0 remaining = 25
1
0
0
0
4
0
0
0
74
111
104
110
5
0
0
0
83
109
105
116
104
2
0
0
0
Number of bytes read: 25
FName Length: 4
LName Length: 5
ID: 1
First: John
Last: Smith
Salary: 2
Written bytes: 0

Now, let me talk about some important details of this application. First realize, that each platform has it's own way of how it represents primitive datatypes (bytes/integers/floats/characters) in a byte representation in memory. This concept is called Endianness. In the case of Java, the JRE is "Big Endian". While, on the CLR (i.e .NET) the byte representation is "Little Endian". So, when we convert data into a binary representation, we need to make sure that both sides do it the same way.

Since .NET is "Little Endian" I decided to convert everything to Little Endian. On Java, this is very simple. Since I was using ByteBuffer to convert Java primitive types into bytes, I just set the ByteOrder on the ByteBuffer to be LITTLE_ENDIAN.

The second thing to remember is about strings. When you are serializing objects that contain heterogenous datatypes, or types that vary in length, you need a way of distinguishing the end of one type from the begining of another. In the case of strings, I chose the Length. So, when I serialize a string, I first write out the length of the string, and then the entire string. The receiver first reads the length of the string, and then allocates a byte buffer for that string. It then reads in the actual bytes (upto the length of the string) into the array. Note, that the quantity written is not actually the length in Characters, but the length in bytes of the Encoded string.

Which brings me to the final point. String conversion to Bytes can be done in various ways. I chose to use UTF-8 encoding. Again, the encoding used has to be the same on both the sides in order to be able to exchange the data correctly.

I hope this was useful to you. In the next article, I will talk about using XML as a mechanism of transporting objects across heterogenous platforms.

Sunday, November 1, 2009

DotNet Network Library Links and HOWTOs

A lot of people have blogged about System.NET tips and tricks. Esp from Microsoft, prior System.Net team members (Jon Cole, Durgaprasad Gorti, Malar Chinnusamy) have authored blog posts on how to do stuff with System.Net.

Non microsoft folks have also stepped up to the plate. For eg, Stephen Cleary has written a very good concise blog on System.Net HOWTOs and FAQs.

In this post, I aim to categorize all of them and link them in one place, so that it will be easy to find for people who need information.

FAQ

Stephen Cleary's .NET TCP/IP Sockets FAQ
How to use a Network Monitor to debug your System.Net application
Creating a tracelog for your application
Proxy Configuration Best Practices in System.Net

Authentication

Primer Part-1

Message Framing

Article on preserving message boundnaries
Synchronous Sample
Async sample
Another excellent sample on message framing by Stephen Cleary

Cookies
How to use CookieCollection/CookieContainer

HTTP Expect-100 Continue Caveats

The following blog post explains the consequences of having Expect100Continue=true for some servers.

Autoproxy support

http://blogs.msdn.com/mahjayar/archive/2005/08/04/447799.aspx

Socket Duplication

Socket duplication is a new feature supported by the 2.0 version of the .NET framework. The following articles describe that in more detail, along with code samples.

Socket Duplication - Part 1
Socket Duplication - Part 2

IPv6

Configuring an IPv4 server to be able to receive connections from IPv6 clients

SMTPClient
http://www.systemnetmail.com/default.aspx (FAQ for system.net.mail namespace and classes)
Sending mail to Gmail a/c with SMTPClient

Socket Applications: Traceroute

Part 1
Part 2
Part 3
Part 4

HOWTO send an object across the network, between a Java app and a .NET app, and back.

https://ferozedaud.blogspot.com/2009/11/howto-serialize-data-from-object-from.html

Saturday, October 24, 2009

Write your own search engine using Lucene

Recently, I have been playing around with Lucene (http://incubator.apache.org/lucene.net/. Lucene is an Open Source project, which is sponsored by the Apache foundation, that gives you all the components necessary to create your own search engine.

I downloaded the latest build, which is versioned 2.0.004 and located at http://incubator.apache.org/lucene.net/download/Incubating-Apache-Lucene.Net-2.0-004-11Mar07.bin.zip.

To start off I wrote a small application that indexes all the INF files in my %WINDIR%\System32 directory, and allows me to search their contents.

Here is an example of how the application works:

Search...
You can enter....
filename|fullpath|lastmodified|contents
>> windows fullpath
Query: fullpath:windows
#hits: 8
----------------
Filename: homepage.inf
FullPath: c:\windows\system32\homepage.inf
Last-Modified: 8/10/2004 3:00:00 AM
----------------
Filename: ieuinit.inf
FullPath: c:\windows\system32\ieuinit.inf
Last-Modified: 6/29/2009 1:40:16 AM
----------------
Filename: mapisvc.inf
FullPath: c:\windows\system32\mapisvc.inf
Last-Modified: 4/14/2006 11:39:08 PM
----------------
Filename: mmdriver.inf
FullPath: c:\windows\system32\mmdriver.inf
Last-Modified: 8/10/2004 3:00:00 AM
----------------
Filename: msxmlx.inf
FullPath: c:\windows\system32\msxmlx.inf
Last-Modified: 8/6/2003 10:15:48 AM
----------------
Filename: pid.inf
FullPath: c:\windows\system32\pid.inf
Last-Modified: 6/20/2007 10:52:36 PM
----------------
Filename: $ncsp$.inf
FullPath: c:\windows\system32\$ncsp$.inf
Last-Modified: 9/23/2005 6:50:22 AM
----------------
Filename: $winnt$.inf
FullPath: c:\windows\system32\$winnt$.inf
Last-Modified: 9/27/2005 8:46:09 PM
Search...
You can enter....
filename|fullpath|lastmodified|contents
>> msxml contents
Query: contents:msxml
#hits: 1
----------------
Filename: msxmlx.inf
FullPath: c:\windows\system32\msxmlx.inf
Last-Modified: 8/6/2003 10:15:48 AM
Search...
You can enter....
filename|fullpath|lastmodified|contents
>>

Here is the code for the application.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Lucene.Net;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Store;
using Lucene.Net.Util;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.QueryParsers;
using sdir = System.IO.Directory;
using LDirectory = Lucene.Net.Store.Directory;
using System.IO;

namespace searchtest
{
    class Program
    {
        static void Main(string[] args)
        {
            Analyzer analyzer = new StandardAnalyzer();
            LDirectory directory = FSDirectory.GetDirectory("/index.bin", true);
            //Directory directory = new RAMDirectory();
            IndexWriter writer = new IndexWriter(directory, analyzer, true);
            writer.SetMaxFieldLength(25000);

            String [] infs = sdir.GetFiles(@"c:\windows\system32", "*.inf");
            foreach (String inf in infs)
            {
                FileInfo fi = new FileInfo(inf);
                Document doc = new Document();
                doc.Add(new Field("filename", fi.Name, Field.Store.YES, Field.Index.TOKENIZED));
                doc.Add(new Field("fullpath", fi.FullName, Field.Store.YES, Field.Index.TOKENIZED));
                doc.Add(new Field("lastmodified", DateField.DateToString(fi.LastWriteTimeUtc), Field.Store.YES, Field.Index.TOKENIZED));

                using (StreamReader sr = new StreamReader(inf))
                {
                    String text = sr.ReadToEnd();
                    doc.Add(new Field("contents", text, Field.Store.YES,
                        Field.Index.TOKENIZED));
                    writer.AddDocument(doc);
                }
            }

            writer.Close();

            // Now search the index:
            IndexSearcher isearcher = new IndexSearcher(directory);

            while (true)
            {
                Console.WriteLine("Search...");
                Console.WriteLine("You can enter....");
                Console.WriteLine("filename|fullpath|lastmodified|contents");

                Console.Write(">> ");
                String cmd = Console.ReadLine();

                if (cmd == null || cmd.Length == 0)
                    break;

                String fieldname = "contents";
                String predicate = null;

                if (cmd.StartsWith("!"))
                {
                    int index = cmd.LastIndexOf("\"");
                    predicate = cmd.Substring(2, index-2);
                    fieldname = cmd.Substring(index + 1);
                }
                else if (cmd.StartsWith("\""))
                {
                    int index = cmd.LastIndexOf("\"");
                    if (index != -1)
                    {
                        predicate = cmd.Substring(1, index-1);
                        if (++index < cmd.Length)
                        {
                            fieldname = cmd.Substring(index);
                        }
                    }
                }
                else
                {
                    String[] tokens = cmd.Split();
                    if (tokens.Length == 2)
                    {
                        predicate = tokens[0];
                        fieldname = tokens[1];
                    }
                    else if (tokens.Length == 1)
                    {
                        predicate = tokens[0];
                    }
                    else
                    {
                        Console.WriteLine("ERROR:");
                        continue;
                    }
                }

                // Parse a simple query that searches for "text":
                QueryParser parser = new QueryParser(fieldname, analyzer);
                Query query = null;

                try
                {
                    query = parser.Parse(predicate);
                }
                catch (Exception e)
                {
                    Console.WriteLine(e);
                    continue;
                }


                Hits hits = isearcher.Search(query);
                Console.WriteLine("Query: {0}", query.ToString());
                Console.WriteLine("#hits: {0}", hits.Length());
                // Iterate through the results:
                for (int i = 0; i < hits.Length(); i++)
                {
                    Console.WriteLine("----------------");
                    Document hitDoc = hits.Doc(i);
                    Console.WriteLine("Filename: {0}", hitDoc.Get("filename"));
                    Console.WriteLine("FullPath: {0}", hitDoc.Get("fullpath"));
                    Field f = hitDoc.GetField("lastmodified");
                    Console.WriteLine("Last-Modified: {0}", DateField.StringToDate(hitDoc.Get("lastmodified")));
                    //Console.WriteLine(hitDoc.Get("contents"));
                }
            }
            isearcher.Close();
            directory.Close(); 
        }
    }
}

Friday, October 2, 2009

NTLM auth fails with HttpWebRequest/WebClient, but passes with IE

On public newsgroups, I have seen a lot of postings where people complained that their managed code application, written with HttpWebRequest, and using NTLM auth to talk to a server, would fail. However, Internet explorer running on the same machine would work fine.

Here are some of the threads that show this problem:

http://social.msdn.microsoft.com/Forums/en-US/netfxnetcom/thread/a4aba6c5-6180-441e-ab60-95347fcdc051

In order to root cause this issue, you need to enable logging using System.Net tracelog (http://ferozedaud.blogspot.com/2009/08/tracing-with-systemnet.html) and see the trace. If you see that the client fails with a NotSupported error, when trying to compose a Type2 message (using the response to the previous Type1 message sent by the client).

The second variable here is the operating system on both the client and server. If the OS on the client is >= Vista (for eg, any flavor of Vista or Windows7) and the OS on the server is a version before Vista, then there was a change in the way NTLM works. In vista and later operating systems, NTLM by default now requires 128bit encryption, whereas the prior OS did not.

Ok. So why does IE work on the same machine, and NTLM doesnt?

The difference is the way in which both use the NTLM SSPI package.

When HttpWebRequest uses the package, it asks for NTLMSSP_NEGOTIATE_SEAL and NTLM_NEGOTIATE_SIGN capabilities. This requres encryption. Since 128bit encryption is now required by the OS, this means that the server also has to support 128bit. If the server doesnt, then the authentication will fail.

IE does not ask for SEAL|SIGN capabilities when composing the Type2 message. So, even if the server does not support 128bit encryption, the authentication can still work.

For more details, refer to this thread on stackoverflow:

http://stackoverflow.com/questions/1443617/407-authentication-required-no-challenge-sent/1482442#1482442

Note, that even WindowsXP/Server2003 supports 128bit encryption, just not out of the box. And on Windows7/Vista, even though 128bit is default, it can be changed by modifying the security policy. However, that might now always be possible, esp if the machine is on a domain where the policy is administered by the Domain Admin.

Tuesday, September 29, 2009

Implementing Traceroute with System.Net - Part IV

In the previous parts, I showed how to implement traceroute using System.Net.Sockets. We used the Socket class to create a Raw socket, and use that to do a traceroute.

Implementing Traceroute with System.Net Part III
Implementing Traceroute with System.Net Part II
Implementing Traceroute with System.Net Part I

It turns out, that System.Net.NetworkInformation already has a Ping class, that encapsulates all the functionality of sending ICMP ECHO requests and processing replies. It has the necessary switches to tweak the TTL settings as well.

I wrote a small program to demonstrate how we can do a Traceroute using the Ping class.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Net.Sockets;
using System.Net.NetworkInformation;

namespace PingClassTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Ping pinger = new Ping();
            String data = "0123456789ABCDEF";
            byte[] buffer = Encoding.ASCII.GetBytes(data);
            PingOptions options = new PingOptions();

            IPHostEntry target = Dns.GetHostEntry(args[0]);
            bool isDestination = false;
            for(int i = 1; i <= 30; i++)
            {
                String intermediateHost = null;
                for (int j = 0; j < 3; j++)
                {
                    options.Ttl = i;
                    PingReply reply = pinger.Send(target.AddressList[0], 30, buffer, options);
                    switch (reply.Status)
                    {
                        case IPStatus.TimedOut:
                        case IPStatus.TtlExpired:
                        case IPStatus.Success:
                            Console.Write("<{0}ms\t", reply.RoundtripTime);
                            if (reply.Address.Equals(target.AddressList[0]))
                            {
                                isDestination = true;
                            }
                            intermediateHost = reply.Address.ToString();
                            break;
                        default:
                            Console.Write("*\t");
                            break;

                    }
                }

                Console.WriteLine("\t{0}", intermediateHost);

                if (isDestination)
                {
                    break;
                }
            }javascript:void(0)
        }

        pinger.Dispose();
    }
}

The output of this program, when pinging www.yahoo.com is as follows:

<0ms    <0ms    <0ms            192.168.1.1
<0ms    <0ms    <0ms            98.117.116.1
<0ms    <0ms    <0ms            130.81.138.218
<0ms    <0ms    <0ms            130.81.28.166
<0ms    <0ms    <0ms            130.81.17.56
<0ms    <0ms    <0ms            130.81.17.231
<0ms    <0ms    <0ms            130.81.14.90
<0ms    <0ms    <0ms            216.115.107.57
<0ms    <0ms    <0ms            209.131.32.23
<27ms   <26ms   <26ms           209.131.36.158
Press any key to continue . . .

As you can see, this program gives the exact same functionality as the program that we wrote using the Socket class. The Ping class also supports asynchronous functionality which makes it very useful for high performance applications.

Thursday, September 17, 2009

Implementing Traceroute with System.Net - Part III

In the previous part http://ferozedaud.blogspot.com/2009/09/implementing-traceroute-with-systemnet_07.html I showed a simple implementation of traceroute, that built upon my earlier implementation of Ping.

As promised, in this article, we will see how to make the utility more robust.

One of the things I didnt like about the previous implementation, was that there was no verification of the received request packet. The implementation was not verifying whether the type of the response was correct - an ICMP Echo request should elicit an ICMP Echo response (see the ICMP RFC at http://www.ietf.org/rfc/rfc0792.txt)

So, we add the code for verifying that. According to the RFC, the Echo reply message has a Type Field = 0. Also, since we are using the TTL mechanism for detecting hosts along the route to the destination, we expect that every host but the final one, will reply with a ICMP Time Exceeded message, which has a Message Type = 11.

Also, the other issue with the tool, is that it does not resolve the IPAddress of the intermediate hosts. In a traceroute output, we are mostly interested in the hostnames of the intermediate routers, and not just the IP address. So, we will add that capability as well.

With these changes, here is the changed code. I have only put the changes to the while(true) loop in my original implementation.

while (true)
{
Console.Write("{0}", hop);
bool allTimedOut = true;
for (int i = 0; i < 3; i++)
{
ICMP_PACKET packet = ICMP_PACKET.CreateRequestPacket(111, 222, data);

IPEndPoint epRemote = new IPEndPoint(ipTarget, 0);

pingSocket.SetSocketOption(SocketOptionLevel.IP, SocketOptionName.IpTimeToLive, hop);


stopWatch.Start();
pingSocket.SendTo(packet.Serialize(), epRemote);



byte[] receiveData = new byte[1024];
int read = 0;
try
{
epResponse = new IPEndPoint(0, 0);
read = pingSocket.ReceiveFrom(receiveData, ref epResponse);
stopWatch.Stop();

ICMP_PACKET recvPacket = new ICMP_PACKET(receiveData, 20, read);

ipepResponse = epResponse as IPEndPoint;

if (recvPacket.PacketType == 11)
{
// this is an ICMP Time exceeded message
if (ipepResponse.Address.Equals(ipTarget))
{
// do not expect an ICMP Time Exceeded message from the
// final destination.
Console.Error.Write("\t!!");
}
}
else if (recvPacket.PacketType == 0)
{
// this is an ICMP Echo reply message
// validate that this is coming from the destination host.
if (!ipepResponse.Address.Equals(ipTarget))
{
// do not expect an ICMP Time Exceeded message from the
// final destination.
Console.Error.Write("\t!!");
}
}

Console.Write("\t<<{0}ms", stopWatch.ElapsedMilliseconds);
allTimedOut = false;
}
catch (SocketException e)
{
Console.Write("\t*");
}
}

if (allTimedOut)
{
Console.WriteLine("\tRequest timed out");
}
else
{
String intermediateHost = null;
// now try to resolve the IPAddress
try
{
IPHostEntry hostEntry = Dns.GetHostEntry(ipepResponse.Address);
intermediateHost = hostEntry.HostName;
}
catch (SocketException e)
{
}

Console.WriteLine("\t{0} {1}", ((IPEndPoint)epResponse).Address.ToString(), intermediateHost);

}
++hop;

ipepResponse = epResponse as IPEndPoint;
if (hop > maxHops || ipepResponse.Address.Equals(ipTarget))
{
break;
}
}

In the next part, I will throw some ideas out there, on how this can be extended even more.

Friday, September 11, 2009

Network Programmers Toolchest

In this post, I want to talk about some of the tools that a programmer doing network programmer might want to learn, in order to do his job better.

First, let us talk about logging tools. No matter which library you use, whether it is System.Net from .NET framework, or CommonsHTTP from the Apache Commons project, you need to have a way of enabling logging to see what is going on in the library.

If you are using System.Net, then you can enable logging using the Logging facility provided in the .NET framework.

http://ferozedaud.blogspot.com/2009/08/tracing-with-systemnet.html

For Apache ComonsHttp library, you can enable logging using the Log4J logging library:

http://logging.apache.org/log4j/index.html

Next, you need a way to capture network traffic in order to see what is going on in the network. Many tools exist that can allow you to do that. I prefer to use Wireshark, but on windows, Netmon is also pretty good.

http://www.wireshark.org
http://blogs.technet.com/netmon

Imagine that you are trying to write code that can simulate loggin in to a website. For eg, Facebook or a line of business application. However, it isnt working - you tried using logging but cant seem to get to the problem. In this case, you might want to use a browser and try out the same scenario, and see what the browser is doing differently.

You can do that by using Wireshark/Netmon to sniff the browser session, but it is not very user friendly. Plus, if the session is encrypted (HTTPS) you cannot even make sense of the network sniff.

In this case, it helps to have a browser extension that can sniff the outgoing requests and incoming responses for you. I like to use Firebug on Firefox, and there are equivalent extensions for Internet Explorer as well.

http://getfirebug.com/

Here are the ones for Internet Explorer.

http://www.fiddler2.com/Fiddler2/version.asp
http://www.httpwatch.com/

Readers: Do you have any favorite tools that I have not covered here? I would like to know about them. Add a comment to this post with the information.

Thursday, September 10, 2009

The case of multiple NTLM challenges with IIS7

A reader posed a question on Stack Overflow about a problem with authentication. The scenario was as follows:

This was a typical 3-tier application, in which the middle-tier (ASP.NET web server) was making a HttpWebRequest to a backend IIS-7 server that required authentication.

When authentication method was set to Digest/Basic/Negotiate, the server worked fine, and the authentication succeeded. However, if the auth method was set to NTLM, the server started to challenge twice.

The reader investigate this, and found that this was caused by a Microsoft Security Update http://support.microsoft.com/kb/957097. Also, he came up with a solution for the problem.

Read more about the problem and it's solution at: http://www.tinyint.com/index.php/2009/08/24/401-error-on-httpwebrequest-with-ntlm-authentication/

Monday, September 7, 2009

Implementing Traceroute with System.Net: Part-II

In this part, we will implement traceroute according to the principles we discussed in the last post:

http://ferozedaud.blogspot.com/2009/09/implementing-traceroute-with-systemnet.html

First, start with the code for Ping, which is at http://blogs.msdn.com/feroze_daud/archive/2005/10/26/485372.aspx and modify the Main() method as follows:

    static void Main(string[] args)
    {

        if (args.Length != 1)
        {
            Usage();
        }



        if (0 == String.Compare(args[0], "/?", true, CultureInfo.InvariantCulture)

        || 0 == String.Compare(args[0], "-h", true, CultureInfo.InvariantCulture)

        || 0 == String.Compare(args[0], "-?", true, CultureInfo.InvariantCulture))
        {
            Usage();
        }

        string target = args[0];

        IPAddress[] heTarget = Dns.GetHostAddresses(target);

        IPAddress ipTarget = null;

        foreach (IPAddress ip in heTarget)
        {

            if (ip.AddressFamily == AddressFamily.InterNetwork)
            {
                ipTarget = ip;
                break;
            }
        }



        IPAddress[] heSource = Dns.GetHostAddresses(Dns.GetHostName());

        IPAddress ipSource = null;

        foreach (IPAddress ip in heSource)
        {
            byte[] addressBytes = ip.GetAddressBytes();
            if (addressBytes[0] == 169 && addressBytes[1] == 254)
            {
                continue;
            }
            if (ip.AddressFamily == AddressFamily.InterNetwork)
            {
                ipSource = ip;
                break;
            }
        }

        IPEndPoint epLocal = new IPEndPoint(ipSource, 0);

        Socket pingSocket = new Socket(AddressFamily.InterNetwork, SocketType.Raw, ProtocolType.Icmp);

        pingSocket.Bind(epLocal);

        byte[] data = Encoding.ASCII.GetBytes("1234567890abcdef");

        Console.WriteLine("Ping {0}({1}) with {2} bytes of data...", target, ipTarget.ToString(), data.Length);

        Console.WriteLine();

        int hop = 1;
        int maxHops = 30;

        pingSocket.ReceiveTimeout = 30;
        System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch();
        EndPoint epResponse = (EndPoint)new IPEndPoint(0, 0);

        while (true)
        {
            Console.Write("{0}", hop);
            bool allTimedOut = true;
            for (int i = 0; i < 3; i++)
            {
                ICMP_PACKET packet = ICMP_PACKET.CreateRequestPacket(111, 222, data);

                IPEndPoint epRemote = new IPEndPoint(ipTarget, 0);

                pingSocket.SetSocketOption(SocketOptionLevel.IP, SocketOptionName.IpTimeToLive, hop);


                stopWatch.Start();
                pingSocket.SendTo(packet.Serialize(), epRemote);

                byte[] receiveData = new byte[1024];
                int read = 0;
                try
                {
                    epResponse = new IPEndPoint(0, 0);
                    read = pingSocket.ReceiveFrom(receiveData, ref epResponse);
                    stopWatch.Stop();
                    ICMP_PACKET recvPacket = new ICMP_PACKET(receiveData, 20, read);

                    Console.Write("\<<{0}ms", stopWatch.ElapsedMilliseconds);
                    allTimedOut = false;
                }
                catch (SocketException e)
                {
                    Console.Write("\t*");
                }
            }

            if (allTimedOut)
            {
                Console.WriteLine("\tRequest timed out");
            }
            else
            {
                Console.WriteLine("\t{0}", ((IPEndPoint)epResponse).Address.ToString());
            }
            ++hop;

            IPEndPoint ipepResponse = epResponse as IPEndPoint;
            if (hop > maxHops || ipepResponse.Address.Equals(ipTarget))
            {
                break;
            }
        }

        pingSocket.Close();
    }

As discussed earlier, we have now put a loop around the actual Socket.SendTo() call. This loop will run three times. We will continue sending the packet until we reach 30 hops max, or we have reached the destination. As you have noticed, since we did all the heavy lifting in the Ping program, all we had to do here, was to set the TimeToLive Socket option so that the routers/gateways on the path to the destination respond with a "Time Exceeded" message. When I run it on my machine, here is the output:

C:\>traceroute.exe www.yahoo.com
Ping www.yahoo.com(209.131.36.158) with 16 bytes of data...

1       <0ms    <1ms    <1ms    192.168.1.1
2       <5ms    <10ms   <14ms   <Address removed>
3       <18ms   <23ms   <27ms   <Address removed>
4       <33ms   <38ms   <43ms   <Address removed>
5       <68ms   <92ms   <116ms  <Address removed>
6       <143ms  <171ms  <197ms  <Address removed>
7       <223ms  <250ms  <276ms  <Address removed>
8       <304ms  <331ms  <365ms  <Address removed>
9       <391ms  <417ms  <445ms  <Address removed>
10      <473ms  <510ms  <538ms  <Address removed>

Note, that I have removed the actual IPAddresses from the output, for privacy reasons.

In the next part, we will discuss how we can make this utility more robust.

Friday, September 4, 2009

Implementing traceroute with System.Net: Part-I

In the last part, I started by giving links to my implementation of the Ping utility, that used System.Net.Sockets.

http://ferozedaud.blogspot.com/2009/08/implementing-traceroute-with-systemnet.html

In this part, we will talk about traceroute. Traceroute is a general purpose utility that is used to discover a network path from the source to the destination. Traceroute is similar to Ping, in that it uses the ICMP protocol to send ICMP Echo request packets to the server. However, one key difference, is that it additionally uses the IP Time To Live (TTL) mechanism to specify the lifetime of the outgoing packets.

When the TTL expires on a packet, the receiving host must send an ICMP "Time Exceeded" message to the sender. The sender looks at this packet, and gets the IPAddress/HostName of the host that responded.

So, the basic algorithm for this goes as follows:

while (reply.address != dest.address)
int ttl = 1;
for i = 1 to 3
send a packet to the host at address dest.address with TTL=ttl
wait for reply
if (timeout) then
print "*"
else
print ipaddress of host thatresponded
// increase the TTL
ttl = ttl + 1;
end

This should be pretty easy to implement, given that we have already implemented a PING client in previous episodes, that does all the heavy lifting for us, in terms of marshalling the ICMP packet from managed code to network byte order, etc.

http://blogs.msdn.com/feroze_daud/archive/2005/10/20/483088.aspx
http://blogs.msdn.com/feroze_daud/archive/2005/10/23/483976.aspx
http://blogs.msdn.com/feroze_daud/archive/2005/10/24/484260.aspx
http://blogs.msdn.com/feroze_daud/archive/2005/10/26/485372.aspx

In the next part, we will modify the Ping utility to convert it into a Traceroute implementation.

Thursday, August 27, 2009

Tracing with System.Net

In my old blog, I had written a config file that shows how to enable trace logging for System.Net. Since it is useful, here is the config file again..

NOTE: If you are doing logging inside of the ASP.NET process, make sure to give the ASP.NET process identity WRITE permissions to the directory where you want the log to be written.

Sunday, August 23, 2009

Things I did at Microsoft: Outbound Dialing Rules for Exchange Unified Messaging

Continuing my series on things that I did at Microsoft: in this part, I talk about outbound dialing rules that I implemented for Exchange Server 2007.

Exchange Unified Messaging server provides the functionality to place outbound calls. For eg, imagine a customer calling the corporate switchboard, and looking up the person he wants to talk to, using the AutoAttendant. Imagine that the target person is not in the US, and in order to call him, a long distance call will need to be placed.

It is possible that the company might want to prevent that call and instead route the caller to an operator. This is implemented by the concept of dialing rules.

The UM administrator can configure dialing rules on the AutoAttendant, DialPlan or Mailbox Policy. When the UM server needs to place an outbound call, it consults the dialing rules, and figures out whether the call is allowed. It also figures out the effective number to be called - this number will have the correct outside line access code (for eg: prepending the target number with a '9' to signify to the PBX that the numbe is an external number).

More details about this feature can be found in this article: http://msexchangeteam.com/archive/2007/01/29/432440.aspx which discusses how to create dialing rules.

Implementing Traceroute with System.Net - Introduction

Previously, I have shown how you can implement a Ping utility using System.Net.Sockets.

The following are the links to the four part article that describe the implementation.

http://blogs.msdn.com/feroze_daud/archive/2005/10/20/483088.aspx
http://blogs.msdn.com/feroze_daud/archive/2005/10/23/483976.aspx
http://blogs.msdn.com/feroze_daud/archive/2005/10/24/484260.aspx
http://blogs.msdn.com/feroze_daud/archive/2005/10/26/485372.aspx

Ping is used to figure out if a destination host is responding on the network.

With a few modifications, we can change ping so that it shows us the routes that a packet is taking through the network.

In the next part, we will look at traceroute, and see how we can use the principles in writing our own traceroute utility from scratch.

Thursday, August 13, 2009

Things I did at Microsoft - Call Answering Rules

Larry Osterman started a series of posts on his blog, where he talks about the stuff he was doing in the past. So, I thought it would be a good idea to talk about suff I did at Microsoft.

One of the things that I am really proud about, is a feature I implemented for Exchange2009, which is called "Call Answering Rules". The best way to explain this, is to think about it as the equivalent of the Exchange/Outlook rules, but for your phone. This allows you to create rules that govern how incoming calls to your phone are routed.

Some of the examples of rules you can create are:

1) Forward all calls with a given caller-id to my cell phone.

2) If my client calls and I am in a meeting, forward to voicemail.

3) If a call comes, and my Free-Busy status says "Out of office", then call two other numbers configured by me.

I implemented the rules engine for this feature. You can get more details about the feature from the technet page documentation on this feature.

Hi!

Welcome to my blog. Here, I will post my thoughts on Software, as well as code snippets and other stuff.

For my previous blog, you can see https://blogs.msdn.com/feroze_daud

Subscribe to: Posts ( Atom )

Feroze's musings on Technology