Kiwi Synthesis of C# and F# Combinational Circuit Models into FPGA Circuits

The Kiwi project aims to automatically translate concurrent C# and F# programs into FPGA circuits for accelerated execution. I work with David Greaves at the University of Cambridge Computer Laboratory and we have a prototype system that consumes .NET bytecode and converts it into VHDL or Verilog circuit descriptions. Kiwi usually produces a sequential circuit from a multi-threaded C# program which typically implements a data-path or finite state machine. However, it would also be convenient to describe combinational circuits using C# or F# language constructs and then have an automatic way of generating the corresponding VHDL or Verilog which can then be further synthesized by vendor tools into efficient circuits. This is now possible in Kiwi.

Here's a very simple example of a C# program that contains a method which is intended to model a combinational circuit:

 using System;
using Kiwi;

namespace TestIf2
{
    static class TestIf2Class
    {

        [Hardware]
        [Combinational]
        static int TestIf2(int x, int y)
        {
            int z;
            if (x < y)
                z = 25;
            else
                z = 42;
            return z;
        }

        static void Main(string[] args)
        {
            int p = 17;
            int q = 23;
            int r = TestIf1(p, q);
        }
    }
}

We could also have started from the F# program:

 open Kiwi

[<Hardware>][<Combinational>]
let testif (x : int) (y : int) = if x < y then 25 else 42

let main args
  = let p = 17
    let q = 23
    let r = testif p q
    0

Let's ignore the Main method or function for now because it is not intended for hardware implementation. The custom attribute Hardware identifies a static method (or F# function) which should be converted into a circuit by Kiwi. The new Kiwi custom attribute Combinational indicates that this method should be implemented as a combinational circuit.

The Kiwi behavioural VHDL back end takes as input the compiled binary for this program and converts the IL bytecode for the TestIf2 method (or function) into a VHDL circuit model with this interface:

 entity TestIf2 is
  port (signal x : in integer ;
        signal y : in integer ;
        signal result : out integer) ;
end entity TestIf2 ;

The VHDL implementation architecture consists of a VHDL process which is sensitive to changes in x and y. This model can be simulated using a VHDL simulator like Modelsim:

image

The body of the process contains a transcription of the IL bytecode into VHDL behavioural descriptions. The bytecode that we start from for our synthesis flow for the C# program shown above is:

 // Code size       26 (0x1a)
.maxstack  2
.locals init ([0] int32 z,
         [1] int32 CS$1$0000,
         [2] bool CS$4$0001)
IL_0000:  nop
IL_0001:  ldarg.0
IL_0002:  ldarg.1
IL_0003:  clt
IL_0005:  ldc.i4.0
IL_0006:  ceq
IL_0008:  stloc.2
IL_0009:  ldloc.2
IL_000a:  brtrue.s   IL_0011
IL_000c:  ldc.i4.s   25
IL_000e:  stloc.0
IL_000f:  br.s       IL_0014
IL_0011:  ldc.i4.s   42
IL_0013:  stloc.0
IL_0014:  ldloc.0
IL_0015:  stloc.1
IL_0016:  br.s       IL_0018
IL_0018:  ldloc.1
IL_0019:  ret

Although these statements occur in a specific order a VHDL synthesis tool can perform the appropriate dependency analysis to synthesize a circuit which is simply a comparator and multiplexor. The XST VHDL synthesizer which is a component of the Xilinx ISE tools reports the results of the synthesis as being simply a multiplexor driven by a "less" comparison:

 =========================================================================
HDL Synthesis Report

Macro Statistics
# Latches                                              : 3
 32-bit latch                                          : 3
# Comparators                                          : 1
 32-bit comparator less                                : 1

The generated latches are trimmed during a later optimization phase since they contain constant values. The screenshot from the Xilinx ISE tools below shows part of the multiplexor and comparator that is synthesised from the generated VHDL.

image

The final implementation uses slice LUTs on a Virtex-5 FPGA:

 Device Utilization Summary:

   Number of External IOBs                  96 out of 640    15%
      Number of LOCed IOBs                   0 out of 96      0%

   Number of Slice Registers                 0 out of 69120   0%
      Number used as Flip Flops              0
      Number used as Latches                 0
      Number used as LatchThrus              0

   Number of Slice LUTS                     17 out of 69120   1%
   Number of Slice LUT-Flip Flop pairs      17 out of 69120   1%

How does the generated circuit from the IL bytecode compare to a handwritten circuit? As an experiment we wrote a direct transcription of the original C# method (or F# function) by hand:

 entity TestIf is
  port (
        signal x : in integer ;
        signal y : in integer ;
        signal result : out integer) ;
end entity TestIf ;

architecture behavioural of TestIf is
begin
  
  process (x, y) 
  begin
    if x < y then
      result <= 25 ;
    else
      result <= 42 ;
    end if ;
  end process ;
  
end architecture behavioural ;

This was then synthesized using the Xilinx ISE tools and produced an implementation with the following resource utilization:

 Device Utilization Summary:   Number of External IOBs                  96 out of 640    15%      Number of LOCed IOBs                   0 out of 96      0%   Number of Slice Registers                 0 out of 69120   0%      Number used as Flip Flops              0      Number used as Latches                 0      Number used as LatchThrus              0   Number of Slice LUTS                     17 out of 69120   1%   Number of Slice LUT-Flip Flop pairs      17 out of 69120   1%

This is exactly the same utilization (and implementation) which was obtained from the C# version.

Right now the combinational synthesis component of Kiwi still has several restrictions but I hope to continue developing it to the point where it could be used as a "macro expander" for combinational circuits expressed in C# and then elaborated into VHDL or Verilog.