Configuring a Xilinx FPGA from ARM

Okay, it’s time I got down to my Ph.D project which is a 2D/3D GFX accelerator. I really wonder if I’m going to pull it off. So, after playing for a couple of months with a simulator I decided to move on to an FPGA. I doubt that a spartan will allow me to fit in everything I want – but I definetely can test a plenty of stuff there.

I had this board from starterkit.ru, which has an FPGA and an AT91SAM9260, which is a 200Mhz ARM SoC.

What conserns the board – it is not the best choice, e.g. I Do not like a couple of things there: e.g. What for does it need a rarely used GSM/GPS module slot. What conserns the software part – I didn’t even take care to build the bundled kernels – just used the code a a reference. A quick look at the sources revealed that the guys didn’t take any effort in developing, just put some hacks into board-sam9260ek.c, and that’s all. Didn’t even register properly a mach-type for their board. For my own work I took the latest stable kernel from arm branch (2.6.39) and applied at91 patches for extra hardware goodies. So now goes the FPGA

The board allows the FPGA to be placed on the external memory bus, theirfore exposing some FPGA registers as plain memory. This is the very stuff I needed. However, configuring the fpga was pain.

JTAG cable was slow as hell Actually, a usb one would be better then my old LPT one, but I was too lazy to lift my ass and get one, so I had a good read of the datasheets and found some things of interest:

Xilinx FPGAs have quite a few ways to upload the configuration. One of them is Slave Serial Mode, when the host serially sends the data into the FPGA. To do so you have to set mode bits to “1”s (in my case – pop on the jumpers). And you need a couple of pins:

  • DIN – we’ll send the data to the FPGA with this one
  • CLK – we’ll clock the transfer with this one
  • DONE – once it gets set to “1” we’re good to go
  • PROG_B – This one resets the FPGA and clears the config
  • INIT_B – this one is not really needed, but it indicates some useful things we’ll have to do blindly otherwise

After having a good look at the schematics I figured out the pinouts. N.B. Rememer to remove J22, it pulls PROG_B to GND, hence keeping FPGA inactive.

  • DIN – PC7
  • PROG_B – PC9
  • CLK – PC6
  • DONE – PC4
  • INIT_B – Not connected to MCU.

INIT_B is only wired outside. So, we have no way of telling if there were errors. Apart from seeing that the DONE dodn’t go HIGH. I would really like to have this one avaliable. It can be wired to any GPIO line, but I decided to use something, that wouldn’t affect the outputs wired to connectors. So I found the EN_GSM pin, that toggles the power of GSM module, which is absent here. So I threw a small coated wire and finally got all the pins at my disposal.

Software time! I thought, that the best way of programming the fpga would be:

cat bitstream.bin > /dev/fpga0

So to do this we need…. Yep! A character device. And we also have a cool thing called ‘miscdevice’, that simplifies creation of character devices. So, a couple of hours of work and… meet xilinx-sscu (Xilinx Slave Serial Configuration Upload) driver. It bitbangs the firmware written to /dev/fpga0, checks error conditions, if any and, well, just works ™
Here goes the board-specific code (arch/arm/mach-at91/board-charlene.c)

static struct xsscu_data charlene_xsscu_pdata[] = {
  {
  .name="Xilinx XC3S500E Spartan",
  .sout=AT91_PIN_PC7,
  .prog_b=AT91_PIN_PC9,
  .clk=AT91_PIN_PC6,
  .done=AT91_PIN_PC4,
  .init_b=AT91_PIN_PC10,
  },
};
 
static int __init charlene_register_xsscu(struct xsscu_data* pdata, int count)
{
  int i,err;
  struct platform_device *pdev;
  for (i=0;idev.platform_data=&pdata[i];
   err = platform_device_add(pdev);
   if (err) break;
  }
  if (err) printk(KERN_INFO "Registration failed: %d\n",err);
  return err;
}

It registers one or more platform devices, depending on how many FPGAs you’ve got, and performs at91-specific initialisation for GPIO pins (e.g. disconnects the periph blocks, so that these are plain GPIOs).

Now goes the platform driver. Ugly in places, but hey, it works! And supports more than one FPGA connected.
include/linux/xilinx-sscu.h

#ifndef _XILINX_SSCU
#define _XILINX_SSCU
struct xsscu_data {
	char s *name;
	unsigned int clk;
	unsigned int sout;
	unsigned int init_b;
	unsigned int prog_b;
	unsigned int done;
};
 
enum {
	XSSCU_STATE_IDLE,
	XSSCU_STATE_UPLOADING,
	XSSCU_STATE_UPLOAD_DONE,
	XSSCU_STATE_DISABLED,
	XSSCU_STATE_PROG_ERROR,
};
 
struct xsscu_device_data {
	struct xsscu_data *pdata;
	int open;
	int state;
	char *read_ptr;
	char msg_buffer[128];
};
#endif

Ну и конечно drivers/char/xilinx-sscu.c

/*
 *  linux/drivers/char/xilinx-sscu.c
 *
 *  Copyright (C) 2011 Andrew 'Necromant' Andrianov <[email protected]>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */
 
#include <linux/device.h>
#include <linux/fs.h>
#include <linux/module.h>
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/slab.h>
 
#include <linux/platform_device.h>
#include <linux/uaccess.h>
#include <linux/io.h>
 
#include <linux/types.h>
#include <linux/cdev.h>
 
#include <linux/xilinx-sscu.h>
#include <linux/miscdevice.h>
#include <linux/gpio.h>
#include <linux/delay.h>
 
#define DRVNAME "xilinx-sscu"
#define DEVNAME "fpga"
#define DRVVER	"0.1"
 
static int g_debug;
module_param(g_debug, int, 0);	/* and these 2 lines */
MODULE_PARM_DESC(g_debug, "Print lots of useless debug info.");
 
/* This delay is system specific. In my case (200Mhz ARM) I can safely
   define it to nothing to speed things up. But on a faster system you
   may want to define it to something, e.g. udelay(100) if the clk will
   get too fast and crew things up. I do not have a chance to check if
   it's needed on a faster system, so I left it here to be 100% sure.
   Have fun
*/
 
#define DELAY
 
#define DBG(fmt, ...)	if (g_debug) \
    printk(KERN_DEBUG "%s/%s: " fmt " \n", DRVNAME, __FUNCTION__, ##__VA_ARGS__)
#define INF(fmt, ...)	printk(KERN_INFO "%s: " fmt " \n", DRVNAME, ##__VA_ARGS__)
#define ERR(fmt, ...)	printk(KERN_ERR "%s: " fmt " \n", DRVNAME, ##__VA_ARGS__)
 
static inline char *xsscu_state2char(struct xsscu_device_data *dev_data)
{
	switch (dev_data->state) {
	case XSSCU_STATE_UPLOAD_DONE:
	case XSSCU_STATE_IDLE:
		if (gpio_get_value(dev_data->pdata->done))
			return "Online";
		else
			return "Unprogrammed/Error";
	case XSSCU_STATE_DISABLED:
		return "Offline";
	case XSSCU_STATE_PROG_ERROR:
		return "Bitstream error";
	default:
		return "Bug!";
	}
}
 
static int xsscu_open(struct inode *inode, struct file *file)
{
	struct miscdevice *misc;
	struct xsscu_device_data *dev_data;
	misc = file->private_data;
	dev_data = misc->this_device->platform_data;
	if (dev_data->open)
		return -EBUSY;
	dev_data->open++;
	DBG("Device %s opened", dev_data->pdata->name);
	sprintf(dev_data->msg_buffer,
		"DEVICE:\t%s\nINIT_B:\t%d\nDONE:\t%d\nSTATE:\t%s\n",
		dev_data->pdata->name,
		gpio_get_value(dev_data->pdata->init_b),
		gpio_get_value(dev_data->pdata->done),
		xsscu_state2char(dev_data)
	    );
	dev_data->read_ptr = dev_data->msg_buffer;
	return 0;
}
 
static int send_clocks(struct xsscu_data *p, int c)
{
 
	while (c--) {
		gpio_direction_output(p->clk, 0);
		DELAY;
		gpio_direction_output(p->clk, 1);
		DELAY;
		if (1 == gpio_get_value(p->done))
			return 0;
	}
	return 1;
}
 
static inline void xsscu_dbg_state(struct xsscu_data *p)
{
	DBG("INIT_B: %d | DONE: %d",
	    gpio_get_value(p->init_b), gpio_get_value(p->done));
}
 
static int xsscu_release(struct inode *inode, struct file *file)
{
	struct miscdevice *misc;
	struct xsscu_device_data *dev_data;
	int err = 0;
	misc = file->private_data;
	dev_data = misc->this_device->platform_data;
	dev_data->open--;
	switch (dev_data->state) {
	case XSSCU_STATE_UPLOADING:
		err = send_clocks(dev_data->pdata, 10000);
		dev_data->state = XSSCU_STATE_UPLOAD_DONE;
		break;
	case XSSCU_STATE_DISABLED:
		err = 0;
		break;
	}
 
	if (err) {
		ERR("DONE not HIGH or other programming error");
		dev_data->state = XSSCU_STATE_PROG_ERROR;
	}
	xsscu_dbg_state(dev_data->pdata);
	DBG("Device closed");
	/* We must still close the device, hence return ok */
	return 0;
}
 
static ssize_t xsscu_read(struct file *filp, char *buffer,
			  size_t length,
			  loff_t *offset)
{
	struct miscdevice *misc;
	struct xsscu_device_data *dev_data;
	int bytes_read = 0;
	misc = filp->private_data;
	dev_data = misc->this_device->platform_data;
 
	if (*dev_data->read_ptr == 0)
		return 0;
	while (length && *dev_data->read_ptr) {
		put_user(*(dev_data->read_ptr++), buffer++);
		length--;
		bytes_read++;
	}
	return bytes_read;
}
 
static int xsscu_reset_fpga(struct xsscu_data *p)
{
	int i = 50;
	DBG("Resetting FPGA...");
	gpio_direction_output(p->prog_b, 0);
	mdelay(1);
	gpio_direction_output(p->prog_b, 1);
	while (i--) {
		xsscu_dbg_state(p);
		if (gpio_get_value(p->init_b) == 1)
			return 0;
		mdelay(1);
	}
	ERR("FPGA reset failed");
	return 1;
}
 
static ssize_t xsscu_write(struct file *filp,
			   const char *buff, size_t len, loff_t * off)
{
	struct miscdevice *misc;
	struct xsscu_device_data *dev_data;
	int i;
	int k;
	i = 0;
	misc = filp->private_data;
	dev_data = misc->this_device->platform_data;
 
	if ((*off == 0)) {
		if (strncmp(buff, "disable", 7) == 0) {
			DBG("Disabling FPGA");
			gpio_direction_output(dev_data->pdata->prog_b, 0);
			dev_data->state = XSSCU_STATE_DISABLED;
			goto all_written;
		} else if (xsscu_reset_fpga(dev_data->pdata) != 0)
			return -EIO;
		/*Wait a little bit, before starting to clock the fpga,
		as the datasheet suggests */
		mdelay(1);
		gpio_direction_output(dev_data->pdata->clk, 0);
		dev_data->state = XSSCU_STATE_UPLOADING;
	}
	/* bitbang data */
	while (i < len) {
		for (k = 7; k >= 0; k--) {
			gpio_direction_output(dev_data->pdata->sout,
					      (buff[i] & (1 << k)));
			gpio_direction_output(dev_data->pdata->clk, 1);
			DELAY;
			gpio_direction_output(dev_data->pdata->clk, 0);
			DELAY;
		}
		i++;
	}
all_written:
	*off += len;
	return len;
}
 
static const struct file_operations xsscu_fileops = {
	.owner = THIS_MODULE,
	.write = xsscu_write,
	.read = xsscu_read,
	.open = xsscu_open,
	.release = xsscu_release,
	.llseek = no_llseek,
};
 
static int xsscu_create_miscdevice(struct platform_device *p, int id)
{
	struct miscdevice *mdev;
	struct xsscu_device_data *dev_data;
	char *nm;
	int err;
	mdev = kzalloc(sizeof(struct miscdevice), GFP_KERNEL);
	if (!mdev) {
		ERR("Misc device allocation failed");
		return -ENOMEM;
	}
	nm = kzalloc(64, GFP_KERNEL);
	if (!nm) {
		err = -ENOMEM;
		goto freemisc;
	}
	dev_data = kzalloc(sizeof(struct xsscu_device_data), GFP_KERNEL);
	if (!dev_data) {
		err = -ENOMEM;
		goto freenm;
	}
 
	snprintf(nm, 64, "fpga%d", id);
	mdev->name = nm;
	mdev->fops = &xsscu_fileops;
	mdev->minor = MISC_DYNAMIC_MINOR;
	err = misc_register(mdev);
	if (!err) {
		mdev->this_device->platform_data = dev_data;
		dev_data->pdata = p->dev.platform_data;
	}
 
	return err;
 
freenm:
	kfree(nm);
freemisc:
	kfree(mdev);
 
	return err;
}
 
static int xsscu_probe(struct platform_device *p)
{
	int err;
	int id;
	struct xsscu_data *pdata = p->dev.platform_data;
	/* some id magic */
	if (p->id == -1)
		id = 0;
	else
		id = p->id;
	DBG("Probing xsscu platform device with id %d", p->id);
	if (!pdata) {
		ERR("Missing platform_data, sorry dude");
		return -ENOMEM;
	}
	/* claim gpio pins */
	err = gpio_request(pdata->clk, "xilinx-sscu-clk") +
	    gpio_request(pdata->done, "xilinx-sscu-done") +
	    gpio_request(pdata->init_b, "xilinx-sscu-init_b") +
	    gpio_request(pdata->prog_b, "xilinx-sscu-prog_b") +
	    gpio_request(pdata->sout, "xilinx-sscu-sout");
	if (err) {
		ERR("Failed to claim required GPIOs, bailing out");
		return err;
	}
 
	gpio_direction_input(pdata->init_b);
	gpio_direction_input(pdata->done);
 
	err = xsscu_create_miscdevice(p, id);
	if (!err)
		INF("FPGA Device %s registered as /dev/fpga%d", pdata->name,
		    id);
	return err;
}
 
static struct platform_driver xsscu_driver = {
	.probe = xsscu_probe,
	.driver = {
		   .name = DRVNAME,
		   .owner = THIS_MODULE,
		   }
};
 
static int __init xsscu_init(void)
{
	INF("Xilinx Slave Serial Configuration Upload Driver " DRVVER);
	return platform_driver_register(&xsscu_driver);
}
 
static void __exit xsscu_cleanup(void)
{
	/* Normally you would not like to unload this driver. */
}
 
module_init(xsscu_init);
module_exit(xsscu_cleanup);
 
MODULE_AUTHOR("Andrew 'Necromant' Andrianov <[email protected]>");
MODULE_DESCRIPTION("Xilinx Slave Serial BitBang Uploader driver");
MODULE_LICENSE("GPL");

Now, reading /dev/fpga0, we can learn some info about the FPGA, and it’s state. (when DONE is HIGH we’re ready to work),
Writing something there resets the FPGA and sends teh bitstream. To reset an FPGA we can just write something which is NOT a bitstream there. e.h. echo “reset” > /dev/fpga0
FPGA will get reset.
Well, that’s about all. The remaining stuff is how to use it in a makefile on the host pc. Well, ssh to the rescue:

cat ./bitstream.bin | ssh board "cat > /dev/fpga0"
ssh board "cat /dev/fpga0"

It is also faster, then I even expected. Configuring my XC3S500E takes somewhat about 6 or 7 seconds.
Now, it's time to clean up the code and send it upstream.
LKML# https://lkml.org/lkml/2011/11/19/119

6 thoughts on “Configuring a Xilinx FPGA from ARM

  1. JTAG slow? What planet are we on? Here on Earth I configure an XC3S200 in well under a second using a USB Platform cable. A serial port, for goodness sake, complete with more code to deal with on an ARM? Whatever works for you, I guess.

    1. @arm7.developer My results are for XC3S500E. With bitstream compression enabled that drops to about 4-6 seconds. The only JTAG I have around is a pretty old LPT one, so it’s supposed to be slow, I guess.
      If it’s slower than USB one, then so be it. I’ll post new results and updated driver soon.
      The biggest win of this solution is the ability to reconf in runtime without shutting down the system/messing with cables. Actually, my drivers can now do an fpga_request_firmware(“fw.bin”) on load, and free the FPGA on unload. That allows to quickly switch between different designs depending on whatever I need. Basically an insmod mydev.ko does everything so I can just forget there’s an FPGA around.
      Initially I wanted to go with parallel upload method, but not with the layout starterkit.ru implemented (some datapins are shared with RAM bus, so no way it can work on without screwing things up).

  2. Hello.
    Really good post it.
    I prepare same project. Configuration FPGA from ARM9(linux kernel 3.0.35) .
    So.. I refer to your post.
    Can i have some question?
    ———————————————————————————————————————————————————————-
    Here goes the board-specific code (arch/arm/mach-at91/board-charlene.c)

    static struct xsscu_data charlene_xsscu_pdata[] = {
    }

    ———————————————————————————————————————————————————————-
    First example code , what file put the code?
    miscdevice.h? or board-charlene.c or others?… sorry, i am beginner to linux.

    If i follow your post, can i make character device for my linux?
    xilinx-sscu.c , xilinx-sscu.h, Makefile

Leave a Reply to arm7.developerCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.