Dlink DNS-327L: How you should NOT make a NAS

TL;DR; Ported Debian on Dlink DNS-327L with fresh kernel, mybranch’s here‘е. Inside stock firmware there lies madness!

Now, the long read:

You might have noticed that my dull engineering blog has been lagging recently, occasionally throwing the nginx 500 error. That’s because of the ongoing cursed hardware adventures I’ve been having for the past 2 weeks.

But let’s start from the very beginning, and it will be a very long story. It all began with an HDD crashing in my RAID-1. Since the motherboard serving as the NAS storage box was pretty old and laggy, I decided I would get a nice and shiny small NAS box as well. The criteria was simple: cheap, supports nfs, small, RAID-1 supported.
So I picked Dlink DNS-327L and a spare HDD, I knew the box might be a Debian/OpenWRT target once the warranty’s void. Having dealt with some dlink hardware in the past (or, better to say, raising that hardware back from the dead) I ditched the stock power brick and used my own. I wanted to live on stock for a few months, just to make sure that no hardware is faulty…

Yep, I found use for that free sticker HaD sent me!
Yep, I found use for that sticker HaD sent me!

And the great adventure begins. First – it wanted to reformat my disk. No way it would just work with an existing mirror. Okay, time for data maneuvers. I had 3 1 TiB drives. The one with the data, The one with the copy of the data and a bunch of reallocated sectors (failing) and a blank one. Initially I wanted to make a degraded RAID-1, copy over all data, and then rebuild it.

Seems simple, as that: mdadm –create /dev/md0 -l raid1 -f -n 1 /dev/sda1

Not if you are in the dlink’s web ui. It doesn’t allow you to create a degraded raid. WTF? Format on the PC? But how does that cursed firmware want it to be formatted? At that point I wasn’t really sure it did use mdadm internally.

In the end, I ended up taking a failing drive and a spare one and formatting them as RAID-1, then copying over the data. The partitioning is really weird. They make a GPT partition table, and create 4 volumes:

537 MiBs are reserved on both drives in a volume, then are mirrored with mdadm and afterwards… Formatted and used as swap (HOLY **** WHY?). Another GiB for ext4, not mirrored, left empty. WHY?, Then goes our precious data, followed by a 1GiB spare unformatted volume no filesystem could mount. The remaining’s our new volume. The listing below shows the partition table:

Model: ATA ST31000528AS (scsi)
Disk /dev/sdc: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system     Name                Flags
 1      1049kB  538MB   537MB   linux-swap(v1)  Linux swap
 4      538MB   1612MB  1074MB  ext4            Linux/Windows data
 2      1612MB  999GB   998GB   ext4            Linux/Windows data
 3      999GB   1000GB  1074MB                  Linux/Windows data

Transferring 700 GiB of porn data over the gigabit network with a lot of small files took roughly 18 hours. Once done, I pull out the dying drive, put in the one I copied the data from and start the RAID rebuild procedure. In two hours it was one. yet the status was still ‘degraded’ and it suggested another rebuild. WTF?

After the third rebuild voices in my head told me it was yet another firmware bug. I don’t even dare to speak of totally ugly design, freezing/crashing JS, slowiness and a recommendation to use IE. A quick look around and we know that it’s PHP + mysql inside with lighttpd as the webserver… And mysql port wide open on the network (Thanks, nmap). Great.

While the raid was rebuilding I poked around web ui and found a few weird things: data about ongoing operations is receive via ajax, but in XML, not JSON they use these days. I don’t know what they did there, but at 1.2 Ghz ARM core their UI was slower, than a php script running on MB77.07 – and that’s only 324Mhz ARM1176!

Yep, and they also do a full restart of all nfs services once you add a network share. And if service restart takes more than timeout seconds – lighttpd crashes.
So, firmware update? The firmware flashed by default – 1.0, on the dlink ftp – 1.03b and my bug’s listed. Downloading, unpacking, uploading to webui… BOOM:

413 – request entity too large.

I ended up deciding to try again later from work, so I forwarded the port, and gave it a spin remotely. I don’t know how – but it worked. And after rebuilding th raid it went live! Finally! It was a victory just until the next reboot.

After rebooting the NAS (I needed to put it on the shelf somewhere) I saw a nice message stating that my RAID was degraded. Again. The problem was «Enable 1.5Gbps phy» jumper on one of the drives (A wild guess, because after removing it and rebuilding again it all worked).

Updating may have fixed that bug, bug added a dozen more. WebUi became totally unusable (if you can call that excuse for a web interface usable at all). Luckily I figured it out quickly: resetting settings to factory defaults after a firmware upgrade is a must. They never managed to do configuration migration right.

So, I set up all the NFS shares, tried to boot my odroid-x2 servers that use NFS for data and see a WONDERFUL permission problem. Everything’s 777. Fixing (thanks to the bash magic and my ‘backup’ on the dying drive) 777 helped only until next reboot. After playing a little I found out that the damned box does a chmod -Rf 777 /mnt/ on all your data every single reboot. And should you plug an external drive with something other than FAT or NTFS you’ll get a chmod -R 777 there for FREE! Looks like all Dlink NAS boxes do that. facepalm.

facepalm

At that moment I had a very difficult question at hand. Either go and get a moneyback, or void warranty and make my own firmware with blackjack and hookers Debian and ajenti. the hardware inside is quite tasty: Marvell Armada 370 (armv7 + VFP, no NEON) @ 1.2Ghz, 512MB DDR3, 128Mb NAND. USB3.0, 2xSATA and the rest of usual interfaces. Looking at the upstream sources at kernel.org and OpenWRT – mvebu is supported.

I picked the latter. Screw the warranty. Besides, something inside their firmware was just asking for a small rant code review. I started by taking the thing apart and taking random photos.

The board, TOP view. The SoC itself – Marvell 88F6707, 512 MiB DDR3, 128MiB NAND, Gigabit PHY, at the PCIe bus there’s an xHCI (USB3.0) controller from NEC (no datasheet in the wild, closes one says it also has 2 ehci ports). Integrated into the chip are USB2.0 ports, these aren’t wired anywhere. On the SATA power rails you can see some capacitors missing.

IMG_20141018_160029

There’s a small mcu from Weltrend connected to /dev/ttyS1. It has battery power, takes care of RTC, WOL, etc. It also has a thermal sensor and manages fan. Power button and power LED are connected to it.

IMG_20141018_160102

IMG_20141018_160128

On the bottom side you see 2 unpopulated footprints for DDR3 memory. You can get a max of 1Gib of RAM on this board. But we only have 512MiB. I didn’t take the risk soldering new ram, since there was a very high probability to damage soldering on the other side.

IMG_20141018_160141

No warranty-void stickers anywhere. If you are careful when opening, wash rosin with acetone and leave no traces – your warranty wouldn’t be void after all. Just don’t forget to put on a good pokerface ;). I started with UART. We see it clearly on the pads, settings are 115200 8n1.

IMG_20141018_160302

After adding the following wizardry onto the PCB I finally accessed u-boot.

IMG_20141018_164702

u-boot has tftp, usb and a lot of other things, but usb won’t work. It only supports EHCI that’s inside the chip, but not the xHCI on PCIe. There’s also a 128MiB NAND chip onboard, our excuse for a firmware lives there.
At first I booted the stock firmware with UART console opened. You can login there with your admin password. Next I backed up the root filesystem on a flash jump drive and gave it a quick review. A few of the most ‘tasty’ places below:

1. Loading userlist. Didn’t notice those unlink() in PHP documentation…

<?php
if (!empty($_FILES)) {
 //echo $_FILES['file']['tmp_name'];
 system("rm -f /tmp/import_users");
 move_uploaded_file($_FILES['fileToUpload']['tmp_name'],"/tmp/import_users");
}
?>

2. At the end of this script there’s the answer to the question “Why so slow?”. That’s how a true Jedi fights race conditions!

<?php
//if(!isset($_REQUEST['name'])) throw new Exception('Name required');
//if(!preg_match('/^[-a-z0-9_][-a-z0-9_.]*$/i', $_REQUEST['name'])) throw new Exception('Name error');
//
//if(!isset($_REQUEST['index'])) throw new Exception('Index required');
//if(!preg_match('/^[0-9]+$/', $_REQUEST['index'])) throw new Exception('Index error');
//
//if(!isset($_FILES['file'])) throw new Exception('Upload required');
//if($_FILES['file']['error'] != 0) throw new Exception('Upload error');
 
//201308 Sean Add for upload security (Consumer Storage Security Vurnubility)
$ip = gethostbyaddr($_SERVER['SERVER_ADDR']);
$result = @stripslashes( @join( @file( "http://".$ip."/cgi-bin/nas_sharing.cgi?cmd=71&uuid=".$_SERVER['REMOTE_ADDR']),"" ));
$equal = strcmp($result, "success");
if ($equal != 0)
{
        header("HTTP/1.1 302 Found");
        exit();
}
//201308 Sean Add for upload security (Consumer Storage Security Vurnubility)
 
 
$path = str_replace('//','/',$_REQUEST['folder']);
$filename = str_replace('\\','',$_REQUEST['name']);
$target =  $path . $filename . '-' . $_REQUEST['index'];
 
//$target =  $_REQUEST['folder'] . $_REQUEST['name'] . '-' . $_REQUEST['index'];
 
move_uploaded_file($_FILES['file']['tmp_name'], $target);
 
 
//$handle = fopen("/tmp/debug.txt", "w+");
//fwrite($handle, $_FILES['file']['tmp_name']); 
//fwrite($handle, "\n"); 
//fwrite($handle, $target); 
//fclose($handle); 
 
// Might execute too quickly.
sleep(1);
 
?>

And you’ll tons of stuff like this out there. But let’s leave the code review to the people who do web development on a regular basis and stick to hardware.

I started by cloning a fresh kernel from kernel.org, namely – 3.17.1. Then figured where all those leds and buttons are wired to and wrote a dts file, youcan find at my github repo in branch dlink-dns327l. mvebu_defconfig works nice as a start.

Next we need to find out where the root filesystem would come from. Since 128MiB NAND isn’t much, we have either USB option or SATA option. I picked the usb option.

Note – for this CPU – (armv7 + vfpv3 WITHOUT NEON SIMD) we can either use debian armel or raspbian. Debian armhf AFAIK needs NEON to function properly.

After initial setup was done, we need to setup weltrend mcu so that it would control our fan. Luckily theformware of the MCU is the same as in DNS320L, so we can just use this [url=http://www.aboehler.at/hg/dns320l-daemon]dns320l-daemon[/url]. temperature map looks like a bit off, but you can always compensate that in config. My /etc/dns320l-daemon.ini:

[Serial]
Port = /dev/ttyS1
NumberOfRetries = 3
 
[Daemon]
ServerPort = 57367
ServerAddr = 127.0.0.1
SyncTimeOnStartup = 0
SyncTimeOnShutdown = 0
 
[GPIO]
SysfsGpioDir = /sys/class/gpio
PollTime = 100
 
[Fan]
PollTime = 10
TempLow  = 30
TempHigh = 36
Hysteresis = 2

I also threw together this init script for it:

### BEGIN INIT INFO
# Provides:          dns320l-daemon
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: DNS320L daemon
# Description:       Fan and rtc daemon
#                    
### END INIT INFO
 
# Author: Andrew 'Necromant' Andrianov <spam [at] ncrmnt [dot] org>
#
# Do NOT "set -e"
 
# PATH should only include /usr/* if it runs after the mountnfs.sh script
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="NAS Fan and RTC daemon"
NAME=dns320l-daemon
DAEMON=/usr/sbin/$NAME
DAEMON_ARGS=""
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
 
# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0
 
# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME
 
# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh
 
# Define LSB log_* functions.
# Depend on lsb-base (>= 3.2-14) to ensure that this file is present
# and status_of_proc is working.
. /lib/lsb/init-functions
 
 
dns_cmd()
{
{ echo "${*}"; sleep 1; } |  telnet localhost 57367
}
 
#
# Function that starts the daemon/service
#
do_start()
{
        # Return
        #   0 if daemon has been started
        #   1 if daemon was already running
        #   2 if daemon could not be started
        start-stop-daemon --start --quiet --exec $DAEMON --test > /dev/null \
                || return 1
        start-stop-daemon --start --quiet --exec $DAEMON -- \
                $DAEMON_ARGS \
                || return 2
        dns_cmd hctosys
 
        # Add code here, if necessary, that waits for the process to be ready
        # to handle requests from services started subsequently which depend
        # on this one.  As a last resort, sleep for some time.
}
 
#
# Function that stops the daemon/service
#
do_stop()
{
        dns_cmd systohc
        killall -9 dns320l-daemon
        return 0
}
 
#
# Function that sends a SIGHUP to the daemon/service
#
do_reload() {
        #
        # If the daemon can reload its configuration without
        # restarting (for example, when it is sent a SIGHUP),
        # then implement that here.
        #
        start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --name $NAME
        return 0
}
 
case "$1" in
  start)
        [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
        do_start
        case "$?" in
                0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
                2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
        esac
        ;;
  stop)
        [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
        do_stop
        case "$?" in
                0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
                2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
        esac
        ;;
  status)
        status_of_proc "$DAEMON" "$NAME" && exit 0 || exit $?
        ;;
  #reload|force-reload)
        #
        # If do_reload() is not implemented then leave this commented out
        # and leave 'force-reload' as an alias for 'restart'.
        #
        #log_daemon_msg "Reloading $DESC" "$NAME"
        #do_reload
        #log_end_msg $?
        #;;
  restart|force-reload)
        #
        # If the "reload" option is implemented then remove the
        # 'force-reload' alias
        #
        log_daemon_msg "Restarting $DESC" "$NAME"
        do_stop
        case "$?" in
          0|1)
                do_start
                case "$?" in
                        0) log_end_msg 0 ;;
                        1) log_end_msg 1 ;; # Old process is still running
                        *) log_end_msg 1 ;; # Failed to start
                esac
                ;;
          *)
                # Failed to stop
                log_end_msg 1
                ;;
        esac
        ;;
  *)
        #echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
        echo "Usage: $SCRIPTNAME {start|stop|status|restart|force-reload}" >&2
        exit 3
        ;;
esac

Nextthe LEDS. HDD leds are paired white + amber, white works for activity indication, amber is plain GPIO. I wanted amber LEDs to indicate when an HDD got its first reallocated sector. (That’s when I normally change the HDD). Easy done with a script in /etc/cron.hourly/:

#!/bin/bash
 
# A more or less hacky way to get drives in the manner they are PHYSICALLY wired
# in the system. We can't use /dev/sdX names, because they may swap in respect to
# which initializes first. Numbering starts with '1'
 
get_drive()
{
        DRIVE=`find /sys/|grep ata${1}|grep events_async|cut -d"/" -f 12`
        echo "/dev/$DRIVE"
}
 
check_drive()
{
        DISK=`get_drive $1`
        REALLOC=`smartctl -A $DISK|grep Reallocated_Sector_Ct | awk '{print $10}'`
        if [ "$REALLOC" != "0" ]; then
                echo default-on &gt; /sys/class/leds/$2/trigger
                echo "Yikes! Drive $DISK has $REALLOC reallocs!"
        else
                echo "Drive $DISK is operational"
                echo none &gt; /sys/class/leds/$2/trigger
        fi
}
 
check_drive 1 dns327l:amber:sata-l
check_drive 2 dns327l:amber:sata-r

That should do it. However nobody told you you shouldn’t properly setup smartd.

As the web interface – I picked ajenti as the most and only sane solution.

9

And it all worked until I got my first hand freeze in some 3 to 4 hours of uptime. The first bad thing was DLink’s hardware initialization script that I overlooked. Took me a little while to figure it out:

#!/bin/sh
 
echo "hardware init"
 
# enable usb power
mem_rw -w -t 1 -o 0x18100 -v 0x2010
 
#for SPI clock
mem_rw -w -t 1 -o 0x1100c -v 0xfb
 
#/* hardware request phy */
mem_rw -w -t 1 -o 0x184e0 -v 0xa8a
 
mem_rw -w -t 2 -o 22 -v 0x2
mem_rw -w -t 2 -o 25 -v 0x77
mem_rw -w -t 2 -o 24 -v 0x5747
mem_rw -w -t 2 -o 22 -v 0
 
# modify for hw sata eye
mem_rw -w -t 1 -o 0xA2834 -v 0xc92a
mem_rw -w -t 1 -o 0xA283c -v 0xaa2b
 
mem_rw -w -t 1 -o 0xA4834 -v 0xc92a
mem_rw -w -t 1 -o 0xA483c -v 0xaa2b

Weird folk at dlink made their own /dev/mem called /dev/REG, ontop of which works a tool called mem_rw. the first line turns on USB power by setting the GPIO register. (GPIO subsystem? Nah! That’s for girls!)
Although, on the other hand – should they have shoved that into kernel instead of u-boot – I might have never known any of it. At the time of writing there’s no GPL sourcedrop from dlink.

Next line sets i2c speed. Why did they comment it as SPI? Dunno. But I guess functional reference won’t lie.

The last 4 lines are also simple – they set the mux registers and set LEDs to indicate “SATA activity”.

the thing in between – black magic. I only started to understand it spending a few time with strace, оbjdump and functional reference the next evening. The black magic with my comments follows

# Write IO config reg, set RGMII voltage to 1.8 volt from 3.3 volts.
mem_rw -w -t 1 -o 0x184e0 -v 0xa8a
 
# The following is written to eth PHY register. 
# Datasheet is confidential, so I could only partially
# decipher it looking at linux kernel driver code.  
#
# Select page #2
mem_rw -w -t 2 -o 22 -v 0x2
 
# BLACK MAGIC HAPPENS
mem_rw -w -t 2 -o 25 -v 0x77
mem_rw -w -t 2 -o 24 -v 0x5747
 
# Select page 0
mem_rw -w -t 2 -o 22 -v 0

So first we set RGMII voltage level to 1.8 volts instead of 3.3, next write something to ethernet PHY registers on page 2. Since the datasheet is proprietary and confidential and I have to access to it – I can only make a wild guess that it involves some magic to tell the PHY we’re using some different voltage. May be some kind of line calibration for different logic level, I’ve no idea. Anyway, since doing it from userspace looks like a totally bad idea and I don’t want to screw kernel driver with any of this – I came up with the following magic that can be done in u-boot prompt. Just set it to an environment variable voodoo and do ‘run voodoo’ before actual system boot. Something like this:

voodoo=mw.l 0xd00184e0 0xa8a; phyWrite 0 16 2; phyWrite 0 19 77; phyWrite 0 18 5747; sleep 1; phyRead 0 19; phyRead 0 18; phyWrite 0 16 0;
bootcmd=run voodoo; nand read.e 0xa00000 0x500000 0x400000; bootm 0xa00000

This partially solved some issues, however occasional freezes are still frequent when using kernel.org kernel – every 2 to 5 hours. Stock kernel doesn’t have this problem, and you can boot even boot debian with it.

So right now we have two options to run a proper linux userspace on DNS-327L.
1. With Stock kernel:

  • idmapd will fail to start and nfs will have some problems, looks like they’ve omitted some features needed for nfs (sic!)
  • No /sys/class/leds on stock kernel
  • No way to boot stock kernel with root on usb flash: kernel disables usb power rail on boot
  • Doesn’t boot over nfs.
  • No watchdog driver compiled in
  • Likely no kernel updates from Dlink whatsoever
  • Slightly faster hddparm benchmarks compared to upstream (or is it my imagination?)
  • No sources

2. Upstream kernel:

  • Occasional freezes every several hours. Partially solved by hardware watchdog (See my watchdog.conf below). These look power-related – more frequent when using a different power brick (How?)
  • Can boot from usb flash
  • Simple to update the kernel

That’s all for now, but I guess I will be getting back to this hardware when either Dlink releases GPL sourcecode (finally) or when I have a spare minute. For now I’ve settled on a booting debian with the stock

Remaining mysteries:

  • DNS-320L had a GPIO line from weltrend micro to SoC that indicated that power button has been pressed. I couldn’t pinpoint that GPIO line on DNS-327L. Is it there?
  • Mysterious i2c device with address 0x13. Stock software looks like calling i2cset/i2cget internally to interface with it
  • Freezes! (A total showstopper)
  • GPIO lines for buttons RESET and USB BACKUP don’t want to bind to gpio-keys driver, causing infinite probe deferral. Likely an upstream bug

My watchdog.conf:

#ping                   = 172.31.14.1
#ping                   = 172.26.1.255
#interface              = eth0
#file                   = /var/log/messages
#change                 = 1407

# Uncomment to enable test. Setting one of these values to '0' disables it.
# These values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher than 25)
#max-load-1             = 24
#max-load-5             = 18
#max-load-15            = 12

# Note that this is the number of pages!
# To get the real size, check how large the pagesize is on your machine.
#min-memory             = 1

#repair-binary          = /usr/sbin/repair
#repair-timeout         = 
#test-binary            = 
#test-timeout           = 

watchdog-device = /dev/watchdog
watchdog-timeout = 60
interval = 15

# Defaults compiled into the binary
#temperature-device     =
#max-temperature        = 120

# Defaults compiled into the binary
#admin                  = root
#interval               = 1
#logtick                = 1
#log-dir                = /var/log/watchdog

# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
realtime                = yes
priority                = 20

# Check if syslogd is still running by enabling the following line
#pidfile                = /var/run/syslogd.pid   

5 thoughts on “Dlink DNS-327L: How you should NOT make a NAS”

  1. 700GB in 18H => 11MByte/S, Did you check for 1 Gbit netadapter?
    I get 60MByte with Realtek

    By the way: There’s a “Funz” add on out there for 320 – Did you check it for working with 327L?

  2. Hello, do you know how one can use the NFS? Still trying with different -o but get no result at all

  3. hi, thanks for review
    at the moment I wondering whether to buy D-Link DNS-327L or ZYXEL NSA325 v2
    Fine! After reading your article, definitely not D-Link. thanks
    that Zyxel is also not perfect, I will look on other models too

  4. Hmm, so was reading some good stuff about this until I found your blog post.

    Should I stay well clear of D-Link stuff then.
    Basically just want a network accessible drive to store all my family photos and films, that I can access from my laptop and maybe stream to my TV.
    Is all the stuff you mentioned above a deal breaker, because for the price there is nothing similar on the market that I can find unless I go well over £100.
    Cheers

  5. I should have found your blog before i bought it. Regret it now. Hated the mobile app. No improvement, bad UIs. Used WD before and it gave much much ^99999 better user exp. Dlink is probably ran as a cash cow that just aim for profit, not product quality.

Leave a Reply