Gdium linux kernel support status

Apr 16th, 2010 | Posted by yajin | Filed under kernel, loongson

After several days working, the 2.6.34-rc2 kernel is working on gdium expect sound. Of course most of the codes are from mandriva and Philippe's work.

I will make some code clean and make the sound work in the next few days. It seems the sm501 sound driver needs a hardcoded 8051 firmware to work. Damn it. After these works are done, I will send the patches to loongson-dev maillist and merge it to linux-loongson-community and linux-mips mainline at last.

I am keeping moving....... Please wait.

ps: The linux kernel for gdium repository is here.

kill-bill:~# uname -a
Linux kill-bill 2.6.34-rc2 #24 PREEMPT Fri Apr 16 21:01:51 CST 2010 mips64 GNU/Linux
kill-bill:~# cat /proc/cpuinfo
system type : dexxon-gdium-2f-10inches
processor : 0
cpu model : ICT Loongson-2 V0.3 FPU V0.1
BogoMIPS : 598.01
wait instruction : no
microsecond timers : yes
tlb_entries : 64
extra interrupt vector : no
hardware watchpoint : yes, count: 0, address/irw mask: []
ASEs implemented :
shadow register sets : 1
core : 0
VCED exceptions : not available
VCEI exceptions : not available

Install debian lenny on yeeloong 8089/8101

Mar 27th, 2010 | Posted by yajin | Filed under loongson

NOTICE/TIPS:
[For one want to install the debian 6.0, there is a more easy way. See the following link.
http://www.anheng.com.cn/loongson/install/readme.txt (In Chinese).]

Yesterday I installed the debian lenny on yeeloong 8101, the 10.1 inch notebook based on loongson 2F CPU for a friend. Then I find there is less English document describing how to do this. So I write the process down to anyone who is interested in installing debian on yeloong. There are many ways to do it I choose the way of using a debian network installer. Please make sure you have a internet connection first.

1. First download the kernel and initrd to your PC.

wget http://dev.lemote.com/drupal/sites/default/files/kernel-2.6.27-LM8089.tar.gz
wget http://dev.lemote.com/drupal/sites/default/files/initrd_yl_netboot.gz

2. Decompress kernel on your PC.

tar zxvf kernel-2.6.27-LM8089.tar.gz

You will get the kernel vmlinux and the directory named lib. The lib directory contains all the kernel modules.

3. Format your USB disk with ext2 partition and copy vmlinux, directory lib and initrd_yl_netboot.gz to the usb disk.

4. Insert the usb disk to your netbook and boot it

5. Enter the PMON command line.

There are two ways to enter the PMON(the bootloader of yeeloong) command line. One is press DEL when booting. The other way is click C when you see the boot menu.

Use the following commands to load the kernel and initrd, which contains the debian network installer.

load /dev/fs/ext2@usb0/vmlinux
initrd /dev/fs/ext2@usb0/initrd_yl_netboot.gz

Please be patient. The initrd command may need more than 5 miniutes to be finished.

Sometimes the PMON bootloader may hang when you boot with a usb disk inserted. I do not know why. The workaround is booting into the default linux system and inserting the usb disk and then rebooting. Or you can use a tftp method to load the kernel and initrd.

At last use the following command to launch the debian network installer.

g console=tty no_auto_cmd

Then just install the debian as normal.

6. Install debian lenny

After debian configurating the DHCP, it will complain about "no kernel modules were found" and will let you choose "continue the install without loading kernel modules?", just choose Yes(the default answer is No) to continue.

When in the part of Partition disks, it will complain about "The current kernel doesn't support the Logical Volume Manager. You may need to load the lvm-mod modules" and the background becomes red. Do not be scared. Just click continue. :)

Then everything goes as it should be. But at last, debian installer will say "no installable kernel was found in the defined APT sources.... Continue without installing a kernel". Do not click Yes too quickly. We need to copy the kernel and all the modules into new system first. Please make sure that the USB disk is still inserting on the notebook. Use ALT+F2 to active a console. Mount the use disk and copy kernel and libs.

mount /dev/sda1 /target/mnt
cp /mnt/vmlinux /target/boot
cp -rf /mnt/lib/modules /target/lib/

Then click ALT+F1 return to the debian installer. Click Yes to continue installing.

7. Install Desktop environment

You can install LXDE or gnome as your desktop. I prefer LXDE because it is light.

apt-get install lxde

Install the X server driver.

wget http://www.anheng.com.cn/loongson2f/lenny/xorg-server/xserver-xorg-video-siliconmotion_2.2.8-lemote.r04_mipsel.deb
dpkg -i xserver-xorg-video-siliconmotion_2.2.8-lemote.r04_mipsel.deb

Change the xorg.conf according to this link.

8. Trouble shooting

(1) My wifi does not work

You can see "rtl8187: rtl8187_open process failed because radio off" if you use dmesg to see the message. Use FN+F5 to turn on the wifi first. You will see such message "rtl8187: SCI interrupt Methord Will Turn Radio On" on your console.

(2) My sound does not work

Use alsamixer to adjust the volume. But install alsa-utils first.

(3) OOPS, I forget to copy kernel to my new installed system. I can not boot it now. What should I do?

You can load the kernel using tftp method.

UNSW Advanced OS about L4

Jan 22nd, 2010 | Posted by yajin | Filed under Micro-kernel

For someone who is interested in OS and micro-kernel especially L4.

Part 1:

Part 2:

Read more...

performance of loongson 2f

Oct 30th, 2009 | Posted by yajin | Filed under loongson

Some friends ask me about the performance of loongson2f. They want to know whether the performance of loongson 2f can surpass Marvell Sheeva CPU. Well I can not just say it's better or worse without giving the benchmark data.
Since there is a benchmark result of Marvell Sheeva CPU, we can run the same benchmark program on loongson 2f. The benchmark program is nbench.
Machine: gdium
OS: Debian squeeze
Kernel: Linux

1. gcc-4.3.4
CFLAGS = -s -static -Wall -O3

TEST Iterations/sec. Old Index New Index
    Pentium 90* AMD K6/233*
NUMERIC SORT 358.24. 9.19 3.02
STRING SORT 33.041 14.76 2.29
BITFIELD 5.5164e+07 9.46 1.98
FP EMULATION 47.402 22.75 5.25
FOURIER 4721.1 5.37 3.02
ASSIGNMENT 7.0534 26.84 6.96
IDEA 1597.4 24.43 7.25
HUFFMAN 575.17 15.95 5.09
NEURAL NET 4.2065 6.76 2.84
LU DECOMPOSITION 107.28 5.56 4.01

==========ORIGINAL BYTEMARK RESULTS========
INTEGER INDEX : 16.297
FLOATING-POINT INDEX: 5.864
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==========LINUX DATA BELOW============
CPU                 :
L2 Cache            :
OS                  : Linux 2.6.24-gdium-1
C compiler          : gcc version 4.3.4 (Debian 4.3.4-5)
libc                : libc-2.9.so
MEMORY INDEX        : 3.156
INTEGER INDEX       : 4.918
FLOATING-POINT INDEX: 3.252
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

2. gcc-4.4
CFLAGS = -s -static -Wall -O3 -fomit-frame-pointer -funroll-loops
CFLAGS += -march=loongson2f  -mtune=loongson2f  -mabi=n32

TEST Iterations/sec. Old Index New Index
    Pentium 90* AMD K6/233*
NUMERIC SORT 366.08 9.39 3.08
STRING SORT 46.686 20.86 3.23
BITFIELD 4.764e+07 8.17 1.71
FP EMULATION 90.2 43.28 9.99
FOURIER 5171.9 5.88 3.30
ASSIGNMENT 11.094 42.21 10.95
IDEA 1726.9 26.41 7.84
HUFFMAN 605 16.78 5.36
NEURAL NET 9.761 15.68 6.60
LU DECOMPOSITION 215.64 11.17 8.07

==========ORIGINAL BYTEMARK RESULTS========
INTEGER INDEX       : 20.035
FLOATING-POINT INDEX: 10.100
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==========LINUX DATA BELOW============
CPU                 :
L2 Cache            :
OS                  : Linux 2.6.24-gdium-1
C compiler          : gcc-4.4
libc                : libc-2.9.so
MEMORY INDEX        : 3.922
INTEGER INDEX       : 5.997
FLOATING-POINT INDEX: 5.602
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

qemu mips msub instruction emulation bug

Oct 15th, 2009 | Posted by yajin | Filed under ARM/MIPS, emulation

It's really a long time since last post. Now I am working on the android mips porting project. I want to run android on the MIPS emulator.

The problem is that when I run mips-android on qemu, it hangs when executing init program in the initramfs root file-system. Then I use the remote gdb to debug the init and finds out that it it because pa_workspace is not initiated.

Function ashmem_create_region will open /dev/ashmem and return the fd if succeed. However, it returns -1 and the errno is 19 which means NO SUCH DEVICES.

fd = ashmem_create_region("system_properties", size);

The problem is who is responsible for creating /dev/ashmem?

In fact, in android it uses udev mechanism to create devices in /dev when executing function device_init. The full cold patch to create a device is as following:

device_init->coldboot->do_coldboot->write(fd, "add\n", 4)->handle_device_fd->handle_device_event->make_device

In function parse_event, it will parse the uevent msg and then pass uevent to handle_device_event. However, I find that the uevent message is a little weird. I use remote gdb to dump this message.

0x7ff3d250:     "add@/class/tty/console"
0x7ff3d267:     "ACTION=add"
0x7ff3d272:     "DEVPATH=/class/tty/console"
0x7ff3d28d:     "SUBSYSTEM=tty"
0x7ff3d29b:     "MAJOR=+"
0x7ff3d2a3:     "MINOR=/"
0x7ff3d2ab:     "SEQNUM=31+"
0x7ff3d2b6:     ""
0x7ff3d2b7:     ""
0x7ff3d2b8:     ""

You see, in the message the MAJOR is +. It will confuse the parse_event so that the corresponding device won't be created.

It looks the kernel passes wrong uevent message to user space. So the question is who has messed up the uevent message?

Then I recall that when booting linux kernel, there are some weird messages.

Primary instruction cache .kB, VIPT, 2-way, linesize 1* bytes.
Primary data cache .kB, 2-way, VIPT, no aliases, linesize 1* bytes

You see the instruction cache is .kB and linesize is 1* bytes, not a valid number at all.

Then I suspect that something is wrong in kernel when parsing the numbers. So I use the remote gdb to debug the kernel again.

r4k_cache_init->probe_pcache->printk->vprintk->vscnprintf->vsnprintf->number->put_dec->put_dec_trunc

Function put_dec_trunc uses a unsigned int between[0,99999] as input and outputs the number as a string. But I find that when the input is 2, the output is '.', not the expected '2'. So maybe this function is the bad boy.

In the following function I assume q=2.

277 static char* put_dec_trunc(char *buf, unsigned q)
278 {
279         unsigned d3, d2, d1, d0;
280         d1 = (q>>4) & 0xf;            /*d1=0*/
281         d2 = (q>>8) & 0xf;            /*d2=0*/
282         d3 = (q>>12);                 /*d3=0*/
283
284         d0 = 6*(d3 + d2 + d1) + (q & 0xf);  /*d0=2*/
285         q = (d0 * 0xcd) >> 11;              /*q=0*/
286         d0 = d0 - 10*q;                     /*d==2*/
287         *buf++ = d0 + ''; /* least significant digit */
288         d1 = q + 9*d3 + 5*d2 + d1;   /*d1=0*/
289         if (d1 != 0) {               /* it is false so we won't get into here.*/
290                 q = (d1 * 0xcd) >> 11;
291                 d1 = d1 - 10*q;
292                 *buf++ = d1 + ''; /* next digit */
293
294                 d2 = q + 2*d2;
295                 if ((d2 != 0) || (d3 != 0)) {
296                         q = (d2 * 0xd) >> 7;
297                         d2 = d2 - 10*q;
298                         *buf++ = d2 + ''; /* next digit */
299
300                         d3 = q + 4*d3;
301                         if (d3 != 0) {
302                                 q = (d3 * 0xcd) >> 11;
303                                 d3 = d3 - 10*q;
304                                 *buf++ = d3 + '';  /* next digit */
305                                 if (q != 0)
306                                         *buf++ = q + '';  /* most sign. digit */
307                         }
308                 }
309         }    
310         return buf;
311 }

/*  MIPS uses a0/a1 to pass arguments.
 *  a0= address of bu
 *  a1= q = 2
 */
8015a040 <put_dec_trunc>:
8015a040:    00051202     srl    v0,a1,0x8        /* v0= q>>8*/
8015a044:    00053102     srl    a2,a1,0x4        /* a2= q>>4*/
8015a048:    30c6000f     andi    a2,a2,0xf        /* a2= (q>>4) & 0xf = d1 in line 280*/
8015a04c:    3048000f     andi    t0,v0,0xf        /* t0 = (q>>8) & 0xf = d2 in line 281*/
8015a050:    00054b02     srl    t1,a1,0xc        /* t1= (q>>12) = d3 in line 282*/
8015a054:    00c81021     addu    v0,a2,t0        
8015a058:    00491021     addu    v0,v0,t1          /* v0 = d1+d2+d3 in line 284*/
8015a05c:    24030006     li    v1,6
8015a060:    70433802     mul    a3,v0,v1          /*a3= 6*(d3 + d2 + d1)*/
8015a064:    30a5000f     andi    a1,a1,0xf         /*a1= q & 0xf*/
8015a068:    24020009     li    v0,9               /*v0=9*/
8015a06c:    00e56021     addu    t4,a3,a1          /*t4= 6*(d3 + d2 + d1) + (q & 0xf) = d0 in line 284*/
8015a070:    240b00cd     li    t3,205            /*t3= 0xcd*/
8015a074:    71223802     mul    a3,t1,v0          /*a3= 9*d3 in line 288. Apparently gcc has reordered the code.*/   
8015a078:    718b1802     mul    v1,t4,t3          /*v1= (d0*0xcd)*/
8015a07c:    24020005     li    v0,5
8015a080:    00e62821     addu    a1,a3,a2          /*a1= 9*d3 + d1 in line 288*/
8015a084:    71023002     mul    a2,t0,v0          /*a2= 5*d2 in line 288*/
8015a088:    00031ac2     srl    v1,v1,0xb         /*v1= (d0*0xcd)>>11 in line 285*/
8015a08c:    240a000a     li    t2,10             /*t2=10*/
8015a090:    01800013     mtlo    t4                /*put t4->lo. lo=t4= 6*(d3 + d2 + d1) + (q & 0xf) = d0 in line 284 */
8015a094:    706a0004     msub    v1,t2             /*hilo = v1*t2 - hilo = 10*(q) - hilo in line 286*/
8015a098:    00c51021     addu    v0,a2,a1
8015a09c:    00002812     mflo    a1                /*lo->a1*/    
8015a0a0:    00431821     addu    v1,v0,v1
8015a0a4:    24a20030     addiu    v0,a1,48         
8015a0a8:    00803821     move    a3,a0
8015a0ac:    a0820000     sb    v0,0(a0)

We just need to see the instruction in 0x8015a094, it is a msub instruction. The defination of msub is as following:

(HI,LO) = (HI,LO) - (GRP[RS]*GPR[RT])

Then after executing the instruction in 0x8015a094, the HI/LO should be 0/2. But qemu produces the value 0xffffffff/0xfffffffe, which is -2 indeed. Maybe this is the problem.

Then I need to find how qemu emulates msub instruction.

2178     case OPC_MSUB:
2179         {
2180             TCGv r_tmp1 = tcg_temp_new(TCG_TYPE_I64);
2181             TCGv r_tmp2 = tcg_temp_new(TCG_TYPE_I64);
2182             TCGv r_tmp3 = tcg_temp_new(TCG_TYPE_I64);
2183
2184             tcg_gen_ext32s_tl(t0, t0);
2185             tcg_gen_ext32s_tl(t1, t1);
2186             tcg_gen_ext_tl_i64(r_tmp1, t0);
2187             tcg_gen_ext_tl_i64(r_tmp2, t1);
2188             tcg_gen_mul_i64(r_tmp1, r_tmp1, r_tmp2);  /*r_tmp1= gpr[rs]*gpr[rt] */
2189             gen_load_LO(t0, 0);                /*t0 <- lo*/
2190             gen_load_HI(t1, 0);                /*t1 <- hi*/
2191             tcg_gen_extu_tl_i64(r_tmp2, t0);   /*r_tmp2 = 64bit expand of lo*/
2192             tcg_gen_extu_tl_i64(r_tmp3, t1);   /*r_tmp3 = 64bit expand of hi*/
2193             tcg_gen_shli_i64(r_tmp3, r_tmp3, 32);
2194             tcg_gen_or_i64(r_tmp2, r_tmp2, r_tmp3);
2195             tcg_temp_free(r_tmp3);
2196             tcg_gen_sub_i64(r_tmp1, r_tmp1, r_tmp2); /*r_tmp1= r_tmp1 - r_tmp2 = gpr[rs]*gpr[rt] - HI/LO */
2197             tcg_temp_free(r_tmp2);
2198             tcg_gen_trunc_i64_tl(t0, r_tmp1);
2199             tcg_gen_shri_i64(r_tmp1, r_tmp1, 32);
2200             tcg_gen_trunc_i64_tl(t1, r_tmp1);
2201             tcg_temp_free(r_tmp1);
2202             tcg_gen_ext32s_tl(t0, t0);
2203             tcg_gen_ext32s_tl(t1, t1);
2204             gen_store_LO(t0, 0);
2205             gen_store_HI(t1, 0);
2206         }
2207         opn = "msub";
2208         break;

You see, qemu makes an error emulation of msub instruction. It uses gpr[rs]*gpr[rt]-HI/LO and then put the results to HI/LO, which is different from the defination of msub instruction. I patched the qemu code and it works.

BTW: MIPS32 4KTM Processor Core Family Software User’s Manual version MD00016 gives an error operation of msub instruction on papge 253.

Operation:
   temp ← (HI || LO) - (GPR[rs] * GPR[rt])
   HI ← temp63..32
   LO ← temp31..0

Maybe this misleads the qemu developers. The latest qemu version has fixed this bug. We can see this instruction emulation in qemu-svn-20091014.

2195     case OPC_MSUB:
2196         {
2197             TCGv_i64 t2 = tcg_temp_new_i64();
2198             TCGv_i64 t3 = tcg_temp_new_i64();
2199
2200             tcg_gen_ext_tl_i64(t2, t0);
2201             tcg_gen_ext_tl_i64(t3, t1);
2202             tcg_gen_mul_i64(t2, t2, t3);  /*t2=GPR[RS]*GPR[RT] */
2203             tcg_gen_concat_tl_i64(t3, cpu_LO[0], cpu_HI[0]);  /*t3= HI/LO*/
2204             tcg_gen_sub_i64(t2, t3, t2); /*t2= HI/LO - GPR[RS]*GPR[RT] */
2205             tcg_temp_free_i64(t3);
2206             tcg_gen_trunc_i64_tl(t0, t2);
2207             tcg_gen_shri_i64(t2, t2, 32);
2208             tcg_gen_trunc_i64_tl(t1, t2);
2209             tcg_temp_free_i64(t2);
2210             tcg_gen_ext32s_tl(cpu_LO[0], t0);
2211             tcg_gen_ext32s_tl(cpu_HI[0], t1);
2212         }
2213         opn = "msub";
2214         break;

See this link for more information.

So finding this bug is really not easy. I have to dig dig and dig from userland to linux kernel and then to qemu until catching this bad qemu bug. Thanks to the remote gdb and gdb stub in qemu, it makes life easier.

Following is the patch of qemu.

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 3dded6c..0a1b461 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -2193,7 +2193,11 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
             tcg_gen_shli_i64(r_tmp3, r_tmp3, 32);
             tcg_gen_or_i64(r_tmp2, r_tmp2, r_tmp3);
             tcg_temp_free(r_tmp3);
-            tcg_gen_sub_i64(r_tmp1, r_tmp1, r_tmp2);
+            /* msub means HI/LO = HI/LO - GPR[RS]*GPR[RT],
+             * not HI/LO = GPR[RS]*GPR[RT] - HI/LO
+             */
+            //tcg_gen_sub_i64(r_tmp1, r_tmp2, r_tmp2);
+            tcg_gen_sub_i64(r_tmp1, r_tmp2, r_tmp1);
             tcg_temp_free(r_tmp2);
             tcg_gen_trunc_i64_tl(t0, r_tmp1);
             tcg_gen_shri_i64(r_tmp1, r_tmp1, 32);

Tags:

qemu internal part 3: memory watchpoint

Jul 15th, 2009 | Posted by yajin | Filed under emulation, qemu

In qemu there is an amazing feature – memory watchpoint. It can watch all the memory access including memory read, write or both of them. When guest os/application touches the memory region watched by qemu, a registered function will be called and you can do everything as you want in this function. The gdb stub in qemu uses it to implement the memory watch command.

The implemention of memory watchpoint is tricky in qemu. In last article of qemu internal, we know that when emulating memory access, qemu needs to distinguish the normal RAM read/write from memory mapped I/O read/write. If it is a memory mapped I/O address access, qemu will dispatch this access to the registered I/O emulation functions. Qemu use this mechanism to implement the memory watchpoint. When accessing the memory address watched by qemu, qemu will dispatch this access to the registered memory watch functions, even if this address is normal guest RAM address or memory mapped I/O address! Qemu will do all the magic things in these memory watch functions.

In the following, I will use an example to explain the whole process of memory watch implement of qemu.

80103c60 <memcpy>:
80103c60:       00801021        move    v0,a0
80103c64 <__copy_user>:
80103c64:       2cca0004        sltiu   t2,a2,4
80103c68:       30890003        andi    t1,a0,0x3
80103c6c:       15400068        bnez    t2,80103e10 <__copy_user+0x1ac>
80103c70:       30a80003        andi    t0,a1,0x3
80103c74:       1520003d        bnez    t1,80103d6c <__copy_user+0x108>
80103c78:       00000000        nop
80103c7c:       15000046        bnez    t0,80103d98 <__copy_user+0x134>
80103c80:       00064142        srl     t0,a2,0x5
80103c84:       11000017        beqz    t0,80103ce4 <__copy_user+0x80>
80103c88:       30d8001f        andi    t8,a2,0x1f
80103c8c:       00000000        nop
80103c90:       8ca80000        lw      t0,0(a1)

These asm lines are objdumped from linux 2.6.30 kernel for mips malta. Assume that I want to  watch the memory access of virtual address 0x804cd000(swapper_pg_dir in linux kernel).

First I insert the watchpoint into cpu.

cpu_watchpoint_insert(env, 0x804cd000, 4, BP_GDB | BP_MEM_ACCESS,
                        NULL);

And then I need to register the vm state changing call back functions.

qemu_add_vm_change_state_handler(spy_vm_state_change, NULL);

If register a1=0x804cd000, guest linux kernel will touch the watched memory region when pc is 0x80103c90, then qemu dispatches this access to the registered memory watch function, even if this access is a noram guest RAM access.The memory watch functions in qemu are in array watch_mem_read/watch_mem_write.

exec.c

2649 static CPUReadMemoryFunc *watch_mem_read[3] = {
2650     watch_mem_readb,
2651     watch_mem_readw,
2652     watch_mem_readl,
2653 };
2654
2655 static CPUWriteMemoryFunc *watch_mem_write[3] = {
2656     watch_mem_writeb,
2657     watch_mem_writew,
2658     watch_mem_writel,
2659 };

In function watch_mem_readl, it will call function check_watchpoint first.

exec.c

2622 static uint32_t watch_mem_readl(void *opaque, target_phys_addr_t addr)
2623 {
2624     check_watchpoint(addr & ~TARGET_PAGE_MASK, ~0x3, BP_MEM_READ);
2625     return ldl_phys(addr);
2626 }

2563 static void check_watchpoint(int offset, int len_mask, int flags)
2564 {
2565     CPUState *env = cpu_single_env;
2566     target_ulong pc, cs_base;
2567     TranslationBlock *tb;
2568     target_ulong vaddr;
2569     CPUWatchpoint *wp;
2570     int cpu_flags;
2571
2572     if (env->watchpoint_hit) {
2573         /* We re-entered the check after replacing the TB. Now raise
2574          * the debug interrupt so that is will trigger after the
2575          * current instruction. */
2576         cpu_interrupt(env, CPU_INTERRUPT_DEBUG);
2577         return;
2578     }
2579     vaddr = (env->mem_io_vaddr & TARGET_PAGE_MASK) + offset;
2580     TAILQ_FOREACH(wp, &env->watchpoints, entry) {
2581         if ((vaddr == (wp->vaddr & len_mask) ||
2582              (vaddr & wp->len_mask) == wp->vaddr) && (wp->flags & flags)) {
2583             wp->flags |= BP_WATCHPOINT_HIT;
2584             if (!env->watchpoint_hit) {
2585                 env->watchpoint_hit = wp;
2586                 tb = tb_find_pc(env->mem_io_pc);
2587                 if (!tb) {
2588                     cpu_abort(env, "check_watchpoint: could not find TB for "
2589                               "pc=%p", (void *)env->mem_io_pc);
2590                 }
2591                 cpu_restore_state(tb, env, env->mem_io_pc, NULL);
2592                 tb_phys_invalidate(tb, -1);
2593                 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
2594                     env->exception_index = EXCP_DEBUG;
2595                 } else {
2596                     cpu_get_tb_cpu_state(env, &pc, &cs_base, &cpu_flags);
2597                     tb_gen_code(env, pc, cs_base, cpu_flags, 1);
2598                 }
2599                 cpu_resume_from_signal(env, NULL);
2600             }
2601         } else {
2602             wp->flags &= ~BP_WATCHPOINT_HIT;
2603         }
2604     }
2605 }

When check_watchpoint is executed in the first time, env->watchpoint_hit is null. Then it will check whether the address is a watched address. If so, set the flag BP_WATCHPOINT_HIT in wp->flags(line 2583) and set env->watchpoint_hit to wp. Then it will find and invalidate the current translation block(line 2586-2592). If the flag BP_STOP_BEFORE_ACCESS in wp is not set, then qemu will translate the code from current pc(line 2596-2597) and resume the guest instruction emulation(line 2599). Function cpu_resume_from_signal will jump to line 256 in cpu-exec.c and rerun the emulation process from the lw instruction(pc=0x80103c90).

cpu-exec.c

255     for(;;) {
256         if (setjmp(env->jmp_env) == 0) {
257             env->current_tb = NULL;
258             /* if an exception is pending, we execute it here */
259             if (env->exception_index >= 0) {
260                 if (env->exception_index >= EXCP_INTERRUPT) {
261                     /* exit request from the cpu execution loop */
262                     ret = env->exception_index;
263                     if (ret == EXCP_DEBUG)
264                         cpu_handle_debug_exception(env);
265                     break;
266                 } else {

Why do qemu need to invalidate current translation block and regenerate the code? Because this memory access(pc=0x80103c90) is in the middle of a translation block. If we want to rerun this instruction, we need to regenerate the code from this instruction(pc=0x80103c90). Moreover before invalidating the translation block, qemu needs to sync the cpu state to guest cpu(cpu_restore_state). That’s because the cpu state in the middle of translation block is different from the actual cpu state. Understanding this process needs some knowledge of binary translation. If you find it is hard to understand, just ignore it.

Now qemu rerun the guest os from pc=0x80103c90. Because the memory address is a watched memory address, qemu will call watch_mem_readl->check_watchpoint again. But this time, env->watchpoint_hit is not null(qemu set it in last call), then it will call cpu_interrupt and return from function check_watchpoint. Then in watch_mem_readl it will call ldl_phys to fetch the value from guest RAM. Function cpu_interrupt in check_watchpoint  sets the CPU_INTERRUPT_DEBUG to flag to env->interrupt_request

Then qemu runs normally just like nothing has happened. Because the CPU_INTERRUPT_DEBUG has been set in env->interrupt_request, the main loop of cpu emulation will return.

cpu-exec.c

355                     if (interrupt_request & CPU_INTERRUPT_DEBUG) {
356                         env->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
357                         env->exception_index = EXCP_DEBUG;
358                         cpu_loop_exit();
359                     }

54 void cpu_loop_exit(void)
55 {
56     /* NOTE: the register at this point must be saved by hand because
57        longjmp restore them */
58     regs_to_env();
59     longjmp(env->jmp_env, 1);
60 }

Function cpu_loop_exit will do longjmp to line 256 in cpu-exec.c. Because env->exception_index is EXCP_DEBUG, it will break from the loop of function cpu_exec. Function cpu_exec returns to main_loop in vl.c.

vl.c

3800                 ret = cpu_exec(env);

3850             if (unlikely(ret == EXCP_DEBUG)) {
3851                 gdb_set_stop_cpu(cur_cpu);
3852                 vm_stop(EXCP_DEBUG);
3853             }

It will call gdb_set_stop_cpu and then vm_stop to stop the qemu. It the virtual state is changed, qemu will the call the callback functions registered by qemu_add_vm_change_state_handler. So the function spy_vm_state_change will be called.

In sum, when accessing the watched memory address, the memory watch functions will be called. It will call function check_watchpoint. Function check_watchpoint will set env->watchpoint_hit to current watchpoint and rerun the guest os/applicaton from current pc. Then memory watched functions will be called again. It will call function check_watchpoint. This time, function check_watchpoint just set the flag in env->interrupt_request and tells cpu to interrupt the emulation process. And then qemu will return to the main_loop and stop the vm. At last it will call the registered vm change state callback functions.

qemu internal part 2: softmmu

Jul 10th, 2009 | Posted by yajin | Filed under emulation, qemu

Qemu uses softmmu to accelerate the process of finding the mapping between guest physical address and host virtual address and the mapping between guest I/O region and qemu I/O emulation functions. In this article, I assume the guest page table size is 4K.

1. the two level guest physical page descriptor table

Qemu uses a two level guest physical page descriptor table to maintain the guest memory space and MMIO space. The table is pointed by l1_phys_map. Bits [31:22] is used to index first level entry and bits [21:12] is used to index the second level entry. The entry of the second level table is PhysPageDesc.

exec.c

146 typedef struct PhysPageDesc {
147     /* offset in host memory of the page + io_index in the low bits */
148     ram_addr_t phys_offset;
149     ram_addr_t region_offset;
150 } PhysPageDesc;

If the memory region is RAM, then the bits [31:12] of phys_offset means the offset of this page in emulated physical memory. If the memory region is memory mapped I/O, then the bits of [11:3] of phys_offset means the index in io_mem_write/io_mem_read array. When accessing this memory region, the functions in io_mem_write/io_mem_read of index phys_offset will be called.

2. register the guest physical memory

Function cpu_register_physical_memory is used to register a guest memory region. If phys_offset is IO_MEM_RAM then it means this region is guest RAM space. If the phys_offset >IO_MEM_ROM, then it means this memory region is MMIO space.

898 static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
899                                                 ram_addr_t size,
900                                                 ram_addr_t phys_offset)
901 {
902     cpu_register_physical_memory_offset(start_addr, size, phys_offset, 0);
903 }

Function cpu_register_physical_memory_offset will first find the PhysPageDesc in table l1_phys_map using the given guest physical address. If finding the entry, qemu will update the entry. If not finding the entry, then qemu creates a new entry and updates its value and insert this entry to the table at last.

In malta emulation, the following is the code to register malta RAM space.

hw/mips_malta.c

811     cpu_register_physical_memory(0, ram_size, IO_MEM_RAM);

3. register the mmio space

Before registering mmio space using cpu_register_physical_memory, qemu uses the function cpu_register_io_memory to register the I/O emulation functions to array io_mem_write/io_mem_read.

exec.c

2851 int cpu_register_io_memory(int io_index,
2852                            CPUReadMemoryFunc **mem_read,
2853                            CPUWriteMemoryFunc **mem_write,
2854                            void *opaque)

This function will return the index in array io_mem_write/io_mem_read and this index will be passed to function cpu_register_physical_memory via parameter phys_offset.

hw/mips_malta.c

malta = cpu_register_io_memory(0, malta_fpga_read,
                                   malta_fpga_write, s);

cpu_register_physical_memory(base, 0x900, malta);

4. softmmu

Given the guest virtual address, how does qemu find the corresponding host virtual address? First qemu needs to translate the guest virtual address to guest physical address. Then qemu needs to find the PhysPageDesc entry in table l1_phys_map and get the phys_offset. At last qemu should add phys_offset to phys_ram_base to get the host virtual address.

Qemu uses a softmmu model to speed up this process. Its main idea is storing the offset of guest virtual address to host virtual address in a TLB table. When translating the guest virtual address to host virtual address, it will search this TLB table firstly. If there is an entry in the table, then qemu can add this offset to guest virtual address to get the host virtual address directly. Otherwise, it needs to search the l1_phys_map table and then fill the corresponding entry to the TLB table. The index of this TLB table is bits [19:12] of guest virtual address and there is no asid field in tlb entry. This means the TLB table needs to be flushed in process switch!

This TLB table idea is just like the most traditional hardware TLB. However, to MIPS cpu, there is another mmu model in qemu. Unlike x86 cpu, MIPS does NOT care about hardware page table. Instead it uses hardware TLB which is NOT transparent to software. Maybe It is another topic I will explain in another article. What we need to understand here is that the softmmu model in this article is not the mmu model of MIPS cpu itself.

Moreover, besides helping speed up the process of translating guest virtual address to host virtual address, this softmmu model can speed up the process of dispatching I/O emulation functions according to guest virtual address too. In this case, the idex of I/O emulation functions in io_mem_write/io_mem_read is stored in iotlb.

The format of TLB entry is as flowing:

cpu-defs.h

176     CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
177     target_phys_addr_t iotlb[NB_MMU_MODES][CPU_TLB_SIZE];  

108 typedef struct CPUTLBEntry {
109     /* bit TARGET_LONG_BITS to TARGET_PAGE_BITS : virtual address
110        bit TARGET_PAGE_BITS-1..4  : Nonzero for accesses that should not
111                                     go directly to ram.
112        bit 3                      : indicates that the entry is invalid
113        bit 2..0                   : zero
114     */
115     target_ulong addr_read;
116     target_ulong addr_write;
117     target_ulong addr_code;
124     target_phys_addr_t addend;
131 } CPUTLBEntry;

Field addr_read/write/code stores the guest virtual address for TLB entry. It is the tag of this entry. Filed addend is the offset of host virtual address to guest virtual address. We can add this value to guest virtual address to get the host virtual address.

addend = host_virtual_address – guest_virtual_address

host_virtual_address = phys_ram_base(qemu variable) + guest_physical_address – guest_physical_address_base(0 in MIPS)

The iotlb stores the index of I/O emulation function in io_mem_write/io_mem_read.

Function __ldb_mmu/__ldl_mmu/__ldw_mmu is used to translating the guest virtual address to host virtual address or dispatching guest virtual address to I/O emulation functions.

softmmu_template.h

86 DATA_TYPE REGPARM glue(glue(__ld, SUFFIX), MMUSUFFIX)(target_ulong addr,
87                                                       int mmu_idx)
88 {
89     DATA_TYPE res;
90     int index;
91     target_ulong tlb_addr;
92     target_phys_addr_t addend;
93     void *retaddr;
94
95     /* test if there is match for unaligned or IO access */
96     /* XXX: could done more in memory macro in a non portable way */
97     index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
98  redo:
99     tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
100     if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
101         if (tlb_addr & ~TARGET_PAGE_MASK) {
102             /* IO access */
103             if ((addr & (DATA_SIZE - 1)) != 0)
104                 goto do_unaligned_access;
105             retaddr = GETPC();
106             addend = env->iotlb[mmu_idx][index];
107             res = glue(io_read, SUFFIX)(addend, addr, retaddr);
108         } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
109             /* slow unaligned access (it spans two pages or IO) */
110         do_unaligned_access:
111             retaddr = GETPC();
112 #ifdef ALIGNED_ONLY
113             do_unaligned_access(addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
114 #endif
115             res = glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(addr,
116                                                          mmu_idx, retaddr);
117         } else {
118             /* unaligned/aligned access in the same page */
119 #ifdef ALIGNED_ONLY
120             if ((addr & (DATA_SIZE - 1)) != 0) {
121                 retaddr = GETPC();
122                 do_unaligned_access(addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
123             }
124 #endif
125             addend = env->tlb_table[mmu_idx][index].addend;
126             res = glue(glue(ld, USUFFIX), _raw)((uint8_t *)(long)(addr+addend));
127         }
128     } else {
129         /* the page is not in the TLB : fill it */
130         retaddr = GETPC();
131 #ifdef ALIGNED_ONLY
132         if ((addr & (DATA_SIZE - 1)) != 0)
133             do_unaligned_access(addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
134 #endif
135         tlb_fill(addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
136         goto redo;
137     }
138     return res;
139 }

In this function, it will get the index of TLB table and compare the guest virtual address with the address stored in this tlb entry(line 97-100). If these two addresses match, it means this guest virtual address hits the tlb entry. Then qemu will determine this virtual address is a MMIO address or RAM address. If it is a MMIO address, get the index of IO emulation functions from env->iotlb and call these functions(line 103-117). If it is a RAM space, add the guest virtual address to addend to get the host virtual address(line 118-128). If there is no matched tlb entry, then fietch the entry from table l1_phys_map and insert the entry to tlb table(line 135).

5. an example

When fetching code from guest memory, the whole code path is as flowing:

cpu_exec->tb_find_fast->tb_find_slow->get_phys_addr_code->(if tlb not match)ldub_code(softmmu_header.h)->__ldl_mmu(softmmu_template.h)->tlb_fill->cpu_mips_handle_mmu_fault->tlb_set_page->tlb_set_page_exec

Tags: ,

qemu internal part 1: the code path of memory load emulation

Jul 8th, 2009 | Posted by yajin | Filed under emulation, qemu

In qemu, there are two different meanings of target. The first meaning of ‘target’ means the emulated target machine architecture. For example, when emulating mips machine on x86, the target is mips and host is x86. However, in tcg(tiny code generator), target has a different meaning. It means the generated binary architecture. In the example of emulating mips on x86, in tcg the target means x86 because tcg will generate x86 binary.

This article is based on qemu version 0.10.5 and target machine emulated is little endian mips. I will summarize the code path of mips lw instruction emulation in qemu.

Function decode_opc is used for decoding all the fetched instructions before tcg generating the target binary.

target-mips/translate.c

7566 static void decode_opc (CPUState *env, DisasContext *ctx)

7960     case OPC_LB ... OPC_LWR: /* Load and stores */
7961     case OPC_SB ... OPC_SW:
7962     case OPC_SWR:
7963     case OPC_LL:
7964     case OPC_SC:
7965          gen_ldst(ctx, op, rt, rs, imm);
7966          break;

It will call function gen_ldst which is also in target-mips/translate.c.

target-mips/translate.c

973 static void gen_ldst (DisasContext *ctx, uint32_t opc, int rt,
974                       int base, int16_t offset)

1046     case OPC_LW:
1047         op_ldst_lw(t0, ctx);
1048         gen_store_gpr(t0, rt);
1049         opn = "lw";
1050         break;

Function op_ldst_lw will generate the target binary which fetches the value from the emulated guest memory and gen_store_gpr will store this value to the emulated cpu’s general register rt.

Function op_ldst_lw is generated by the macro OP_LD.

target-mips/translate.c

901 #define OP_LD(insn,fname)                                        \
902 static inline void op_ldst_##insn(TCGv t0, DisasContext *ctx)    \
903 {                                                                \
904     tcg_gen_qemu_##fname(t0, t0, ctx->mem_idx);                  \
905 }

910 OP_LD(lw,ld32s);

We can find that op_ldst_lw is a function which calls function tcg_gen_qemu_ld32s. It will output the OPC(INDEX_op_qemu_ld32u) and args to gen_opc_ptr.

tcg/tcg-op.h

1793 static inline void tcg_gen_qemu_ld32s(TCGv ret, TCGv addr, int mem_index)
1794 {
1795 #if TARGET_LONG_BITS == 32
1796     tcg_gen_op3i_i32(INDEX_op_qemu_ld32u, ret, addr, mem_index);
1797 #else
1798     tcg_gen_op4i_i32(INDEX_op_qemu_ld32u, TCGV_LOW(ret), TCGV_LOW(addr),
1799                      TCGV_HIGH(addr), mem_index);
1800     tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
1801 #endif
1802 }

99 static inline void tcg_gen_op3i_i32(int opc, TCGv_i32 arg1, TCGv_i32 arg2,
100                                     TCGArg arg3)
101 {
102     *gen_opc_ptr++ = opc;
103     *gen_opparam_ptr++ = GET_TCGV_I32(arg1);
104     *gen_opparam_ptr++ = GET_TCGV_I32(arg2);
105     *gen_opparam_ptr++ = arg3;
106 }

The path of generation of target binary code of tcg is as following.

cpu_gen_code->tcg_gen_code->tcg_gen_code_common->tcg_reg_alloc_op->tcg_out_op

tcg/i386/tcg-target.c

856 static inline void tcg_out_op(TCGContext *s, int opc,
857                               const TCGArg *args, const int *const_args)

1041     case INDEX_op_qemu_ld32u:
1042         tcg_out_qemu_ld(s, args, 2);
1043         break;

431 static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
432                             int opc)

508 #if TARGET_LONG_BITS == 32
509     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EDX, mem_index);
510 #else
511     tcg_out_mov(s, TCG_REG_EDX, addr_reg2);
512     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_ECX, mem_index);
513 #endif
514     tcg_out8(s, 0xe8);
515     tcg_out32(s, (tcg_target_long)qemu_ld_helpers[s_bits] -
516               (tcg_target_long)s->code_ptr - 4);

In line 514, tcg outputs 0xe8 which means a call instruction in x86. It will call the functions in array qemu_ld_helpers. The args to the functions is passed by registers EAX,EDX and ECX.

tcg/i386/tcg-target.c

413 static void *qemu_ld_helpers[4] = {
414     __ldb_mmu,
415     __ldw_mmu,
416     __ldl_mmu,
417     __ldq_mmu,
418 };

These functions __ldb_mmu/__ldw_mmu are defined in softmmu_template.h.

softmmu_tempate.h

DATA_TYPE REGPARM glue(glue(__ld, SUFFIX), MMUSUFFIX)(target_ulong addr,
                                                      int mmu_idx)

In sum, function gen_ldst outputs the OPC(INDEX_op_qemu_ld32u) to gen_opc_ptr and tcg_out_op will generates the target binary according to the OPC. In the lw instruction emulation, it will generate the x86 binary calls the functions in softmmu_template.h.

Tags: ,

a little progress on qemu-loongson

Jun 10th, 2009 | Posted by yajin | Filed under emulation

Hi guys, it is about one month since posting last blog entry. These days I am really very busy preparing the GRE and Tofel test. Moreover I have to work to support my life. So I have to spend less time on qemu-loongson.

Anyway, there are progress these days.

  • Rewrite the GPIO I2C emulation for gdium. Now it is more clear than before.
  • add st4180 rtc emulation to qemu
  • add stds75 temperature sensor emulation to qemu
  • change a little in uart emulation to satisfy pmon’s uart probing process
  • fix a little bug in pflash_cfi02.c
  • fix gdb stub bug in qemu to support mips64
  • Follow is the uart output of qemu-loongson.

    kill-bill:/home/root/sdd/gdium/qemu-loongson/mips64el-softmmu# ./qemu-system-mips64el -M gdium -pflash gzrom.bin.gdb -nographic -S -s
    Register sst39vf040  size 80000  at offset 08800000 addr 1fc00000 'pflash0' 80
    devfn 70
    unassigned_mem_readl Unassigned mem read 000000001fbffffc
    unassigned_mem_readl Unassigned mem read 000000001fbffffc
    new_sm502_mm_io 7000000 pci_mem _base 10000000
    PMON2000 MIPS Initializing. Standby...
    PRID=00006302
    enable register space of MEMORY
    DDR2 config begin_whd
    DIMM read
    0000008000000008read DIMM number of rows
    read number of cols
    module data width
    DIMM SIZE=20000000
    cols rows:
    04030940DDR2 config end
    DDR2 DLL locked
    00000004
    disable register space of MEMORY
    jlliu : rom speed reg : 0x00000f8c
    Init SDRAM Done!
    Sizing caches...
    Init caches...
    godson2 caches found
    Init caches done, cfg = 00030932
    Copy PMON to execute location...
    copy text section done.
    Copy PMON to execute location done.
    sp=80ffc000...............new_sm502_mm_io 6000000 pci_mem _base 10000000
    cmd 7
    mmio 6000000
    FREQ
    FREI
    DONE
    DEVI
    ENVI
    MAPV
    nvram=bfc00000
    NVRAM is invalid!
    NVRAM@bfc00000
    STDV
    80100000: heap is already above this point
    SBDD
    P12PCIH

what's the difference between these two definitions

Apr 23rd, 2009 | Posted by yajin | Filed under kernel

I write this article because some guys are talking about it in CLF. The question is: what is the difference between the two following definitions:

A. const char temp[]="abc";

B. const char *temp="abc";

You may have your own answer already. But wait a moment, let me write some test cases first and you can see whether your answer is right or not. :)

(1) Test case 1

const char temp[]="abc";
int main()
{
temp[0]='c';
printf("temp %s \n",temp);
}
debian:~# gcc -o test test.c
test.c: In function `main':
test.c:8: error: assignment of read-only location `temp[0]'

(2) Test case 2

const char temp[]="abc";
char temp1[]="def";
int main()
{
temp = temp1;
printf("temp %s \n",temp);
}
debian:~# gcc -o test test.c
test.c: In function `main':
test.c:8: error: assignment of read-only variable `temp'

(3) Test case 3

const char* temp="abc";
char temp1[]="def";
int main()
{
temp = temp1;
printf("temp %s \n",temp);
}
debian:~# gcc -o test test.c
debian:~# ./test
temp def

(4) Test case 4

const char* temp="abc";
int main()
{
temp[0] = 'd';
printf("temp %s \n",temp);
}
debian:~# gcc -o test test.c
test.c: In function `main':
test.c:8: error: assignment of read-only location `*temp'

So the definition A means both temp and array is const and you can not change it. Definition B means temp points to a const string, which you can not change its content. But you can change temp itself.

Tags: