Quote:
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
Queuing for hope.
Lottery is a tax on the math-challenged, but I do think it is worth buying once in a while — when I perceive the odds are better.
(The $10 million draw may seem a lot, but it usually ends up $2 to $3 million due to the shared pot. In other words, don't go heavy on this draw.)
I don't buy often, though. My annual gambling budget is S$30 — $20 for 4D and $10 for Toto.
And that means I can't visit the IR! :cry:
PCSX treats the screen as 386 x 262 at 60 Hz, with VSync lasting 22 scan lines. As a result, it programs these values into the counters:
Counter | Rate | |
---|---|---|
Pixel clock | 5 | VSync / (386 * 262) |
Horizontal scan | 1973 | VSync / 262 |
Total vertical scan, NTSC | 564,480 | 33,868,800 / 60 |
Vertical blanking | 47,388 | 22 HSync |
This is pretty straightforward. PCSX uses 517,092 for VSync instead of the
correct 517,080 because it divides dumbly. However, it has a BIAS
value of 2 that makes all the values 2x smaller.
I'm not able to find the PSX's video clock frequency. Should it drive the digital display at 320x240 (with suitable overscan) or the analogue NTSC at 4.5 MHz?
A quick google search shows that NTSC is 286 x 262.5 at 59.94 Hz, with vertical blanking lasting 19.5 scan lines. (This is looking at the NTSC interlaced fields. If we look at frames, then it is 286 x 525 at 29.97 Hz, with 39 scan lines.)
PCSX should program these values instead:
Counter | Rate | |
---|---|---|
Pixel clock | 7.5264075 | VScan / (286 * 262.5) |
Horizontal scan | 2152.5526 | VScan / 262.5 |
Total vertical scan, NTSC | 565,045.05 | 33,868,800 / 59.94 |
Vertical blanking | 41,974.775 | 19.5 HSync |
Fractional bits must be used especially at low multipliers to retain accuracy.
So a fight broke out in a wet market because a woman accused a fishmonger of molesting her and the fishmonger's wife took offence.
How did the shoppers react?
Quote of the year:
I like this picture the best. It shows Singaporeans gawking as one united people regardless of race, language or religion with the SAF citizen-soldier head and shoulders above everyone else.
:clap:
The PSX has 4 root counters (timers):
PCSX keeps a CPU cycle count. It basically assumes 1-cycle per instruction.
When a timer event has occurred, PCSX records the current cycle and the target cycle to trigger the next event. Because the counters count in their own units, PCSX uses a multiplier to convert the target count to cycles.
Typical multipliers:
System clock | 1 |
1/8 system clock | 8 |
Pixel clock | VSync multiplier / (386 * 262) |
Horizontal retrace | VSync multiplier / 262 |
Vertical retrace (NTSC) | 33,868,800 / 60 - 22 HSync |
We can tell that the video screen size is 386x262 including the overscan.
The VSync multiplier is fixed. On NTSC, it is 517,092. It means the VSync will be triggered every so many CPU cycles. It is equal to 240 HSync. The VSync "retrace" lasts for 22 HSync.
When a timer event is triggered, PCSX sets the corresponding counter
interrupt bit at address 0x1f801070 which subsequently triggers
psxException()
in R3000A.c.
PCSX creates a 5th counter for asychronous SPU with a multiplier of 49,152,
meaning it will be called 689 times per second. It is only used if the SPU
plug-in supports SPU_async()
.
When a counter is updated, PCSX will look for the first counter event to be triggered and check against that. This allows it to do just one comparison instead of 4 (or 5). Every cycle counts because timer events are checked on every taken branch (something like 1 in 10-20 instructions).
There are two configurable hacks to make the timers trigger faster.
In theory, everything should work fine. However, PCSX uses a divisor of 2
(the so-called BIAS
) when it converts timer counts to CPU cycles.
This has the effect of triggering the timer events 2 times faster. Strangely,
everything still seems to work fine.
PCSX handles the root counters well, except for the fact that it does not take care to preserve as much precision as possible during conversion of units. The multipliers are also integers, so errors can add up over time.
For example, if PCSX calculates the VSync interval accurately, it should be triggered every 517,080 cycles. It is 12 cycles off every second.
A page entry has the value 0 for invalid memory regions. I would just point these pages to a dummy page:
pg = addr >> 16; pg_base = mem_r_lut[pg]; return pg_base[addr & 0xffff];
There is no special case. It works because we can map the write LUT invalid pages elsewhere, so our dummy read page is untouched.
If we need to detect h/w registers and special memory regions:
pg = addr >> 16; pg_base = mem_r_lut[pg]; if((UINT) pg_base >= 8) return pg_base[addr & 0xffff]; else return (*sp_mem_r_handlers[(UINT) pg_base]) (addr);
A LUT entry can either be a valid page or 0-7 for special memory handlers. (This is no faster than PCSX, but is more generic.)
There are still two problems with this modified code:
I would do it differently if I were to handle the PSX memory map from scratch.
Memory access can make or break an emulator. Emulators must be fast to be useful. If resources are constrained, then fast means being efficient.
Memory access is typically a 1-cycle event on a 32-bit CPU — we can assume a cache hit rate of 0.95. In other words, we have to be as efficient as possible.
We have already seen PSX's huge and non-contiguous 4 GB memory map. Let's see how PCSX handles it.
PCSX first allocates 2 MB for the RAM, 512 kB for the BIOS and 64 kB each for the parallel ports (unused) and the h/w registers.
It then allocates two 64k-entry LUTs (Look-Up Tables) — 256 kB each.
Each entry points to a 64 kB page or is NULL
for an invalid
memory region.
One LUT is used for read access and the other write access. This allows memory regions to be made read-only. However, this is only used for the BIOS.
The LUTs are very sparse: the read LUT only has 394 entries, and the write LUT has 286 entries (8 fewer).
Memory access is straightforward. Use the top 16 bits as the index into the LUT, get the page's base address and then add the bottom 16-bit offset to it. This is basically software-based paging.
pg = addr >> 16; // optional if known/assumed to be memory if(pg == 0x1f80) return read_hw_reg(addr); pg_base = mem_r_lut[pg]; if(pg_base != NULL) return pg_base[addr & 0xffff]; else return 0;
Two conditionals are needed. In many cases, PCSX assumes it must be a memory access, so it skips the first check. That still leaves one check.
The recompiler will fold away the page lookup for constant memory
locations. However, an unscientific profiling shows that only 27% of the
LW
instruction can be folded away.
Write is very similar, but PCSX needs to clear the recompiler's code memory and handle some special memory addresses.
pg = addr >> 16; // optional if known/assumed to be memory if(pg == 0x1f80) write_hw_reg(addr); pg_base = mem_w_lut[pg]; if(pg_base != NULL) { pg_base[addr & 0xffff] = value; invalidate_code(addr); } else { handle_special_mem(addr); }
PCSX's method is acceptable, but it can be made slightly more efficient. A known mem read can be done without any conditional checks. However, the main problem is that a mem write always writes to two memory locations.
From an online forum:
Greece, those clowns racked up $350 billion of external public debt with a population of 11 million?!
Holy calamari Batman!! That's $31,000 per Greek. What *******s!!
Who could be so foolish?
Wait, lemme check OUR numbers... $13 trillion / 300 million = $43,000 per American.
Holy flat screen Batman!!
And that doesn't even take into account the 3 trillion we have parked off balance sheet at the FED in some Enron-on-steroids scheme.
Haircuts are coming. A little at a time, or all at once — but they are coming. Doesn't mean the worlds gotta end, but I don't know if there is enough duct tape around to keep this game going.
Sit back and relax. Do not panic, everything is under control. :-D
At a random point in time in PCSX:
ld | 16.5% |
st | 8.0% |
br | 12.9% |
jal | 0.76% |
lui | 5.9% |
arth | 12.8% |
bit | 6.2% |
shift | 4.1% |
nop | 20.8% |
Others | 11.8% |
Load/store make up almost 25% of the instructions. Fast memory access is vital.
br
, used for conditionals and loops, is very common.
jal
are function calls. Not that common.
lui
is an indication that a 32-bit integer is needed. It is
quite a common occurrence, but MIPS has no direct way to load it.
arth
is add, sub. bit
is and, or, nor.
shift
is shift left and right.
nop
is surprisingly common. It must be due to the load and
branch delay slot.
Basically, a few key instructions make up the bulk of the instruction stream.
I feel the MIPS architecture has two major flaws:
nop
The net effect is that some common operations are effectively 2 cycles and 8 bytes.
However, it is not fair to criticise the MIPS architecture because it dates back to 1981! The processor complexity back then was different. Too bad we still have to live with the flaws 30 years down the road.
Data gathered with an instrumented PCSX. I implemented simple counters that are designed to have the least possible overhead.
I used the interpreter mode as my counting method is not accurate with the recompiler (when its peephole optimizations work).
Test | instr | COP0 | GTE | GPU | CD | SPU | MDEC |
---|---|---|---|---|---|---|---|
FF 6 normal | 8.1M | 1.4k | 0 | 2.5k | 2.8k | 1.0k | 0 |
FF 7 normal | 16.5M | 2.0k | 460k | 9.9k | 0 | 425 | 0 |
FF 8 normal | 13.7M | 2.2k | 271k | 539 | 0 | 131 | 0 |
FF 6 movie | 26.7M | 2.2k | 0 | 6.2k | 6.1k | 2.2k | 47 |
FF 7 movie | 27.9M | 8.6k | 183k | 14.5k | 9.0k | 0 | 49 |
FF 8 movie | 29.1M | 7.9k | 5.2k | 16.2k | 48.4k | 1.8k | 45 |
instr is the number of MIPS instructions per second, including COP0 and GTE (COP2).
GPU, CD, SPU and MDEC counts the number of requests/transactions per second.
The interpreter can emulate around 32M instructions/s on my CULV 1.2 GHz Core2Duo CPU. A drop indicates how heavily the other sub-systems are using the CPU.
COP0 is Memory Management and System Interrupt related. They are not as rare as I thought they would be.
GTE (COP2): complex and slow 3D instructions. Expect a slowdown when they show up.
GPU: will almost always be high, especially for a 2D scene, but it is usually fine because we have a separate GPU/core to take care of it.
CD: non-stop usually indicates streamed music. Random and one-off numbers usually means loading.
SPU: will almost always be present. Low throughput. Complexity is unknown.
MDEC: usually shows up during movies only. Contrary to my expectations, it does not seem to be computationally intensive at all.
When the sub-systems don't seem to be heavily loaded and yet the #instructions is low, it usually means there is some heavy memory transfer. For example, for FF 6 normal, I suspect the game is blitting to the screen on every VSync... my system can only manage 20-25 fps for it.
PCSX is a very simple program. It is written in a very modular fashion: each source file corresponds to one aspect or sub-system of the PSX. The core emulator is implemented in PsxHw.c, PsxInterpreter.c and PsxMem.c.
Once the user starts a game, PCSX executes psxCpu->Execute()
,
an infinite loop, on the main thread and never returns. The way it works is
that the emulator engine calls SysUpdate()
to handle the GUI on
every emulated VSync (60 Hz for NTSC).
PsxMem.c allocates memory for the various PSX memory regions and creates two 64k-entry LUTs (Look-Up Tables) to map the huge non-contiguous PSX memory. It uses one for read access and the other for write access. An entry is either 0 or points to a 64 kB page.
PsxInterpreter.c emulates the MIPS R3000 instructions.
It does not emulate load delay slot. It emulates branch delay slot for load
and co-processor instructions. GTE instructions are handled in gte.c. The core interrupt handling is done in R3000A.c. The tests are done in psxBranchTest()
on every taken branch.
PsxHw.c handles the h/w registers. It maps GPU and SPU registers directly to the plug-in function calls. Their DMAs are handled in psxDma.c.
MDEC is handled in Mdec.c.
CD-ROM requires additional handling in CdRom.c, which interfaces to the CD-ROM plug-in. It also uses Decode_XA.c to decode XA data.
Serial I/O is handled in Sio.c. It handles memory cards internally and interfaces to the Pad plug-in.
PsxCounters.c implements root counters. They are
updated in psxRcntUpdate()
, called by
psxBranchTest()
.
plugins.c loads the plug-ins. The functions are stored in function pointers and used by other files directly.
PsxBios.c contains an emulated BIOS.
Misc.c contains the CD-ROM and EXE file loader, and the core save state functionality, among others.
iR3000A.c is a direct replacement for PsxInterpreter.c. iGte.h replaces gte.c.
ix86.c contains the x86 code emitter functions.
The recompiler allocates its own PSX code memory (2.5 MB) and uses a 64k-entry LUT to map the non-contiguous PSX memory. The code memory points into a 8 MB native instruction store. When the store is filled up (down to the last 64 kB), it is reset and all recompiled instructions are lost.
The recompiler recompiles the instructions standalone up to a branch or 500 instructions, whichever comes first. It does some peephole optimization and constant propagation within a basic block.
The recompiler allows the code memory to be invalidated using
psxCpu->Clear()
. This is called by DMA writes.
psxMemWrite8/16/32()
also zeroes out the code memory directly
to support self-modifying code. However, this is likely not to work due to the
way the recompiled instructions are executed.
I am looking at PCSX, an open source PSX emulator. Its last official release was May 2003. Its team moved on to PCSX2, a PS2 emulator. A few groups forked the project, but they made very few changes to the core emulation.
The forked projects:
I have been thinking how to start. I can think of a few things to do:
There are two better PSX emulators: ePSXe and pSX. Both are closed source.
ePSXe 1.7.0 was last released in May 2008, a full 5 years after its 1.6.0 release in 2003. It uses the same PSEmu Pro based plug-in system as PCSX.
pSX 1.13 was last released in August 2007. It does not use any plug-ins.
Not many people are interested in PSX emulation any more. Its hay day is over.
The Sony Playstation 1 specs, mainly taken from Wiki:
CPU | MIPS R3000A (33.8688 MHz). 30 MIPS. |
Bus | 132 MB/s. |
GTE | Geometry Transformation Engine. 66 MIPS. 1.5M flat shaded or 500k textured polygon/s. |
MDEC | Data Decompression Engine. Decompressing images and video. 80 MIPS. Capable of 320x240 30 fps or 640x480 15 fps. |
GPU | Graphics Processing Unit. 360,000 flat-shaded or 180,000 texture-mapped and light-sourced polygon/s. 1 MB VRAM. |
SPU | Sound Processing Unit. ADPCM source. Max 24 channels. Max sampling rate 44.1 kHz. 512 kB RAM. |
CD-ROM | XA Mode 2, CD-DA, 300 kB/s. |
Memory card | 128 kB. |
Very impressive for a 1994 console.
The PSX can be emulated properly in a present day system (1.5 GHz onwards); it is a matter of efficiency.
We can get by without a powerful graphics card, because the 3D processing is done using the GTE, which is a co-processor to the CPU.
We can recompile instructions from one CPU architecture to another, achieving near native speed. Are we done? No, there is an even harder problem to solve: memory access.
There are three things that make it difficult to emulate a memory map:
An emulator needs to do these:
If the emulator does dynamic recompilation, it also needs to detect any changes to the code memory. This means additional checks.
It is easier to understand if we look at a real system, because memory maps are usually very system specific. Take for example Sony Playstation 1's memory map:
0x0000_0000 | 0x001f_ffff | 2 MB | RAM | |
0x1f00_0000 | 0x1f00_ffff | 64 kB | Parallel port | |
0x1f80_0000 | 0x1f80_03ff | 1 kB | Scratchpad | |
0x1f80_1000 | 0x1f80_2fff | 8 kB | H/W registers | |
0x8000_0000 | 0x801f_ffff | 2 MB | RAM mirror (cached) | |
0xa000_0000 | 0xa01f_ffff | 2 MB | RAM mirror (uncached) | |
0xbfc0_0000 | 0xbfc7_ffff | 512 kB | BIOS |
It is trivial to map a sparse 4 GB memory region in a 64-bit OS (provided it allows multiple mapping to the same underlying page — which we need for the mirrored regions).
We cannot do that on a 32-bit OS; we need to do everything in software. We have to define at least 3 regions, check which region is being accessed and handle it appropriately, so a general purpose memory access will always be at sub-native speed.
The above is the "official" memory map. However, PCSX, an open source PSX emulator, mirrors the first 2 MB memory at 0x0020_0000, 0x0040_000 and 0x0060_0000 too.
It also makes the BIOS read only.
There must be a reason why they are doing these.
My department is shifting to another building. The new cubes will be smaller — 50% smaller.
(My current cube is 48 sq feet, already downsized from its original 64 sq feet. The new cube is a mere 24 sq feet.)
I decided to change the way I work after looking at the mockup cube. I have 3 PCs, 2 LCD monitors and one set of keyboard/mouse. One of the PCs is already headless and I VPN in 99% of the time.
Now, I'll make the secondary PC headless too and use Remote Desktop to connect to it. I can remove one monitor and reclaim some desk space.
I will also be able to place the two headless PCs further away — perhaps not even in my cube!
I left home at 7:45 am. Just before I joined PIE, I saw the EMAS display "Massive Jam on PIE (to Tuas)". The EMAS is usually right and I do not want to be in a MASSIVE jam. I decided to take Farrer road instead.
Bad choice! Farrer road had the worse jam I have ever seen. After being stuck in the jam for a while and still at the peripheral of Toa Payoh, I decided to come back to Toa Payoh for my breakfast. Time: 8:30 am.
I'm going to be late for work.
I hope this is not a new trend. Just last week, I wanted to "detour" to Boon Lay to pack my lunch before I head for work. I left home at 8:15 am. I found a massive jam waiting for me at Eng Neo that supposedly originated at Clementi Ave 6. That's 10 km of jam! I decided to abort my plan and exit at Eng Neo. It still took me until 9:50 am to reach office!
There are a few reasons why I like to go to office early. This is one of them: avoiding jams. Jams are such a huge waste of time. Peak hours always have some sort of jam. However, there is also a good chance of a super-jam, usually due to an accident.
However, I can't seem to wake up as early these days. Some of my colleagues are already remarking that I'm coming to office later and later. (I'm known to reach the office early.)
(Note: the info here is based on an open source MIPS R3000A emulator.)
It is pretty easy to emulate the MIPS branch instructions, even with the branch delay slot. However, we need to handle the special case where (a) the branch is taken and (b) the delay instruction writes to a register used in the first target instruction.
First, we check if the delay instruction matches:
Delay Instr | Cond |
---|---|
COP0 | op == 0x10 |
COP2: MFC2 or CFC2 | op == 0x12 /*COP2*/ && func == 0 && (rs == 0 /*MFC2*/ || rs == 2 /*CFC2*/) |
LB, LH, LWL, LW, LBU, LHU, LWR | op >= 0x20 && op <= 0x26 |
LWC2 | op == 0x32 |
If so, we handle them specially if reg
(the rt
register in the delay instruction) is used in the target instruction:
Target Instr | Cond | Effect |
---|---|---|
SLL (excluding NOP), SRL, SRA | rd == reg && rt == reg | Delay R/W |
rt == reg | Delay R | |
rd == reg | Delay W | |
JR | rs == reg | Delay R |
JALR | rd == reg && rs == reg | Delay R/W |
rs == reg | Delay R | |
rd == reg | Delay W | |
SLLV, SRLV, SRAV ADD, ADDU, SUB, SUBU AND, OR, XOR, NOR SLT, SLTU |
rd == reg && (rt == reg || rs == reg) | Delay R/W |
rt == reg || rs == reg | Delay R | |
rd == reg | Delay W | |
MFHI, MFLO | rd == reg | Delay W |
MTHI, MTLO | rs == reg | Delay R |
MULT, MULTU, DIV, DIVU | rt == reg || rs == reg | Delay R |
BLTZ, BGEZ, BLTZAL, BGEZAL | rs == reg | Delay R |
JAL | reg == 31 | Delay W |
BEQ, BNE | rs == reg || rt == reg | Delay R |
BLEZ, BGTZ | rs == reg | Delay R |
ADDI, ADDIU, SLTI, SLTIU ANDI, ORI, XORI |
rt == reg && rs == reg | Delay R/W |
rs == reg | Delay R | |
rt == reg | Delay W | |
LUI | rt == reg | Delay W |
COP0: MFC0 or CFC0 | rt == reg | Delay W |
COP0: MTC0 or CTC0 | rt == reg | Delay R |
COP2: MFC2, CFC2, MTC2 or CTC2 | rt == reg | Delay R |
LWL, LWR | rt == reg | Delay R/W |
rs == reg | Delay R | |
LB, LH, LW, LBU, LHU | rt == reg && rs == reg | Delay R/W |
rs == reg | Delay R | |
rt == reg | Delay W | |
SB, SH, SWL, SW, SWR | rt == reg || rs == reg | Delay R |
LWC2, SWC2 | rs == reg | Delay R |
Delay R |
The branch delay instruction is executed. The new rd is saved and
restored to its original value.
The first target instruction is executed. The |
Delay W | Normal behaviour |
Delay R/W | The branch delay instruction is skipped |
The MIPS architecture has the branch delay slot feature. It was thought to
be a good idea in the 80s to reduce pipeline stalls. Unfortunately, it was not
as useful as it seemed and it contained nop
most of the time. (It
is quite bad as it is a 4-byte penalty.)
Also, it did not stand up to the test of time and now it is just a pain in the ass, but we are stuck with it for backward compatibility.
How it works:
instr1 beq label1 instr2 instr3 label1: instr4
Branch taken:
instr1 instr2 instr4
Branch not taken:
instr1 instr2 instr3 instr4
Notice that instr2
is always executed. I usually treat the code
like this (for a CPU without branch delay slot):
instr1 instr2 beq label1 instr3 label1: instr4
This is not true if there are dependencies between instr2
and
instr4
, but compilers usually avoid this case.
(Note: the info here is based on an open source MIPS R3000A emulator.)
We need to do dynamic recompilation to get some decent speed. We can simply translate
addi rs rt imm
to
mov eax, vm.regs[rs] add eax, imm mov vm.regs[rt], eax
(MASM syntax; I'm very old school.)
This assumes vm.regs[]
evaulates to an absolute memory
location.
We don't even need the dispatcher If we put this in an execution buffer.
We only need 3 instructions to emulate addi
.
Now we are talking.
(How about vm.pc
and vm.cycles
? We can update
them just once per basic block.)
We can emit 10 different code sequences if we want to:
1 | rt == 0 | (nop) |
2 | rs.state == CONST |
rt.state = CONST; rt.val = rs.val + imm |
3 | rt == rs, imm = 1 | inc vm.regs[rt] |
4 | rt == rs, imm = -1 | dec vm.regs[rt] |
5 | rt == rs, imm = 0 | (nop) |
6 | rt == rs, else | add vm.regs[rt], imm |
7 | imm = 1 |
mov eax, vm.regs[rs] inc eax mov vm.regs[rt], eax |
8 | imm = -1 |
mov eax, vm.regs[rs] dec eax mov vm.regs[rt], eax |
9 | imm = 0 |
mov eax, vm.regs[rs] mov vm.regs[rt], eax |
10 | else |
mov eax, vm.regs[rs] add eax, imm mov vm.regs[rt], eax |
Many of the cases are not really necessary anymore: we can remove the special handling for imm = +/-1. (They save space, but are not any faster and may in fact be slower.)
Note that the operation is folded in some cases. If we want to get
fanciful, we can even track register aliasing (case 9), or even real register
allocation (eax
has the value of vm.regs[rt]
in some
cases).
In general, I prefer a dumb recompiler and a separate flow analyzer that is able to work across emulated instructions.
The dispatcher:
while(vm.run_flag) { vm.code = mem(vm.pc); vm.pc += 4; ++vm.cycle; (*op_tbl[vm.code >> 26]) (); }
The opcode table:
void (*op_tbl[64]) () = { ... instr_addi, /* entry 8 */ ... };
And the instruction:
// Rt = Rs + Im void instr_addi() { int rt = (vm.code >> 16) & 0x1f; if(rt == 0) return; int rs = (vm.code >> 21) & 0x1f; int imm = (S16) vm.code; vm.regs[rt] = vm.regs[rs] + imm; }
This code is extracted from an open source MIPS R3000A emulator.
I estimated 25 real instructions are needed to emulate addi
,
not to mention a ton of memory accesses. The dispatcher itself already needs
around 12 instructions. If the original CPU runs at 33.9 MHz, it takes a 850
MHz CPU to emulate it at full speed. And addi
is one of the
simpler instructions. Other instructions require even more decoding.
Also, this is just the CPU. We need to emulate other parts of the system too. Obviously, we need to do much better than this.
Want to create your own virtual machine?
#define INSTR_HLT 0 #define INSTR_MOV 1 int mem[1024]; int *pc = mem; int run_flag = 1; void exec() { int src, dest; while(run_flag) { switch(*pc++) { case INSTR_MOV: dest = *pc++; src = *pc++; mem[dest] = mem[src]; break; case INSTR_HLT: run_flag = 0; break; } } }
This is of course terribly inefficient, but it'll work. This is just to show what a virtual machine consists of:
Do we need to use a switch
statement? No, the dispatcher can
look like this:
while(run_flag) (*instr_tbl[*pc++]) ();
Such VMs are never efficient. Main reasons:
We'll take a close look at performance in the future.
Now, why do we want to write our own VM?
Simple: to get true write-once-run-everywhere functionality!
Plus, it is a way to add very versatile scripting to our application.
A comparison of the rewards from the big three oil companies.
Assumptions:
Caltex | 1 Thanks! point per litre.
1400 points for $40 petrol voucher. |
589 points ($1000 / (1.93 * 0.88))
Equivalent to $16.80 petrol voucher. |
ExxonMobil | 1 Smiles point for every $1.25 spent.
750 points for $30 petrol voucher. |
800 points ($1000 / 1.25)
Equivalent to $32 petrol voucher. |
Shell | 1 Escape point per litre.
600 points for $20 petrol voucher. |
589 points ($1000 / (1.93 * 0.88))
Equivalent to $19.60 petrol voucher. |
As you can see, ExxonMobil has the best reward based on the current assumptions. The higher the price, the better the reward. However, if the price is very low, say $1/litre, then Shell has the best reward, and Caltex and ExxonMobil are about the same.
Traditionally, Microsoft bans modded XBox 360 every November. Last year, they started on 30 October and lasted a few weeks. The ban was slightly different from previous years.
First, it was on a much larger scale. A news article said 600,000 consoles were banned in North America and another 400,000 worldwide. This is 5% of their Live membership.
Second, the consoles were not just banned from XBox Live, but also access to their own HD! (You still can play games using the DVD drive.) The functionality was much reduced.
Tip: never buy a second hand XBox 360.
In previous years, Microsoft targetted several aspects:
The latest hacked firmware resolved all that and was thought to be undetectable.
Until now. So many consoles were banned that people began to suspect the firmware was detectable after all. It is still unknown how Microsoft did it, but these facts are clear:
(There are very few samples for the last case, so its accuracy is in doubt.)
The detection actually started several months back. It was done even in offline use. Once the console was flagged, it would be banned. Nothing could save it (using the original flash, playing only original games).
The best recourse is not to connect to XBox Live so that the console would not receive the ban signal that would render the HD useless. Keep the console for offline use only. But how to tell if the console is banned or not? Apparently you can call and ask for warranty support. They'll tell you if your console is tampered with or not — tampered consoles have their warranty voided.
This will work until the next "wave" of games that require the latest update to run. The update contains the blacklist, so the marked consoles will fail to work then.
So, can Microsoft detect the firmware?
My guess is yes, Microsoft can detect the firmware reliably. The simplest I can think of is the timing changes. A big change would indicate a modded firmware. Is it sufficient to get you banned? Yes, because a modded firmware is considered tampering.
Originally, I thought Microsoft wanted to be really sure, so the second flag (at least one copied game) must be tripped. But no, just the firmware itself was sufficient: some people were banned just by swapping drives — of different makes with original firmware. Unfortunately for them, this is easily detectable.
How would Microsoft detect copied games?
There were bad game rips that gave the game away, but some people had religiously followed the "stealth" instructions and still got banned. Plus, some people claimed to be banned playing just one game. It is clear: Microsoft is able to detect a single copied game.
My guess: perhaps there is a unique serial no. embedded in the out-of-band data. (DVDs are known to have that.) If more than one is in use at the same time, it is obvious the game is copied. (A non-unique serial no. will work too, just not as well.)
Microsoft wins this round.