My Rambling Thoughts

Tales of Japan first impressions


Folk Tales of Japan

Given that the books were sold by Amazon.jp, I thought they would be shipped by express delivery. :lol: But then, it was 'free shipping'.

The books are smaller than I thought. I thought they were 4 x 6" — close, 4.4 x 7".

Each story is abbreviated from the original, then he gives his commentary.

I'll say it is acceptable, but that's all.

Should I have bought all four books? No. This is why I want to buy one book to try first. :lol:

S$32 for one book vs S$88 for all four. What's your choice?

What-if redesign of 8087

First, register-based instead of stack-based.

There are 8 FP registers. Each is 48-bit (4-bit type, 12-bit exponent and 32-bit mantissa in two's-complement representation and no implicit bit). The type indicates if it is a normal number, denormal, infinity, NaN, integer or pointer. This is obviously not IEEE 754 format.

Just like the redesigned 8086, it allows memory operands as source and MOV is used to write to memory.

It supports 32-bit single-precision float directly. There is no need to move it to a register first.

; fp src
fadd fp0, fp1       ; 48f + 48f -> 48f
fmov a, fp0         ; 48f -> 32f, default rounding

; mem src
fadd fp0, b         ; 48f + 32f -> 48f
fmov.rz a, fp0      ; 48f -> 32f, truncate

Operations can be rounded to single-precision directly. This allows repeatable calculations in case of register spillage.

; a = b + c + d
fmov fp0, b         ; 32f -> 48f
fadd fp0, c         ; 48f + 32f -> 48f
fadd fp0, d         ; 48f + 32f -> 48f
fmov a, fp0         ; 48f -> 32f, default rounding

; a = b + c + d (single precision)
fmov fp0, b         ; 32f -> 48f
fadd.r fp0, c       ; 48f + 32f -> 48f (default rounding to 32f)
fadd.r fp0, d       ; 48f + 32f -> 48f (default rounding to 32f)
fmov a, fp0         ; 48f -> 32f, default rounding

Conversion operations can specify rounding mode. This is useful for float-to-int conversion (truncate towards zero) when the default is round-to-nearest.

fcnvint dx:ax, fp0     ; convert to 32-bit int (implicit .rz)
fcnvint.r dx:ax, fp0   ; convert to 32-bit int w/ default rounding

Rounding modes are: nearest even (.re), zero (.rz), up (.ru), down (.rd). .r uses the default (normally set to .re).

Only basic arithmetic operations are provided, as well as FMULADD — this must be in, no matter what! — FRECIPORAL and FSQRT.

Special values

Denormals use the full mantissa. The difference from normal numbers is that the top bit is 0. Exponent is set to -2^11.

Infinity uses the top bit of the mantissa as the sign. The rest of the mantissa is not used, but should be set to 0. Exponent is set to 2^11 - 1.

For NaN, exponent and mantissa are not used, but they should be set to 0.

Integer and Pointer types are for "NaN-boxing". Integer allows 44-bit integer to be stored in exponent and mantissa. Pointer allows 44-bit pointer. It is separate so that second-level decoding is not needed.

Double precision

The problem with floating-point math is that we cannot use single precision floating-point instructions as building blocks to obtain higher precision floating-point numbers — well, we can, but they are not in double precision format.

We either need to support it natively, or have floating-point emulation friendly instructions such as CMOV (conditional move), CNTLZ (count leading zero) and SHLD/SHRD (shift left/right double).

An eye on the future

Vectorization, different formats (64-bit double precision, various 16-bit half-prec, various 8-bit, Posits).

Revelation on interstellar travel

After listening to many AI Richard Feynman videos why aliens cannnot visit us due to the immerse distance involved — where even light speed is slow — it suddenly dawned to me that we are boxing ourselves to 'reality', that is why it is impossible.

Long story short, we have not found aliens because we are doing it wrong. In our reality, it seems obvious to use radio waves and use HI line as indicator of intelligent life.

But no advanced civilizations do it this way. It is simply too inefficient — primitive, even.

What is the way to communicate — and travel? It is through higher dimensions. If a civilization does not know how to do this, they are not qualified. Such as us.

Once we breakthrough, we may find the Universe to be quite boisterous!

In an unrelated news, the third part of FF7 Remake was announced: FF7 Revelation. Many people were sure it was going to be called Reunion.

Book time!


Folk Tales of Japan

I happened upon some skits on YouTube by Kyota Ko and they were entertaining — and educational. He did the skits to encourage people to buy his books. I intended to pick up one book — maybe two — first, but ended up buying all four.


Murdoku: 80 Murder Mystery Logic Puzzles (Vol 1)

While on Amazon.sg, this book was recommended to me. It is a logic puzzle book. It merges Sudoku-style number placement with logic grid mysteries. It piped my curiosity. Let's buy one and see first. I bought this from Blackwell because it was cheaper there.

(This is the UK edition. The US edition has purple cover and is OOS.)


Murdle Murder Mystery Logic Puzzles (Book 1)

Also recommended at the same time, this is a collection of grid-based murder-mystery logic puzzles. Again, let's try one book first.


A Thousand Miles of Wind, the Sky at Dawn: Part 1 (Book 5)

Twelve Kingdoms is being re-translated by Seven Seas Entertainment. I knew about this, but I didn't plan to buy — I already have the Tokyopop edition.

A Thousand Miles of Wind, the Sky at Dawn: Part 1 was just released — Jun 2026! This is the most 'happening' arc. I decided to buy it and try. If it is done well, I'll buy Part 2 as well (Sep 2026). I bought this from Blackwell.

I may be interested to get Shadow of the Moon, Shadow of the Sea, but only Part 2. It covers the second half of the protagonist's journey in the unfamiliar fantasy world.

Brief notes on 8087

The 8087 was designed by a numerical analysis expert and served as working proof for the IEEE 754 floating-point spec. It was revolutionary. It was released in 1980, the spec was ratified in 1985.

Before the 8087, floating-point math had proprietary formats (limiting inter-operability), lacked accurary and consistency (rounding and precision) and was mostly emulated (so ultra-slow).

The 8087 supports IEEE 754 single precision (32-bit) and double precision (64-bit) formats. Internally, it uses a stack-based 8-deep set of 80-bit FP registers. Each FP register has 1 sign bit, 15-bit exponent and 64-bit mantissa (no implicit bit).

Most 8087 instructions operate on ToS (Top-of-Stack). Programmers were used to operands and were unfamiliar with stack-based operations. It was a struggle to write efficient code.

The biggest issue with 8087 is its buggy stack architecture. Due to misalignment between the design and hardware teams, the hardware does not automatically spill an overflowed stack to memory-based virtual stack. It is handled as an exception which is complex and slow. Software work around it by not overflowing the stack in the first place.

Because of this, it gives unpredictable inconsistent result depending whether the calcuations are done entirely in 80-bit registers or spilled into 64-bit FP in memory (with less precision) midway — depending on compiler and optimization level.

At one point, it was thought the reliance on ST(0) made it impossible to pipeline FP operations — because they all used ST(0). But Pentium proved it was possible to do register renaming with FXCH and achieved pipelined FP operations. It was a breakthrough. From that point, x86 FP became competitive in speed with RISC CPUs.

The second issue is that explicit synchronization is needed using FWAIT. This is needed 99% of the time, so FP assembly instructions insert FWAIT automatically before the actual instruction. This is not needed from 80287 onwards as the CPU waits for the FPU automatically.

8087 runs in parallel to 8086. It is slow compared to integer operations, e.g. FADD takes 70 – 100 cycles, so it is possible to run many x86 instructions before executing the next FP operation. Question is, how many programs made use of this?

The third is emulation. Unlike 80286, 8086 does not raise a Coprocessor Absent exception that would have allowed transparent software emulation. This means the executable does not contain FP instructions directly, but must use emulator-transformed code that call emulated functions if 8087 is absent, or modified to 8087 code if present — this is an excellent use of self-modifying code.

The technique is pretty clever. The compiler emits actual 8087 code and marks them as requiring fixup (relocation), the linker transforms them into emulated calls using fixup (which is an addition) if emulator support is needed.

(This technique does require FWAIT before each FP instruction to patch properly.)

Side note. 8087 also supports 64-bit integer and 18-digit BCD operations. These are obsolete today.