Magic Voice Speech Module

MAGIC VOICE

Bit Format, der Aufbau der Rahmen

Allgemeines General	Einführung Introduction	Anschluss Connection	Anwenden des Moduls Using the Module	Vokabular Vocabulary	Tips & Tricks Tips & Tricks

Ergänzendes Additional stuff	Demos Demos	Spiele Games	Zeitschriften Magazines	Sprach Downloads Speech Downloads	Lexikon Lexicon

Internes Internal	Hardware Hardware	Firmware Firmware	Sprachdaten Speechdata	Bit-Format Bit allocation	De-/Codierung De-/Coding

Nützliches und mehr Utilities and more	Speech Playtool	Magic Game Cart	Magic Talkie	Magic Voice NT	Ersatzteile & Reparatur Spares & Repair

Der Sprachchip T6721 im Magic Voice basiert auf einem Sprachsyntheseverfahren namens Linear Predictive Coding (LPC). Hierbei werden 10 Parameter (K1-K10 = Filter-Koeffizienten) für stimmhafte (voiced) Laute zum Beispiel "e"), und vier Paramter für stimmlose (unvoiced) Laute (zum Beispiel "s") verwendet. Neben der Anzahl der Parameter wird die Qualität der Sprache auch von der Auflösung, also der Anzahl der verwendeten Bits für die einzelnen Parameter bestimmt. Die Summe der Bits für Amplitude (Energy), Frequenz (Pitch) und der Parameter (K) ergibt dann die Rahmenlänge (Framelength).

Neben der Unterscheidung zwischen stimmhaften (voiced) und stimmlosen (unvoiced) Rahmen können auch noch Wiederholungsrahmen (repeat) verwendet werden. Hierbei wird nur der Wert für Energy und Pitch neu übertragen, die Koeffizienten gelten dann als unverändert. Einen Sonderfall stellen Rahmen dar, die einfach nur Stille (silence) erzeugen oder das Ende der Sprachausgabe (end of speech) markieren.

In den folgenden Tabellen ist der Aufbau dieser Rahmen ersichtlich.

The voice chip T6721 within the Magic Voice is based on a Linear Predictive Coding (LPC) speech synthesis method. Here, 10 parameters (K1-K10 = filter coefficients) are used for voiced sounds (for example, "e"), and four parameters for unvoiced sounds (for example, "s"). In addition to the number of parameters, the quality of the speech is also determined by the resolution, i.e., the number of bits used for the individual parameters. The sum of the bits for amplitude (energy), frequency (pitch), and the parameters (K) then results in the frame length.

In addition to the distinction between voiced and unvoiced frames, repeat frames can also be used. In this case, only the values for energy and pitch are retransmitted, the coefficients are then considered unchanged. Frames that simply generate silence or mark the end of speech output are a special case.

The following tables show the structure of these frames.

Rahmen Bit Format - Frame Bit allocation

Zunächst als Beispiel der Aufbau beim Sprachchip TMS 5100 von Texas Instruments.

First, let's look at the structure of the voice chip TMS 5100 from Texas Instruments as an example.

49 Bits/Frame

LPC 10 - Speak & Spell (Texas Instruments, TMS 5100, 5200 and 5220)

MSB

LSB

Energy

Repeat Bit

Pitch

K10

FRAME

4 Bit

1 Bit

5 Bit

4 Bit

3 Bit

Voiced

xxxx

x xxxx

xxxx

xxx

Unvoiced

xxxx

0 0000

x xxxx

xxxx

Repeat

xxxx

x xxxx

Silent

0000

End of Speech

1111

Energy: Energy = 0: Silence , Energy = 15: End Of Speech (Stop Code)
Pitch: Pitch = 0: Unvoiced

Hier der Aufbau der Rahmen beim T6721A von Toshiba, ohne Garantie, dass das alles richtig ist.

Here is the frame structure of the Toshiba T6721A, without any guarantee that everything is correct.

48 Bits/Frame (nonlinear transformation)

LPC 10 - Magic Voice Speech Module (Toshiba, T6721A)

MSB

LSB

Energy

Pitch

K10

FRAME

4 Bit

5 Bit

4 Bit

3 Bit

Voiced

xxxx

x xxxx

xxxx

xxx

Unvoiced

xxxx

0000

x xxxx

xxxx

Repeat

Silent

0001

End of Speech

0000

96 Bits/Frame (linear transformation)

LPC 10 - Magic Voice Speech Module (Toshiba, T6721A)

MSB

LSB

Energy

Pitch

K10

FRAME

7 Bit

10 Bit

8 Bit

7 Bit

Voiced

xxx xxxx

xx xxxx xxxx

xxxx xxxx

xxx xxxx

Unvoiced

xxx xxxx

000 0000

xx xxxx xxxx

xxxx xxxx

Repeat

xxx xxxx

Silent

000 0001

End of Speech

000 0000

Energy: Energy = 0 = End of Speech
Pitch: Pitch = 0 = Unvoiced (Bit 6=1 = Repeat?)

Sprachdaten vom C64-Speicher zum Sprachchip - Speech data from C64 memory to the voice chip

Im folgenden wird erklärt, in welcher Form die Sprachdaten vom Speicher zum Synthesizer T6721 übertragen werden. Diese Kenntnis ist wichtig, wenn man den umgekehrten Weg gehen will, und selbst erzeugte Sprachdaten im Speicher des C64 ablegen möchte.

Beispiel: Sprachdaten des Wortes ZERO

Im Speicher des C64 ist das Wort ZERO so abgelegt (96 Bit/Frame):

The following explains how the speech data is transferred from the memory to the T6721 synthesizer. This knowledge is important if you want to go the other way and store self-generated language data in the C64's memory.

Example: Speech data for the word ZERO

In the C64's memory, the word ZERO is stored as follows (96 Bit/Frame):

(4A) D1 FE D7 C1 66 10 7D B2 13 31 72 B3 A2 CC C4 1E 99 55 AB D6 3D 27 7C D7 D4 E1 C0 1A 16 75 78 85 25 59 A6 D9 2E B5 4A 76 A9 59 B5 7E AE 75 CB 6C 59 D9 96 E5

Diese Daten (52 Bytes = 416 Bit) werden dann dekomprimiert und Nibbleweise (4-Bit) zum Sprachsynthesizer, bzw. zunächst zum 4-Bit FIFO (CD40105), in folgender Form übertragen.

This data (52 bytes = 416 bit) is then decompressed and sent nibble by nibble (4 bit) to the speech synthesizer, or initially to the 4-bit FIFO (CD40105), in the following form.

SAY "ZERO" (29 Frames x 96 Bits/Frame = 2784 Bit, filled with "0" = 4176 Bit)

0x000,
0x003, 0x02B, 0x227, 0x3DE, 0x02B, 0x035, 0x032, 0x027, 0x07D, 0x005, 0x006, 0x006,
0x008, 0x02C, 0x2B4, 0x326, 0x35A, 0x0F8, 0x004, 0x02D, 0x016, 0x00B, 0x00D, 0x004,
0x008, 0x02C, 0x2B4, 0x326, 0x35A, 0x0F8, 0x004, 0x02D, 0x016, 0x00B, 0x00D, 0x004,
0x008, 0x02C, 0x3B7, 0x321, 0x37D, 0x0DC, 0x0F4, 0x003, 0x007, 0x008, 0x013, 0x009,
0x00A, 0x02C, 0x3B7, 0x321, 0x37D, 0x0DC, 0x0F4, 0x003, 0x007, 0x008, 0x013, 0x009,
0x008, 0x02B, 0x3B7, 0x321, 0x37D, 0x0DC, 0x0F4, 0x003, 0x007, 0x008, 0x013, 0x009,
0x00C, 0x02A, 0x353, 0x396, 0x329, 0x0F2, 0x001, 0x02B, 0x002, 0x001, 0x003, 0x003,
0x023, 0x029, 0x353, 0x396, 0x329, 0x0F2, 0x001, 0x02B, 0x002, 0x001, 0x003, 0x003,
0x053, 0x028, 0x34D, 0x004, 0x316, 0x013, 0x01E, 0x04B, 0x07C, 0x002, 0x07A, 0x07D,
0x07E, 0x027, 0x3C9, 0x3BB, 0x2AE, 0x007, 0x030, 0x03E, 0x010, 0x072, 0x002, 0x07B,
0x07E, 0x026, 0x3C9, 0x3BB, 0x2AE, 0x007, 0x030, 0x03E, 0x010, 0x072, 0x002, 0x07B,
0x043, 0x025, 0x27E, 0x3B3, 0x3BF, 0x0F6, 0x015, 0x02A, 0x01A, 0x079, 0x001, 0x000,
0x012, 0x024, 0x254, 0x0EA, 0x013, 0x017, 0x000, 0x0E1, 0x015, 0x003, 0x004, 0x003,
0x00A, 0x025, 0x230, 0x1A2, 0x308, 0x0FD, 0x022, 0x01F, 0x016, 0x001, 0x000, 0x07C,
0x012, 0x025, 0x230, 0x1A2, 0x308, 0x0FD, 0x022, 0x01F, 0x016, 0x001, 0x000, 0x07C,
0x036, 0x026, 0x2D3, 0x123, 0x28F, 0x044, 0x019, 0x0FE, 0x018, 0x071, 0x000, 0x002,
0x066, 0x027, 0x317, 0x0F9, 0x306, 0x003, 0x03A, 0x0FB, 0x01C, 0x074, 0x079, 0x006,
0x07E, 0x028, 0x297, 0x13C, 0x3FB, 0x008, 0x00E, 0x0F7, 0x029, 0x079, 0x07F, 0x00B,
0x066, 0x028, 0x2EB, 0x0A6, 0x3D5, 0x016, 0x010, 0x01C, 0x012, 0x071, 0x004, 0x004,
0x053, 0x028, 0x279, 0x14F, 0x05F, 0x00E, 0x0EF, 0x02B, 0x01A, 0x073, 0x008, 0x000,
0x053, 0x027, 0x279, 0x14F, 0x05F, 0x00E, 0x0EF, 0x02B, 0x01A, 0x073, 0x008, 0x000,
0x036, 0x026, 0x279, 0x14F, 0x05F, 0x00E, 0x0EF, 0x02B, 0x01A, 0x073, 0x008, 0x000,
0x023, 0x025, 0x294, 0x09C, 0x131, 0x004, 0x0DF, 0x013, 0x026, 0x07C, 0x000, 0x004,
0x017, 0x024, 0x294, 0x09C, 0x131, 0x004, 0x0DF, 0x013, 0x026, 0x07C, 0x000, 0x004,
0x00C, 0x022, 0x229, 0x10E, 0x092, 0x0F1, 0x0F9, 0x023, 0x01F, 0x073, 0x000, 0x000,
0x006, 0x020, 0x24B, 0x1C5, 0x025, 0x0CC, 0x002, 0x031, 0x01F, 0x06C, 0x000, 0x005,
0x002, 0x01F, 0x21B, 0x194, 0x018, 0x0DF, 0x000, 0x030, 0x016, 0x079, 0x005, 0x07C,
0x002, 0x01D, 0x229, 0x10E, 0x092, 0x0F1, 0x0F9, 0x023, 0x01F, 0x073, 0x000, 0x000,
0x001, 0x01D, 0x21B, 0x194, 0x018, 0x0DF, 0x000, 0x030, 0x016, 0x079, 0x005, 0x07C,
0x000

Zuerst sendet die Firmware 0x000, das stoppt den Synthesizer und resetet ihn, indirekt über den Ausgang APD (Pin 8) auch die Hardware (CD40105 und LA05-124). Genauso findet man auch am Ende wieder 0x000.

Nun eine genauere Betrachtung am Beispiel der ersten Zeile. Wir stellen die Nibble entsprechend des Rahmenformats von 96 Bit/Frame als einzelne Bits (Parameter) dar.

First, the firmware sends 0x000, this stops the synthesizer and resets it, indirectly via the APD output (pin 8) also the hardware (CD40105 and LA05-124). In the same way, at the end you will find again 0x000.

Now let's take a closer look at the first line as an example. We represent the nibble as individual bits (parameters) in accordance with the frame format of 96 bits/frame.

0x003, 0x02B, 0x227, 0x3DE, 0x02B, 0x035, 0x032, 0x027, 0x07D, 0x005, 0x006, 0x006

Eng: 7 Bit, Value = 0x003 =       000 0011
Pit: 7 Bit, Value = 0x02B =       010 1011
K01:10 Bit, Value = 0x227 = 10 0010 0111
K02:10 Bit, Value = 0x3DE = 11 1101 1110
K03:10 Bit, Value = 0x02B = 00 0010 1011
K04: 8 Bit, Value = 0x035 =      0011 0101
K05: 8 Bit, Value = 0x032 =      0011 0010
K06: 8 Bit, Value = 0x027 =      0010 0111
K07: 7 Bit, Value = 0x07D =       111 1101
K08: 7 Bit, Value = 0x005 =       000 0101
K09: 7 Bit, Value = 0x006 =       000 0110
K10: 7 Bit, Value = 0x006 =       000 0110

Der Sprachsynthesizer hat ein internes Register von 10-Bit, deshalb werden für jeden Parameter immer 3 Nibble übertragen. Parameter die kleiner als 10 Bit sind, werden also mit führenden Nullen aufgefüllt. Für die Speicherung im C64 wird allerdings nur die nötige Anzahl an Bits gespeichert, also dort liegen die Daten entsprechend komprimiert vor.
Die 12 Parameter (Energy, Pitch, K1 - K10) nochmal als Bit-Reihenfolge hintereinander dargestellt:

xyz

000 0011 / 010 1011 / 10 0010 0111 / 11 1101 1110 / 00 0010 1011 / 0011 0101 / 0011 0010 / 0010 0111 / 111 1101 / 000 0101 / 000 0110 / 000 0110

Wie werden nun aus 416 Bit Sprachdaten 2784 Bit Daten für den Synthesizer? Im Programmcode des Magic Voice Moduls findet man häufiger den Aufruf des Unterprogramms FETCH_UNPACKED_PARAMETER gefolgt von Dekodier-Tabellen. ...

How does 416 bits of voice data become 2784 bits of data for the synthesizer? In the Magic Voice module's program code, you'll often find a call to the subroutine FETCH_UNPACKED_PARAMETER followed by decoding tables. ...

Nach oben zum Menü - Go top to menu