Document converted to plain ASCII for inclusion in Wotsit's Format ----------------------------------------------------- DELUSION DIGITAL MUSIC FILEFORMAT V0.16B1 (25/12/93) ----------------------------------------------------- Sorry dudes, some stuff is still in German, but who carez ;-) Will be converted to English soon.... Ask us for probz EXTENSION IS ".DMF" Header [DDMF]: -------------- ID "DDMF" 4 BYTES VERSION current is 04 1 BYTE TRACKER NAME ex. "XTRACKER" ;-) 8 BYTES SONG NAME ex. "my first DMF" 30 BYTES COMPOSER NAME ex. "COSMIC" 20 BYTES DATE ex. "27 12 93" 3 BYTES Day,Month,Year InfoHeader [INFO]: ------------------ ID "INFO" 4 BYTES INFO_SIZE Jump Bytes to next Block 1 LONGINT Composer Message [CMSG]: ------------------------ ID "CMSG" 4 BYTES MSG_SIZE Jump Bytes to next Block 1 LONGINT Filler 1 BYTE Message array of char N BYTES Sequencer [SEQU]: ----------------- ID "SEQU" SEQU_SIZE Jump Bytes to next Block 1 LONGINT SEQU LOOP START 1 WORD SEQU LOOP END 1 WORD SEQUENCER (SEQU_SIZE/2 WORDS)-1 !!! Pattern [PATT]: --------------- ID "PATT" 4 BYTES PATT_SIZE Jump Bytes to next Block 1 LONGINT MAX PATTERN 1-1024 1 WORD MAX TRACKS Tracks required to play 1 BYTE this piece of Music =<16 For 1 to MAX PATTERN: TRACK ENTRYS Tracks (max.32) 1 BYTE BEAT HI|LOW 1 BYTE | | Ticks per Beat --+ +-- Beats per Measure MAX TICK Ticks the Pattern is 1 WORD long. (max. 512) JMP_SIZE Bytes to Jump to next 1 LONGINT Pattern Track Datastream: (( ) * MAX Tracks) (( ) * MAX Tracks) ... GLOBAL TRACK ------------- EFFECT the global effect 1 BYTE DATA the data for gl. eff. 1 BYTE only stored if EFFECT>0 Getrennte allgemeine Spur fr: Speed, Delays, Beat/Tick Change, Flags, General Volume INFO BYTE ---------- XXXXXXXX = Info Byte |||||||| |||||||x not used ||||||1 Effekt VOLUME / 0 not stored |||||1 Effekt NOTE / 0 not stored ||||1 Effekt INSTRUMENT / 0 not stored |||1 Volume stored / 0 not stored ||1 Note stored / 0 not stored |1 Instrument stored / 0 not stored 1 Counter to next Info Byte / 0 not stored, next Info Byte in 1 Tick Wenn ein Bit im Info gesetzt ist ein Daten Byte fr den Eintrag gespeichert bei Effekten sind 2 Daten Byte (Effekt Nr. und Effekt Daten) gespeichert. Counter ist ein Z„hler in Ticks bis zum n„chsten Info Byte wenn Counter Bit in Info = 0 dann ist fr den n„chsten Tick wieder ein Info Byte vorhanden. Effekt Gruppen: INSTRUMENT: Jump Position, Loop Controll, Reverse, Scratch, Filter NOTE: Portamento, Tremolo, Vibrato, Arpeggio VOLUME: Set, Slide, Tremolo, Vibrato, Arpeggio, Stereo Es k”nnten also maximal 3 Effekte gleichzeitig ausgel”st werden jeder in einem anderen Bereich. Maximale Gr”sse eines Track Eintrags sind 11 Byte (Info=0FEh). Def.: Note 0 = Keine Žnderung 1-108 = Note in Halbtonschritten, C0=1 bis H8=108 Dies entspricht einer Midi Note mit 1 addiert 109-128 = nicht defined 129-236 = Set Note Buffer Die Note wird in den Note Buffer gespeichert und nicht gespielt, eine spielende Note wird dabei nicht ver„ndert. Das MSBit wird gel”scht die Noten entsprechen dann den Noten 1-108. Der Note Buffer wird als 2 Parameter fr Note Effekte verwendet z.B. fr Tone Portamentos wo die Effekt Daten schon fr den Steigungsgrad vergeben sind. Note Buffer k”nnte man auch dazu verwenden um Noten aus zuklammern, also um auszuprobieren wie sich das Stck ohne diese Note anh”rt ;-) 237-254 = nicht definiert 255 = Note Off Def.: Volume 0 = keine Žnderung 1 - 255 = Volume (255=Max Volume, linear Scale) For 1 to MAX TICKS: Global Effect Nr. 1 BYTE (Effect Data) 1 " For 1 to MAX TRACKS PatternEntry: ------------- Info Byte 1 BYTE (Counter Byte) 1 " (Instrument Byte) 1 " (Note Byte) 1 " (Volume Byte) 1 " (INSTRUMENT Effekt Word) 2 " (NOTE Effekt Word) 2 " (VOLUME Effekt Word) 2 " END MAX TRACKS END MAX TICKS END MAX PATTERN Instrument [INST]: ------------------- Ist dieser Block nicht vorhanden zeigen die Instrument Nr. im Pattern direkt auf die Samples im [SMPI] Block. ID "INST" 4 BYTES INSTR_SIZE Jump Bytes to next Block 1 LONGINT MAX INSTR max. 255 1 BYTE NAME the Instrumentname 30 BYTES INSTR TYPE 1 BYTE xxxxxxXX = Instrument Type xxxxxx00 = Sample aus [SMPI] Block xxxxxx01 = Midi Device, Midi Keyboard xxxxxx10 = FM Instrument ;-))))))) xxxxxx11 = Not Defined xxxxxXxx = 1 = valid Attack Envelop, 0 = not valid xxxxXxxx = 1 = Sustain ON, 1 = Sustain OFF XXXXxxxx = not used RANGE ENTRYS Anzahl der Range Definition Entrys 1 BYTE For 1 to RANGE ENTRYS RANGE DEFINITION SMPI NR Nr. des Samples im [SMPI] Block das fr diesen Bereich gespielt wird 1 BYTE RANGE_Length Halbtonschritte der dieser Eintrag gilt 1 BYTE END RANGE DEFINTION ENVELOP 6 Point Envelop noch nicht geanu definiert ;-) SampleInfo [SMPI]: ------------------ ID "SMPI" 4 BYTES SMPI_SIZE Jump Bytes to next Block 1 LONGINT MAX SAMPLES max. 250 1 BYTE For 1 to MAX SAMPLES: NAME_LENGTH length of NameBlock 1 BYTE NAME the samplename NAME_LENGTH BYTES LENGTH length of Sample 1 LONGINT LOOP_START start of the loop 1 LONGINT LOOP_END end of the loop 1 LONGINT FREQUENCY frequency for C-3 1 WORD VOLUME Instrument Volume 1 BYTE 0 = don't change current Volume 1 - 255 = Volume (255=Max Volume, linear Scale) TYPE sample type 1 BYTE xxxxxxx0 = not looped xxxxxxx1 = looped xxxxxx0x = 8BIT xxxxxx1x = 16BIT (not yet supported) xxxxXXxx = Packe Type xxxx00xx = Unpacked signed xxxx01xx = Pack Type 0 xxxx10xx = Pack Type 1 xxxx11xx = Pack Type 2 xXXXxxxx = not defined. 0xxxxxxx = --> stored in dmf 1xxxxxxx = --> stored in bib. FILLER not defined ;-) 1 WORD should be zero CRC32_ID checksum do identify 1 DWORD equal Samples in bib. END MAX SAMPLES SampleData [SAMPD]: ------------------- ID "SMPD" 4 BYTES SMPD_SIZE Jump Bytes to next Block 1 LONGINT SAMPLELENTGH Jump Bytes to next Entry 1 LONGINT SAMPLE DATA Data of Sample SAMPLELENGTH Bytes Stream: [Ende]: ID "ENDE" ------- END DDMF ----------------------------------------------------------------------------- Das DMF Format ist somit folgende Bl”cke unterteilt: [DDMF] Format Kennzeichnung. The almighty DELUSION DIGITAL MUSIC FORMAT ;-) [INFO] Info wird nur wenn n”tig gespeichert. Defintion fehlt... [SEQU] Sequencer, solange wie der Block [PATT] Pattern Data, es ist fr jeden Pattern eine beliebige Track Zahl m”glich sonst werden bei 16 Spuren z.b. in einem Pattern nur 4 bentzt werden zuviel Overhead gespeichert. [SMPI] Info der Samples. [SMPD] Daten der Samples. Sollte nach dem SMPI Block gespeichert sein. [ENDE] Letzter Block im File ;-) Sound FontÒ Technical Specification Version 2.00a October 18, 1995 0 About This Document 0.1 Revision History Revision Issue Date Comments 2.00a 10/18/95 First publicly released draft 0.2 Disclaimers THIS SPECIFICATION IS PROVIDED ÒAS ISÓ WITH NO WARRANTIES WHATSOEVER INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTEE OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION, OR SAMPLE. A LICENSE IS HEREBY GRANTED TO COPY, REPRODUCE, AND DISTRIBUTE THIS SPECIFICATION FOR INTERNAL USE ONLY. NO OTHER LICENSE EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY OTHER INTELLECTUAL PROPERTY RIGHTS IS GRANTED OR INTENDED HEREBY. AUTHORS OF THIS SPECIFICATION DISCLAIM ALL LIABILITY, INCLUDING LIABILITY FOR INFRINGEMENT OF PROPPIETARY RIGHTS, RELATING TO IMPLEMENTATION OF INFORMATION IN THIS SPECIFICATION. AUTHORS OF THIS SPECIFICATION ALSO DO NOT WARRANT OR REPRESENT THAT SUCH IMPLEMENTATION(S) WILL NOT INFRINGE ON SUCH RIGHTS. This preliminary document is being distributed solely for the purpose of review and solicitation of comments. It will be updated periodically. No products should rely on the content of this version of the document. SoundFontÒ is a registered trademark of E-mu Systems, Inc. E-mu Systems licenses a ÒSoundFont CompatibilityÓ logo for a nominal fee; please contact E-muÕs SoundFont administrator by FAX at (408) 439-0392 for more information. Users of the information contained herein should refer to files conforming to the specification as ÒSoundFont Compatible,Ó with appropriate acknowledgement of trademark ownership. 0.3 Comments Please send comments via e-mail to soundfont@emu.com 0.4 Table of Contents 0 About This Document 0.1 Revision History 0.2 Disclaimers 0.3 Comments 0.4 Table of Contents 0.5 Illustrations 1 Introduction 1.1 Scope and Intended Purpose of this Document 1.2 Document Organization 1.3 SoundFont 2 Objectives 1.4 SoundFont 1.x 1.5 Future Enhancements to the SoundFont 2 Standard 2 Terms and Abbreviations 2.1 Data Structure Terminology 2.2 Synthesizer Terminology 2.3 Parameter Terminology 3 RIFF Structure 3.1 General RIFF File Structure 3.2 The SoundFont 2 Chunks and Subchunks 3.3 Redundancy and Error Handling in the RIFF structure 4 SoundFont 2 RIFF File Format 4.1 SoundFont 2 RIFF File Format Level 0 4.2 SoundFont 2 RIFF File Format Level 1 4.3 SoundFont 2 RIFF File Format Level 2 4.4 SoundFont 2 RIFF File Format Level 3 4.5 SoundFont 2 RIFF File Format Type Definitions 5 The INFO-list Chunk 5.1 The ifil Subchunk 5.2 The isng Subchunk 5.3 The INAM Subchunk 5.4 The irom Subchunk 5.5 The iver Subchunk 5.6 The ICRD Subchunk 5.7 The IENG Subchunk 5.8 The IPRD Subchunk 5.9 The ICOP Subchunk 5.10 The ICMT Subchunk 5.11 The ISFT Subchunk 6 The sdta-list Chunk 6.1 Sample Data Format in the smpl Subchunk 6.2 Sample Data Looping Rules 7 The pdta-list Chunk 7.1 The HYDRA Data Structure 7.2 The PHDR Subchunk 7.3 The PBAG Subchunk 7.4 The PMOD Subchunk 7.5 The PGEN Subchunk 7.6 The INST Subchunk 7.7 The IBAG Subchunk 7.8 The IMOD Subchunk 7.9 The IGEN Subchunk 7.10 The SHDR Subchunk 8 Enumerators 8.1 Generator Enumerators 8.1.1 Kinds of Generator Enumerators 8.1.2 Generator Enumerators Defined 8.1.3 Generator Summary 8.2 Source Enumerators 8.3 Transform Enumerators 8.4 Default Modulators 8.5 Precedence and Absolute and Relative values. 9 Parameters and Synthesis Model 9.1 Synthesis Model 9.1.1 Wavetable Oscillator 9.1.2 Sample Looping 9.1.3 Lowpass Filter 9.1.4 Final Gain Amplifier 9.1.5 Effects Sends 9.1.6 Low Frequency Oscillators 9.1.7 Envelope Generators 9.1.8 Modulation Interconnection Summary 9.2 MIDI Functions 9.3 Parameter Units 9.4 On Implementation Accuracy 10 Error Handling 10.1 Structural Errors 10.2 Unknown Chunks 10.3 Unknown Enumerators 10.4 Illegal Parameter Values 10.5 Unusual Values 10.6 Missing Required Parameter or Terminator 10.7 Illegal enumerator 11 Silicon SoundFonts 12 Glossary 0.5 Illustrations Figure 1 - Ideal Filter Response Section 9.1.3 Figure 2 - Modulation Structure Section 9.1.8 1 Introduction 1.1 Scope and Intended Purpose of this Document This document is the definitive source for the SoundFont 2 standard. This document should provide complete and accurate information to allow any user to correctly construct and interpret SoundFont 2 compatible banks. This document is not intended to provide any information on the design or implementation of music synthesizers. 1.2 Document Organization This document is organized such that sections 1 and 2 give introductory information about the SoundFont 2 standard. Both new and seasoned musical engineers will get value from the review of terminology provided in section 2. Sections 3 through 8 provide increasingly detailed descriptions of the SoundFont 2 standard data structures. The sections will ultimately serve as reference, but can be scanned in order to provide sufficient detail for any level of understanding. Section 9 deals with the Synthesis model supported by the SoundFont standard, and will be of interest to anyone involved with the synthesis engine or bank creation. Section 10 specifies error handling when dealing with SoundFont compatible banks, and will be of interest primarily to programmers using the SoundFont standard. The alphabetical glossary in section 11 can be used as a reference for any unfamiliar or confusing terminology. 1.3 SoundFont 2 Objectives The SoundFont 2 standard is intended to provide an extensible, portable, universal interchange format for wavetable synthesizer ÒsamplesÓ and articulation data. The standard is made extensible largely by the use of enumerated ÒgeneratorsÓ and ÒmodulatorsÓ so that additional function units can be added as requirements dictate. The standard is made portable and universal by the use of precisely defined and hardware independent parameters, as well as by specific practices designed to provide support to a broad range of technologies. 1.4 SoundFont 1.x The SoundFont standard was originally released in its 1.0 embodiment with the Creative Technology AWE32 product using the EMU8000 music synthesis chip. This proprietary format proved very successful, but experience brought a number of refinements. These initially were performed in an upward compatible manner to revision 1.5. However, due to increasing demand for a public downloadable sound interchange format, Creative Technology determined that a public disclosure of the SoundFont format would be in its best interest. Because there were still more improvements required, many of which could not be supported in a completely compatible manner, Creative decided to combine public disclosure with the step to a revised format. The result is the SoundFont 2 standard. There are several key enhancements contained in the SoundFont 2 standard. The first is the use of relative parameters in the Preset level. This allows instruments to be adjusted without altering their self-consistency, providing easy and effective user editing of instruments. The second is an improvement in the data structures associated with the samples themselves, again providing key information which will allow the sound designer to re-use samples with a minimum of difficulty. An increased specificity in the rules for sample data produces enhanced portability across various sound engines. Finally, the addition of modulators produces a robust structure which can express all the typical function in current and future wavetable synthesizers. 1.5 Future Enhancements to the SoundFont 2 Standard The SoundFont 2 standard is designed to allow for enhancements based on future wavetable synthesis technology capabilities by additional enumerations of generators and modulators. This will be done as required in an upwardly compatible manner. Suggestions for additions can be made via e-mail to soundfont@emu.com. In general, our policy for updating the specification will be based on consumer need, rather than technological idealism. It is our expectation to maintain bidirectional compatibility within the SoundFont 2 standard for some years. 2 Terms and Abbreviations The following sections introduce terms used within this specification in a logical order. They are provided both as an introduction to readers unfamiliar with wavetable synthesis implementation details, as well as a review and reference for the expert. These and other terms and abbreviations can also be found arranged alphabetically for reference in the glossary at the end of this specification. 2.1 Data Structure Terminology bag - A SoundFont data structure element containing a list of layers (preset bag) or splits (instrument bag). big endian - Refers to the organization in memory of bytes within a word such that the most significant byte occurs at the lowest address. Contrast Òlittle endian.Ó byte - A data structure element of eight bits without definition of meaning to those bits. BYTE - A data structure element of eight bits which contains an unsigned value from 0 to 255. case-insensitive - Indicates that an ASCII character or string treats alphabetic characters of upper or lower case as identical. Contrast Òcase-sensitive.Ó case-sensitive - Indicates that an ASCII character or string treats alphabetic characters of upper or lower case as distinct. Contrast Òcase-insensitive.Ó CHAR - A data structure of eight bits which contains a signed value from -128 to +127. chunk - The top level division of a RIFF file. doubleword - A data structure element of 32 bits without definition of meaning to those bits. DWORD - A data structure of 32 bits which contains an unsigned value from zero to 4,294,967,295. enumerated - Said of a data element whose symbols correspond to particular assigned functions. global - Refers to parameters which affect all associated structures. See Òglobal layerÓ and Òglobal split.Ó global layer - A layer whose generators and modulators affect all other layers within the preset. global split - A split whose generators and modulators affect all other splits within the instrument. header - A data structure element which describes several aspects of a SoundFont element. hydra - A. A nine-headed mythical beast. B. The nine ÒpdtaÓ subchunks which make up the SoundFont articulation data. instrument - In the SoundFont standard, a collection of splits which represents the sound of a single musical instrument or sound effect set. instrument split - A sample and associated articulation data defined to play over certain key numbers and velocities. Also simply called a split. layer - A subset of a preset containing generators, modulators, and an instrument. Also termed Òpreset layer.Ó level - In the SoundFont structure, this refers either to the preset and layers (the preset level) or the instrument and splits (the instrument level). little endian - A method of ordering bytes within larger words in memory in which the least significant byte is at the lowest address. Contrast Òbig endian.Ó orphan - Said of a data structure which under normal circumstances is referenced by a higher level, but in this particular instance is no longer linked. Specifically, it is an instrument which is not referenced by any preset layer, or a sample which is not referenced by any instrument split. preset - A keyboard full of sound. Typically the collection of samples and articulation data associated with a particular MIDI preset number. preset layer - A subset of a preset containing generators, modulators, and an instrument. Also simply termed a Òlayer.Ó record - A single instrance of a data structure. RIFF - Acronym for Resource Interchange File Format. The recommended form for interchange files such as SoundFont compatible files within Microsoft operating systems. SHORT - A data structure element of sixteen bits which contains a signed value from -32,768 to +32,767. split - A sample and associated articulation data defined to play over certain key numbers and velocities. Also called an instrument split. subchunk - A division of a RIFF file below that of the chunk. terminator - A data structure element indicating the final element in a sequence. WORD - A data structure of 16 bits which contains an unsigned value from zero to 65,535. word - A data structure element of 16 bits without definition of meaning to those bits. 2.2 Synthesizer Terminology articulation - The process of modulation of amplitude, pitch, and timbre to produce an expressive musical note. artifact - A (typically undesirable) sonic event which is recognizable as not being present in the original sound. attack - That phase of an envelope or sound during which the amplitude increases from zero to a peak value. attenuation - A decrease in volume or amplitude of a signal. AWE32 - The original Creative Technology Sound Blaster product which contained an EMU8000 wavetable synthesizer and supported the SoundFont standard. balance - A form of stereo volume control in which both left and right channels are at maximum when the control is centered, and which attenuates only the opposite channel when taken to either extreme. bank - A collection of presets. See also MIDI bank. chorus - An effects processing algorithm which involves cyclically shifting the pitch of a signal and remixing it with itself to produce a time varying comb filter, giving a perception of motion and fullness to the resulting sound. cutoff frequency - The frequency of a filter function at which the attenuation reaches a specified value. data points - The individual values comprising a sample. Sometimes also called sample points. Contrast Òsample.Ó decay - The portion of an envelope or sound during which the amplitude declines from a peak to steady state value. delay - The portion of an envelope or LFO function which elapses from a key-on event until the amplitude becomes non-zero. DC gain - The degree of amplification or attentuation a system presents to a static or zero frequency signal. digital audio - Audio represented as a sequence of quantized values spaced evenly over time. The values are called Òsample data points.Ó downloadable - Said of samples which are loaded from a file into RAM, in contrast to samples which are maintained in ROM. dry - Refers to audio which has not received any effects processing such as reverb or chorus. EMU8000 - A wavetable synthesizer chip designed by E-mu Systems for use in Creative Technology products. envelope - A time varying signal which typically controls the pitch, volume, and/or filter cutoff frequency of a note, and comprises multiple phases including attack, decay, sustain, and release. flat - A. Said of a tone that is lower in pitch than another reference tone. B. Said of a frequency response that does not deviate significantly from a single fixed gain over the audio range. interpolator - A circuit or algorithm which computes intermediate points between existing sample data points. This is of particular use in the pitch shifting operation of a wavetable synthesizer, in which these intermediate points represent the output samples of the waveform at the desired pitch transposition. key number - See MIDI key number. LFO - Acronym for Low Frequency Oscillator. A slow periodic modulation source. linear coding - The most common method of encoding amplitudes in digital audio in which each step is of equal size. loop - In wavetable synthesis, a portion of a sample which is repeated many times to increase the duration of the resulting sound. loop points - The sample data points at which a loop begins and ends. lowpass - Said of a filter which attenuates high frequencies but does not attenuate low frequencies. MIDI - Acronym for Musical Instrument Digital Interface. The standard protocol for sending performance information to a musical synthesizer. MIDI bank - A group of up to 128 presets selected by a MIDI Òchange bankÓ command. MIDI continuous controller - A construct in the MIDI protocol. MIDI key number - A construct in the MIDI protocol which accompanies a MIDI key-on or key-off command and specifies the key of the musical instrument keyboard to which the command refers. MIDI pitch bend - A special MIDI construct akin to the MIDI continuous controllers which controls the realtime value of the pitch of all notes played in a MIDI channel. MIDI preset - A ÒpresetÓ selected to be active in a particular MIDI channel by a MIDI Òchange presetÓ command. MIDI velocity - A construct in the MIDI protocol which accompanies a MIDI key-on or key-off command and specifies the speed with which the key was pressed or released. mono - Short for Òmonophonic.Ó Indicates a sound comprising only one channel or waveform. Contrast with Òstereo.Ó oscillator - In wavetable synthesis, the wavetable interpolator is considered an oscillator. pan - Short for Òpanorama.Ó This is the control of the apparent azimuth of a sound source over 180 degrees from left to right. It is generally implemented by varying the volume at the left and right speakers. pitch - The perceived value of frequency. Generally can be used interchangably with frequency. pitch shift - A change in pitch. Wavetable synthesis relies on interpolators to cause pitch shift in a sample to produce the notes of the scale. pole - A mathematical term used in filter transform analysis. Traditionally in synthesis, a pole is equated with a rolloff of 6dB per octave, and the rolloff of a filter is specified in Òpoles.Ó Preditor - E-mu SystemsÕ proprietary SoundFont 2.00 compatible bank editing software. preset - A keyboard full of sound. Typically the collection of samples and articulation data associated with a particular MIDI preset number. Q - A mathematical term used in filter transform analysis. Indicates the degree of resonance of the filter. In synthesis terminology, it is synonymous with resonance. release - The portion of an envelope or sound during which the amplitude declines from a steady state to zero value or inaudibility. resonance - Describes the aspect of a filter in which particular frequencies are given significantly more gain than others. The resonance can be measured in dB above the DC gain. resonant frequency - The frequency at which resonance reaches its maximum. reverb - Short for reverberation. In synthesis, a synthetic signal processor which adds artificial spaciousness and ambience to a sound. sample - This term is often used both to indicate a Òsample data pointÓ and to indicate a collection of such points comprising a digital audio waveform. The latter meaning is exclusively used in this specification. soft - The pedal on a piano, so named because it causes the damper to be lowered in such a way as to soften the timbre and loudness of the notes. In MIDI, continuous controller #66 which behaves in a similar manner. sostenudo - The pedal on a piano which causes the dampers on all keys depressed to be held until the pedal is released. In MIDI, continuous controller #67 which behaves in a similar manner. sustain - The pedal on a piano which prevents all dampers on keys as they are depressed from being released. In MIDI, continuous controller #64 which behaves in a similar manner. SoundFont - A registered trademark of E-mu Systems, Inc, indicating files produced by E-mu which conform to the SoundFont standard file format. stereo - Literally indicating three dimensions. In this specification, the term is used to mean two channel stereophonic, indicating that the sound is composed of two independent audio channels, dubbed left and right. Constrast monophonic. synthesis engine - The hardware and software associated with the signal processing and modulation path for a particular synthesizer. synthesizer - A device capable of producing ideally arbitrary musical sound. tremolo - A periodic change in amplitude of a sound, typically produced by applying a low frequency oscillator to the final volume amplifier. triangular - A waveform which ramps upward to a positive limit, then downward at the opposite slope to the symmetrically negative limit periodically. unpitched - Said of a sound which is not characterized by a perceived frequency. This would be true of noise-like musical instruments and of many sound effects. velocity - In synthesis, the speed with which a keyboard key is depressed, typically proportionally to the impact delivered by the musician. See also MIDI velocity. vibrato - A periodic change in the pitch of a sound, typically produced by applying a low frequency oscillator to the oscillator pitch. volume - The loudness or amplitude of a sound, or the control of this parameter. wavetable - A music synthesis technique wherein musical sounds are recorded or computed mathematically and stored in a memory, then played back at a variable rate to produce the desired pitch. Additional timbral adjustments are often made to the sound thus produced using amplifiers, filters, and effect processing such as reverb and chorus. 2.3 Parameter Terminology absolute - Describes a parameter which gives a definitive real-world value. Contrast to relative. additive - Describes a parameter which is to be numerically added to another parameter. attenuation - A decrease in volume or amplitude of a signal. cent - A unit of pitch ratio corresponding to the twelve hundredth root of two, or one hundredth of a semitone, approximately 1.000577790. centibel - A unit of amplitude ratio corresponding to the two hundredth root of ten, or one tenth of a decibel, approximately 1.011579454. cutoff frequency - The frequency of a filter function at which the attenuation reaches a specified value. decibel - A unit of amplitude ratio corresponding to the twentieth root of ten, approximately 1.122018454. octave - A factor of two in ratio, typically applied to pitch or frequency. pitch - The perceived value of frequency. Generally can be used interchangably with frequency. pitch shift - A change in pitch. Wavetable synthesis relies on interpolators to cause pitch shift in a sample to produce the notes of the scale. relative - Describes a parameter which merely indicates an offset from an otherwise established value. Contrast to absolute. resonance - Describes the aspect of a filter in which particular frequencies are given significantly more gain than others. The resonance can be measured in dB above the DC gain. sample rate - The frequency, in Hertz, at which sample data points are taken when recording a sample. semitone - A unit of pitch ratio corresponding to the twelfth root of two, or one twelfth of an octave, approximately 1.059463094. sharp - Said of a tone that is higher in pitch than another reference tone. timecent - A unit of duration ratio corresponding to the twelve hundredth root of two, or one twelve hundredth of an octave, approximately 1.000577790. 3 RIFF Structure 3.1 General RIFF File Structure The RIFF (Resource Interchange File Format) is a tagged file structure developed for multimedia resource files, and is described in some detail in the Microsoft Windows 3.1 SDK Multimedia ProgrammerÕs Reference. the Tagged-file structure is useful because it helps prevent compatibility problems which can occur as the file definition changes over time. Because each piece of data in the file is identified by a standard header, an application that does not recognize a given data element can skip over the unknown information. A RIFF file is constructed from a basic building block called a Òchunk.Ó In ÔCÕ syntax, a chunk is defined: typedef DWORD FOURCC; // Four-character code typedef struct { FOURCC ckID; // A chunk ID identifies the type of data within the chunk. DWORD ckSize; // The size of the chunk data in bytes, excluding any pad byte. BYTE ckDATA[ckSize]; // The actual data plus a pad byte if reqÕd to word align. }; Two types of chunks, the ÒRIFFÓ and ÒLISTÓ chunks, may contain nested chunks called subchunks as their data. The ordering requirements of chunks and subchunks within a RIFF file is not well documented in the RIFF file format. In SoundFont 2.0, the order of the subchunks withing the INFO chunk is arbitrary, but for consistency it is recommended that the subchunks be ordered as presented in this document. The order of the all other chunks and subchunks is strictly defined and must be maintained as presented in this document. 3.2 The SoundFont 2 Chunks and Subchunks A SoundFont 2 compatible RIFF file comprises three chunks: an INFO-list chunk containing a number of required and optional subchunks describing the file, its history, and its intended use, an sdta-list chunk comprising a single subchunk containing any referenced digital audio samples, and a pdta-list chunk containing nine subchunks which define the articulation of the digital audio data. The SoundFont 2 standard allows that the subchunks within the INFO-list chunk may appear in arbitrary order. However, the order of the three chunks, and the order of the subchunks within the pdta- list chunk, is fixed. The SoundFont 2 specification requires that implementations ignore unknown subchunks within the INFO-list chunk. Note, however, that until such subchunks become defined in the specification, inclusion of additional INFO-list subchunks will preclude the file from conforming to the SoundFont standard. A detailed description of the SoundFont 2 RIFF structure is provided in Section 4. 3.3 Redundancy and Error Handling in the RIFF structure The RIFF file structure contains redundant information regarding the length of the file and the length of the chunks and subchunks. This fact enables any reader of a SoundFont compatible file to determine if the file has been damaged by loss of data. If any such loss is detected, the SoundFont compatible file is termed Òstructurally unsoundÓ and in general should be rejected. SoundFont compatible software developers may produce utilities to recover data from structurally unsound files, producing with or without user assitance a corrected and structurally sound SoundFont 2 compatible file. 4 SoundFont 2 RIFF File Format 4.1 SoundFont 2 RIFF File Format Level 0 -> RIFF (ÔsfbkÕ ; RIFF form header { ; Supplemental Information ; The Sample Binary Data ; The Preset, Instrument, and Sample Header data } ) 4.2 SoundFont 2 RIFF File Format Level 1 -> LIST (ÔINFOÕ { ; Refers to the version of the Sound Font RIFF file ; Refers to the target Sound Engine ; Refers to the Sound Font Bank Name [] ; Refers to the Sound ROM Name [] ; Refers to the Sound ROM Version [] ; Refers to the Date of Creation of the Bank [] ; Sound Designers and Engineers for the Bank [] ; Product for which the Bank was intended [] ; Contains any Copyright message [] ; Contains any Comments on the Bank [] ; The SoundFont tools used to create and alter the bank } ) -> LIST (ÔsdtaÕ { [ -> LIST (ÔpdtaÕ { ; The Preset Headers ; The Preset Index list ; The Preset Modulator list ; The Preset Generator list ; The Instrument Names and Indicies ; The Instrument Index list ; The Instrument Modulator list ; The Instrument Generator list ; The Sample Headers } ) 4.3 SoundFont 2 RIFF File Format Level 2 -> ifil() ; e.g. 2.00 -> isng(szSoundEngine:ZSTR) ; e.g. ÒEMU8000Ó -> irom(szROM:ZSTR) ; e.g. Ò1MGMÓ -> iver() ; e.g. 2.08 -> INAM(szName:ZSTR) ; e.g. ÒGeneral MIDIÓ -> ICRD(szDate:ZSTR) ; e.g. ÒJuly 15, 1995Ó -> IENG(szName:ZSTR) ; e.g. ÒJohn Q. EngineerÓ -> IPRD(szProduct:ZSTR) ; e.g. ÒSBAWE32Ó -> ICOP(szCopyright:ZSTR) ; e.g. ÒCopyright (c) 1995 E-mu Systems, Inc.Ó -> ICMT(szComment:ZSTR) ; e.g. ÒThis is a commentÓ -> ISFT(szTools:ZSTR) ; e.g. ÒPreditor 2.00a:Preditor 2.00aÓ -> smpl() ; 16 bit Linearly Coded Digital Audio Data -> phdr() -> pbag() -> pmod() -> pgen() -> inst () -> ibag() -> imod() -> igen() -> shdr() 4.4 SoundFont 2 RIFF File Format Level 3 -> struct sfVersionTag { WORD wMajor; WORD wMinor; }; -> struct sfPresetHeader { CHAR achPresetName[20]; WORD wPreset; WORD wBank; WORD wPresetBagNdx; DWORD dwLibrary; DWORD dwGenre; DWORD dwMorphology; }; -> struct sfPresetBag { WORD wGenNdx; WORD wModNdx; }; -> struct sfModList { SFModulator sfModSrcOper; SFGenerator sfModDestOper; SHORT modAmount; SFModulator sfModAmtSrcOper; SFTransform sfModTransOper; }; -> struct sfGenList { SFGenerator sfGenOper; genAmountType genAmount; }; -> struct sfInst { CHAR achInstName[20]; WORD wInstBagNdx; }; -> struct sfInstBag { WORD wInstGenNdx; WORD wInstModNdx; }; -> struct sfInstModList { SFModulator sfModSrcOper; SFGenerator sfModDestOper; SHORT modAmount; SFModulator sfModAmtSrcOper; SFTransform sfModTransOper; }; -> struct sfInstGenList { SFGenerator sfGenOper; genAmountType genAmount; }; -> struct sfSample { CHAR achSampleName[20]; DWORD dwStart; DWORD dwEnd; DWORD dwStartloop; DWORD dwEndloop; DWORD dwSampleRate; BYTE byOriginalKey; CHAR chCorrection; WORD wSampleLink; SFSampleLink sfSampleType; }; 4.5 SoundFont 2 RIFF File Format Type Definitions The sfModulator, sfGenerator, and sfTransform types are all enumeration types whose values are defined in subsequent sections. The genAmountType is a union which allows signed 16 bit, unsigned 16 bit, and two unsigned 8 bit fields: typedef struct { BYTE byLo; BYTE byHi; } rangesType; typedef union { rangesType ranges; SHORT shAmount; WORD wAmount; } genAmountType; The SFSampleLink is an enumeration type which describes both the type of sample (mono, stereo left, etc.) and the whether the sample is located in RAM or ROM memory: typedef enum { monoSample = 1, rightSample = 2, leftSample = 4, linkedSample = 8, RomMonoSample = 0x8001, RomRightSample = 0x8002, RomLeftSample = 0x8004, RomLinkedSample = 0x8008 } SFSampleLink; 5 The INFO-list Chunk TheINFO-list chunk in a SoundFont 2 compatible file contains three mandatory and a variety of optional subchunks as defined below. The INFO-list chunk gives basic information about the SoundFont compatible bank contained in the file. 5.1 The ifil Subchunk The ifil subchunk is a mandatory subchunk identifying the SoundFont specification version level to which the file complies. It is always four bytes in length, and contains data according to the structure: struct sfVersionTag { WORD wMajor; WORD wMinor; }; The WORD wMajor contains the value to the left of the decimal point in the SoundFont specification version, the WORD wMinor contains the value to the right of the decimal point. For example, version 2.11 would be implied if wMajor=2 and wMinor=11. These values can be used by applications which read SoundFont compatible files to determine if the format of the file is usable by the program. Within a fixed wMajor, the only changes to the format will be the addition of Generator, Source and Transform enumerators, and additional info subchunks. These are all defined as being ignored if unknown to the program. Consequently, many applications can be designed to be fully upward compatible within a given wMajor. In the case of editors or other programs in which all enumerators should be known, the value of wMinor may be of consequence. Generally the application program will either accept the file as usable (possibly with appropriate transparent translation), reject the file as unusable, or warn the user that there may be uneditable data in the file. If the ifil subchunk is missing, or its size is not four bytes, the file should be rejected as structurally unsound. 5.2 The isng Subchunk The isng subchunk is a mandatory subchunk identifying the wavetable sound engine for which the file was optimized. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. The default isng field is the eight bytes representing ÒEMU8000Ó as seven ASCII characters followed by a zero byte. The ASCII should be treated as case-sensitive. In other words Òemu8000Ó is not the same as ÒEMU8000.Ó The isng string can be optionally used by chip drivers to vary their synthesis algorithms to emulate the target sound engine. If the isng subchunk is missing, not terminated in a zero valued byte, or its contents are an unknown sound engine, the field should be ignored and EMU8000 assumed. 5.3 The INAM Subchunk The INAM subchunk is a mandatory subchunk providing the name of the SoundFont compatible bank. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. A typical inam subchunk would be the fourteen bytes representing ÒGeneral MIDIÓ as twelve ASCII characters followed by two zero bytes. The ASCII should be treated as case-sensitive. In other words ÒGeneral MIDIÓ is not the same as ÒGENERAL MIDI.Ó The inam string is typically used for the idenitification of banks even if the file names are altered. If the inam subchunk is missing, or not terminated in a zero valued byte, the field should be ignored and the user supplied with an appropriate error message if the name is queried. If the file is re-written, a valid name should be placed in the INAM field. 5.4 The irom Subchunk The irom subchunk is an optional subchunk identifying a particular wavetable sound data ROM to which any ROM samples refer. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. A typical irom field would be the six bytes representing Ò1MGMÓ as four ASCII characters followed by two zero bytes. The ASCII should be treated as case-sensitive. In other words Ò1mgmÓ is not the same as Ò1MGM.Ó The irom string is used by drivers to verify that the ROM data referenced by the file is available to the sound engine. If the irom subchunk is missing, not terminated in a zero valued byte, or its contents are an unknown ROM, the field should be ignored and the file assumed to reference no ROM samples. If ROM samples are accessed, any accesses to such intruments should be terminated and not sound. A file should not be written which attempts to access ROM samples without both irom and iver present and valid. 5.5 The iver Subchunk The iver subchunk is an optional subchunk identifying the particular wavetable sound data ROM revision to which any ROM samples refer. It is always four bytes in length, and contains data according to the structure: struct sfVersionTag { WORD wMajor; WORD wMinor; }; The WORD wMajor contains the value to the left of the decimal point in the ROM version, the WORD wMinor contains the value to the right of the decimal point. For example, version 1.36 would be implied if wMajor=1 and wMinor=36. The iver subchunk is used by drivers to verify that the ROM data referenced by the file is located in the exact locations specified by the sound headers. If the iver subchunk is missing, not four bytes in length, or its contents indicate an unknown or incorrect ROM, the field should be ignored and the file assumed to reference no ROM samples. If ROM samples are accessed, any accesses to such instruments should be terminated and not sound. Note that for ROM samples to function correctly, both iver and irom must be present and valid. A file should not be written which attempts to access ROM samples without both irom and iver present and valid. 5.6 The ICRD Subchunk The ICRD subchunk is an optional subchunk identifying the creation date of the SoundFont compatible bank. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. A typical ICRD field would be the twelve bytes representing ÒMay 1, 1995Ó as eleven ASCII characters followed by a zero byte. Conventionally, the format of the string is ÒMonth Day, YearÓ where Month is initially capitalized and is the conventional full English spelling of the month, Day is the date in decimal followed by a comma, and Year is the full decimal year. Thus the field should conventionally never be longer than 32 bytes. The ICRD string is provided for library management purposes. If the ICRD subchunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the fieldÕs contents are not seemingly meaningful but can faithfully reproduced, this should be done. 5.7 The IENG Subchunk The IENG subchunk is an optional subchunk identifying the names of any sound designers or engineers responsible for the SoundFont compatible bank. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. A typical IENG field would be the twelve bytes representing ÒTim SwartzÓ as ten ASCII characters followed by two zero bytes. The IENG string is provided for library management purposes. If the IENG subchunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the fieldÕs contents are not seemingly meaningful but can faithfully reproduced, this should be done. 5.8 The IPRD Subchunk The IPRD subchunk is an optional subchunk identifying any specific product for which the SoundFont compatible bank is intended. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. A typical IPRD field would be the eight bytes representing ÒSBAWE32Ó as seven ASCII characters followed by a zero byte. The ASCII should be treated as case-sensitive. In other words Òsbawe32Ó is not the same as ÒSBAWE32.Ó The IPRD string is provided for library management purposes. If the IPRD subchunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the fieldÕs contents are not seemingly meaningful but can faithfully reproduced, this should be done. 5.9 The ICOP Subchunk The ICOP subchunk is an optional subchunk containing any copyright assertion string associated with the SoundFont compatible bank. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. A typical ICOP field would be the 40 bytes representing ÒCopyright (c) 1995 E-mu Systems, Inc.Ó as 38 ASCII characters followed by two zero bytes. The ICOP string is provided for intellectual property protection and management purposes. If the ICOP subchunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the fieldÕs contents are not seemingly meaningful but can faithfully reproduced, this should be done. 5.10 The ICMT Subchunk The ICMT subchunk is an optional subchunk containing any comments associated with the SoundFont compatible bank. It contains an ASCII string of 65,536 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. A typical ICMT field would be the 40 bytes representing ÒThis space unintentionally left blank.Ó as 38 ASCII characters followed by two zero bytes. The ICMT string is provided for any non-scatological uses. If the ICMT subchunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the fieldÕs contents are not seemingly meaningful but can faithfully reproduced, this should be done. 5.11 The ISFT Subchunk The ISFT subchunk is an optional subchunk identifying the SoundFont compatible tools used to create and most recently modify the SoundFont compatible bank. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even. A typical ISFT field would be the thirty bytes representing ÒPreditor 2.00a:Preditor 2.00aÓ as twenty-nine ASCII characters followed by a zero byte. The ASCII should be treated as case-sensitive. In other words ÒPreditorÓ is not the same as ÒPREDITOR.Ó Conventionally, the tool name and revision control number are included first for the creating tool and then for the most recent modifying tool. The two strings are separated by a colon. The string should be produced by the creating program with a null modifying tool field (e.g. ÒPreditor 2.00a:), and each time a tool modifies the bank, it should replace the modifying tool field with its own name and revision control number. The ISFT string is provided primarily for error tracing purposes. If the ISFT subchunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the fieldÕs contents are not seemingly meaningful but can faithfully reproduced, this should be done. 6 The sdta-list Chunk The sdta-list chunk in a SoundFont 2 compatible file contains a single optional smpl subchunk which contains all the RAM based sound data associated with the SoundFont compatible bank. The smpl subchunk is of arbitrary length, and contains an even number of bytes. 6.1 Sample Data Format in the smpl Subchunk The smpl subchunk, if present, contains one or more ÒsamplesÓ of digital audio information in the form of linearly coded sixteen bit, signed, little endian (least significant byte first) words. Each sample is followed by a minimum of forty-six zero valued sample data points. These zero valued data points are necessary to guarantee that any reasonable upward pitch shift using any reasonable interpolator can loop on zero data at the end of the sound. 6.2 Sample Data Looping Rules Within each sample, one or more loop point pairs may exist. The locations of these points are defined within the pdta-list chunk, but the sample data points themselves must comply with certain practices in order for the loop to be compatible across multiple platforms. The loops are defined by Òequivalent pointsÓ in the sample. This means that there are two sample data points which are logically equivalent, and a loop occurs when these points are spliced atop one another. In concept, the loop end point is never actually played during looping; instead the loop start point follows the point just prior to the loop end point. Because of the bandlimited nature of digital audio sampling, an artifact free loop will exhibit virtually identical data surrounding the equivalent points. In actuality, because of the various interpolation algorithms used by wavetable synthesizers, the data surrounding both the loop start and end points may affect the sound of the loop. Hence both the loop start and end points must be surrounded by continuous audio data. For example, even if the sound is programmed to continue to loop throughout the decay, sample data points must be provided beyond the loop end point. This data will typically be identical to the data at the start of the loop. A minimum of eight valid data points are required to be present before the loop start and after the loop end. The eight data points (four on each side) surrounding the two equivalent loop points should also be forced to be identical. By forcing the data to be identical, all interpolation algorithms are guaranteed to properly reproduce an artifact-free loop. 7 The pdta-list Chunk 7.1 The HYDRA Data Structure The articulation data within a SoundFont 2 compatible file is contained in nine subchunks, named ÒhydraÓ after the mythical nine-headed beast. The structure has been designed for interchange purposes; it is not optimized for either run-time synthesis nor for on-the-fly editing. It is reasonable and proper for SoundFont compatible client programs to translate to and from the hydra structure as they read and write SoundFont compatible files. 7.2 The PHDR Subchunk The PHDR subchunk is a required subchunk listing all presets within the SoundFont compatible file. It is always a multiple of thirty eight bytes in length, and contains a minimum of two records, one record for each preset and one for a terminal record according to the structure: struct sfPresetHeader { CHAR achPresetName[20]; WORD wPreset; WORD wBank; WORD wPresetBagNdx; DWORD dwLibrary; DWORD dwGenre; DWORD dwMorphology; }; The ASCII character field achPresetName contains the name of the preset expressed in ASCII, with unused terminal characters filled with zero valued bytes. Preset names are case-sensitive. A unique name should always be assigned to each preset in the SoundFont compatible bank to enable identification. However, if a bank is read containing the erroneous state of presets with identical names, the presets should not be discarded. They should either be preserved as read or preferentially uniquely renamed. The WORD wPreset contains the MIDI Preset Number and the WORD wBank contains the MIDI Bank Number which apply to this preset. Note that the presets are not ordered within the SoundFont compatible bank. Presets should have a unique set of wPreset and wBank numbers. However, if two presets have identical values of both wPreset and wBank, the first occuring preset in the PHDR chunk is the active preset, but any others with the same wBank and wPreset values should be maintained so that they can be renumbered and used at a later time. The special case of a General MIDI percussion bank is handled conventionally by a wBank value of 128. If the value in either field is not a valid MIDI value of zero through 127, or 128 for wBank, the preset cannot be played but should be maintained. The WORD wPresetBagNdx is an index to the presetÕs layer list in the PBAG subchunk. Because the preset layer list is in the same order as the preset header list, the preset bag indicies will be monotonically increasing with increasing preset headers. The size of the PBAG subchunk in bytes will be equal to four times the terminal presetÕs wPresetBagNdx plus four. If the preset bag indicies are non-monotonic or if the terminal presetÕs wPresetBagNdx does not match the PBAG subchunk size, the file is structurally defective and should be rejected at load time. All presets except the terminal preset must have at least one layer; any preset with no layers should be ignored. The DWORDs dwLibrary; dwGenre and dwMorphology are reserved for future implementation in a preset library management function and should be preserved as read, and created as zero. The terminal sfPresetHeader record should never be accessed, and exists only to provide a terminal wPresetBagNdx with which to determine the number of layers in the last preset. All other values are conventionally zero, with the exception of achPresetName, which can optionally be ÒEOPÓ indicating end of presets. If the PHDR subchunk is missing, contains fewer than two records, or its size is not a multiple of 38 bytes, the file should be rejected as structurally unsound. 7.3 The PBAG Subchunk The PBAG subchunk is a required subchunk listing all preset layers within the SoundFont compatible file. It is always a multiple of four bytes in length, and contains one record for each preset layer plus one record for a terminal layer according to the structure: struct sfPresetBag { WORD wGenNdx; WORD wModNdx; }; The first layer in a given preset is located at that presetÕs wPresetBagNdx. The number of layers in the preset is determined by the difference between the next presetÕs wPresetBagNdx and the current wPresetBagNdx. The WORD wGenNdx is an index to the presetÕs layer list of generators in the PGEN subchunk, and the wModNdx is an index to its list of modulators in the PMOD subchunk. Because both the generator and modulator lists are in the same order as the preset header and layer lists, these indicies will be monotonically increasing with increasing preset layers. The size of the PMOD subchunk in bytes will be equal to ten times the terminal presetÕs wModNdx plus ten and the size of the PGEN subchunk in bytes will be equal to four times the terminal presetÕs wGenNdx plus four. If the generator or modulator indicies are non-monotonic or do not match the size of the respective PGEN or PMOD subchunks, the file is structurally defective and should be rejected at load time. If a preset has more than one layer, the first layer may be a global layer. A global layer is determined by the fact that the last generator in the list is not an Instrument generator. All generator lists must contain at least one generator with one exception - if a global layer exists for which there are no generators but only modulators. The modulator lists can contain zero or more modulators. If a layer other than the first layer lacks an Instrument generator as its last generator, that layer should be ignored. A global layer with no modulators and no generators should also be ignored. If the PBAG subchunk is missing, or its size is not a multiple of four bytes, the file should be rejected as structurally unsound. 7.4 The PMOD Subchunk The PMOD subchunk is a required subchunk listing all preset layer modulators within the SoundFont compatible file. It is always a multiple of ten bytes in length, and contains zero or more modulators plus a terminal record according to the structure: struct sfModList { SFModulator sfModSrcOper; SFGenerator sfModDestOper; SHORT modAmount; SFModulator sfModAmtSrcOper; SFTransform sfModTransOper; }; The preset layerÕs wModNdx points to the first modulator for that preset layer, and the number of modulators present for a preset layer is determined by the difference between the next higher preset layerÕs wModNdx and the current presetÕs wModNdx. A difference of zero indicates there are no modulators in this preset layer. The sfModSrcOper is a value of one of the SFModulator enumeration type values. Unknown or undefined values are ignored. This value indicates the source of data for the modulator. The sfModDestOper is a value of one of the SFGenerator enumeration type values. Unknown or undefined values are ignored. This value indicates the destination of the modulator. The SHORT modAmount is a signed value indicating the degree to which the source modulates the destination. A zero value indicates there is no fixed amount. The sfModAmtSrcOper is a value of one of the SFModulator enumeration type values. Unknown or undefined values are ignored. This value indicates the degree to which the source modulates the destination is to be controlled by the specified modulation source. The sfModTransOper is a value of one of the SFTransform enumeration type values. Unknown or undefined values are ignored. This value indicates that a transform of the specified type will be applied to the modulation source before application to the modulator. The terminal record conventionally contains zero in all fields, and is always ignored. A modulator is defined by its sfModSrcOper, its sfModDestOper, and its sfModSrcAmtOper. All modulators within a layer must have a unique set of these three enumerators. If a second modulator is encountered with the same three enumerators as a previous modulator with the same layer, the first modulator will be ignored. Modulators in the PMOD subchunk act as additively relative modulators with respect to those in the IMOD subchunk. In other words, a PMOD modulator can increase or decrease the amount of an IMOD modulator. In SoundFont 2.00, no modulators have yet been defined, and the PMOD subchunk will always consist of ten zero valued bytes. If the PMOD subchunk is missing, or its size is not a multiple of ten bytes, the file should be rejected as structurally unsound. 7.5 The PGEN Subchunk The PGEN chunk is a required chunk containing a list of preset layer generators for each preset layer within the SoundFont compatible file. It is always a multiple of four bytes in length, and contains one or more generators for each preset layer (except a global layer containing only modulators) plus a terminal record according to the structure: struct sfGenList { SFGenerator sfGenOper; genAmountType genAmount; }; where the types are defined: typedef struct { BYTE byLo; BYTE byHi; } rangesType; typedef union { rangesType ranges; SHORT shAmount; WORD wAmount; } genAmountType; The sfGenOper is a value of one of the SFGenerator enumeration type values. Unknown or undefined values are ignored. This value indicates the type of generator being indicated. The genAmount is the value to be assigned to the specified generator. Note that this can be of three formats. Certain generators specify a range of MIDI key numbers of MIDI velocities, with a minimum and maximum value. Other generators specify an unsigned WORD value. Most generators, however, specify a signed 16 bit SHORT value. The preset layerÕs wGenNdx points to the first generator for that preset layer. Unless the layer is a global layer, the last generator in the list is an ÒInstrumentÓ generator, whose value is a pointer to the instrument associated with that layer. If a Òkey rangeÓ generator exists for the preset layer, it is always the first generator in the list for that preset layer. If a Òvelocity rangeÓ generator exists for the preset layer, it will only be preceded by a key range generator. If any generators follow an Instrument generator, they will be ignored. A generator is defined by its sfGenOper. All generators within a layer must have a unique sfGenOper enumerator. If a second generator is encountered with the same sfGenOper enumerator as a previous generator with the same layer, the first generator will be ignored. Generators in the PGEN subchunk are applied relative to generators in the IGEN subchunk in an additve manner. In other words, PGEN generators increase or decrease the value of an IGEN generator. If the PGEN subchunk is missing, or its size is not a multiple of four bytes, the file should be rejected as structurally unsound. If a key range generator is present and not the first generator, it should be ignored. If a velocity range generator is present, and is preceded by a generator other than a key range generator, it should be ignored. If a non-global list does not end in an instrument generator, layer should be ignored. If the instrument generator value is equal to or greater than the terminal instrument, the file should be rejected as structurally unsound. 7.6 The INST Subchunk The inst subchunk is a required subchunk listing all instruments within the SoundFont compatible file. It is always a multiple of twenty two bytes in length, and contains a minimum of two records, one record for each instrument and one for a terminal record according to the structure: struct sfInst { CHAR achInstName[20]; WORD wInstBagNdx; }; The ASCII character field achInstName contains the name of the instrument expressed in ASCII, with unused terminal characters filled with zero valued bytes. Instrument names are case-sensitive. A unique name should always be assigned to each instrument in the SoundFont compatible bank to enable identification. However, if a bank is read containing the erroneous state of instruments with identical names, the instruments should not be discarded. They should either be preserved as read or preferentially uniquely renamed. The WORD wInstBagNdx is an index to the instrumentÕs split list in the IBAG subchunk. Because the instrument split list is in the same order as the instrument list, the instrument bag indicies will be monotonically increasing with increasing instruments. The size of the IBAG subchunk in bytes will be four greater than four times the terminal (EOI) instrumentÕs wInstBagNdx. If the instrument bag indicies are non-monotonic or if the terminal instrumentÕs wInstBagNdx does not match the IBAG subchunk size, the file is structurally defective and should be rejected at load time. All instruments except the terminal instrument must have at least one split; any preset with no splits should be ignored. The terminal sfInst record should never be accessed, and exists only to provide a terminal wInstBagNdx with which to determine the number of splits in the last instrument. All other values are conventionally zero, with the exception of achInstName, which should be ÒEOIÓ indicating end of instruments. If the INST subchunk is missing, contains fewer than two records, or its size is not a multiple of 22 bytes, the file should be rejected as structurally unsound. All instruments present in the inst subchunk are typically referenced by a preset layer. However, a file containing any ÒorphanedÓ instruments need not be rejected. SoundFont compatible applications can optionally ignore or filter out these orphaned instruments based on user preference. 7.7 The IBAG Subchunk The IBAG subchunk is a required subchunk listing all instrument splits within the SoundFont compatible file. It is always a multiple of four bytes in length, and contains one record for each instrument split plus one record for a terminal layer according to the structure: struct sfInstBag { WORD wInstGenNdx; WORD wInstModNdx; }; The first split in a given instrument is located at that instrumentÕs wInstBagNdx. The number of splits in the instrument is determined by the difference between the next instrumentÕs wInstBagNdx and the current wInstBagNdx. The WORD wInstGenNdx is an index to the instrument splitÕs list of generators in the IGEN subchunk, and the wInstModNdx is an index to its list of modulators in the IMOD subchunk. Because both the generator and modulator lists are in the same order as the instrument and split lists, these indicies will be monotonically increasing with increasing splits. The size of the IMOD subchunk in bytes will be equal to ten times the terminal instrumentÕs wModNdx plus ten and the size of the IGEN subchunk in bytes will be equal to four times the terminal instrumentÕs wGenNdx plus four. If the generator or modulator indicies are non-monotonic or do not match the size of the respective IGEN or IMOD subchunks, the file is structurally defective and should be rejected at load time. If an instrument has more than one split, the first split may be a global split. A global split is determined by the fact that the last generator in the list is not a sampleID generator. All generator lists must contain at least one generator with one exception - if a global split exists for which there are no generators but only modulators. The modulator lists can contain zero or more modulators. If a split other than the first split lacks a sampleID generator as its last generator, that split should be ignored. A global split with no modulators and no generators should also be ignored. If the IBAG subchunk is missing, or its size is not a multiple of four bytes, the file should be rejected as structurally unsound. 7.8 The IMOD Subchunk The IMOD subchunk is a required subchunk listing all instrument split modulators within the SoundFont compatible file. It is always a multiple of ten bytes in length, and contains zero or more modulators plus a terminal record according to the structure: struct sfModList { SFModulator sfModSrcOper; SFGenerator sfModDestOper; SHORT modAmount; SFModulator sfModAmtSrcOper; SFTransform sfModTransOper; }; The splitÕs wInstModNdx points to the first modulator for that split, and the number of modulators present for a split is determined by the difference between the next higher splitÕs wInstModNdx and the current splitÕs wModNdx. A difference of zero indicates there are no modulators in this split. The sfModSrcOper is a value of one of the SFModulator enumeration type values. Unknown or undefined values are ignored. This value indicates the source of data for the modulator. The sfModDestOper is a value of one of the SFGenerator enumeration type values. Unknown or undefined values are ignored. This value indicates the destination of the modulator. The SHORT modAmount is a signed value indicating the degree to which the source modulates the destination. A zero value indicates there is no fixed amount. The sfModAmtSrcOper is a value of one of the SFModulator enumeration type values. Unknown or undefined values are ignored. This value indicates the degree to which the source modulates the destination is to be controlled by the specified modulation source. The sfModTransOper is a value of one of the SFTransform enumeration type values. Unknown or undefined values are ignored. This value indicates that a transform of the specified type will be applied to the modulation source before application to the modulator. The terminal record conventionally contains zero in all fields, and is always ignored. A modulator is defined by its sfModSrcOper, its sfModDestOper, and its sfModSrcAmtOper. All modulators within a split must have a unique set of these three enumerators. If a second modulator is encountered with the same three enumerators as a previous modulator with the same split, the first modulator will be ignored. Modulators in the IMOD subchunk are absolute. This means that an IMOD modulator replaces, rather than adding to, a default modulator. In SoundFont 2.00, no modulators have yet been defined, and the IMOD subchunk will always consist of ten zero valued bytes. If the IMOD subchunk is missing, or its size is not a multiple of ten bytes, the file should be rejected as structurally unsound. 7.9 The IGEN Subchunk The IGEN chunk is a required chunk containing a list of split generators for each instrument split within the SoundFont compatible file. It is always a multiple of four bytes in length, and contains one or more generators for each split (except a globalsplit containing only modulators) plus a terminal record according to the structure: struct sfInstGenList { SFGenerator sfGenOper; genAmountType genAmount; }; where the types are defined as in the PGEN layer above. The genAmount is the value to be assigned to the specified generator. Note that this can be of three formats. Certain generators specify a range of MIDI key numbers of MIDI velocities, with a minimum and maximum value. Other generators specify an unsigned WORD value. Most generators, however, specify a signed 16 bit SHORT value. The splitÕs wInstGenNdx points to the first generator for that split. Unless the split is a global split, the last generator in the list is a ÒsampleIDÓ generator, whose value is a pointer to the sample associated with that split. If a Òkey rangeÓ generator exists for the split, it is always the first generator in the list for that split. If a Òvelocity rangeÓ generator exists for the split, it will only be preceded by a key range generator. If any generators follow a sampleID generator, they will be ignored. A generator is defined by its sfGenOper. All generators within a split must have a unique sfGenOper enumerator. If a second generator is encountered with the same sfGenOper enumerator as a previous generator with the same split, the first generator will be ignored. Generators in the IGEN subchunk are absolute in nature. This means that an IGEN generator replaces, rather than adding to, the default value for the generator. If the IGEN subchunk is missing, or its size is not a multiple of four bytes, the file should be rejected as structurally unsound. If a key range generator is present and not the first generator, it should be ignored. If a velocity range generator is present, and is preceded by a generator other than a key range generator, it should be ignored. If a non-global list does not end in a sampleID generator, the split should be ignored. If the sampleID generator value is equal to or greater than the terminal sampleID, the file should be rejected as structurally unsound. 7.10 The SHDR Subchunk The SHDR chunk is a required subchunk listing all samples within the smpl subchunk and any referenced ROM samples. It is always a multiple of forty six bytes in length, and contains one record for each sample plus a terminal record according to the structure: struct sfSample { CHAR achSampleName[20]; DWORD dwStart; DWORD dwEnd; DWORD dwStartloop; DWORD dwEndloop; DWORD dwSampleRate; BYTE byOriginalPitch; CHAR chPitchCorrection; WORD wSampleLink; SFSampleLink sfSampleType; }; The ASCII character field achSampleName contains the name of the sample expressed in ASCII, with unused terminal characters filled with zero valued bytes. Sample names are case-sensitive. A unique name should always be assigned to each sample in the SoundFont compatible bank to enable identification. However, if a bank is read containing the erroneous state of samples with identical names, the samples should not be discarded. They should either be preserved as read or preferentially uniquely renamed. The DWORD dwStart contains the index, in sample data points, from the beginning of the sample data field to the first data point of this sample. The DWORD dwEnd contains the index, in sample data points, from the beginning of the sample data field to the first of the set of 46 zero valued data points following this sample. The DWORD dwStartloop contains the index, in sample data points, from the beginning of the sample data field to the first datapoint in the loop of this sample. The DWORD dwEndloop contains the index, in sample data points, from the beginning of the sample data field to the first datapoint following the loop of this sample. Note that this is the data point Òequivalent toÓ the first loop datapoint, and that to produce portable artifact free loops, the eight proximal datapoints surrounding both the Startloop and Endloop points should be identical. The values of dwStart, dwEnd, dwStartloop, and dwEndloop must all be within the range of the sample data field included in the SoundFont compatible bank or referenced in the sound ROM. Also, to allow a variety of hardware platforms to be able to reproduce the data, the samples have a minimum length of 48 data points, a minimum loop size of 32 data points, and a minimum of 8 valid points prior to dwStartloop and after dwEndloop. Thus dwStart must be less than dwStartloop-7, dwStartloop must be less than dwEndloop-31, and dwEndloop must be less than dwEnd-7. If these constraints are not met, the sound may optionally not be played if the hardware cannot support artifact-free playback for the parameters given. The DWORD dwSampleRate contains the sample rate, in Hertz, at which this sample was acquired or to which it was most recently converted. Values of greater than 50000 or less than 400 may not be reproducable by some hardware platforms and should be avoided. A value of zero is illegal. If an illegal or impractical value is encountered, the nearest practical value should be used. The BYTE byOriginalPitch contains the MIDI key number of the recorded pitch of the sample. For example, a recording of an instrument playing middle C (261.62 Hz) should receive a value of 60. This value is used as the default Òroot keyÓ for the sample, so that in the example, a MIDI key-on command for note number 60 would reproduce the sound at its original pitch. For unpitched sounds, a conventional value of 255 should be used. Values between 128 and 254 are illegal. Whenever an illegal value or a value of 255 is encountered, the value 60 should be used. The CHAR chPitchCorrection contains a pitch correction in cents which should be applied to the sample on playback. The purpose of this field is to compensate for any pitch errors during the sample recording process. The correction value is that of the correction to be applied. For example, if the sound is 4 cents sharp, a correction bringing it 4 cents flat is required, thus the value should be -4. The value in sfSampleType is an enumeration with eight defined values: monoSample = 1, rightSample = 2, leftSample = 4, linkedSample = 8, RomMonoSample = 32769, RomRightSample = 32770, RomLeftSample = 32772, and RomLinkedSample = 32776. It can be seen that this is encoded such that bit 15 of the 16 bit value is set if the sample is in ROM, and reset if it is included in the SoundFont compatible bank. The four LS bits of the word are then exclusively set indicating mono, left, right, or linked. If the sound is flagged as a ROM sample and no valid ÒiromÓ subchunk is included, the file is structurally defective and should be rejected at load time. If sfSampleType indicates a mono sample, then wSampleLink is undefined and its value should be conventionally zero, but will be ignored regardless of value. If sfSampleType indicates a left or right sample, then wSampleLink is the sample header index of the associated right or left stereo sample respectively. Both samples should be played entirely syncrhonously, with their pitch controlled by the right sampleÕs generators. All non-pitch generators should apply as normal; in particular the panning of the individual samples to left and right should be accomplished via the pan generator. Left-right pairs should always be found within the same instrument. Note also that no instrument should be designed in which it is possible to activate more than one instance of a particular stereo pair. The linked sample type is not currently fully defined in the SoundFont 2 specification, but will ultimately support a circularly linked list of samples using wSampleLink. The terminal sample record is never referenced, and is conventionally entirely zero with the exception of achSampleName, which should be ÒEOSÓ indicating end of samples. All samples present in the smpl subchunk are typically referenced by an instrument, however a file containing any ÒorphanedÓ samples need not be rejected. SoundFont compatible applications can optionally ignore or filter out these orphaned samples according to user preference. If the SHDR subchunk is missing, or its is size is not a multiple of 46 bytes the file should be rejected as structurally unsound. 8 Enumerators 8.1 Generator Enumerators 8.1.1 Kinds of Generator Enumerators Five kinds of Generator Enumerators exist: Index Generators, Range Generators, Substitution Generators, Sample Generators, and Value Generators. An Index GeneratorÕs amount is an index into another data structure. The only two Index Generators are Instrument and sampleID. A Range Generator defines a range of note-on parameters outside of which the layer or split is undefined. Two Range Generators are currently defined, keyRange and velRange. Substitution Generators are generators which substitute a value for a note-on parameter. Two Substitution Generators are currently defined, overridingKeyNumber and overridingVelocity. Sample Generators are generators which directly affect a sampleÕs properties. These generators are undefined at the layer level. The currently defined Sample Generators are the eight address offset generators, the sampleModes generator, the Overriding Root Key generator and the Exclusive Class generator. Value Generators are generators whose value directly affects a signal processing parameter. Most generators are value generators. 8.1.2 Generator Enumerators Defined The following is an exhaustive list of SoundFont 2.00 generators and their strict definitions: 0 startAddrsOffset The offset, in sample data points, beyond the Start sample header parameter to the first sample data point to be played for this instrument. For example, if Start were 7 and startAddrOffset were 2, the first sample data point played would be sample data point 9. 1 endAddrsOffset The offset, in sample sample data points, beyond the End sample header parameter to the last sample data point to be played for this instrument. For example, if End were 17 and endAddrOffset were -2, the last sample data point played would be sample data point 15. 2 startloopAddrsOffset The offset, in sample data points, beyond the Startloop sample header parameter to the first sample data point to be repeated in the loop for this instrument. For example, if Startloop were 10 and startloopAddrOffset were -1, the first repeated loop sample data point would be sample data point 9. 3 endloopAddrsOffset The offset, in sample data points, beyond the Endloop sample header parameter to the sample data point considered equivalent to the Startloop sample data point for the loop for this instrument. For example, if Endloop were 15 and endloopAddrOffset were 2, sample data point 17 would be considered equivalent to the Startloop sample data point, and hence sample data point 16 would effectively precede Startloop during looping. 4 startAddrsCoarseOffset The offset, in 32768 sample data point increments beyond the Start sample header parameter and the first sample data point to be played in this instrument. This parameter is added to the startAddrsOffset parameter. For example, if Start were 5, startAddrOffset were 3 and startAddrCoarseOffset were 2, the first sample data point played would be sample data point 65544. 5 modLfoToPitch This is the degree, in cents, to which a full scale excursion of the Modulation LFO will influence pitch. A positive value indicates a positive LFO excursion increases pitch; a negative value indicates a positive excursion decreases pitch. Pitch is always modified logarithmically, that is the deviation is in cents, semitones, and octaves rather than in Hz. For example, a value of 100 indicates that the pitch will first rise 1 semitone, then fall one semitone. 6 vibLfoToPitch This is the degree, in cents, to which a full scale excursion of the Vibrato LFO will influence pitch. A positive value indicates a positive LFO excursion increases pitch; a negative value indicates a positive excursion decreases pitch. Pitch is always modified logarithmically, that is the deviation is in cents, semitones, and octaves rather than in Hz. For example, a value of 100 indicates that the pitch will first rise 1 semitone, then fall one semitone. 7 modEnvToPitch This is the degree, in cents, to which a full scale excursion of the Modulation Envelope will influence pitch. A positive value indicates an increase in pitch; a negative value indicates a decrease in pitch. Pitch is always modified logarithmically, that is the deviation is in cents, semitones, and octaves rather than in Hz. For example, a value of 100 indicates that the pitch will rise 1 semitone at the envelope peak. 8 initialFilterFc This is the cutoff and resonant frequency of the lowpass filter in absolute cent units. The lowpass filter is defined as a second order resonant pole pair whose pole frequency in Hz is defined by the Initial Filter Cutoff parameter. When the cutoff frequency exceeds 20kHz and the Q (resonance) of the filter is zero, the filter does not affect the signal. 9 initialFilterQ This is the height above DC gain in centibels which the filter resonance exhibits at the cutoff frequency. A value of zero or less indicates the filter is not resonant; the gain at the cutoff frequency (pole angle) may be less than zero when zero is specified. The filter gain at DC is also affected by this parameter such that the gain at DC is reduced by half the specified gain. For example, for a value of 100, the filter gain at DC would be 5 dB below unity gain, and the height of the resonant peak would be 10 dB above the DC gain, or 5 dB above unity gain. Note also that if initialFilterQ is set to zero or less and the cutoff frequency exceeds 20 kHz, then the filter response is flat and unity gain. 10 modLfoToFilterFc This is the degree, in cents, to which a full scale excursion of the Modulation LFO will influence filter cutoff frequency. A positive number indicates a positive LFO excursion increases cutoff frequency; a negative number indicates a positive excursion decreases cutoff frequency. Filter cutoff frequency is always modified logarithmically, that is the deviation is in cents, semitones, and octaves rather than in Hz. For example, a value of 1200 indicates that the cutoff frequency will first rise 1 octave, then fall one octave. 11 modEnvToFilterFc This is the degree, in cents, to which a full scale excursion of the Modulation Envelope will influence filter cutoff. A positive number indicates an increase in cutoff frequency; a negative number indicates a decrease in filter cutoff. Filter cutoff is always modified logarithmically, that is the deviation is in cents, semitones, and octaves rather than in Hz. For example, a value of 1000 indicates that the cutoff frequency will rise one octave at the envelope attack peak. 12 endAddrsCoarseOffset The offset, in 32768 sample data point increments beyond the End sample header parameter and the last sample data point to be played in this instrument. This parameter is added to the endAddrsOffset parameter. For example, if End were 65536, startAddrOffset were -3 and startAddrCoarseOffset were -1, the last sample data point played would be sample data point 32765. 13 modLfoToVolume This is the degree, in centibels, to which a full scale excursion of the Modulation LFO will influence volume. A positive number indicates a positive LFO excursion increases volume; a negative number indicates a positive excursion decreases volume. Volume is always modified logarithmically, that is the deviation is in decibels rather than in linear amplitude. For example, a value of 100 indicates that the volume will first rise ten dB, then fall ten dB. 14 unused1 Unused, reserved. Should be ignored if encountered. 15 chorusEffectsSend This is the degree, in 0.1% units, to which the audio output of the note is sent to the chorus effects processor. A value of 0% or less indicates no signal is sent from this note; a value of 100% or more indicates the note is sent at full level. Note that this parameter has no effect on the amount of this signal sent to the ÒdryÓ or unprocessed portion of the output. For example, a value of 250 indicates that the signal is sent at 25% of full level (attenuation of 12 dB from full level) to the chorus effects processor. 16 reverbEffectsSend This is the degree, in 0.1% units, to which the audio output of the note is sent to the reverb effects processor. A value of 0% or less indicates no signal is sent from this note; a value of 100% or more indicates the note is sent at full level. Note that this parameter has no effect on the amount of this signal sent to the ÒdryÓ or unprocessed portion of the output. For example, a value of 250 indicates that the signal is sent at 25% of full level (attenuation of 12 dB from full level) to the reverb effects processor. 17 pan This is the degree, in 0.1% units, to which the ÒdryÓ audio output of the note is positioned to the left or right output. A value of -50% or less indicates the signal is sent entirely to the left output and not sent to the right output; a value of +50% or more indicates the note is sent entirely to the right and not sent to the left. A value of zero places the signal centered between left and right. For example, a value of -250 indicates that the signal is sent at 75% of full level to the left output and 25% of full level to the right output. 18 unused2 Unused, reserved. Should be ignored if encountered. 19 unused3 Unused, reserved. Should be ignored if encountered. 20 unused4 Unused, reserved. Should be ignored if encountered. 21 delayModLFO This is the delay time, in absolute timecents, from key on until the Modulation LFO begins its upward ramp from zero value. A value of 0 indicates a 1 second delay. A negative value indicates a delay less than one second and a positive value a delay longer than one second. The most negative number (-32768) conventionally indicates no delay. For example, a delay of 10 msec would be 1200log2(.01) = -7973. 22 freqModLFO This is the frequency, in absolute cents, of the Modulation LFOÕs triangular period. A value of zero indicates a frequency of 8.176 Hz. A negative value indicates a frequency less than 8.176 Hz; a positive value a frequency greater than 8.176 Hz. For example, a frequency of 10 mHz would be 1200log2(.01/8.176) = -11610. 23 delayVibLFO This is the delay time, in absolute timecents, from key on until the Vibrato LFO begins its upward ramp from zero value. A value of 0 indicates a 1 second delay. A negative value indicates a delay less than one second; a positive value a delay longer than one second. The most negative number (-32768) conventionally indicates no delay. For example, a delay of 10 msec would be 1200log2(.01) = -7973. 24 freqVibLFO This is the frequency, in absolute cents, of the Vibrato LFOÕs triangular period. A value of zero indicates a frequency of 8.176 Hz. A negative value indicates a frequency less than 8.176 Hz; a positive value a frequency greater than 8.176 Hz. For example, a frequency of 10 mHz would be 1200log2(.01/8.176) = -11610. 25 delayModEnv This is the delay time, in absolute timecents, between key on and the start of the attack phase of the Modulation envelope. A value of 0 indicates a 1 second delay. A negative value indicates a delay less than one second; a positive value a delay longer than one second. The most negative number (-32768) conventionally indicates no delay. For example, a delay of 10 msec would be 1200log2(.01) = -7973. 26 attackModEnv This is the time, in absolute timecents, from the end of the Modulation Envelope Delay Time until the point at which the Modulation Envelope value reaches its peak. Note that the attack is ÒconvexÓ; the curve is nominally such that when applied to a decibel or semitone parameter, the result is linear in amplitude or Hz respectively. A value of 0 indicates a 1 second attack time. A negative value indicates a time less than one second; a positive value a time longer than one second. The most negative number (-32768) conventionally indicates instantaneous attack. For example, an attack time of 10 msec would be 1200log2(.01) = -7973. 27 holdModEnv This is the time, in absolute timecents, from the end of the attack phase to the entry into decay phase, during which the envelope value is held at its peak. A value of 0 indicates a 1 second hold time. A negative value indicates a time less than one second; a positive value a time longer than one second. The most negative number (-32768) conventionally indicates no hold phase. For example, a hold time of 10 msec would be 1200log2(.01) = -7973. 28 decayModEnv This is the time, in absolute timecents, for a 100% change in the Modulation Envelope value during decay phase. For the Modulation Envelope, the decay phase linearly ramps toward the sustain level. If the sustain level were zero, the Modulation Envelope Decay Time would be the time spent in decay phase. A value of 0 indicates a 1 second decay time for a zero sustain level. A negative value indicates a time less than one second; a positive value a time longer than one second. For example, a decay time of 10 msec would be 1200log2(.01) = -7973. 29 sustainModEnv This is the decrease in level, expressed in 0.1% units, over which the Modulation Envelope value ramps during the decay phase. For the Modulation Envelope, the sustain level is properly expressed in percent of full scale. Because the volume envelope sustain level is expressed as an attenuation from full scale, the sustain level is analogously expressed as a decrease from full scale. A value of 0 indicates the sustain level is full level; this implies a zero duration of decay phase regardless of decay time. A positive value indicates a decay to the corresponding level. Values less than zero are to be interpreted as zero; values above 1000 are to be interpreted as 1000. For example, a sustain level which corresponds to an absolute value 40% of peak would be 600. 30 releaseModEnv This is the time, in absolute timecents, for a 100% change in the Modulation Envelope value during release phase. For the Modulation Envelope, the release phase linearly ramps toward zero from the current level. If the current level were full scale, the Modulation Envelope Release Time would be the time spent in release phase until zero value were reached. A value of 0 indicates a 1 second decay time for a release from full level. A negative value indicates a time less than one second; a positive value a time longer than one second. For example, a release time of 10 msec would be 1200log2(.01) = -7973. 31 keynumToModEnvHold This is the degree, in timecent per keynumber units, to which the hold time of the Modulation Envelope is decreased by increasing MIDI key number. The hold time at key number 60 is always unchanged. The unit scaling is such that a value of 100 provides a hold time which tracks the keyboard; that is, an upward octave causes the hold time to halve. For example, if the Modulation Envelope Hold Time were - 7973 = 10 msec and the Key Number to Mod Env Hold were 50 when key number 36 was played, the hold time would be 20 msec. 32 keynumToModEnvDecay This is the degree, in timecent per keynumber units, to which the hold time of the Modulation Envelope is decreased by increasing MIDI key number. The hold time at key number 60 is always unchanged. The unit scaling is such that a value of 100 provides a hold time which tracks the keyboard; that is, an upward octave causes the hold time to halve. For example, if the Modulation Envelope Hold Time were - 7973 = 10 msec and the Key Number to Mod Env Hold were 50 when key number 36 was played, the hold time would be 20 msec. 33 delayVolEnv This is the delay time, in absolute timecents, between key on and the start of the attack phase of the Volume envelope. A value of 0 indicates a 1 second delay. A negative value indicates a delay less than one second; a positive value a delay longer than one second. The most negative number (-32768) conventionally indicates no delay. For example, a delay of 10 msec would be 1200log2(.01) = -7973. 34 attackVolEnv This is the time, in absolute timecents, from the end of the Volume Envelope Delay Time until the point at which the Volume Envelope value reaches its peak. Note that the attack is ÒconvexÓ; the curve is nominally such that when applied to the decibel volume parameter, the result is linear in amplitude. A value of 0 indicates a 1 second attack time. A negative value indicates a time less than one second; a positive value a time longer than one second. The most negative number (-32768) conventionally indicates instantaneous attack. For example, an attack time of 10 msec would be 1200log2(.01) = -7973. 35 holdVolEnv This is the time, in absolute timecents, from the end of the attack phase to the entry into decay phase, during which the Volume envelope value is held at its peak. A value of 0 indicates a 1 second hold time. A negative value indicates a time less than one second; a positive value a time longer than one second. The most negative number (-32768) conventionally indicates no hold phase. For example, a hold time of 10 msec would be 1200log2(.01) = -7973. 36 decayVolEnv This is the time, in absolute timecents, for a 100% change in the Volume Envelope value during decay phase. For the Volume Envelope, the decay phase linearly ramps toward the sustain level, causing a constant dB change for each time unit. If the sustain level were -100dB, the Volume Envelope Decay Time would be the time spent in decay phase. A value of 0 indicates a 1 second decay time for a zero sustain level. A negative value indicates a time less than one second; a positive value a time longer than one second. For example, a decay time of 10 msec would be 1200log2(.01) = -7973. 37 sustainVolEnv This is the decrease in level, expressed in centibels, over which the Volume Envelope value ramps during the decay phase. For the Volume Envelope, the sustain level is best expressed in cB of attenuation from full scale. A value of 0 indicates the sustain level is full level; this implies a zero duration of decay phase regardless of decay time. A positive value indicates a decay to the corresponding level. Values less than zero are to be interpreted as zero; conventionally 1000 indicates full attenuation. For example, a sustain level which corresponds to an absolute value 12dB below of peak would be 120. 38 releaseVolEnv This is the time, in absolute timecents, for a 100% change in the Volume Envelope value during release phase. For the Volume Envelope, the release phase linearly ramps toward zero from the current level, causing a constant dB change for each time unit. If the current level were full scale, the Volume Envelope Release Time would be the time spent in release phase until -100dB attenuation were reached. A value of 0 indicates a 1 second decay time for a release from full level. A negative value indicates a time less than one second; a positive value a time longer than one second. For example, a release time of 10 msec would be 1200log2(.01) = -7973. 39 keynumToVolEnvHold This is the degree, in timecent per keynumber units, to which the hold time of the Volume Envelope is decreased by increasing MIDI key number. The hold time at key number 60 is always unchanged. The unit scaling is such that a value of 100 provides a hold time which tracks the keyboard; that is, an upward octave causes the hold time to halve. For example, if the Volume Envelope Hold Time were -7973 = 10 msec and the Key Number to Vol Env Hold were 50 when key number 36 was played, the hold time would be 20 msec. 40 keynumToVolEnvDecay This is the degree, in timecent per keynumber units, to which the hold time of the Volume Envelope is decreased by increasing MIDI key number. The hold time at key number 60 is always unchanged. The unit scaling is such that a value of 100 provides a hold time which tracks the keyboard; that is, an upward octave causes the hold time to halve. For example, if the Volume Envelope Hold Time were -7973 = 10 msec and the Key Number to Vol Env Hold were 50 when key number 36 was played, the hold time would be 20 msec. 41 instrument This is the index into the INST subchunk providing the instrument to be used for the current layer. A value of zero indicates the first instrument in the list. The value should never exceed two less than the size of the instrument list. The instrument enumerator is the terminal generator for PGEN layers. As such, it should only appear in the PGEN subchunk, and it must appear as the last generator enumerator in all but the global layer. 42 reserved1 Unused, reserved. Should be ignored if encountered. 43 keyRange This is the minimum and maximum MIDI key number values for which this preset, layer, instrument or split is active. The LS byte indicates the highest and the MS byte the lowest valid key . The keyRange enumerator is optional, but when it does appear, it must be the first generator in the preset layer or instrument split generator list. 44 velRange This is the minimum and maximum MIDI velocity values for which this preset, layer, instrument or split is active. The LS byte indicates the highest and the MS byte the lowest valid velocity . The velRange enumerator is optional, but when it does appear, it must be preceded only by keyRange in the preset layer or instrument split generator list. 45 startloopAddrsCoarseOffset The offset, in 32768 sample data point increments beyond the Startloop sample header parameter and the first sample data point to be repeated in this instrumentÕs loop. This parameter is added to the startloopAddrsOffset parameter. For example, if Startloop were 5, startloopAddrOffset were 3 and startAddrCoarseOffset were 2, the first sample data point in the loop would be sample data point 65544. 46 keynum This enumerator forces the MIDI key number to effectively be interpreted as the value given. This generator caon only appear at the instrument level. Valid values are from 0 to 127. 47 velocity This enumerator forces the MIDI velocity to effectively be interpreted as the value given. This generator caon only appear at the instrument level. Valid values are from 0 to 127. 48 initialAttenuation This is the attenuation, in centibels, by which a note is attenuated below full scale. A value of zero indicates no attenuation; the note will be played at full scale. For example, a value of 60 indicates the note will be played at 6 dB below full scale for the note. 49 reserved2 Unused, reserved. Should be ignored if encountered. 50 endloopAddrsCoarseOffset The offset, in 32768 sample data point increments beyond the Endloop sample header parameter parameter to the sample data point considered equivalent to the Startloop sample data point for the loop for this instrument. This parameter is added to the endloopAddrsOffset parameter. For example, if Endloop were 5, endloopAddrOffset were 3 and endAddrCoarseOffset were 2, sample data point 65544 would be considered equivalent to the Startloop sample data point, and hence sample data point 65543 would effectively precede Startloop during looping. 51 coarseTune This is a pitch offset, in semitones, which should be applied to the note. A positive value indicates the sound is reproduced at a higher pitch; a negative value indicates a lower pitch. For example, a Coarse Tune value of -4 would cause the sound to be reproduced four semitones flat. 52 fineTune This is a pitch offset, in cents, which should be applied to the note. It is additive with coarseTune. A positive value indicates the sound is reproduced at a higher pitch; a negative value indicates a lower pitch. . For example, a Fine Tuning value of -5 would cause the sound to be reproduced five cents flat. 53 sampleID This is the index into the SHDR subchunk providing the sample to be used for the current split. A value of zero indicates the first sample in the list. The value should never exceed two less than the size of the sample list. The sampleID enumerator is the terminal generator for IGEN splits. As such, it should only appear in the IGEN subchunk, and it must appear as the last generator enumerator in all but the global split. 54 sampleModes This enumerator indicates a value which gives a variety of Boolean flags describing the sample for the current instrument split. The sampleModes should only appear in the IGEN subchunk, and should not appear in the global split. The two LS bits of the value indicate the type of loop in the sample: 0 indicates a sound reproduced with no loop, 1 indicates a sound which loops continuously, 2 is unused but should be interpreted as indicating no loop, and 3 indicates a sound which loops for the duration of key depression then proceeds to play the remainder of the sample. 55 reserved3 Unused, reserved. Should be ignored if encountered. 56 scaleTuning This parameter represents the degree to which MIDI key number influences pitch. A value of zero indicates that MIDI key number has no effect on pitch; a value of 100 represents the usual tempered semitone scale. 57 exclusiveClass This parameter provides the capability for a key depression in a given instrument to terminate the playback of other instruments. This is particularly useful for percussive instruments such as a hihat cymbal. An exclusive class value of zero indicates no exclusive class; no special action is taken. Any other value indicates that when this note is initiated, any other sounding note with the same exclusive class value should be rapidly terminated. The exclusive class generator can only appear at the instrument level. The scope of the exclusive class is the entire preset. In other words, any other instrument split within the same preset holding a corresponding exclusive class will be terminated. 58 overridingRootKey This parameter represents the MIDI key number at which the sample is to be played back at its original sample rate. If not present, or if present with a value of -1, then the sample header parameter Original Key is used in its place. If it is present in the range 0-127, then the indicated key number will cause the sample to be played back at its sample header Sample Rate. For example, if the sample were a recording of a piano middle C (Original Key = 60) at a sample rate of 22.050 kHz, and Root Key were set to 69, then playing MIDI key number 69 (A above middle C) would cause a piano note of pitch middle C to be heard. 59 unused5 Unused, reserved. Should be ignored if encountered. 60 endOper Unused, reserved. Should be ignored if encountered. Unique name provides value to end of defined list. 8.1.3 Generator Summary The following tables give the ranges and default values for all SoundFont 2.00 defined generators. # Name Unit Abs Zero Min Useful Max Useful Default Value 0 startAddrsOffset+ smpls 0 0 None * * 0 None 1 endAddrsOffset+ smpls 0 * * 0 None 0 None 2 startloopAddrsOffset+ smpls 0 * * * * 0 None 3 endloopAddrsOffset+ smpls 0 * * * * 0 None 4 startAddrsCoarseOffset+ 32k smpls 0 0 None * * 0 None 5 modLfoToPitch cent fs 0 -12000 -10 oct 12000 10 oct 0 None 6 vibLfoToPitch cent fs 0 -12000 -10 oct 12000 10 oct 0 None 7 modEnvToPitch cent fs 0 -12000 -10 oct 12000 10 oct 0 None 8 initialFilterFc cent 8.176 Hz 1500 20 Hz 13500 20 kHz 13500 Open 9 initialFilterQ cB 0 0 None 960 96 dB 0 None 10 modLfoToFilterFc cent fs 0 -12000 -10 oct 12000 10 oct 0 None 11 modEnvToFilterFc cent fs 0 -12000 -10 oct 12000 10 oct 0 None 12 endAddrsCoarseOffset+ 32k smpls 0 * * 0 None 0 None 13 modLfoToVolume cbB fs 0 -960 -96 dB 960 96 dB 0 None 15 chorusEffectsSend 0.1% 0 0 None 1000 100% 0 None 16 reverbEffectsSend 0.1% 0 0 None 1000 100% 0 None 17 pan 0.1% Center -500 Left +500 Right 0 Center 21 delayModLFO timecent 1 sec -12000 1 msec 5000 20 sec -12000 <1 msec 22 freqModLFO cent 8.176 Hz -16000 1 mHz 4500 100 Hz 0 8.176 Hz 23 delayVibLFO timecent 1 sec -12000 1 msec 5000 20 sec -12000 <1 msec 24 freqVibLFO cent 8.176 Hz -16000 1 mHz 4500 100 Hz 0 8.176 Hz 25 delayModEnv timecent 1 sec -12000 1 msec 5000 20 sec -12000 <1 msec 26 attackModEnv timecent 1 sec -12000 1 msec 8000 100sec -12000 <1 msec 27 holdModEnv timecent 1 sec -12000 1 msec 5000 20 sec -12000 <1 msec 28 decayModEnv timecent 1 sec -12000 1 msec 8000 100sec -12000 <1 msec 29 sustainModEnv -0.1% attk peak 0 100% 1000 0% 0 attk pk 30 releaseModEnv timecent 1 sec -12000 1 msec 8000 100sec -12000 <1 msec 31 keynumToModEnvHold tcent/key 0 -1200 -oct/ky 1200 oct/ky 0 None 32 keynumToModEnvDecay tcent/key 0 -1200 -oct/ky 1200 oct/ky 0 None 33 delayVolEnv timecent 1 sec -12000 1 msec 5000 20 sec -12000 <1 msec 34 attackVolEnv timecent 1 sec -12000 1 msec 8000 100sec -12000 <1 msec 35 holdVolEnv timecent 1 sec -12000 1 msec 5000 20 sec -12000 <1 msec 36 decayVolEnv timecent 1 sec -12000 1 msec 8000 100sec -12000 <1 msec 37 sustainVolEnv cB attn attk peak 0 0 dB 1440 144dB 0 attk pk 38 releaseVolEnv timecent 1 sec -12000 1 msec 8000 100sec -12000 <1 msec 39 keynumToVolEnvHold tcent/key 0 -1200 -oct/ky 1200 oct/ky 0 None 40 keynumToVolEnvDecay tcent/key 0 -1200 -oct/ky 1200 oct/ky 0 None 43 keyRange MIDI ky# key# 0 0 lo key 127 hi key 0-127 full kbd 44 velRange MIDI vel 0 0 min vel 127 mx vel 0-127 all vels 45 startloopAddrsCoarseOffset+ smpls 0 * * * * 0 None 46 keynum+ MIDI ky# key# 0 0 lo key 127 hi key -1 None 47 velocity+ MIDI vel 0 1 min vel 127 mx vel -1 None 48 initialAttenuation cB 0 0 0 dB 1440 144dB 0 None 50 endloopAddrsCoarseOffset+ smpls 0 * * * * 0 None 51 coarseTune semitone 0 -120 -10 oct 120 10 oct 0 None 52 fineTune cent 0 -99 -99cent 99 99cent 0 None 54 sampleModes+ Bit Flags Flags ** ** ** ** 0 No Loop 56 scaleTuning cent/key 0 0 none 1200 oct/ky 100 semitone 57 exclusiveClass+ arbitrary# 0 1 -- 127 -- 0 None 58 overridingRootKey+ MIDI ky# key# 0 0 lo key 127 hi key -1 None * Range depends on values of start, loop, and end points in sample header. ** Range has discrete values based on bit flags + This generator is only valid at the instrument split level. 8.2 Source Enumerators In SoundFont 2.00, no modulators have yet been defined, hence no Source Enumerators have yet been established. Future Source Enumerators will include MIDI key number, MIDI key velocity, MIDI pitch bend, MIDI channel pressure, and various MIDI continuous controllers. 8.3 Transform Enumerators In SoundFont 2.00, no modulators have yet been defined, hence no Transform Enumerators have yet been established. Future Transform Enumerators will include a linear, or ÒnullÓ transform, as enumerator 0, and other transforms as required to implement the current default modulators. 8.4 Default Modulators In SoundFont 2.00, although no modulators have yet been defined, certain default modulation paths are expected to be implemented as standard. In a future revision to SoundFont 2, these default modulators will be implemented within the modulator structure, allowing variations from the defaults. The current default modulators are described below: 8.4.1 MIDI Key Velocity to Initial Attenuation The MIDI key number is passed through a ÒMIDI velocity to dBÓ transform to change it to the standard dB volume attentuation curve for MIDI velocity. A mathematical model for this transform will be included in a future version of this specification. The transformed result is then added to the initial attenuation generator. 8.4.2 MIDI Key Velocity to Filter Cutoff The MIDI key number is passed through a ÒMIDI velocity to FcÓ transform. A mathematical model for this transform will be included in a future version of this specification. The resulting value is added to the initial filter cutoff. This default modulator does not occur unless the volume envelope attack time is less than 7 msec. 8.4.3 MIDI Channel Pressure to Vibrato LFO to Pitch The MIDI Channel Pressure value value is divided by 128 to give a scaled value from zero to 127/128. This value is then multiplied by 18 and added to the Vibrato to LFO Pitch generator to give an increase in vibrato depth of 18 cents full scale. 8.4.4 MIDI Continuous Controller 1 to Vibrato LFO to Pitch The MIDI Continuous Controller 1 value is divided by 128 to give a scaled value from zero to 127/128. This value is then multiplied by 18 and added to the Vibrato to LFO Pitch generator to give an increase in vibrato depth of 18 cents full scale. 8.4.5 MIDI Continuous Controller 7 to Initial Attenuation The MIDI Continuous Controller 7 value is passed through a ÒMIDI velocity to dBÓ transform to change it to the standard dB volume attentuation curve for MIDI velocity. A mathematical model for this transform will be included in a future version of this specification. The transformed result is then added to the initial attenuation generator. 8.4.6 MIDI Continuous Controller 10 to Pan Position The MIDI Continuous Controller 10 value is added to -64, and the result divided by 64 to produce a scaled value ranging from -1 to +63/64. This value is then multiplied by 1000 and added to the pan generator to give a change in pan position from -100% to +100%. 8.4.7 MIDI Continuous Controller 11 to Initial Attentuation The MIDI Continuous Controller 11 value is passed through a ÒMIDI expressionÓ transform to change it to the standard dB volume attentuation curve for MIDI CC11. A mathematical model for this transform will be included in a future version of this specification. The transfromed result is then added to the initial attenuation generator. 8.4.8 MIDI Continuous Controller 91 to Reverb Effects Send The MIDI Continuous Controller 91 value is divided by 128 to give a scaled value from zero to 127/128. This value is then multiplied by 1000 and added to the reverbEffectsSend generator to give a change in reverb send from zero to +100%. 8.4.9 MIDI Continuous Controller 93 to Chorus Effects Send The MIDI Continuous Controller 93 value is divided by 128 to give a scaled value from zero to 127/128. This value is then multiplied by 1000 and added to the chorusEffectsSend generator to give a change in chrous send from zero to +100%. 8.5 Precedence and Absolute and Relative values. Most SoundFont generators are available at both the Instrument and Preset Levels, as well as having a default value. Generators at the Instrument Level are considered ÒabsoluteÓ and determine an actual physical value for the associated synthesis parameter, which is used instead of the default. For example, an value of 1200 for the attackVolEnv generator would produce an absolute time of 1200 timecents or 2 seconds of attack time for the volume envelope, instead of the default value of -12000 timecents or 1 msec. Generators at the Preset Level are instead considered ÒrelativeÓ and additive to all the default or instrument level generators within the Preset Layer. For example, a value of 2400 timecents for the attackVolEnv generator in a preset layer containing an instrument with two splits, one with the default attackVelEnv and one with an absolute attackVolEnv generator value of 1200 timecents would cause the default split to actually have a value of -9600 timecents or 4 msec, and the other to have a value of 3600 timecents or 8 seconds attack time. There are some generators which are not available at the Preset Level. These are: # Name 0 startAddrsOffset 1 endAddrsOffset 2 startloopAddrsOffset 3 endloopAddrsOffset 4 startAddrsCoarseOffset 12 endAddrsCoarseOffset 45 startloopAddrsCoarseOffset 46 keynum 47 velocity 50 endloopAddrsCoarseOffset 54 sampleModes 57 exclusiveClass 58 overridingRootKey If these generators are encountered in the Preset Level, they should be ignored. 9 Parameters and Synthesis Model The SoundFont 2 standard has been established with the intent of providing support for an expanding base of wavetable based synthesis models. The model supported by the SoundFont 2 specification originates with the EMU8000 wavetable synthesizer chip. The description below of the underlying synthesis model and the associated parameters are provided to allow mapping of this synthesis model onto other hardware platforms. 9.1 Synthesis Model The SoundFont 2 specification Synthesis Model comprises a wavetable oscillator, a dynamic lowpass filter, an enveloping amplifier, and programmable sends to pan, reverb, and chorus effects units. An underlying modulation engine comprises two low frequency oscillators (LFOs) and two envelope generators with appropriate routing amplifiers. 9.1.1 Wavetable Oscillator The SoundFont 2 specificationWavetable oscillator model is capable of playing back a sample at an arbitrary sampling rate with an arbitrary pitch shift. In practice, the upward pitch shift (downward sample rate conversion) will be limited to a maximum value, typically at least two octaves. The pitch is described in terms of an initial pitch shift which is based on the sampleÕs sampling rate, the root key at which the sample should be unshifted on the keyboard, the coarse, fine, and correction tunings, the effective MIDI key number, and the keyboard scale factor. All modulations in pitch are in octaves, semitones, and cents. 9.1.2 Sample Looping The wavetable oscillator is playing a digital sample which is described in terms of a start point, end point, and two points describing a loop. The sound can be flagged as unlooped, in which case the loop points are ignored. If the sound is looped, in can be played in two ways. If it is flagged as Òloop during releaseÓ, the sound is played from the start point through the loop, and loops until the note becomes inaudible. If not, the sound is played from the start point through the loop, and loops until the key is released. At this point, the next time the loop end point is reached, the sound continues through the loop end point and plays until the end point is reached, at which time audio is terminated. 9.1.3 Lowpass Filter The synthesis model contains a resonant lowpass filter, which is characterized by a dynamic cutoff frequency and a fixed resonance (Q). Because there is tremendous variation within the industry as to filter implementations, this filter is idealized rather than being specified as a particular realization. The filter is idealized at zero resonance as having a flat passband to the cutoff frequency, then a rolloff at 6dB per octave above that frequency. The resonance, when non-zero, comprises a peak at the cutoff frequency, superimposed on the above response. The resonance is measured as a dB ratio of the resonant peak to the DC gain. The DC gain at any resonance is half of the resonance value below the DC gain at zero resonance; hence the peak height is half the resonance value above DC gain at zero resonance. All modulations in cutoff frequency are in octaves, semitones, and cents. Figure 1 - Ideal Filter Response 9.1.4 Final Gain Amplifier The final gain amplifier is a multiplier on the filter output, which is controlled by an initial gain in dB. This is added to the volume envelope. Additional modulation can also be added. The gain is always specified in dB. 9.1.5 Effects Sends The output of the final gain amplifier can be routed into the effects unit. This unit causes the sound to be located (panned) in the stereo field, and a degree of reverberation and chorus to be added. The pan is specified in terms of percentage left and right, which also could be considered as an azimuth angle. The reverb and chorus sends are specified as a percentage of the signal amplitude to be sent to these units, from 0% to 100%. 9.1.6 Low Frequency Oscillators The synthesis model provides for two low frequency oscillators (LFOs) for modulating pitch, filter cutoff, and amplitude. The ÒvibratoÓ LFO is only capable of modulating pitch. The ÒmodulationÓ LFO can modulate any of the three parameters. An LFO is defined as having a delay period during which its value remains zero, followed by a triangular waveshape ramping linearly to positive one, then downward to negative 1, then upward again to positive one, etc. Each parameter can be modulated to a varying degree, either positively or negatively, by the associated LFO. Modulations of pitch and cutoff are in octaves, semitones, and cents, while modulations of amplitude are in dB. The degree of modulation is specified in cents or dB for the full scale positive LFO excursion. 9.1.7 Envelope Generators The synthesis model provides for two envelope generators. The volume envelope generator controls the final gain amplifier and hence determines the volume contour of the sound. The modulation envelope can control pitch and/or filter cutoff. An envelope generates a control signal in six phases. When key-on occurs, a delay period begins during which the envelope value is zero. The envelope then rises in a convex curve to a value of one during the attack phase. When a value of one is reached, the envelope enters a hold phase during which it remains at one. When the hold phase ends, the envelope enters a decay phase during which its value decreases linearly to a sustain level. When the sustain level is reached, the envelope enters sustain phase, during which the envelope stays at the sustain level. Whenever a key-off occurs, the envelope immediately enters a release phase during which the value linearly ramps from the current value to zero. When zero is reached, the envelope value remains at zero. Modulation of pitch and filter cutoff are in octaves, semitones, and cents. These parameters can be mdoulated to varying degree, either positively or negatively, by the modulation envelope. The degree of modulation is specified in cents for the full scale attack peak. The volume envelope operates in dB, with the attack peak providing a full scale output, appropriately scaled by the initial volume. The zero value, however, is actually zero gain. The implementation in the EMU8000 provides for 96 dB of amplitude control. When 96 dB of attenuation is reached in the final gain amplifier, an abrupt jump to zero gain (infinite dB of attenuation) occurs. In a 16 bit system, this jump is inaudible. 9.1.8 Modulation Interconnection Summary The following diagram shows the interconnections expressed in the SoundFont 2 specification synthesis model: Figure 2 - Modulation Structure 9.2 MIDI Functions The response to certain MIDI commands is defined within the MIDI spec, and is not considered to be part of the SoundFont 2 specification. For completeness, the expected response is given here. MIDI key number to Pitch - Relative to the Root Key as determined by the sample header byOriginalPitch and the Instrument Split overridingRootKey, keynumber varies the pitch by one cent times the scaleTune generator per keynumber. MIDI Pitch Bend - Changes the pitch of all notes in the MIDI channel as specified by the sensitivity set by the MIDI synthesizer mode or by using MIDI Registered Parameters. MIDI CC64 Sustain - ACTIVE when greater than or equal to 64. When the sustain function is active, all notes in the key-on state remain in the key-on state regardless of whether a key-off command for the note arrives. The key-off commands are stored, and when sustain becomes inactive, all stored key-off commands are executed. MIDI CC66 Soft - ACTIVE when greater than or equal to 64. When active, all new key-ons are modulated in such a way to make the note sound Òsoft.Ó This typically affects initial attentuation and filter cutoff is a pre-defined manner. MIDI CC67 Sostenuto - ACTIVE when greater than or equal to 64. When sostenuto becomes active, all notes currently in the key-on state remain in the key-on state until the sostenuto becomes inactive. All other notes behave normally. Notes maintained by sostenuto in key-on state remain in key-on state even if sustain is switched on and off. 9.3 Parameter Units The units with which SoundFont generators are described are all well defined. The strict definitions appear below: ABSOLUTE SAMPLE DATA POINTS - A numeric index of 16 bit sample data point words as stored in ROM or supplied in the smpl-ck, indexing the first sample data point word of memory or the chunk as zero. RELATIVE SAMPLE DATA POINTS - A count of 16 bit sample data point words based on an absolute sample data point reference. A negative value implies a relative count toward the beginning of the data. ABSOLUTE SEMITONES - A absolute logarithmic measure of frequency based on a reference of MIDI key numbers. A semitone is 1/12 of an octave, and value 69 is 440 Hz (A-440). Negative values and values above 127 are allowed. RELATIVE SEMITONES - A relative logarithmic measure of frequency ratio based on units of 1/12 of an octave, which is the twelfth root of two, approximately 1.059463094. ABSOLUTE CENTS - An absolute logarithmic measure of frequency based on a reference of MIDI key number scaled by 100. A cent is 1/1200 of an octave, and value 6900 is 440 Hz (A-440). Negative values and values above 12700 are allowed. RELATIVE CENTS - A relative logarithmic measure of frequency ratio based on units of 1/1200 of an octave, which is the twelve hundredth root of two, approximately 1.000577790. ABSOLUTE CENTIBELS - An absolute measure of the attenuation of a signal, based on a reference of zero being no attentuation. A centibel is a tenth of a decibel, or a ratio in signal amplitude of the two hundredth root of 10, approximately 1.011579454. RELATIVE CENTIBELS - A relative measure of the attenuation of a signal. A centibel is a tenth of a decibel, or a ratio in signal amplitude of the two hundredth root of 10, approximately 1.011579454. ABSOLUTE TIMECENTS - An absolute measure of time, based on a reference of zero being one second. A timecent represents a ratio in time of the twelve hundredth root of two, approximately 1.011579454. RELATIVE TIMECENTS - A relative measure of time ratio, based on a unit size of the twelve hundredth root of two, approximately 1.011579454. ABSOLUTE PERCENT - An absolute measure of gain, based on a reference of unity. In SoundFont 2, absolute percent is measured in 0.1% units, so a value of zero is 0% and a value of 1000 is 100%. RELATIVE PERCENT - A relative measure of gain difference. In SoundFont 2, relative percent is measured in 0.1% units. When the gain goes below zero, zero is assumed; when the gain exceeds 100%, 100% is used. 9.4 On Implementation Accuracy While the SoundFont 2 standard is well defined, it must be recognized that there are a large variety of practices and features within the wavetable music synthesis industry that are not conducive to exact implementation of the specification as defined. Some examples of impediments include the order of interpolation of sample data points, the exact shape and number of segments of envelopes, the filter implementation, and the details of the implementation of loops. Additionally, all real implementations are likely to have less accuracy than the SoundFont 2 standard itself. The units for the standard have been chosen to exceed the accuracy required for high fidelity applications. It should be recognized that in rendering a SoundFont 2 compatible file, a best practical reproduction is all that is expected. As such, implementers of SoundFont 2 compatible rendering engines will have to determine based on their own perceptual criteria the degree to which their implementation meets the standard. Approximations may take a variety of forms. In many cases, the resolution of the rendering engine will be less than that of the corresponding SoundFont unit. Also, it will frequently be the case that a line segment approximation will be made to a continuous curve. In the case of filters, the order of the filter may vary from the SoundFont 2 standard, and an optimum audible equivalent will have to be heuristically constructed. All such problems are left to the ingenuity of the implementers. 10 Error Handling 10.1 Structural Errors Structural Errors are errors which are determined from the implicit redundancy of the SoundFont RIFF file structure, and indicate that the structure is not intact. Examples are incorrect lengths for the chunks or subchunks, pointers out of valid range, or missing required chunks or subchunks for which no error correction procedure exists. In all cases, files should be checked for structural errors at load time, and if any are found the files should be rejected. Separate tools or options can be used to ÒrepairÓ structurally defective files, but these tools should validate that the reconstructed file is not only a valid SoundFont compatible bank but also complies with the intended timbral results in all cases. 10.2 Unknown Chunks In parsing the RIFF structure, unknown but well formed chunks or subchunks may be encountered. Unknown chunks within the INFO-list chunk should simply be ignored. Other unknown chunks or subchunks are illegal and should be treated as structural errors. 10.3 Unknown Enumerators Unknown enumerators may be encountered in Generators, Modulator Sources, or Transforms. This is to be expected if the ifil field exceeds the specification to which the application was written. Even if unexpected, unknown enumerators should simply cause the associated Generator or Modulator to be ignored. 10.4 Illegal Parameter Values Some SoundFont parameters are defined for only a limited range of the possible values which can be expressed in their field. If the value of the field is not in the defined range, the parameter has an illegal value. Illegal values for may be detected either at load or at run time. If detected at load time, the file may optionally be rejected as structurally unsound. If detected at run time, the default value for the parameter should be used if the parameter is required, or the entire Generator or Modulator ignored if it is optional. Certain parameters may have more specific procedures for illegal values as expressed elsewhere in this specification. 10.5 Out-of-range Values SoundFont parameters have a specified minimum and useful range the span the perceptually relevant values for the associated sonic property. When the parameter value is exceeds this useful range, the parameter is said to have an out of range value. Out of range values can result from two distinct causes. An out of range value can be actually present as a SoundFont generator value, or the out of range value can be the result of the summation of instrument and preset values. Out of range values should be handled by substituting the nearest perceptually relevant or realizable value. SoundFont compatible banks should not be created with out of range values in the instrument generators. While it is acceptable practice to create SoundFont banks which produce out of range values as a result of summation, it is undesirable and should be avoided where practical. 10.6 Missing Required Parameter or Terminator Certain parameters and terminators are required by the SoundFont specification. If these are missing, the file is technically not within specification. If such a problem is detected at load time, the file may optionally be rejected as structurally unsound. If detected at run time, the instrument or split for which the required parameter is missing should simply be ignored. If this causes no sound, the coresponding key-on event is ignored. 10.7 Illegal enumerator Certain enumerators are illegal in certain contexts. For example, key and velocity ranges must be the first generators in a layer or split, instruments are not allowed in splits, and sampleIDs are not allowed in layers. If such a problem is detected at load time, the file may optionally be rejected as structurally unsound. If detected at run time, the enumerator should simply be ignored. 11 Silicon SoundFonts 11.1 Silicon SoundFont Overview A ÒSilicon SoundFont BankÓ is an implementation of a SoundFont compatible bank realized in non- volatile memory with slight format additions. On initialization of a system using a Silicon SoundFont, the host processor navigates the Silicon SoundFont ROM format in sample memory space, determines the number of SoundFont Banks installed, and, when appropriate, reads the articulation data of the SoundFont files out of the Preset Data Chunks into its local RAM. The sample headers in the Silicon SoundFont point to the sample address offsets relative to the start of the Sample Chunk in the SoundFont compatile bank. The loader adds the appropriate offset to the sample addresses as part of its data management. Then, the system operates like any other SoundFont compatible system. The format of a Silicon SoundFont file intended to be burned into non-volatile memory is a hybrid between a standard ROM header and a modification of the standard SoundFont compatible bank file format. The ROM header contains data used for diagnostic tests, a ROM name, a size, and checksum information, and a sine wave sample to test audio outputs of a circuit. This is the first block of data found in the SoundFont ROM (address 0). The structure of the data contained in the ROM header is shown below. Because sample mamory space is word oriented, the endian nature of the resulting word reads is processor independent. However, the ogranization of bytes within a word, or words within a doubleword may vary on both the way the data has been encoded in the ROM and the endian nature of the processor. To handle all eventualities, it is recommended that the initialization software recognize and adapt for endian variations. 11.2 Silicon SoundFont ROM Header Format typedef struct romHdrType{ DWORD romRsrc; // unused DWORD romByteSize; // ROM size in bytes CHAR interleaveIndex; // for use in case of interleaved ROMs CHAR revision[3]; // for revision control CHAR id[4]; // matched with the IROM chunk in SF file format SHORT checksum; // to check ROM integrity SHORT checksum2sComplement; // for updating checksum variable w/o changing file // checksum value CHAR bankFormat; // unused CHAR product[16]; // product name (either system or SoundFont) BYTE sampleCompType; // indicates type of sample precompensation used CHAR filler1[2]; // future use CHAR style[16]; // sound library style CHAR copyright[80]; // copyright notice DWORD sampleStart; // beginning byte address of the SoundFont bank DWORD sineWaveStart; // beginning byte address of the sine wave sample DWORD filler2[124]; // future use SHORT sineWave[SINEWAVESIZE]; // sine wave sample data } romHdr; 12 Glossary absolute - Describes a parameter which gives a definitive real-world value. Contrast to relative. additive - Describes a parameter which is to be numerically added to another parameter. articulation - The process of modulation of amplitude, pitch, and timbre to produce an expressive musical note. artifact - A (typically undesirable) sonic event which is recognizable as not being present in the original sound. attack - That phase of an envelope or sound during which the amplitude increases from zero to a peak value. attenuation - A decrease in volume or amplitude of a signal. AWE32 - The original Creative Technology Sound Blaster product which contained an EMU8000 wavetable synthesizer and supported the SoundFont standard. bag - A SoundFont data structure element containing a list of layers (preset bag) or splits (instrument bag). balance - A form of stereo volume control in which both left and right channels are at maximum when the control is centered, and which attenuates only the opposite channel when taken to either extreme. bank - A collection of presets. See also MIDI bank. bidirectional compatibility - Simultaneous upward and downward compatibility. This refers to the fact that a properly designed SoundFont compatible program can appropriately handle files written to either a lower or higher revision of the specification. big endian - Refers to the organization in memory of bytes within a word such that the most significant byte occurs at the lowest address. Contrast Òlittle endian.Ó byte - A data structure element of eight bits without definition of meaning to those bits. BYTE - A data structure element of eight bits which contains an unsigned value from 0 to 255. case-sensitive - Indicates that an ASCII character or string treats alphabetic characters of upper or lower case as identical. Contrast Òcase-sensitive.Ó case-sensitive - Indicates that an ASCII character or string treats alphabetic characters of upper or lower case as distinct. Contrast Òcase-insensitive.Ó cent - A unit of pitch ratio corresponding to the twelve hundredth root of two, or one hundredth of a semitone, approximately 1.000577790. centibel - A unit of amplitude ratio corresponding to the two hundredth root of ten, or one tenth of a decibel, approximately 1.011579454. CHAR - A data structure of eight bits which contains a signed value from -128 to +127. chorus - An effects processing algorithm which involves cyclically shifting the pitch of a signal and remixing it with itself to produce a time varying comb filter, giving a perception of motion and fullness to the resulting sound. chunk - The top level division of a RIFF file. convex - A curve which is bowed in such a way that it is steeper on its lower portion. cutoff frequency - The frequency of a filter function at which the attenuation reaches a specified value. data points - The individual values comprising a sample. Sometimes also called sample points. Contrast Òsample.Ó decay - The portion of an envelope or sound during which the amplitude declines from a peak to steady state value. decibel - A unit of amplitude ratio corresponding to the twentieth root of ten, approximately 1.122018454. delay - The portion of an envelope or LFO function which elapses from a key-on event until the amplitude becomes non-zero. destination - The generator to which a modulator is applied. DC gain - The degree of amplification or attentuation a system presents to a static or zero frequency signal. digital audio - Audio represented as a sequence of quantized values spaced evenly over time. The values are called Òsample data points.Ó doubleword - A data structure element of 32 bits without definition of meaning to those bits. downloadable - Said of samples which are loaded from a file into RAM, in contrast to samples which are maintained in ROM. dry - Refers to audio which has not received any effects processing such as reverb or chorus. DWORD - A data structure of 32 bits which contains an unsigned value from zero to 4,294,967,295. EMU8000 - A wavetable synthesizer chip designed by E-mu Systems for use in Creative Technology products. envelope - A time varying signal which typically controls the pitch, volume, and/or filter cutoff frequency of a note, and comprises multiple phases including attack, decay, sustain, and release. enumerated - Said of a data element whose symbols correspond to particular assigned functions. extensible - Said of a format whose feature set can be expanded without impact on existing function. flat - A. Said of a tone that is lower in pitch than another reference tone. B. Said of a frequency response that does not deviate significantly from a single fixed gain over the audio range. generator - In the SoundFont standard, a parameter which directly affects sound reproduction. Contrast with Òmodulator.Ó global - Refers to parameters which affect all associated structures. See Òglobal layerÓ and Òglobal split.Ó global layer - A layer whose generators and modulators affect all other layers within the preset. global split - A split whose generators and modulators affect all other splits within the instrument. header - A data structure element which describes several aspects of a SoundFont element. hydra - A. A nine-headed mythical beast. B. The nine ÒpdtaÓ subchunks which make up the SoundFont articulation data. instrument - In the SoundFont standard, a collection of splits which represents the sound of a single musical instrument or sound effect set. instrument split - A sample and associated articulation data defined to play over certain key numbers and velocities. Also simply called a split. interpolator - A circuit or algorithm which computes intermediate points between existing sample data points. This is of particular use in the pitch shifting operation of a wavetable synthesizer, in which these intermediate points represent the output samples of the waveform at the desired pitch transposition. key number - See MIDI key number. layer - A subset of a preset containing generators, modulators, and an instrument. Also termed Òpreset layer.Ó level - In the SoundFont structure, this refers either to the preset and layers (the preset level) or the instrument and splits (the instrument level). LFO - Acronym for Low Frequency Oscillator. A slow periodic modulation source. linear coding - The most common method of encoding amplitudes in digital audio in which each step is of equal size. little endian - A method of ordering bytes within larger words in memory in which the least significant byte is at the lowest address. Contrast Òbig endian.Ó loop - In wavetable synthesis, a portion of a sample which is repeated many times to increase the duration of the resulting sound. loop points - The sample data points at which a loop begins and ends. lowpass - Said of a filter which attenuates high frequencies but does not attenuate low frequencies. monotonic - Continuously increasing or decreasing. Said of a sequence which never reverses direction. MIDI - Acronym for Musical Instrument Digital Interface. The standard protocol for sending performance information to a musical synthesizer. MIDI bank - A group of up to 128 presets selected by a MIDI Òchange bankÓ command. MIDI continuous controller - A construct in the MIDI protocol. MIDI key number - A construct in the MIDI protocol which accompanies a MIDI key-on or key-off command and specifies the key of the musical instrument keyboard to which the command refers. MIDI pitch bend - A special MIDI construct akin to the MIDI continuous controllers which controls the realtime value of the pitch of all notes played in a MIDI channel. MIDI preset - A ÒpresetÓ selected to be active in a particular MIDI channel by a MIDI Òchange presetÓ command. MIDI velocity - A construct in the MIDI protocol which accompanies a MIDI key-on or key-off command and specifies the speed with which the key was pressed or released. modulator - In the SoundFont standard, a set of parameters which affect a particular generator. Contrast with Ògenerator.Ó mono - Short for Òmonophonic.Ó Indicates a sound comprising only one channel or waveform. Contrast with Òstereo.Ó octave - A factor of two in ratio, typically applied to pitch or frequency. orphan - Said of a data structure which under normal circumstances is referenced by a higher level, but in this particular instance is no longer linked. Specifically, it is an instrument which is not referenced by any preset layer, or a sample which is not referenced by any instrument split. oscillator - In wavetable synthesis, the wavetable interpolator is considered an oscillator. pan - Short for Òpanorama.Ó This is the control of the apparent azimuth of a sound source over 180 degrees from left to right. It is generally implemented by varying the volume at the left and right speakers. pitch - The perceived value of frequency. Generally can be used interchangably with frequency. pitch shift - A change in pitch. Wavetable synthesis relies on interpolators to cause pitch shift in a sample to produce the notes of the scale. pole - A mathematical term used in filter transform analysis. Traditionally in synthesis, a pole is equated with a rolloff of 6dB per octave, and the rolloff of a filter is specified in Òpoles.Ó Preditor - E-mu SystemsÕ proprietary SoundFont 2.00 compatible bank editing software. preset - A keyboard full of sound. Typically the collection of samples and articulation data associated with a particular MIDI preset number. preset layer - A subset of a preset containing generators, modulators, and an instrument. Also simply termed a Òlayer.Ó proximal - Closest to. Proximal sample data points are the data points closest in either direction to the named point. Q - A mathematical term used in filter transform analysis. Indicates the degree of resonance of the filter. In synthesis terminology, it is synonymous with resonance. RAM - Random Access Memory. Conventionally, this term implies read-write memory. Contrast ÒROM.Ó record - A single instrance of a data structure. relative - Describes a parameter which merely indicates an offset from an otherwise established value. Contrast to absolute. release - The portion of an envelope or sound during which the amplitude declines from a steady state to zero value or inaudibility. resonance - Describes the aspect of a filter in which particular frequencies are given significantly more gain than others. The resonance can be measured in dB above the DC gain. resonant frequency - The frequency at which resonance reaches its maximum. reverb - Short for reverberation. In synthesis, a synthetic signal processor which adds artificial spaciousness and ambience to a sound. RIFF - Acronym for Resource Interchange File Format. The recommended form for interchange files such as SoundFont compatible files within Microsoft operating systems. ROM - Acronym for Read Only Memory. A memory whose contents are fixed at manufacture, and hence cannot be written by the user. Contrast with RAM. sample - This term is often used both to indicate a Òsample data pointÓ and to indicate a collection of such points comprising a digital audio waveform. The latter meaning is exclusively used in this specification. sample rate - The frequency, in Hertz, at which sample data points are taken when recording a sample. semitone - A unit of pitch ratio corresponding to the twelfth root of two, or one twelfth of an octave, approximately 1.059463094. sharp - Said of a tone that is higher in pitch than another reference tone. SHORT - A data structure element of sixteen bits which contains a signed value from -32,768 to +32,767. soft - The pedal on a piano, so named because it causes the damper to be lowered in such a way as to soften the timbre and loudness of the notes. In MIDI, continuous controller #66 which behaves in a similar manner. sostenudo - The pedal on a piano which causes the dampers on all keys depressed to be held until the pedal is released. In MIDI, continuous controller #67 which behaves in a similar manner. sustain - The pedal on a piano which prevents all dampers on keys as they are depressed from being released. In MIDI, continuous controller #64 which behaves in a similar manner. SoundFont - A registered trademark of E-mu Systems, Inc, indicating files produced by E-mu which conform to the SoundFont standard file format. source - In a SoundFont modulator, the enumerator indicating the particular realtime value which the modulator will transform, scale, and add to the destination generator. split - A sample and associated articulation data defined to play over certain key numbers and velocities. Also called an instrument split. stereo - Literally indicating three dimensions. In this specification, the term is used to mean two channel stereophonic, indicating that the sound is composed of two independent audio channels, dubbed left and right. Constrast monophonic. subchunk - A division of a RIFF file below that of the chunk. synthesis engine - The hardware and software associated with the signal processing and modulation path for a particular synthesizer. synthesizer - A device capable of producing ideally arbitrary musical sound. terminator - A data structure element indicating the final element in a sequence. timecent - A unit of duration ratio corresponding to the twelve hundredth root of two, or one twelve hundredth of an octave, approximately 1.000577790. transform - In a SoundFont modulator, the enumerator indicating the particular transfer function through which the source will be passed prior to scaling and addition to the destination generator. tremolo - A periodic change in amplitude of a sound, typically produced by applying a low frequency oscillator to the final volume amplifier. triangular - A waveform which ramps upward to a positive limit, then downward at the opposite slope to the symmetrically negative limit periodically. unpitched - Said of a sound which is not characterized by a perceived frequency. This would be true of noise-like musical instruments and of many sound effects. velocity - In synthesis, the speed with which a keyboard key is depressed, typically proportionally to the impact delivered by the musician. See also MIDI velocity. vibrato - A periodic change in the pitch of a sound, typically produced by applying a low frequency oscillator to the oscillator pitch. volume - The loudness or amplitude of a sound, or the control of this parameter. wavetable - A music synthesis technique wherein musical sounds are recorded or computed mathematically and stored in a memory, then played back at a variable rate to produce the desired pitch. Additional timbral adjustments are often made to the sound thus produced using amplifiers, filters, and effect processing such as reverb and chorus. WORD - A data structure of 16 bits which contains an unsigned value from zero to 65,535. word - A data structure element of 16 bits without definition of meaning to those bits. SoundFon 2.00a Technical Specification - Page 1 - Printed 12/4/95 at 13:32 Document converted to plain ASCII for inclusion in Wotsit's Format +-------------------------------------------------------------+ | + + +--- +----+ +----+ +----+ +----+ +----| +----+ | +----+ | +--- | | +--- | | | | | | | | | ---+ | | | | | | | | | | | | | | | + +----+ + +----+ + + +----+ +----+ + +----+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ------------ COMPOSER + + | -+- + | +--+ -+- +--| + +--+ + | +--+ | | +--+ |- + +--+ + | + + +--+ + +--| + + +--+ + + + + + + + + + + +--+ + + 93! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Format Specifications!! by Daniel Potter ----------------------------------------------------------------------------- Well, as you probably know, this is a pretty exciting program already. However, it would be nothing to a lot of people who would use it in their games, demos, etc, if I did not include the format specs. Besides that, I intend that this format, which although is CERTAINLY not the most efficient (basically a dump of the internals of the composer's editing mem), they will perhaps serve as a standard 16 channel format, with ease of use on the level of a MOD. Remember that this format is for EDITING purposes (storing EVERYTHING you're working on) so it may include information not completely neccessary. You can even see into the last moments of creation of the song through some of these variables :) You may process this info as you see fit, such as the scrolltext, which is not even supported in the current version of the composer. You could simply display it on the screen, or you could be creative, and have a scroller at the top of the screen while it's playing (that's the idea, for things like musicdisks). Farandole .FAR file (16 channel tracker) format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Header Note that with the way the file magic(s) are set up, you can see the name of the song by TYPEing it from DOS. For example, if the song name was "StarLit MoonRise" then you would see when you typed it out: FARþStarLit MoonRise In case you're teeming over with all the anxiety of all the wasted information in the header, just think back to the last time you saw a tracker that saved every info about what you were doing last. Think of it as a project file in Borland C. len desc ----------------------------------------------------------------------------- 4 "FARþ" (file magic) 40 Song name 3 13,10,26 (bytes) (end of file chars) 2 Remaining length of header in bytes 1 Version number, major/minor upper/lower nybbles (0x10) 16 Channel ON/OFF map 1 Current editing octave 1 Current editing voice 1 Current editing row 1 Current editing pattern 1 Current editing order 1 Current editing sample 1 Current editing volume 1 Current top of screen display (top row visible) 1 Current editing area of the screen (0=sample,1=pattern,2=order) 1 Current tempo (default tempo) 16 Panning map, from 0-F for each channel 1 MarkTop (block) 1 MarkBot (block) 1 Grid granularity (default 4) 1 Edit Mode 2 Song text length (above) Song text of length STLen (field above) 256 Order bytes 1 Number of patterns stored in the file 1 Length of order used 1 LoopTo location 2*256 Length in bytes of each pattern - Determine number of rows stored for this pattern with this formula: Rows=((PatSize-2)/(16*4)) The -2 will be explained below. If the file is of a newer format than this one, then there might be extra stuff down here. The original header here will NEVER have anything new inserted before this space, to maintain a somewhat compatible file. The original header, described about will always be 869 bytes long+SongText len. So you should seek up HdLen-(869+STLen) bytes after reading the header in case there is more. Patterns len desc ---------------------------------------------------------------------------- 1 Break location in the pattern (length in rows) 1 Tempo for this pattern. ** THIS VALUE IS *NOT* USED ANYMORE!!! ** DO NOT SUPPORT IT!! Rows*16*4 Pattern data- len desc -------------------------------------------- 1 Note value - (Octave*12+Note)+1 or 0 for no note 1 Instrument/sample value 1 Volume - reversed byte. MSN is stored as LSN, LSN as MSN. This is for compatibility purposes. Basically, the lower nybble is the major volume adjust, upper nybble for minor adjust. 1 Effect, upper nybble is effect, lower nybble is parameter Current no provisions are made in this format to remove unused channels from the file. Sample Map This is an array of 0-7 (8 bytes) which is a set of 64 flags. Each flag corresponds to a sample, and these flags are packed into bytes. If the bit is set, the sample record IS stored in the file. Otherwise, it is not (and therefore should NOT be read). You can check to see if its present like this: if (SMap[SampleNumber/8] & (1<<(SampleNumber%8))) ReadSample(SampleNumber); Now that I think about it, I wonder why I didn't just store all the samples that are used up to the last used one? Who knows.. I was tired that night :) Samples/records All samples are stored just like they are in FSM format on disk. Each one is header-data-header-data, etc. Here is the header format: len desc --------------------------------------------------------------------------- 32 Name of sample 4 Length of sample (currently only support up to 64k samples) 1 Finetune (also not supported right now) 1 Volume ... yet another unsupported feature 4 Repeat Start 4 Repeat End - If the sample is looping, this should be set to the repeat end value. Otherwise, it should be set to the length of the sample. 1 Sample Type byte 1<<0 8/16 bit (8=0 16=1) 1 Looping mode byte 1<<3 On=looped, Off=not looped (len) Sample data in SIGNED format Info on playing- Here are how you generate the various FX: FEKT Hex# How! ---------------------------------------------------------------------------- Tempo 0xf? Notes per second is 32/Tempo. Pitch Adjust 0x1? Add ?*4 values to the value you're sending to the GUS. This is based on 16 channels. If you're using more or less, then you will have to calculate the pitch through this proportion: x ? -- = --- 16 chn which simplifies to chn*x=16*? or 16*? x=---- chn where ? is the amount, chn is the # of channels and x is the amount you add to the pitch value. Note that this effect and the one below are CUMULATIVE. Pitch Adjust 0x2? Do the same as above, except subtract from the val Fine Tempo up 0xe? Add this number to the current interrupt calls per second. Sorry, I could not figure out any other way to do it. My tempos are based on a system of 128/Tempo for finer control of other things, so this value would be added to that number instead of 32/x. So again, solve the proportion. Fine Tempo dn 0xd? Same as above, but subtract from tempo Fine Tempo cnl 0xe? or Cancel fine tempo; revert ints/sec to normal value 0xd? for current tempo Port to Note 0x3? Slide from current pitch to the pitch specified on the line where the command is issued. The parameter tells in how many rows the pitch should have gotten to the destination. You can use this equation to figure a standard increment: P ---------- intSpeed*? Where P is the pitch, intSpeed is the interrupt speed, and ? is the effect parameter. Of course an integer is not enough precision to store the increment most of the time. Retrigger 0x4? Repeat the current note ? times in this bar. If a drum is issued as the note, and the parameter is 0x42 then the drum should be played 2 times that bar, in evenly spaced intervals. Set VibDepth 0x5? Set vibrato depth. Actually, in Farandole this value is used to generate a new sin table; perhaps not the most efficient way to do it, but what the hell. The table is generated using this equation: f(x)=sin(2*pi*f*t)*a ..where a is the value for the effect and f=1. Vibrato note 0x6? Vibrato this note. Although it goes away if you stop using it, this effect when used repedeatly actually just tells FAR to continue the previous vibrato, which may span several notes depending on how large it is. Vibrato Sust 0x9? Is the same as above, but it doesn't stop until you reach a 0x90 command VolSld Up 0x7? Pushes the volume up one notch (0-F) VolSld Dn 0x8? Same as above, but it goes down Port To Vol 0xA? This uses the same method as the Port-To-Note command, but it acts on volume Note Offset 0xC? Pretend that you're doing an F-Retrigger command (0x4F). What you do is blank out all the notes in the retrig except the one specified here. * Info specific to above commands in Farandole: The way I handle it is like this: My interrupt is based around 128 times a second, so I generate a table where the x domain is 0..127. You then use the ? value from the 0x6 command to skip through the table, where the table increment is ?*6. You should keep looping through the table until the vibrato commands go away. See FARTRAK.CPP for more details. For more info, please see the sample code, FARTRAK.CPP (again, like FARLOAD.CPP this is code straight from the composer. So if there is descrepency, the code is correct, not this doc.) Farandole .FSM Sample/instrument format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Header This format is almost identical to the one described above for samples in .FAR files. Note also that this format is set up like .FAR, where you can type out the file to see the long name for it. len desc ---------------------------------------------------------------------------- 4 "FSMþ" - File magic 32 Sample name 3 (10,13,26) 4 Length of sample (currently only support up to 64k samples) 1 Finetune (also not supported right now) 1 Volume ... yet another unsupported feature 4 Repeat Start 4 Repeat End - If the sample is looping, this should be set to the repeat end value. Otherwise, it should be set to the length of the sample. 1 Sample Type byte 1<<0 8/16 bit (8=0 16=1) 1 Looping mode byte 1<<3 On=looped, Off=not looped (len) Sample data in SIGNED format Farandole .USM Sample/instrument format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (len) Sample data in UNSIGNED format Farandole .FPT Pattern format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ len desc --------------------------------------------------------------------------- 4 "FPTþ" File magic 32 Pattern Name 3 (10,13,26) 2 PatStore array length (PatSize)(Total remaining length of file) 1 Break Location 1 Unused PatSize-2 Pattern in raw format (just like in .FAR file) Farandole .F2R Linear module (2.0) format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (file imported) ------------------------------------------------------------------------------ F2R (Farandole Form2.0) linear-layout digital music specifications. By Daniel Potter/Digital Infinity. This is the internal format we use for writing demos. Currently there are these versions of the F2R playing code: A) A full C++ version that is rather slow, but plays all effects correctly. B) A full ASM version that is fast, although in real mode, and plays almost no effects C) A full ASM version in protected mode that has been extremely optimized and plays almost no effects (working on it). These will all be available when I see that they are fit for the public eye.. You will see in a moment why I call it a linear-layout format. Everything can be read easily in one pass without mixing up too many variables. (Ex: sample data is stored with sample headers..) Header A -------- len description --- ----------- 3 'F2R' - file magic 3 Composer magic. Only existing one right now is 'FAR' (farandole) 40 Song name in ASCIIZ 2 Songtext length (in bytes) STLen Songtext (length in previous field) 1 Song version. Current version is 0x20 (2.0) 1 Number of channels. Probly not more than 16, but up 256. 1 Default tempo, in ticks per second. NChan Default panning for each channel (length in NOC field above) 1 Number of samples saved in file. Sample Structures ----------------- len description --- ----------- 32 Sample name in ASCIIZ 4 Sample length (PC dword) 1 FineTune. Not currently supported. 1 Volume. Also has no purpose currently. 4 Repeat START (PC dword) 4 Repeat END (PC dword) (note that this is NOT repeat LENGTH) 1 Sample type. bit 0=1->16 bit data Len Sample data in signed format (length=SLen field above) This structure repeats for the number of samples stored in the file. Header B -------- len description --- ----------- 3 SectionID - 'JDC' - (see below comment) 1 Order length 1 Number of patterns stored in file 1 Loop To value (order index) 128 Order table. Blank entries padded with 0xFF Pattern Structure ----------------- len description --- ----------- 3 SectionID - 'JDC' 2 Number of events stored in this pattern 4 Length of pattern in bytes (starting with next byte) What remains is an event for each thing that is to happen on any channel. This eliminates the need for saving blank data, and thus this is currently the most efficient digital format out. Here's the format of each event: len description --- ----------- 1 Event type. Each bit denotes a bit of information included: bit description --- ----------- 0 New note pitch 1 New instrument value 2 Start a new note 3 New volume 4 Effect (normal effect) 5 Extended effect 1 Channel Each of the follow is included only if the appropriate bit is set: 1 ET0-(Octave*12)+Note 1 ET1-Sample number 1 ET3-Volume (0-FF) 2 ET4-effect #+effect data 1 1 ET5-effect data 2 1 Eventtick - number of ticks to wait before processing next event The above structure repeats for NumEvents (in pattern header) and the entire pattern structure continues until all patterns are saved. Effects are standard Farandole Composer effects. Here is a list in case you do not know them: 0- No effect 1- Slide pitch up 2- Slide pitch down 3- Slide to pitch * 4- Retrigger 5- Set vibrato amplitude 6- Vibrato current note with given wavelength 7- Volume slide up 8- Volume slide down 9- Sustained vibrato A- Slide to volume * B- Set panning C- Note offset D- Fine tempo down E- Fine tempo up F- Set tempo *-extended effect (ET5) For more info on how these work and how they are implemented, please see the Farandole documentation, FORMATS.DOC. In the case of the extended effects, the second parameter is what is being slid to. ie pitch slide, parameter two is the pitch to slide to, and vol slide, is the volume to slide to. Note about SectionID: I had a tremendous trouble debugging originally because since all the data in the file practically looked like garbage, there was no way to tell what was going on. What this does is provide the program a way to gauge if the file is valid. If you ever read a section ID and it is not 'JDC' verbatim you should stop reading the file and declare it invalid. This format was provided as a service to the general demo/music/game community. It may be used for any purpose, however if you use my format I would like to at least be greeted or credited or something.. whatever you feel is appropriate. Good luck! Daniel Potter/DI Apr 13, 1994 ---------------------------------------------------------------------------- FSM Format Intel byte order Information from File Format List 2.0 by Max Maischein. --------!-CONTACT_INFO---------------------- If you notice any mistakes or omissions, please let me know! It is only with YOUR help that the list can continue to grow. Please send all changes to me rather than distributing a modified version of the list. This file has been authored in the style of the INTERxxy.* file list by Ralf Brown, and uses almost the same format. Please read the file FILEFMTS.1ST before asking me any questions. You may find that they have already been addressed. Max Maischein Max Maischein, 2:244/1106.17 Max_Maischein@spam.fido.de corion@informatik.uni-frankfurt.de Corion on #coders@IRC --------!-DISCLAIMER------------------------ DISCLAIMER: THIS MATERIAL IS PROVIDED "AS IS". I verify the information contained in this list to the best of my ability, but I cannot be held responsible for any problems caused by use or misuse of the information, especially for those file formats foreign to the PC, like AMIGA or SUN file formats. If an information it is marked "guesswork" or undocumented, you should check it carefully to make sure your program will not break with an unexpected value (and please let me know whether or not it works the same way). Information marked with "???" is known to be incomplete or guesswork. Some file formats were not released by their creators, others are regarded as proprietary, which means that if your programs deal with them, you might be looking for trouble. I don't care about this. -------------------------------------------- The .FSM files are samples to be used for module style music with the Fandarole Composer. Currently only samples of up to 64K length are supported, altough the header reserves a dword for the sample size. OFFSET Count TYPE Description 0000h 4 char ID='FSM',254 0004h 32 char ASCII name of sample 0024h 3 char ID=10,13,26 0027h 1 dword Length of sample (<=64K) 0028h 1 byte Fine tune value for sample (currently unsupported) 0029h 1 byte Sample volume (currently unsupported) 002Ah 1 dword Start of sample loop 002Dh 1 dword End of sample loop. If the sample is not set to loop (see below) this should be set to the end of the sample. 0032h 1 byte Sample type, bitmapped 0 - 8-bit/16-bit sample 1-7 - reserved 0033h 1 byte Loop mode, ?bit mapped? 0-2 - reserved 3 - loop off/loop on 4-7 - reserved 0034h ? byte Sample data in signed format EXTENSION:FSM OCCURENCES:PC PROGRAMS:Fandarole Composer REFERENCE: SEE ALSO:FAR,USM VALIDATION: 8SVX IFF 8-Bit Sampled Voice ============================ 1. Introduction --------------- This is the IFF supplement for FORM "8SVX". An 8SVX is an IFF "data section" or "FORM" (which can be an IFF file or a part of one) containing a digitally sampled audio voice consisting of 8-bit samples. A voice can be a one-shot sound or - with repetition and pitch scaling - a musical instrument. The 8SVX format is designed for playback hardware that uses 8-bit samples attenuated by a volume control for good overall signal-to-noise ratio. So a FORM 8SVX stores 8-bit samples and a volume level. A similar data format (or two) will be needed for higher resolution samples (typically 12 or 16 bits). Properly converting a high resolution sample down to 8 bits requires one pass over the data to find the minimum and maximum values and a second pass to scale each sample into the range -128 to 127 (signed byte). So it's reasonable to store higher resolution data in a different FORM type and convert between them. For instruments, FORM 8SVX can record a repeating waveform optionally preceded by a startup transient waveform. These two recorded signals can be pre0synthesized or sampled from an acoustic instrument. For many instruments, this representation is compact. FORM 8SVX is less practical for an instrument whose waveform changes from cycle to cycle like a plucked string, where a long sample is needed for accurate results. FORM 8SVX can store an "envelope" or "amplitude contour" to enrich musical notes. A future voice FORM could also store amplitude, frequency, and filter modulations. FORM 8SVX is geared for relatively simple musical voices, where one waveform per octave is sufficient, the waveforms for the different octaves follow a factor-of-two size rule, and one envelope is adequate for all octaves. You could store a more general voice as a LIST containing one or more FORMs 8SVX per octave. A future voice FORM could go beyond one "one-shot" waveform and one "repeat" waveform per octave. 2. Standard Data and Property Chunks ------------------------------------ FORM 8SVX stores all the waveform data in one body chunk "BODY". It stores playback parameters in the required header chunk "VHDR" and any optional property chunks "NAME","(c) ", and "AUTH" must all appear before the BODY chunk. Any of these properties may be shared over a LIST of FORMs 8SVX by putting them in a PROP 8SVX. - Background There are two ways to use FORM 8SVX: as a one-shot sampled sound or as a sampled musical instrument that plays "notes" (as for a MOD or SMUS format). Storing both kinds of sounds in the same kind of FORM makes it easy to play a one-shot as an instrument or vice-versa. A one-short sound is a series of audio data samples with a nominal playback rate and amplitude. The recipient program can optionally adjust or modulate the amplitude and playback data rate. For musical instruments, the idea is to store a sampled (or pre-synthesized) waveform that will be parameterized by pitch, duration, and amplitude to play each "note". The creator of the FORM 8SVX can supply a waveform per octave over a range of octaves for this purpose. The intent is to perform a pitch by selecting the closest octave's waveform and scaling the playback data rate. An optional "one-shot" waveform supplies an arbitrary startup transient, then a "repeat" waveform is iterated as long as necessary to sustain the note. A FORM 8SVX can also store an envelope to modulate the waveform. Envelopes are mostly useful for variable-duration notes but could be used for one-shot sounds too. The FORM 8SVX standard has some restrictions. For example, each octave of data must be twice as long as the next higher octave. Most sound driver software and hardware imposes additional restrictions. E.g. the Amiga sound hardware requires an even number of samples in each one-shot and repeat waveform. - Required Property VHDR The required property "VHDR" holds a Voice8Header structure as defined in these C declarations and following documentation. This structure holds the playback parameters for the sampled waveforms in the BODY chunk (see below): #define ID_8SVX MakeID('8','S','V','X') #define ID_VHDR MakeID('V','H','D','R') /* A fixed-point value, 16 bits to the left of the point and 16 to the right. A Fixed is a number of 2^16-ths, i.e. 65536ths */ typedef LONG Fixed; #define Unity 0x10000L /* Unity = Fixed 1.0 = maximum volume */ /sCompression: Choice of compression algorithm applied to the samples */ #define sCmpNone 0 /* not compressed */ #define sCmpFibDelta 1 /* Fibonacci-Delta encoding */ typedef struct { ULONG oneShotHiSamples, /* # samples in the high octave 1-shot part */ repeatHiSamples, /* # samples in the high octave repeat part */ samplesPerHiCycle; /* # samples/cycle in high octave, else 0 */ UWORD samplesPerSec; /* data sampling rate */ UBYTE ctOctave, /* # octaves of waveforms */ sCompression; /* data compression technique used */ Fixed Volume; /* playback volume from 0 to Unity */ } Voice8Header; A FORM 8SVX holds waveform data for one or more octaves, each containing a one-shot part and a repeat part. The fields 'oneShotHiSamples' and 'repeatHiSamples' tell the number of audio samples in the two parts of the highest frequency octave. Each successive (lower frequency) octave contains twice as many data samples in both its one-shot and repeat parts. One of these two parts can be empty across all octaves. The field 'samplesPerHiCycle' tells the number of samples/cycle in the highest frequency octave of data, or else 0 for "unknown". Each successive octave contains twice as many samples/cycle. This field is needed to compute the data rate for a desired playback pitch. Actually, 'samplesPerHiCycle' is an average number of samples/cycle. If the one-shot part contains pitch bends, store the samples/cycle of the repeat part in 'samplesPerHiCycle'. The division 'repeatHiSamples'/'samplerPerHiCycle' should yield an integer number of cycles. The field 'samplesPerSec' gives the sound sampling rate. A program may adjust this to achieve frequency shifts or vary it dynamically to achieve pitch bends and vibrato. The field 'ctOctave' tells how many octaves of data are stored in the BODY chunk. The field 'sCompression' indicates the compression scheme, if any, that was applied to the entire set of data samples stored in the BODY chunk. Note that the whole series of data samples is compressed as a unit. The field 'volume' gives an overall playback volume for the waveforms (all octaves). It lets the 8-bit data samples use the full range -128 through 127 for good signal-to-noise ratio. The playback program should multiply this value by a "volume control" and perhaps by a playback envelope. - Optional Text Chunks NAME,(C) ,AUTH,ANNO Several text chunks may be included in a FORM 8SVX to keep ancillary information. The optional property "NAME" names the voice (or instrument), for instance "tubular bells". The optional property "(c) " holds a copyright notice for the voice. The Chunk ID "(c) " serves as the copyright characters. The chunk types "NAME","(c) ", and "AUTH" are property chunks. Putting more than one NAME (or other) property in a FORM is redundant. A property should be shorter than 256 characters. The optional data chunk "ANNO" holds any text annotations typed in by the author. An ANNO chunk is not a property chunk, so you can put more than one in a FORM 8SVX. You can make it any length up to 2^31 - 1 characters. Syntactically, each of these chunks contains an array of 8-bit ASCII characters in the range " " (SP, hex 20) through "~" (tilde, hex 7F), just like a standard "TEXT" chunk. The chunk's 'ckSize' field holds the count of characters. #define ID_NAME MakeID('N','A','M','E') #define ID_Copyright MakeID('(','c',')',' ') #define ID_AUTH MakeID('A','U','T','H') #define ID_ANNO MakeID('A','N','N','O') Remember to store a zero-value pad byte after odd-length chunks. - Optional Data Chunks ATAK and RLSE The optional data chunks ATAK and RLSE together give a piecewise-linear "envelope" or "amplitude-contour". This contour may be used to modulate the sound during playback. It's especially useful for playing musical notes of variable durations. Playback programs may ignore the supplied envelope or substitute another. #define ID_ATAK MakeID('A','T','A','K') #define ID_RLSE MakeID('R','L','S','E') typedef struct { UWORD duration; /* segment duration in milliseconds, > 0 */ FIXED dest /* destination volume factor */ } EGPoint; ATAK and RLSE chunks contain an EGPoint Array, piecewise-linear envelope. The envelope defines a function of time returning Fixed volume. It's used to scale the nominal volume specified in the Voice8Header. To explain the meaning of these chunks, we'll overview the envelope generation algorithm. Start at 0 volume, step through the ATAK contour, then hold at the sustain level (the last ATAK EGPoint's dest), and then step through the RLSE contour. Begin the release at the desired note stop time minus the total duration of the release contour. Remember to multiply the envelope function by the nominal voice header volume and by any desired note volume. Note: The number of EGPoints in either an ATAK or RLSE chunk is ckSize/sizeof(EGPoint). - Data Chunk BODY The BODY chunk contains the audio data samples. #define ID_BODY MakeID('B','O','D','Y') typedef character BYTE; /* 8bit signed number */ The BODY contains data samples grouped by octave. Within each octave are one-shot and/or repeat portions. In general, the BODY has 'ctOctave' octaves of data. The highest frequency octave comes first, comprising the fewest samples as given by 'oneShotHiSampels'+'repeatHiSamples'. Each successive octave contains twice as many samples as the previous octave. The number of samples in the BODY chunk is ((2^0) + (2^1) + ... + (2^(ctOctave-1))) * (oneShotHiSamples + repeatHiSamples). To avoid playback 'clicks', the beginning and end of the one-shot portion should be at about the same level. 3. FORM 8SVX File Format Layout ------------------------------- FORM 4-byte ID size 4-byte Size (rest of file after next 4-bytes) 8SVX 4-byte ID VHDR 4-byte ID = Voice8Header size 4-byte sizeof(Voice8Header) data sizeof(Voice8Header) bytes of data . . . Any other optional chunks (ATAK,RLSE,NAME,ANNO,etc.) BODY 4-byte ID = Sampled Data size 4-byte data size data size(BODY) bytes of data all odd-length chunks are zero-padded Document converted to plain ASCII for inclusion in Wotsit's Format +-----.. . . ..-----+ +-----------------------------------------------------| #| n-Factor's |--+ +-----------------------------------------------------| #| |--+ | | |###| ~ |###| ~ | | | === DIGITRAKKER === FILE-FORMAT DESCRIPTION === | # #| #| # #| #| | | | | |###| #| |###| #| | | +-----------------------------------------------------| #| |--+ +-----------------------------------------------------| t r a k k|##|e r |--+ +-----.. . . ..-----+ --------------------------- by prodatron/n-Factor ---------------------------- This file contains information about the songmodule-format "MDL", the instrument-format "IST" and the old sample-format "SPL". If you have some problems or if you have questions about these formats, which are not answered in here, just contact me. +----------------------------------------------------------------------------+ | THE SONGMODULE-FORMAT (MDL) V1.1 | +----------------------------------------------------------------------------+ Offset Lenght Description 000 004 "DMDL"; the four letters mark the mdl-format 004 001 version; the current version is 11h (=1.1) 005 ??? the different data-blocks are stored at this position Some words to the format version-number: - if the low-nibble increases, there are extensions in the format, but old loaders should be able to load the new modules (or most of them...) - if the high-nibble increases, there are changes in the format which make old loaders unable to read the new songfiles The MDL-songmodule-format is subdivided into the following blocks: "IN" infoblock; contains most songparameters, like speed, length etc. "ME" songmessage; contains the songinformation from the composer "PA" c pattern; contains the length, names and tracklists for every pattern "TR" tracks; contains all the tracks for the pattern "II" n instruments; contains all information for the used instruments "VE" n volume-envelopes; contains the construction of all used vol-envelopes "PE" n panning-envelopes; the same for the used pan-envelopes "FE" n1 frequency-envelopes; ...used frq-envelopes (LFO) "IS" c sampleinfos; contains information for every used sample "SA" samples; contains the sample-datas [[ c = blockstructure changes from v0.0 to v1.0 ]] [[ n = new in version 1.0 ]] [[ n1 = new in version 1.1 ]] The sequence of the blocks in a file is not fixed so they can be stored in any way. Digitrakker uses the descripted sequence. The structure for every block is the same: Offset Lenght Description 000 002 "xx"; block-ID (example: "IN" for infoblock) 002 004 blocklength; this dword contains the length of the FOLLOWING datas. 006 datas for this block... The next block will be at offset (006 + ). +----------------------------------------------------------------------------+ | The Song-Infoblock (IN) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "IN"; infoblock-ID 002 004 blocklenght 006 032 songname; name of the songmodule (filled with spaces [32]) 038 020 composername; name of the song-composer 058 002 songlength; Digitrakker supports up to 255 songpositions 060 002 songrepeat 062 001 mainvolume (001-255) 063 001 song-speed (001-255) 064 001 beats per minute (004-255) 065 032 channel-information: bit 0-6 - panposition (0=left,127=right) bit 7 - 0=channel on, 1=channel off [number of channels = last active channel] 097 sequencer; contains the number of the pattern for every songposition ??? the names for every channel (8 chars for one name). = 8 * +----------------------------------------------------------------------------+ | The Songmessage (ME) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "ME"; songmessage-ID 002 004 blocklenght 006 ??? songmessage; every line is closed with the CR-char (13). A 0-byte stands at the end of the whole text. +----------------------------------------------------------------------------+ | The Patterndatas (PA) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "PA"; patterndata-ID 002 004 blocklenght 006 001 number of pattern; values from 1 to 255 are possible 007 the datablocks for all saved pattern The structure of one pattern-datablock: 000 001 number of used channels (0-32) 001 001 patternlength-1 (0-255 for 1-256 lines) 002 016 pattern-name (filled with [32]) 018 tracksequencing-list = 2 * number of channels in this pattern The tracksequencing-lists descripe which track is used as which voice in the pattern. The first word in this list is the number of the track at voice 0. The second is track for voice 1 and so on... As every track is saved independend, it is possible to save some discspace by this methode: If the song contains equal tracks at several positions in the patterns, these double tracks will only saved one time. Track 0 is not saved and represents an empty track. +----------------------------------------------------------------------------+ | The Trackdatas (TR) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "TR"; trackdata-ID 002 004 blocklenght 006 002 number of tracks 008 ??? track-datablocks; every trackdatablock is stored in this way: Ofs.000 Len.002 length of the trackdatas 002 datas for this track Every track consists of 1-256 notepositions and every notposition contains 6 bytes: byte 0 - note-value; 1 = C-0, 2 = C#0,..., 120 = B-9, 0 = nothing (---), 255 = key off (^^^) byte 1 - sample-number; 1-255; 0 = nothing byte 2 - volume; 1-255; 0 = no volume change byte 3, low nibble - number of the first effect-command byte 3, high nibble - number of the second effect-command (commands "g"-"l" get the numbers 1-6) byte 4 - databyte for the first effect-command byte 5 - databyte for the second effect-command Digitrakker stores the tracks in a packed way. The structure of this VERY effective (!!!) packformat is the following: bit 76543210 byte 0 - xxxxxxyy if yy = 00 -> +1 is the number of the empty notepositions which are following. if yy = 01 -> the last noteposition will be repeated +1 times. if yy = 10 -> the noteslot from position is copied to the actual position. if yy = 11 -> the following datas will be put in the actual noteslot: bit 2 = 1 -> note bit 3 = 1 -> sample bit 4 = 1 -> volume bit 5 = 1 -> effectcommand numbers bit 6 = 1 -> databyte for effect 1 bit 7 = 1 -> databyte for effect 2 To find out the number of notepositions in a track you should decrease a counter (startvalue: length of the packed trackdatas) while depacking. Every depacked track has to be filled out with 0-values up to position 256. When you copy the track into a pattern just take the number of positions you need for the pattern (if the pattern has a length of 64 positions only take the first 64 positions from the track). +----------------------------------------------------------------------------+ | The Instruments (II) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "II"; intrument-ID 002 004 blocklenght 006 001 number of saved instruments; values from 0 to 255 are possible 007 the datablocks for all used instruments The structure of one instrument-datablock: 000 001 instrument-number; (1-255) 001 001 number of samples in instrument; (1-16) 001 032 instrument-name; the name of the instrument (filled with [32]) 033 this block contains the infos for all defined samples in the instrument; every info consist of 14 bytes, so = 14 * number of defined samples The structure of one instrument-sample datablock: 000 001 sample-number; (1-255) 001 001 playrange-end (0-119,0='c-0'); the last note for this sample; a higher note uses one of the next samples 002 001 volume (1-255) 003 001 bit 0-5 -> volumeenvelope-number (0-63) bit 6 -> flag, if volume is used bit 7 -> flag, if volumeenvelope is used 004 001 panning (0-127) 005 001 bit 0-5 -> panningenvelope-number (0-63) bit 6 -> flag, if panning is used bit 7 -> flag, if panningenvelope is used 006 002 fadeout-speed (0-65535) 008 001 vibrato-speed (0-255) 009 001 vibrato-depth (0-255) 010 001 vibrato-sweep (0-255) 011 001 vibrato-form (0-2) 012 001 ** reserved ** (should be set to <0>) 013 001 bit 0-5 -> frequencyenvelope-number (0-63) bit 6 -> ** reserved ** (should be set to <0>) bit 7 -> flag, if frequencyenvelope is used +----------------------------------------------------------------------------+ | The Volume-Envelopes (VE) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "VE"; volume-envelope-ID 002 004 blocklenght 006 001 number of saved vol-envelopes (0-64) 007 volume-envelope datas; a datablock contains 33 bytes, so = 33 * number of saved vol-envelopes; The structure of one envelope-datablock: 000 001 envelope-number; (0-63) 001 030 the positions of the 15 points are stored here; the first byte is the x-distance from the last point (1-255; 0 means, that no more points are defined; take 1 for the first point), the second byte is the y-position (0-63) 031 001 bit 0-3 -> sustain-point (0-14) bit 4 -> flag, if sustain is on bit 5 -> flag, if loop is on bit 6-7 -> ** reserved ** (should be set to <0>) 032 001 bit 0-3 -> loop-start (0-14) bit 4-7 -> loop-end (0-14) +----------------------------------------------------------------------------+ | The Panning-Envelopes (PE) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "PE"; panning-envelope-ID 002 004 blocklenght 006 001 number of saved pan-envelopes (0-64) 007 panning-envelope datas; a datablock contains 33 bytes, so = 33 * number of saved pan-envelopes see at "VE" for the description of an envelope-datablock +----------------------------------------------------------------------------+ | The Frequency-Envelopes (FE) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "FE"; frequency-envelope-ID 002 004 blocklenght 006 001 number of saved frq-envelopes (0-64) 007 frequency-envelope datas; a datablock contains 33 bytes, so = 33 * number of saved frq-envelopes see at "VE" for the description of an envelope-datablock +----------------------------------------------------------------------------+ | The Sample-Infoblocks (IS) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "IS"; sampleinfo-ID 002 004 blocklenght 006 001 number of saved samples; values from 0 to 255 are possible 007 sample-infoblocks; an infoblock for one sample contains 59 bytes, so = 59 * number of saved samples The structure of one sample-infoblock: 000 001 sample-number; (1-255) 001 032 sample-name; the name of the sample (filled with [32]) 033 008 filename of the sample 041 004 C-4 sample-frequency in hz 045 004 sample-length 049 004 sample-repeatstart 053 004 sample-repeatlength; if this value is set to 0, the sample will not loop 057 001 ** not used ** (this was the volume in old v0.0-modules) 058 001 infobyte: bit 0 -> 0=8 bit sample, 1=16 bit sample bit 1 -> 0=forward looping, 1=bidirectional looping bit 2,3 -> packmethode (0=not packed, 1=8bit packing, 2=16bit packing, 3=not defined) bit 4-7 -> ** reserved ** (should be set to <0>) +----------------------------------------------------------------------------+ | The Sampledatas (SA) | +----------------------------------------------------------------------------+ Offset Lenght Description 000 002 "SA"; sampledata-ID 002 004 blocklenght 006 ??? sampledatas; samples are stored in numeric sequence Unpacked samples are stored in signed form. Packmethode (1) is designed for 8 bit samples, Packmethode (2) for 16 bit samples. Methode (3) isn't defined in this version. A packed sample begins with a dword which contains the length of the following datastream. The description of the sample-packmethode (1) [8bit packing]:... ---------------------------------------------------------------- This methode is based on the huffman-algorithm. It's an easy form, but very fast and effective on samples. The packed sample is a bit-datastream: Byte 0 Byte 1 Byte 2 Byte 3 Bit 76543210 fedcba98 nmlkjihg ....rqpo A packed byte is stored in the following form: xxxx10..0s => byte = + (number of <0>-bits between s and 1) * 16 - 8 ; if s=1 then byte = byte xor 255 If there are no <0>-bits between the first bit (sign) and the <1>-bit, you have the following form: xxx1s => byte = ; if s=1 then byte = byte xor 255 To depack one byte, you have to use the following algorithm: +----------------------------------------------------------------------------+ | read bit | | sign = bit | | read bit | | if bit = 1 | | then read [3bits] | | byte = [3bits] | | goto next | | else byte = 8 | |loop: read bit | | if bit = 0 | | then byte = byte + 16 | | goto loop | | else read [4bits] | | byte = byte + [4bits] | |next: if sign = 1 | | then byte = byte xor 255 | +----------------------------------------------------------------------------+ Two examples: xxxx s 1001101 = ( 9 + 1 * 16 - 8 ) xor 255 = 238 xxx s 01010 = 2 Note that the depacked bytes are delta values. To convert them to real data use this algorithm: oldbyte = 0 for sampleposition = 1 to samplelength newbyte = byte [sampleposition] + oldbyte byte [sampleposition] = newbyte oldbyte = newbyte next sampleposition The description of the sample-packmethode (2) [16bit packing]:... ---------------------------------------------------------------- This works as methode (1) but it only crunches every 2nd byte (the high- bytes of 16 bit samples). So when you depack 16 bit samples, you have to read 8 bits from the data-stream first. They present the lowbyte of the sample-word. Then depack the highbyte in the descripted way (methode [1]). Only the highbytes are delta-values. So take the lowbytes as they are. Go on this way for the whole sample! +----------------------------------------------------------------------------+ | ** Differences to older formats ** | +----------------------------------------------------------------------------+ Changes from v0.0 to v1.0: - block "PN" (patternnames) doesn't exist in v1.0-modules (patternnames now stored in block "PA") the old v0.0 structure of the "PN"-block: 000 002 "PN"; patternnames-ID 002 004 blocklenght 006 the names for every pattern (16 chars for one name). = 16 * - structure of block "PA" changes completely the old v0.0 structure of the "PA"-block: 000 002 "PA"; patterndata-ID 002 004 blocklenght 006 001 number of pattern; values from 1 to 255 are possible 007 tracksequencing-list for the used patterns; = 64 * number of patterns (32 words with the tracknumbers for every pattern) - new blocks in v1.0-modules: "II" (instruments), "VE" (volume-envelopes) and "PE" (panning-envelopes) - volumebyte (byte 57) in the sample-datablocks (block "IS") isn't used; the C-4 sample-frequency increases from a word (2 bytes) to a dword (4 bytes), so one whole sample-infoblock has a length of 59 bytes Extension from v1.0 to v1.1: - new block: "FE" (frequency-envelopes) +----------------------------------------------------------------------------+ | THE INSTRUMENT-FORMAT (IST) V0.1 | +----------------------------------------------------------------------------+ The IST-format has the same structure like the MDL-format: Offset Lenght Description 000 004 "DIST"; the four letters mark the ist-format 004 001 version; the current version is 01h (=0.1) 005 ??? the different data-blocks are stored at this position The IST-instrument-format is subdivided into the following blocks: "II" instruments; contains all information for the saved instrument "VE" volume-envelopes; contains the construction of all vol-envelopes for this instrument "PE" panning-envelopes; the same for the pan-envelopes "FE" frequency-envelopes (new in v0.1); the same for the frq-envelopes "IS" sampleinfos; contains information for every used sample "SA" samples; contains the sample-datas The structures for the several blocks are the same as in the mdl-format. The instrument-infoblock ("II") contains one instrument only. +----------------------------------------------------------------------------+ | THE SAMPLE-FORMAT (SPL) V0.0 | +----------------------------------------------------------------------------+ Here comes the description for the old sample-format "SPL", which was used in older tracker-versions (V2.0-V2.2). This format isn't supported any longer in Digitrakker, that means you can read it, but you can't save samples in this form. The reason for this step was the fact, that there are too many sample-formats and it makes no sense to introduce a new one, because the existing IFF-format nearly contains all infos you need for a Digitrakker-sample. Offset Lenght Description 000 004 "DSPL"; the four letters mark the spl-format 004 001 version; the current version is 0 005 032 sample-name; the name of the sample (filled with [32]) 037 008 filename of the sample 045 002 C-4 sample-frequency in hz (00000-65535) 047 004 sample-length 051 004 sample-repeatstart 055 004 sample-repeatlength; if this value is set to 0, the sample will not loop 059 001 sample-volume (1-255) 060 001 infobyte: bit 0 -> 0=8 bit sample, 1=16 bit sample bit 1 -> 0=forward looping, 1=bidirectional looping bit 2,3 -> packmethode (0=not packed, methodes 2 and 3 doesn't exist in this version) bit 4-7 -> not used (should be set to 0) 061 ??? sampledatas... (see above) MIDI SAMPLE DUMP STANDARD 1) INTRODUCTION The MIDI SDS was adopted in January 1986 by the MIDI Manufacturers Association and the Japanese MIDI Standards Committee. The SDS defines the standard method for transfer of sound sample data between MIDI-equipped devices. Sample dumps may be accomplished with either an 'open loop' or 'closed loop' system. The open loop method simply involves the straight dump of all sample data from its source to the destination, with no timeouts, packet acknowledgements, or any other form of handshaking, much as in the manner of a sysex bulk dump, usually intiated at the source. The closed loop method allows the use of handshaking messages between the dump source and destination, and usually places the dump process under the control of the slave, to allow it time to process the incoming data as necessary. As with any standard, it can not be assumed that a device adheres to it unless the accompanying documentation specifically indicates it. Even then, it is best to check its conformity with non-critical data. 2) SPEC: SAMPLE DUMP FORMATS DUMP HEADER: F0 7E cc 01 ss ss ee ff ff ff gg gg gg hh hh hh ii ii ii jj F7 where cc = channel number ss ss = sample number (LSB first) ee = sample format (number of significant bits; 8->28) ff ff ff = sample period (1/sample rate) in nanoseconds (LSB first) gg gg gg = sample length, in words hh hh hh = sustain loop start point (word number) (LSB first) ii ii ii = sustain loop end point (word number) (LSB first) jj = loop type (00:forwards only; 01:alternating) DATA PACKET: F0 7E cc 02 kk <120 bytes> mm F7 where cc = channel number kk = running packet count (00->7F) mm = checksum (XOR of 7E, cc, 02, kk <120 bytes>) The total size of a data packet is 127 bytes. This is to avoid overflow of the MIDI input buffer of a device that may want to receive an entire packet before processing it. A data packet consists of its own header, a packet number, 120 bytes of data, a checksum, and an EOX. The packet number begins at 00 and increments with each new packet. It resets to 00 after it reaches 7F, and continues counting. The packet number is used by the receiver to distinguish between a new data packet, or a resend of a previous packet. The packet number is followed by 120 bytes of data, which form 60, 40, or 30 words (MSB first for multiword samples), depending on the length of a single data sample. Each data byte hold seven bits, with the msb in each byte set to 0, in order to conform to the requirements of MIDI data transmission. Information is left justified within the 7-bit bytes, and unused bits are filled with 0. Example: Assume a data point in the memory of a 16-bit sampler, with the value 87E5. In binary, that would be 1000 0111 1110 0101 and would be encoded as the following MIDI data stream: 01000011 01111001 00100000 The checksum is the running XOR of all the data after the SYSEX byte, up to but not including the checksum itself. 3) SPEC: SAMPLE DUMP MESSAGES DUMP REQUEST: F0 7E cc 03 ss ss F7 where cc = channel number ss ss = sample number requested (LSB first) Upon receiving the request, the sampler checks the sample number to see if it is within legal range. If it is not, the request is ignored. If it is, the sample dump is started. One packet at a time is sent, under control of the handshaking messages outlined below. HANDSHAKING MESSAGES: For all below: cc = channel number pp = packet number Packet numbers are included in the handshaking messages to accomodate machines that have the intelligence to re-transmit specific packets after an entire dump is finished, or if synchronization is lost. ACK : F0 7E cc 7F pp F7 Means last packet was recieved correctly (checksum OK, etc), please send next one. Packet number is packet being acknowledged as correct. NAK : F0 7E cc 7E pp F7 Means last packet not received correctly, please send again. Packet number is packet being rejected. CANCEL : F0 7E cc 7D pp F7 Means abort dump immediately. Packet number is packet on which abort occurs. WAIT : F0 7E cc 7C pp F7 Means pause dump indefinitely, until next message is sent. Allows the unit recieving the dump to perform other functions (disk access, etc), before receiving the remainder of the dump. The next message it sends (eg ACK, ABORT) will determine if the dump continues or aborts. 4) DUMP PROCEDURE: MASTER (DUMP SOURCE) Once a dump has been requested, either via MIDI or through the front panel, the DUMP HEADER is sent. After sending the header, the master must time out for at least two seconds, to allow the receiver to decide if it will accept this sample (has enough memory, etc). If it receives a CANCEL, within this time, it should abort immediately. If it receives an CAK, it will start sending packets immediately. If it receives a WAIT, it pauses until another message is received, and then processes that mesage normally. If nothing is recieved within the timeout, an open loop is assumed, and the dump starts with the first packet. After sending each packet, the master should time out for at least 20 milliseconds and watch its MIDI In. If an ACK is received, it sends the next packet immediately. If it receives an NAK, and the packet number matches the number of the last packet sent, it resend that packet If the packet numbers don't match, and the device is incapable of sending packets out of order, the NAK will be ignored. If a WAIT is received, the master should watch its MIDI In port indefinitely for another ACK, NAK, or CANCEL message, which it should then process normally. If no messages are received within 20 milliseconds of the transmission of a packet, the master may assume an open loop configuration, and send the next packet. This process continues until there are less than 121 data bytes to send. The final packet will still consist of 120n bytes, regardless of how many significant bytes actually remain, and the unused bytes will be filled with zeroes. The receiver should handshake after receiving the last packet. 5) DUMP PROCEDURE: SLAVE (DUMP DESTINATION) When receiving a sample dump, a device should keep a running checksum during reception. If its checksum matches the checksum in the data packet, it will send an ACK and wait for the next packet. If it does not match, it will send an NAK containing the number of the packet that caused the error, and wait for the next packet. If, after sending an NAK, the packet number of the next packet doesn't match the previous packet number (the one that was NAK'd), and the unit is not capable of accepting packets out of order, the error is ignored and the dump continues as if the checksums had matched. If a receiver runs out of memory before the dumpo is completed, it should send a CANCEL to stop the dump. 6) SDS OVERVIEW SAMPLE DUMP DATA FORMAT: DUMP HEADER: Sysex ID: Universal Non-Real Time Channel Number Sub ID: Header Sample Number (2 bytes, LSB first) Sample Format Sample Period (3 bytes, LSB first) Sample Length (3 bytes, LSB first) Sustain Loop Start Point (3 bytes, LSB first) Sustain Loop End Point (3 bytes, LSB first) Loop Type Eox SAMPLE DUMP DATA FORMAT: DATA PACKET: Sysex ID: Universal Non-Real Time Channel Number Sub ID: Data Packet Packet Number Sample Data (120 bytes) Checksum Eox SAMPLE DUMP MESSAGES: DUMP REQUEST: Sysex ID: Universal Non-Real Time Channel Number Sub ID: Dump Request Sample Number (2 bytes, LSB first) Eox SAMPLE DUMP MESSAGES: HANDSHAKING FLAGS: Sysex ID: Universal Non-Real Time Channel Number Sub ID: ACK or NAK or CANCEL or WAIT Packet Number Eox From: Harald Zappe Date: Wed, 6 Apr 94 19:48:56 +0200 Subject: oktafmt.txt (? final) Thanks to all of those mentioned below for the additional infos and sources. It looks pretty complete now. The effects seem to be complete. All, VT, Multiplayer and the Amiga Oktalyzer 1.1 Player use the same values. But I didn't see the effect 12 anywhere (Arp 5), as mentioned by Peter Kunath. ----------------------------------------------------------------- [C.3.3] Oktalyzer --------- Thanks to Frank Seide (seide@pfa.philips.de) for the first hints, Bryan Ford (baford@schirf.cs.utah.edu) for most of the detailed comments below, the effects, and the (GPL) free source code of his Multiplayer, the Vangelis Team, which is Juan Carlos Arevalo (jarevalo@moises.ls.fi.upm.es), Felix Sanz, and Luis Crespo for the Freeware sources of the Vangelis Tracker, Armin Sander for the (?)first Oktalyzer Player on an Amiga, Peter Kunath (kunath@informatik.tu-muenchen.de) for several hints, and Jamal Hannah (jamal@gnu.ai.mit.edu) for coordinating us all. There are two different "Oktalyzer" formats. The following description only refers to the IFF-like style. The other one (a memory dump model) seems to have no popularity. (All numbers below are given in hex unless specified as 't'ecimal.) MSB first offset |bytes| contents | meaning -------+-----+------------+------------------------------------------------- 000000 | 8 | "OKTASONG" | char Magic[8] | | | /* If you support different music file types | | | check these letters. */ -------+-----+------------+------------------------------------------------- | | | Channel_Modes { 000008 | 4 | "CMOD" | char chunk_name[4] 00000C | 4 | 8 | long chunk_len 000010 | 8 | | short chan_flags[4] | | | /* 0: normal (Amiga) sound channel */ | | | /* 1: 'tied' or 'splitted' channel: two | | | sounds are played through this channel | | | at the same time (mixed at run time) */ | | | /* eg. 0 1 0 1 => 6 channel: 1: normal, | | | 2/3: tied, 4: normal, 5/6: tied */ | | | } -------+-----+------------+------------------------------------------------- | | | Sample_directory { 000018 | 4 | "SAMP" | char chunk_name[4] 00001C | 4 | (00000480) | long Sample_dir_len /*==chunk_len*/ | | | /* Nr_of_samples = Sample_dir_len / 32 */ | | | 000020 | 20t| | char Sample_Name[20] \ 000034 | 4 | | unsigned long Sample_Len ) 000038 | 2 | | unsigned short Repeat_Start ( up to 36x 00003A | 2 | | unsigned short Repeat_Len > (or more?) 00003C | 1 | (00) | char pad1 ( 00003D | 1 | (40) | unsigned char Volume ) 00003E | 2 | (0001) | short pad2 / ... | | | /* | | | If 'Repeat_Len' is zero, it is a simple one-shot | | | sample: ignore 'Repeat_Start', just play the whole | | | 'Sample_Len' bytes and stop the sound. | | | If 'Repeat_Len' is nonzero, it is a repeating sample | | | consisting of three parts: attack, sustain, and re- | | | lease. (Most other tracker formats don't support re- | | | lease.) The attack part starts at 0 and ends at | | | Repeat_Start-1, the sustain part starts at 'Repeat_ | | | Start' and ends at Repeat_Start+Repeat_Len-1, and | | | the release part starts at Repeat_Start+Repeat_Len | | | and ends at Sample_Length-1. | | | The attack part should be played once, followed by | | | the repeat part an arbitrary number of times until | | | another note is played or a "release" command is | | | seen. If the "release" command is seen, then switch | | | to the release part of the sample when the current | | | repeat run is finished, and only play it once, fol- | | | lowed by silence. | | | 'Volume' is the default volume for notes played with | | | this sample: 0 to 64 (0x40) inclusive. */ | | | } -------+-----+------------+------------------------------------------------- | | | Speed { 0004A0 | 4 | "SPEE" | char chunk_name[4] 0004A4 | 4 | 2 | long chunk_len 0004A8 | 2 | (3) | short AmigaVBLDivisor /* InitialTempo */ | | | } -------+-----+------------+------------------------------------------------- | | | Song_Length { 0004AA | 4 | "SLEN" | char chunk_name[4] 0004B2 | 4 | 2 | long chunk_len 0004B6 | 2 | (60t) | short value | | | /* it specifies the number of different | | | patterns this module has. | | | (can be used as counter for the "PBOD" | | | chunks) */ | | | } -------+-----+------------+------------------------------------------------- | | | Num_Pattern_Positions { /* "PatternLength" */ 0004B8 | 4 | "PLEN" | char chunk_name[4] 0004BC | 4 | 2 | long chunk_len 0004BE | 2 | (15t) | short num_positions | | | /* it specifies the number of entries in | | | the pattern table (see "PATT" below) */ | | | } -------+-----+------------+- - - - - - - - - - - - - - - - - - - - - - - - - | | | Pattern_Positions { 0004C0 | 4 | "PATT" | char chunk_name[4] 0004C4 | 4 | (128t ?) | long chunk_len | | | /* (it seems that the length of this chunk | | | is always set to 128) */ 0004C8 | 128t| | byte position[*] | | | /* zero *is* a valid value in this field. | | | it means that pattern number 0 should | | | be played. the number of valid positions | | | is specified by the "PLEN" chunk. */ | | | } =======+=====+============+================================================= | | | Pattern1 { 000548 | 4 | "PBOD" | char chunk_name[4] ) 00054C | 4 | (0702 | long chunk_len ( up to 64 | | or 0602) | > patterns are 00054E | 2 | (64t) | short num_pattern_lines ( supported 000550 | ...| | byte Pattern1_Line[*] ) | | | /* see below */ | | | } ... | | | =======+=====+============+================================================= | | | Sample1 { 0..... | 4 | "SBOD" | char chunk_name[4] ) up to 255* is 0..... | 4 | | long chunk_len (_ possible but 0..... | ...| | byte sample_data[*] ( mostly limited ... | | | /* 8 bit signed data */ ) to 36* | | | } ... | | | =======+=====+============+================================================= Values in parentheses are examples and may vary. (If you choose the faster methode to check the chunk types using a 'long'- value, don't forget to exchange the byte order on LSB-systems.) There are 36 effects, instruments and notes. In the original Oktalyzer editor they are entered using the 10 digits and the 26 letters, that's why 36. ____ A pattern line (PBOD chunks) looks like follows: After the 16-bit num_pattern_lines are that many lines of pattern data, each line containing four bytes for each active channel. For example, in a 6-channel module, each line is 24 bytes. The four bytes of one channel are: unsigned char newnote, unsigned char instrument, /* sample */ unsigned char effect, unsigned char data /* effect parameter */ If newnote is nonzero, start playing a different note. There are 36 pitches, 1-36 (see pertab below). Set the current channel's volume to the sample's volume. 'instrument' indicates which sample to use. Whether or not newnote is nonzero, process 'effect' and 'data' (see effects below). ___ Oktalyzer uses the following period table, which is the same as for ST/NT/PT- Mod-Files. (converted to C actually, but the same numbers) static short pertab[] = { /* C C# D D# E F F# G G# A A# B */ 0x358,0x328,0x2FA,0x2D0, 0x2A6,0x280,0x25C,0x23A, 0x21A,0x1FC,0x1E0,0x1C5, 0x1AC,0x194,0x17D,0x168, 0x153,0x140,0x12E,0x11D, 0x10D, 0xFE, 0xF0, 0xE2, 0xD6, 0xCA, 0xBE, 0xB4, 0xAA, 0xA0, 0x97, 0x8F, 0x87, 0x7F, 0x78, 0x71 }; The extended octaves 0 and 4 which might be found in other formats are not used here. ____ The Oktalyzer format defines the following effects (decimal): 1 Portamento down: decrease period of current sample by 'data', once every 50Hz clock tick. 2 Portamento up: increase period of current sample by 'data', once every 50Hz clock tick. 10 Arpeggio 3: Change note every 50Hz tick between L,N,H 11 Arpeggio 4: Change note every 50Hz tick between N,H,N,L 12 Arpeggio 5: Change note every 50Hz tick between H,H,N N = normal note being played in this channel (1-36) L = normal note number minus upper four bits of 'data'. H = normal note number plus lower four bits of 'data'. 13 Decrease note number by 'data' once per tick. 17 Increase note number by 'data' once per tick. 21 Decrease note number by 'data' once per line. 30 Increase note number by 'data' once per line. 15 Amiga low-pass filter control: 'data' indicates the new setting. 25 Position jump: Instead of going to the next line after this one, instead jump to the beginning of pattern number 'data'. 27 Release: start playing the release phase of the currently playing sample. 28 Set speed (number of 50Hz ticks between advancing lines) to 'data'. 31 Volume control: If 'data' <= 0x40, set the volume of this channel to 'data'. If 0x41 <= 'data' <= 0x50, decrease volume by 'data' - 0x40 every 50Hz clock tick (fast fade out). If 0x51 <= 'data' <= 0x60, increase volume by 'data' - 0x50 every 50Hz clock tick (fast fade in). If 0x61 <= 'data' <= 0x70, decrease volume by 'data' - 0x60 at the beginning of every line (slow fade out). If 0x71 <= 'data' <= 0x80, increase volume by 'data' - 0x70 at the beginning of every line (slow fade in). -- There seems to be much room for future extentions, eg. panning. ... now it's complete? (HZ) ----------------------------------------------------------------- Harald -- >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~< zappe@gaea.sietec.de | | |--+- everything is relative Harald Zappe | |/ nothing is for infinity work: +49-30-386-28328/29 / quantity is not quality home: +49-30-ASK-ME /___ Document converted to plain ASCII for inclusion in Wotsit's Format SBStudio II (C) 1991-94 Henning Hellstr”m All rights reserved Program documentation The SBStudio file format ------------------------------------------------------- >The basic idea> -------------- When I created SBStudio II, I also created the need for a new file format to support the new features. The format I came up with has many advantages. It is easy to make loaders and savers for it, it has few limitations, it takes up less diskspace than most other formats and it is very easy to upgrade. The greatest advantage is that the format is built up of data blocks, each starting with a four byte text ID and a doubleword saying the length of the block. This makes it possible for loaders to skip data they don't support. This is also what makes the format so easy to upgrade. This file will cover the new format in great detail, and i hope it will become a new standard in music file formats. >Some background information> --------------------------- The new format actually consists of three file types; PACKAGES (.PAC), SONGS (.SON) and SOUNDS (.SOU). A PACKAGE is really a SONG file with the needed SOUND files attached to it. As mentioned before, the format is built up of data blocks starting with a four byte identifier. This identifier is followed by a doubleword saying the length of the data block, *excluding* these first eight bytes. This makes it easy to find and read wanted elements from the file without doing heavy calculations. Let's say you want to find a data block called 'TEST': 1. Ignore the first 8 bytes, start at byte 9. 2. Read four bytes. 3. If these four bytes are 'TEST', go to step 7. 4. If these four bytes are 'END ', the file doesn't contain the wanted block. Terminate. 5. Read a doubleword and add it to the read pointer. 6. Go to step 2. 7. Process data. Please note that the files always start with a file identifier with a block length of , and end with an 'END ' identifier with a block length of zero. I think you get the picture now, let's get down to business! >The file format> --------------- This current format version is v1.04. Here is a list of the identifiers you should expect to find in a v1.04 file. 'Block length' represents the doubleword immediately following the block identifier. 'DWord' means long integer or 4 bytes. 'Word' means integer or 2 bytes. Package ------- Identifier : 'PACG' Location : At the beginning of a PACKAGE. Block length : File size - 8. Block structure : None. Identifier : 'PAIN' Location : Usually after the 'PACG' block. Block length : Expect anything. Block structure : Word - Package version. Word - SBStudio version that saved the package. Other savers should write 0 here. Word - Number of sounds in package (may be 0). Song ---- Identifier : 'SONG' Location : At the beginning of a SONG file or inside a PACKAGE, usually after the 'PAIN' block. Represents the start of a song structure. Block length : File size - 8 if it's at the beginning of a SONG file, 0 if it's inside a PACKAGE file. Block structure : None. Identifier : 'SONA' Location : In a song structure. Block length : Expect anything. Block structure : The name of the song. This block is not needed. Identifier : 'SOOR' Location : In a song structure. Block length : Expect anything. Block structure : Block length/2 words, saying the playback order of the song sheets. This block is not needed. Identifier : 'SOIN' Location : In a song structure. Block length : Expect anything. Block structure : Byte - Base speed, usually 6. Byte - Base BPM, usually 125. Word - Number of sheets in song, must be at least 1. Byte - Number of channels used in song. 4-16 channels is normal for v1.04. Byte - Number of lines in sheet. Should always be 64. Byte - Number of bytes per channel cell. Should always be 5. Byte - Sheet packing: Bit 0 - 0 = Unpacked. 1 = Packed. Byte * channels - Pan positions for each channel. Pan range is 0h-Fh. Identifier : 'SOSH' Location : In a song structure. Block length : Expect anything. Block structure : This block contains one sheet. Read the chapter 'The sheet format' later in this file for details on the sheet structure. Sound ----- Identifier : 'SND ' Location : At the beginning of a SOUND file or inside a PACKAGE, usually after the song structure. Represents the start of a sound structure, which contain one sound. Block length : File size - 8 if it's at the beginning of a SOUND file, 0 if it's inside a PACKAGE file. Block structure : None. Identifier : 'SNNA' Location : In a sound structure. Block length : Expect anything. Block structure : The name of the sound. Identifier : 'SNIN' Location : In a sound structure. Block length : Expect anything. Block structure : Word - Sound number, only used in PACKAGE. Word - Reserved. Byte - Fine tuning. Word - Sound volume, 0-16384. Word - Sound type: Bit 0 - 1=PCM/0=Other. Bit 1 - 1=16bit/0=8bit. Format version 1.04 only supports PCM sounds. DWord - Sound loop start. DWord - Sound loop end. Byte - Sound packing: Bit 0 - 0 = Unpacked. 1 = Packed. Format version 1.04 only supports unpacked sounds. Identifier : 'SNDT' Location : In a sound structure. Block length : Sample length. Block structure : This block contains one sampled sound. All --- Identifier : 'END ' Location : At the end of all PACKAGE, SONG and SOUND files. Block length : 0 Block structure : None. >The sheet format> ---------------- The sheet is where the song notes are stored. SBStudio v2.05 limits the total number of different sheets to 64, but a song structure may contain up to 65535 sheets. The sheet consists of 5 bytes per channel, 64 times. NOTE: This MAY change in future versions, but let's say it won't for now. Check the values in the 'SOIN' block of the song structure to be sure. The 5 bytes represent one note. This is the format: Byte 0 - Note number 1-48, 0 = No note. 1 = C-1, 2 = C#1, 3 = D-1 ... 48 = B-4. Byte 1 - Sound number 1-99, 0 = No change. Byte 2 - Volume 1-65, 0 = No change. Byte 3 - Command 00h-0Fh. Byte 4 - Command parameter 00h-FFh. Read the documentation part 'Programming the sheet' for details on what the different commands do. When writing a loader, keep in mind that support for more octaves, sounds and commands may be added to future versions of the format. >The packed sheet format> ----------------------- SBStudio v2.05 saves all sheets in a packed format. The packed format is very simple, but may sometimes dramatically reduce the file size. When loading a sheet, you should always assume it is type 1 packed. This will make your loader compatible with both type 0 and 1 sheets, and reduce the number of instructions needed. Because more packing types may come in the future, you should always check the 'sheet packing' byte in the 'SOIN' block to see what type of packing is used. Here is the format description. In a type 1 packed sheet, byte 0 or 2 in the 5 byte channel cell may contain a special byte. The special bytes are: Value Meaning ---------------------------------------------- 0FDh End of channel cell. Next byte is the first byte of the next channel cell. 0FEh End of sheet row. Next byte is the first byte of the next row. 0FFh End of sheet. You are finished! ---------------------------------------------- >Last words> ---------- This should be all you need to know to write your own loader or saver for the new format. If you run into problems, please drop me a message in the SBStudio conference at the SoundServer MBBS in Oslo. Read the doc part 'About SBStudio' for details. Good luck! ------------------------------------------------------- PAT Format Intel byte order Information from File Format List 2.0 by Max Maischein. --------!-CONTACT_INFO---------------------- If you notice any mistakes or omissions, please let me know! It is only with YOUR help that the list can continue to grow. Please send all changes to me rather than distributing a modified version of the list. This file has been authored in the style of the INTERxxy.* file list by Ralf Brown, and uses almost the same format. Please read the file FILEFMTS.1ST before asking me any questions. You may find that they have already been addressed. Max Maischein Max Maischein, 2:244/1106.17 Max_Maischein@spam.fido.de corion@informatik.uni-frankfurt.de Corion on #coders@IRC --------!-DISCLAIMER------------------------ DISCLAIMER: THIS MATERIAL IS PROVIDED "AS IS". I verify the information contained in this list to the best of my ability, but I cannot be held responsible for any problems caused by use or misuse of the information, especially for those file formats foreign to the PC, like AMIGA or SUN file formats. If an information it is marked "guesswork" or undocumented, you should check it carefully to make sure your program will not break with an unexpected value (and please let me know whether or not it works the same way). Information marked with "???" is known to be incomplete or guesswork. Some file formats were not released by their creators, others are regarded as proprietary, which means that if your programs deal with them, you might be looking for trouble. I don't care about this. -------------------------------------------- The GF1 Patch files are multipart sound files for the Gravis Ultrasound sound card to emulate MIDI sounds in high quality. Each Patch can consist of many samples (for example, a string ensemble consists of Violin, Viola, Cello, Bass) which are played depending on the note to play. A patch can also contain a part to be played before the loop and a part to be played after the tone has been released. OFFSET Count TYPE Description 0000h 12 char ID='GF1PATCH110' 000Ch 10 char Manufacturer ID 0018h 60 char Description of the contained Instruments or copyright of manufacturer. 0054h 1 byte Number of instruments in this patch 0055h 1 byte Number of voices for sample 0056h 1 byte Number of output channels (1=mono,2=stereo) 0057h 1 word Number of waveforms 0059h 1 word Master volume for all samples 005Bh 1 dword Size of the following data 0060h 36 byte reserved Following this header, the instruments with their headers follow. An instrument header contains the name and other data about one instrument contained within the patch. OFFSET Count TYPE Description 0000h 1 word Instrument number. ?Maybe the MIDI instrument number?. In the Gravis patches, this is 0, in other patches, I found random values. 0002h 16 char ASCII name of the instrument. 0012h 1 dword Size of the whole instrument in bytes. 0016h 1 byte Layers. Needed for whatever. 0017h 40 byte reserved About the patch, I don't know anything. Maybe somebody could enlighten me. Each patch record has the following format : OFFSET Count TYPE Description 0000h 7 char Wave file name 0007h 1 byte Fractions 0008h 1 dword Wave size. Size of the wave digital data 000Ch 1 dword Start of wave loop 0010h 1 dword End of wave loop 0012h 1 word Sample rate of the wave 0014h 1 word Minimum frequency to play the wave 0016h 1 word Maximum frequency to play the wave 0018h 1 dword Original sample rate of the wave data 001Ch 1 int Fine tune value for the wave 001Eh 1 byte Stereo balance, values unknown** 001Fh 6 byte Filter envelope rate 0025h 6 byte Filter envelope offse 002Bh 1 byte Tremolo sweep 002Ch 1 byte Tremolo rate 002Dh 1 byte Tremolo depth 002Fh 1 byte Vibrato sweep 0030h 1 byte Vibrato rate 0031h 1 byte Vibrato depth 0032h 1 byte Wave data, bitmapped 0 - 8/16 bit wave data 1 - signed/unsigned data 2 - de/enable looping 3 - no/has bidirectional looping 4 - loop forward/backward 5 - Turn envelope sustaining off/on 6 - Dis/Enable filter envelope 7 - reserved 0033h 1 int Frequency scale, whatever that means 0035h 1 word Frequency scale factor 0037h 36 byte Reserved EXTENSION:PAT OCCURENCES:PC PROGRAMS:Patch Maker SEE ALSO:VOC,WAVe ;---------------------------------------------------------------------------- ; DisorderTracker2 file FORMAT STARTS HERE ;---------------------------------------------------------------------------- well here it comes... I am converting this from the source code now, as I type, so I hope it is right! this is (c) statix 1995... and I accept no responsibility for errors in here, if there is a problem contact me --> statix@sv.span.com... here is a list of fields in a .PLM file, I assume you know some coding?? name length (bytes) description ======================================================================= ID 4 marker, always "PLM" then character 26 headersize 1 number of bytes in header, including ID etc version 1 version code of file format, 10h, I think songname 48 ASCIIZ string channels 1 number of channels flags 1 flags byte, ignore this! maxvol 1 maximum volume for vol slides, normally 40h amplify 1 soundblaster amplify, 40h=no amplify initbpm 1 starting bpm of song, normally 125 initspeed 1 starting speed of song, normally 6 initpan 32 starting pan positions, always 32, 0=left, 15=right numsamps 1 number of samples in file numpats 1 number of patterns in file numorders 2 number of orders in file padding 1 ignore... ; now seek "headersize" bytes from the beginning of the file, (normally no ; change) now: orderlist 4*numorders a list of orders, format coming later... patternlist 4*numpats a list of file offsets of the patterns (dwords) samplelist 4*numsamps a list of file offsets of the samples (dwords) ; now read in the patterns, which are uncompressed. ; for each pattern, seek to the place in the file in the patternlist, and the ; format is below. (NOTE: if the patternlist has a position of 0, skip to next ; pattern) patternsize 4 (dword) number of bytes this pattern takes in file len byte # rows wid byte # channels col byte colour name 25 bytes name,asciiz string ;now the notes in the pattern, stored a row at a time, with "wid" channels ;in each row. the note format is: pitch byte hi nybble is octave, lo nybble is note (C=0, C#=1, D=2 etc). 0=blank sample byte sample number. 0=blank vol byte volume. 0ffh=blank. NOTE! big volumes >64 are allowed! cmd byte command number nfo byte command info ; now read in the samples, which are uncompressed. ; for each sample, seek to the place in the file in the samplelist, and the ; format is below. (NOTE: if the samplelist has a position of 0, skip to next ; sample) ;in fact each sample is stored in the .PLM file as a complete .PLS sample file. ;The .PLS format is as follows: id 4 bytes always "PLS" then character 26 headersize 1 byte size of header in bytes, including ID etc version 1 byte fullname 32 byte ; NOT asciiz filename 12 byte ; ditto pan byte ; default pan, 0..f, >f=none vol byte ; default vol 0..40h flags byte ; 1 = 16 bit , 0=8 bit c4spd word ; c4spd (as for S3M) gusloc dword ; posn in gusram (not used in file) loopst dword ; loopstart loopen dword ; loopend len dword ; data size IN BYTES data lots of bytes ; unsigned data ; default pan changes the pan on that channel when the sample is CHANGED ; to that sample, experiment in dt2 to see what I mean... ; note that default volume does NOT work like MOD or S3M default volume. ; default volume actually multiplies the volume field for that sample, so ; a default volume of 40h plays the sample as written in the file. a ; default volume of 20h always plays the sample half as loud as written in ; the file . This is a better system, I think... (more logical?!?) ; finally I will explain the format of the orderlist. Each one is 4 bytes, ; and is: x word ; starting position of pattern y byte ; channel number of first channel of pattern pattern byte ; number of pattern ; I think that will make sense, If you see how dt2 works... note that is ; is very important to remember that patterns can overlap. In this case, ; the pattern with the higher x takes priority. If they have the same x, ; then it is the one with the higher pattern number. You can see this in ; dt2 by dragging patterns around on the over view screen. ; of course, this system with x and y for each pattern is what makes dt2 ; unique, and also so hard to play with a standard player!!! hope that helps, and I have not made any errors... any questions, just email. ;--------------------------------------------------------------------------- ; END OF DT2 FORMAT ;--------------------------------------------------------------------------- ps If you want to spread that format, go ahead but please only spread with *ALL* of it, starting from the line that says "format starts here..." thankyou... --------------------------------------------------------------------------- With formats like PLM, with the identifier at the start of the file, it's very easy to rip them, just extract everthing after the marker.. S3I Format Intel byte order Information from File Format List 2.0 by Max Maischein. --------!-CONTACT_INFO---------------------- If you notice any mistakes or omissions, please let me know! It is only with YOUR help that the list can continue to grow. Please send all changes to me rather than distributing a modified version of the list. This file has been authored in the style of the INTERxxy.* file list by Ralf Brown, and uses almost the same format. Please read the file FILEFMTS.1ST before asking me any questions. You may find that they have already been addressed. Max Maischein Max Maischein, 2:244/1106.17 Max_Maischein@spam.fido.de corion@informatik.uni-frankfurt.de Corion on #coders@IRC --------!-DISCLAIMER------------------------ DISCLAIMER: THIS MATERIAL IS PROVIDED "AS IS". I verify the information contained in this list to the best of my ability, but I cannot be held responsible for any problems caused by use or misuse of the information, especially for those file formats foreign to the PC, like AMIGA or SUN file formats. If an information it is marked "guesswork" or undocumented, you should check it carefully to make sure your program will not break with an unexpected value (and please let me know whether or not it works the same way). Information marked with "???" is known to be incomplete or guesswork. Some file formats were not released by their creators, others are regarded as proprietary, which means that if your programs deal with them, you might be looking for trouble. I don't care about this. -------------------------------------------- This is the Digiplayer/ST3.0 digital sample file format. The sample files include information about the loop of the instrument. The AdLib instruments have another format listed below. OFFSET Count TYPE Description 0000h 1 byte ID=01h 0001h 12 char DOS filename 000Dh 1 byte reserved (0) 000Eh 1 word Paragraph offset of the raw sample data from beginning of file. 0010h 1 dword Sample length in bytes 0014h 1 dword Start of sample loop 0018h 1 dword End of sample loop 001Ch 1 byte Playback volumne of sample 001Dh 1 byte ??? "DSK" what ever that means 001Eh 1 byte Pack type 0 - unpacked 1 - DP30ADPCM 1 001Fh 1 byte Flags (bitmapped) 0 - loop on/off 1 - stereo sample (length bytes for left channel, then another length bytes for right channel!) 2 - 16-Bit samples (in Intel byte order) 0020h 1 dword C2 frequency 0024h 1 dword reserved 0028h 1 word reserved 002Ah 1 word ID=512 002Ch 1 dword ?? Date of last modification ?? (see table 0009) 0030h 28 char ASCIIZ Sample name 003Ch 4 char ID='SCRS' 0040h ? byte Raw sample data Here follows the AdLib instrument format for which I don't know the extension (yet) : OFFSET Count TYPE Description 0000h 1 byte Instrument type 2 - melodic instrument 3 - bass drum 4 - snare drum 5 - tom tom 6 - cymbal 7 - hihat 0001h 12 char DOS file name 000Dh 3 byte reserved 0010h 1 byte Modulator description (bitmapped) 0-3 - frequency multiplier 4 - scale envelope 5 - sustain 6 - pitch vibrato 7 - volume vibrato 0011h 1 byte Carrier description (same as modulator) 0012h 1 byte Modulator miscellaneous (bitmapped) 0-5 - 63-volume 6 - MSB of levelscale 7 - LSB of levelscale 0013h 1 byte Carrier description (same as modulator) 0014h 1 byte Modulator attack / decay byte (bitmapped) 0-3 - Decay 4-7 - Attack 0015h 1 byte Carrier description (same as modulator) 0016h 1 byte Modulator sustain / release byte (bitmapped) 0-3 - Release count 4-7 - 15-Sustain 0017h 1 byte Carrier description (same as modulator) 0018h 1 byte Modulator wave select 0019h 1 byte Carrier wave select 001Ah 1 byte Modulator feedback byte (bitmapped) 0 - additive synthesis on/off 1-7 - modulation feedback 001Bh 1 byte reserved 001Ch 1 byte Instrument playback volume 001Dh 1 byte ??? "DSK" 001Eh 1 word reserved 0020h 1 dword C2 frequency 0024h 12 byte reserved 0030h 28 char ASCIIZ Instrument name 004Ch 4 char ID='SCRI' EXTENSION:S3I,SMP OCCURENCES:PC PROGRAMS:ScreamTracker 3.0 SEE ALSO:MTM,S3M,STM This Document release Date: 11/8/93 (ver 1.0 of "SSS-form.txt") THE STUDIO SESSION SONG FILE FORMAT (Editor version 1.0) -------------------------------------------------------- Format created by: Steve Capps , Mark Zimmer, Tom Hedges, Ed Bogas, Nick Borelli, Ty Roberts, and Neil Cormia of Bogas Software in 1986. This hacked-together description by: Jamal Hannah There are 12 fields and 6 tracks to this format.. on old Macs, since you use complex waveforms for these sounds, you'll probabaly have to simulate the multiple sound channels by combining them on the fly with some fancy math. I think there is actualy a Mac Toolbox call to do this in the origional "Sound Driver" chapter of Inside Macintosh. File Signatures (Macintosh Only) Type: 'XSNG' (sometimes 'DSNG' or 'JSNG') Creator: 'XPRT' Offset Field Length 0 Tempo 2 range: 10-450 2 unused 2 should be nul ($0000) 4 TimeSignature 2 decimal ranges of each byte: 1-32,1-32 6 Pascal string names of Instrument Files begin here, each followed by 2 nul bytes. ($00 $00) ?? unused 1 (should be $00.. if there are no instruments, this will come right after TimeSignature) ?? unused 64 ?? Data for track 1, terminated by $B0 ?? Data for track 2, terminated by $B0 ?? Track 3 " ?? Track 4 " ?? Track 5 " ?? Track 6 " Instruments are implicitly numbered from 01 onward, starting with the first one listed. They are the exact name of the Studio Session Instrument file, which should be on the same volume or directory. Track data consists of the following commands, which represent notes and other components on a musical staff: COMMANDS (with fields and field names): ending $C0 xx (endingNumber) timeSignature $BD xx xx (timeSigTop,timeSigBottom) barLine $BA newInstrument $B9 xxxx (instrumentNumber) dashedBarLine $B5 keySignature $B4 xx (keyMode) tempoChange $B3 xxxx (tempoSpeed) repeatBarEnd $B2 repeatBarStart $B1 xxxx (numRepeats) coda $B0 musicalNote xx xx xx (pitch, unit#, slurStatus) NOTE UNITS: A "musicalNote" is really either a rest, or a note. Normaly a unit is a rest, but if it has anything in the pitch field (mentioned above) then it is a note. A rest always has $00 in the pitch and slurStatus fields. unit32 $03 1/32nd rest or note unit32_3 $02 1/32nd rest or note triplet unit32_2 none unit32_1 none unit16 $06 1/16th rest or note unit16_3 $04 1/16th rest or note triplet unit16_2 none unit16_1 $09 1/16th rest or note, dotted unit8 $0C 1/8th rest or note unit8_3 $08 1/8th rest or note triplet unit8_2 $15 1/8th rest or note, double-dotted unit8_1 $12 1/8th rest or note, dotted unit4 $18 1/4th rest or note unit4_3 $10 1/4th rest or note triplet unit4_2 $2A 1/4th rest or note, double-dotted unit4_1 $24 1/4th rest or note, dotted unit2 $30 1/2 rest or note unit2_3 $20 1/2 rest or note triplet unit2_2 $54 1/2 rest or note, double-dotted unit2_1 $48 1/2 rest or note, dotted unit1 $60 Whole rest or note unit1_3 $40 Whole rest or note, triplet unit1_2 $A8 Whole rest or note, double-dotted unit1_1 $90 Whole rest or note, dotted (A triplet is a rest/note with a little 3 over it, with playing length multiplied by 2/3 (shortened). A double-dotted unit's length is multiplied by 1 3/4, and a single dotted unit length is multiplied by 1 1/2... if I am wrong, a good book on musical notation can clear this up!) PITCH: Pitch values range from "C0" (lower C, at the bottom of the scale) up to "C6" (upper C, at the top of the scale).. and are represented by the numbers $01-$2B (1-43). Accidentals: If the pitch value has $40 added to it, then it is "flat". If it is "sharp", it has $80 added. SLUR STATUS: The slur is that little curvy line that links two notes together and "slurs" them together at playtime. Here are the byte values: $00 No slur on this note $01 Slur start/line curving from this note toward one on the right $02 Slur end/line coming from a note to the left $03 Slur joint (two slur lines coming from both left and right) (There are other codes that go here too.. sometimes a note has a little letter floating over it. I still have yet to decode these variations.) BAR LINE: This byte represents the vertical bar line that marks the end of a measure. I'm not sure what a dashed bar line is for! ENDING: An "ending" command is followed by the number that the ending is repeated, ranging from 1-10 ($01-$0A) REPEAT BAR: The Command byte is followed by a word-length number representing the number of times to repeat the following notes. The repeated section is terminated by the next instance of a "repeatBarEnd" command. TIME SIGNATURE: The "timeSignature" command is followed by bytes representing the top and bottom of the time signature, respectivly. KEY SIGNATURE: The key signature command is followed by codes representing the following modes of the notes that follow: Value Key Number to add to pitch-bytes of notes following $00 C Major $00 $01 G Major $00 $02 D Major $80 $03 A Major $80 $04 E Major $80 $05 B Major $80 $06 F Sharp Major $80 $07 C Sharp Major $80 $08 F Major $00 $09 B flat Major $00 $0A E flat Major $00 $0B A flat Major $00 $0C D flat Major $00 $0D G flat Major $40 (note: dont add this value if the note already has a sharp or a flat) TEMPO CHANGE: A tempo change command byte is followed by a word-length value representing the new tempo speed, ranging (in decimal) from 10-450. CODA A coda is a byte representing the end of a track. If a track is empty, the coda byte holds it's place. Since the file ends with the end of the 6th track, there will always be at least one $B0 at the end of the file. SUPER STUDIO SESSION SONG FILE FORMAT (Editor version 2.1) ---------------------------------------------------------- The only real difference between verion 1 and 2 is the addition of two more tracks, for a total of eight. The second two are represented just like the 1st six, with $B0 bytes as ending markers. There is also one additional command: VOLUME: The volume command is represented by (hex) byte $BF, with a word-length value following, and 3 bytes after that which I havn't figured out. Volume always defaults at "fff" (loudest). "ppp" is the softest volume. BF xxxx xx xx xx Volume BF 0000 58 88 9A ppp BF 0001 12 40 80 pp BF 0002 12 40 80 p BF 0003 12 40 80 mp BF 0004 58 88 9A mf BF 0005 12 40 80 f BF 0006 58 88 9A ff BF 0007 58 88 9A fff Editor version 2 also places a (incremental) number above most bar lines, but this has nothing to do with the file format. (SUPER) STUDIO SESSION INSTRUMENT FILES (From Bogus Prod. Docs) --------------------------------------------------------------- The format of instrument files is very simple. The samples are eight bit unsigned samples (silence =128). There is an eight byte header with the following format followed by the samples themselves. No. Bytes Description 2 Loop Start: byte offset of loop start 2 Loop End: byte offset of loop end 1 Recorded pitch: #37 is middle C 1 0: reserved 2 Length in bytes n The samples Note: If loopback is used, there must be at least 370 samples after the loop end. For more information, look at "Flute mid" in SoundEdit(tm) and then look at the binary version of the file with any file utility. Note: some older files are compressed on disk so they won't follow the above description and some files have "0" for the pitch which implies middle C. (Special Note: This file format is basicly the origional Macintosh "Sound Cap" and SoundEdit recorded instrument format (Type/Creator: 'DEWF'/'FSSC' or 'DEWF'/'SFX!') respectivly. - JH ) The SoundEdit manual has a good explanation of sampling techniques. SoundEdit will create Super Studio Session(tm) or Jam Session(tm) instrument files or convert them from most other formats. Bogas Productions 751 Laurel Street, #213 San Carlos, California 94070 Phone: (415) 592-5129 Fax: (415) 592-5196 (April, 1992) --- STM Format Intel byte order Information from File Format List 2.0 by Max Maischein. --------!-CONTACT_INFO---------------------- If you notice any mistakes or omissions, please let me know! It is only with YOUR help that the list can continue to grow. Please send all changes to me rather than distributing a modified version of the list. This file has been authored in the style of the INTERxxy.* file list by Ralf Brown, and uses almost the same format. Please read the file FILEFMTS.1ST before asking me any questions. You may find that they have already been addressed. Max Maischein Max Maischein, 2:244/1106.17 Max_Maischein@spam.fido.de corion@informatik.uni-frankfurt.de Corion on #coders@IRC --------!-DISCLAIMER------------------------ DISCLAIMER: THIS MATERIAL IS PROVIDED "AS IS". I verify the information contained in this list to the best of my ability, but I cannot be held responsible for any problems caused by use or misuse of the information, especially for those file formats foreign to the PC, like AMIGA or SUN file formats. If an information it is marked "guesswork" or undocumented, you should check it carefully to make sure your program will not break with an unexpected value (and please let me know whether or not it works the same way). Information marked with "???" is known to be incomplete or guesswork. Some file formats were not released by their creators, others are regarded as proprietary, which means that if your programs deal with them, you might be looking for trouble. I don't care about this. -------------------------------------------- The ScreamTracker 1.0 format was the module format used by the ScreamTracker before version 2.0. OFFSET Count TYPE Description 0000h 20 char ASCIIZ song name 0014h 8 char Tracker name 001Ch 1 byte ID=1Ah 001Dh 1 byte File type 1 - song (contains no samples) 2 - module (contains samples) 001Eh 1 byte Major version number 001Fh 1 byte Minor version number 0020h 1 byte Playback tempo 0021h 1 byte Number of patterns ="PAT" 0022h 1 byte Global playback volume 0023h 13 byte reserved 0030h 31 rec Instrument data 12 char ASCIIZ instrument name 1 byte ID=0 1 byte Instrument disk 1 word reserved 1 word Sample length in bytes 1 word Sample loop start 1 word Sample loop end 1 byte Sample playback volume 1 byte reserved 1 word C3 frequency in Hz 1 dword reserved 1 word length in paragraphs (only for modules,in songs:reserved) 03D0h 64 byte Pattern orders 0410h 4*64*"PAT" rec Pattern data. Each pattern consists of 64 rows, each 4 channels. The channels are stored from left ro right, row by row. 1 byte Note byte : 251 - last 3 bytes not stored, all bytes 0 252 - last 3 bytes not stored, note -0-, whatever that means. 253 - last 3 bytes not stored, note ... 254 - undefined (reserved for run-time) 255 - undefined (reserved for run-time) otherwise bit mapped : 0-3 : note (c=0,c#=1...) 4-7 : octave 1 byte Only valid if above byte < 251, bit mapped 0-2 ; lower bit of note volume 3-7 : instrument number 1 byte bit mapped 0-3 : Effect command in ProTracker format seems to be overlapped by volume bits... 4-6 : upper bits of volume 1 byte command data in ProTracker format 0410h+ ? byte Raw sample data padded to 16 byte boundaries. 4*64*4*"PAT" EXTENSION:STM OCCURENCES:PC PROGRAMS:ScreamTracker 1.0 REFERENCE: SEE ALSO:S3M,MOD Mysterious's ULTRA TRACKER File Format by FreeJack of The Elven Nation (some additional infos on the new format (V1.4/5) by MAS -> * marked) I've done my best to document the file format of Ultra Tracker (UT). If you find any errors please contact me. The file format has stayed consistent through the first four public releases. At the time of this writting, Ultra Tracker is up to version 1.3 (* With version V1.4/5 there are some changes done in the format. *) Thanks go to : SoJa of YLYSY for help translating stuff. Marc Andr‚ Schallehn Thanks for putting out this GREAT program. Also thanks for the info on 16bit samples. With all this crap out of the way lets get to the format. Sample Structure : ______________________________________________________________________________ 00h Samplename : 32 bytes (Sample name) 20h DosName : 12 bytes (when you load a sample into UT, it records the file name here) 2Ch LoopStart : dbl word (loop start point) 30h LoopEnd : dbl word (loop end point) 34h SizeStart : dbl word (see below) 38h SizeEnd : dbl word (see below) 3Ch volume : byte (UT uses a logarithmic volume setting, 0-255) (* V1.4: uses linear Volume ranging from 0-255 *) 3Dh Bidi Loop : byte (see below) 3Eh FineTune : word (Fine tune setting, uses full word value) ______________________________________________________________________________ 8 Bit Samples : SizeStart : The SizeStart is the starting offset of the sample. This seems to tell UT how to load the sample into the Gus's onboard memory. All the files I have worked with start with a value of 32 for the first sample, and the previous SizeEnd value for all sample after that. (See Example below) If the previous sample was 16bit, then SizeStart = (Last SizeEnd * 2) SizeEnd : Like the SizeStart, SizeEnd seems to tell UT where to load the sample into the Gus's onboard memory. SizeEnd equal SizeStart + the length of the sample. Example : If a UT file had 3 samples, 1st 12000 bytes, 2nd 5600 bytes, 3rd 8000 byte. The SizeStart and SizeEnd would look like this: Sample SizeStart SizeEnd 1st 32 12032 2nd 12032 17632 3rd 17632 25632 ***Note*** Samples may NOT cross 256k boundaries. If a sample is too large to fit into the remaining space, its Sizestart will equal the start of the next 256k boundary. UT does keep track of the free space at the top of the 256k boundaries, and will load a sample in there if it will fit. Example : EndSize = 252144 If the next sample was 12000 bytes, its SizeStart would be 262144, not 252144. Note that this leaves 10000 bytes unused. If any of the following sample could fit between 252144 and 262144, its Sizestart would be 252144. Say that 2 samples after the 12000 byte sample we had a sample that was only 5000 bytes long. Its SizeStart would be 252144 and its SizeEnd would be 257144. This also applies to 16 Bit Samples. 16 Bit Samples : 16 bit samples are handled a little different then 8 bit samples. The SizeStart variable is calculated by dividing offset (last SizeEnd) by 2. The SizeEnd variable equals SizeStart + (SampleLength / 2). If the first sample is 16bit, then SizeStart = 16. Example : sample1 = 8bit, 1000 bytes sample2 = 16bit, 5000 bytes sample1 SizeStart = 32 SizeEnd = 1032 (32 + 1000) sample2 SizeStart = 516 (offset (1032) / 2) SizeEnd = 3016 (516 + (5000/2)) ***Note*** If a 16bit sample is loaded into banks 2,3, or 4 the SizeStart variable will be (offset / 2) + 262144 (bank 2) (offset / 2) + 524288 (bank 3) (offset / 2) + 786432 (bank 4) The SizeEnd variable will be SizeStart + (SampleLength / 2) + 262144 (bank 2) SizeStart + (SampleLength / 2) + 524288 (bank 3) SizeStart + (SampleLength / 2) + 786432 (bank 4) BiDi Loop : (Bidirectional Loop) UT takes advantage of the Gus's ability to loop a sample in several different ways. By setting the Bidi Loop, the sample can be played forward or backwards, looped or not looped. The Bidi variable also tracks the sample resolution (8 or 16 bit). The following table shows the possible values of the Bidi Loop. Bidi = 0 : No looping, forward playback, 8bit sample Bidi = 4 : No Looping, forward playback, 16bit sample Bidi = 8 : Loop Sample, forward playback, 8bit sample Bidi = 12 : Loop Sample, forward playback, 16bit sample Bidi = 24 : Loop Sample, reverse playback 8bit sample Bidi = 28 : Loop Sample, reverse playback, 16bit sample ______________________________________________________________________________ Event Structure: ______________________________________________________________________________ 0 Note : byte (See note table below) 1 SampleNumber : byte (Sample Number) 2 Effect1 : nib (Effect1) 2 Effect2 : nib (Effect2) 3 Effect2Data : byte 4 Effect1Data : byte The High order byte of EffectVar is the Effect variable for Effect1. The Low order byte of EffectVar is the Effect variable for Effect2. ***(Note)*** UT uses a form of compression on repetitive events. Say we read in the first byte, if it = $FC then this signifies a repeat block. The next byte is the repeat count. followed by the event structure to repeat. If the first byte read does NOT = $FC then this is the note of the event. So repeat blocks will be 7 bytes long : RepFlag : byte ($FC) RepCount : byte 0 note : byte 1 samplenumber : byte 2 effect1 : nib effect2 : nib 3 effectVar : word Repeat blocks do NOT bridge patterns. ______________________________________________________________________________ Note Table: ______________________________________________________________________________ note value of 0 = pause C-0 to B-0 1 to 12 C-1 to B-1 13 to 24 C-2 to B-2 26 to 36 C-3 to B-3 39 to 48 C-4 to B-4 52 to 60 ______________________________________________________________________________ Offset Bytes Type Description ______________________________________________________________________________ 0 15 byte ID block : should contain 'MAS_UTrack_V001' (* V1.4: 'MAS_UTrack_V002') (* V1.5: 'MAS_UTrack_V003') 15 32 AsciiZ Song Title 47 1 reserved This byte is reserved and always contain 0; (* V1.4: jump-value: reserved * 32; space between is used for song text; [reserved * 32] = RES ! ) 48+RES 1 byte Number of Samples (NOS) 49+RES NOS * 64 SampleStruct Sample Struct (see Sample Structure) Patt_Seq = 48 + (NOS * 64) + RES Patt_Seq 256 byte Pattern Sequence Patt_Seq+256 1 byte Number Of Channels (NOC) Base 0 Patt_Seq+257 1 byte Number Of patterns (NOP) Base 0 (* V1.5: PAN-Position Table Length: NOC * 1byte [0 left] - [0F right] ) NOC+Patt_Seq+258 varies EventStruct Pattern Data (See Event Structure) ______________________________________________________________________________ The remainder of the file is the raw sample data. (signed) ______________________________________________________________________________ That should about cover it. If you have any questions , feel free to e-mail me at freejack@shell.portal.com I can also be contacted on The UltraSound Connection (813) 787-8644 The UltraSound Connection is a BBS dedicated to the Gravis Ultrasound Card. Also I'm the author of Ripper and Gvoc. If anyone has any questions or problems, please contact me. Creative Voice (VOC) file format -------------------------------- ~From: galt@dsd.es.com (byte numbers are hex!) HEADER (bytes 00-19) Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block] - --------------------------------------------------------------- HEADER: ======= byte # Description ------ ------------------------------------------ 00-12 "Creative Voice File" 13 1A (eof to abort printing of file) 14-15 Offset of first datablock in .voc file (std 1A 00 in Intel Notation) 16-17 Version number (minor,major) (VOC-HDR puts 0A 01) 18-19 2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11) - --------------------------------------------------------------- DATA BLOCK: =========== Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes) NOTE: Terminator Block is an exception -- it has only the TYPE byte. TYPE Description Size (3-byte int) Info ---- ----------- ----------------- ----------------------- 00 Terminator (NONE) (NONE) 01 Sound data 2+length of data * 02 Sound continue length of data Voice Data 03 Silence 3 ** 04 Marker 2 Marker# (2 bytes) 05 ASCII length of string null terminated string 06 Repeat 2 Count# (2 bytes) 07 End repeat 0 (NONE) 08 Extended 4 *** *Sound Info Format: **Silence Info Format: --------------------- ---------------------------- 00 Sample Rate 00-01 Length of silence - 1 01 Compression Type 02 Sample Rate 02+ Voice Data ***Extended Info Format: --------------------- 00-01 Time Constant: Mono: 65536 - (256000000/sample_rate) Stereo: 65536 - (25600000/(2*sample_rate)) 02 Pack 03 Mode: 0 = mono 1 = stereo Marker# -- Driver keeps the most recent marker in a status byte Count# -- Number of repetitions + 1 Count# may be 1 to FFFE for 0 - FFFD repetitions or FFFF for endless repetitions Sample Rate -- SR byte = 256-(1000000/sample_rate) Length of silence -- in units of sampling cycle Compression Type -- of voice data 8-bits = 0 4-bits = 1 2.6-bits = 2 2-bits = 3 Multi DAC = 3+(# of channels) [interesting-- this isn't in the developer's manual] AVR (Audio Visual Research) sound format ---------------------------------------- version 1.0 - Atari ST/STE format - developped by 2-BIT systems (Microdeal) - source : ST mag #42, pages 26, by Sebastien Mougey - 0xnnnn are hexadecimal values offset type length name comments -------------------------------------------------------------------------------- 0 char 4 ID format ID == "2BIT" 4 char 8 name sample name (unused space filled with 0) 12 short 1 mono/stereo 0=mono, -1 (0xffff)=stereo With stereo, samples are alternated, the first voice is the left : (LRLRLRLRLRLRLRLRLR...) 14 short 1 resolution 8, 12 or 16 (bits) 16 short 1 signed or not 0=unsigned, -1 (0xffff)=signed 18 short 1 loop or not 0=no loop, -1 (0xffff)=loop on 20 short 1 MIDI note 0xffnn, where 0<=nn<=127 0xffff means "no MIDI note defined" 22 byte 1 Replay speed Frequence in the Replay software 0=5.485 Khz, 1=8.084 Khz, 2=10.971 Khz, 3=16.168 Khz, 4=21.942 Khz, 5=32.336 Khz 6=43.885 Khz, 7=47.261 Khz -1 (0xff)=no defined Frequence 23 byte 3 sample rate in Hertz 26 long 1 size in bytes (2*bytes in stereo) 30 long 1 loop begin 0 for no loop 34 long 1 loop size equal to 'size' for no loop 38 byte 26 reserved filled with 0 64 byte 64 user data 128 bytes ? sample data (12 bits samples are coded on 16 bits : 0000 xxxx xxxx xxxx) ------------------------------------------------------------------------------- Example: -------- 0 "2BIT" 4 "lovebeat" 12 0x0000 mono 14 0x0010 16 bits 16 0xffff signed 18 0xffff loop on 20 0xffff no MIDI note 22 0xf0 Replay freq 23 0x007441 freq = 29.761 Khz 26 0x00012624 size = 75300 samples 30 0x000001d1 loop begin = 465 34 0x000119f0 loop end = 72176 38 0000 00000000 "AVR by P. Segerdahl " 64 Converted with "Zero-X" written by Peter Segerdahl, 1994 Sweden 128 0x0000 0x0001 0xfff6 0xfff7 ... 0x24CC0 0xFFB3 0xFFE7 0x0087 0x0065 file size = 128 bytes header + 75300*16 bits = 0x24cc8 bytes ---------------------------------------------------------------- from gravis PATCHKIT: /* Original: fevrier 1993 Derniere beta: 15 avril 1993 Derniere modification: 30 juin 1993 Par: Francois Dion (dionf@ere.umontreal.ca) Bug: ? Manque: ? */ A .pat file is divided into several sections. In order you would find: +----------------+ | | In the header you find the version and | Header | all the other informations that can be | | found in the header structure. Only one +----------------+ header to be found in a .pat file. +----------------+ | | This info can be found in the inst | Instrument 0 | structure. There can be 65536 instruments | | in a .pat, but it cant be reached with the +----------------+ memory on the GUS. +----------------+ | | Layer structure. There can be up to | Layer 0 | four layers numbered 0-3. It doesn't | | seem to work properly with the current +----------------+ driver to have more than 1 layer. +----------------+ | | Wave structure. There can be 1-16 | Waveheader 1 | waveheaders. | | +----------------+ +----------------+ | | This is the wavesample data. It | Wavesample 1 | can be 8 bit, 16 bit, signed or | | unsigned, depending on the mode +----------------+ flag in the waveheader 1. * . . . +----------------+ | | You can use 1 wave but if you use | Waveheader n | more (up to n=16) you would | | simply put the header and wave +----------------+ one after the other. . . . +----------------+ | | Place where another layer would | Layer x | be if needed. In layer 1,2 and 3 | | you can either put waveheaders and +----------------+ wavesamples like in layer 0, but, you can also turn on a flag so that the layer uses the wavesample of the pre- vious layer. You would only need to put the waveheaders. . . . +----------------+ | | This would be where you would put instrument | Instrument t | if needed. It is a good idea to not put | | more than 1 instrument in each .pat +----------------+ Each block is of fixed length which could be modified in a new version of the .pat format, so it is better to code in relation to the structs. For example, if you want the size of the layer block, you would do a sizeof( struct layer) in C. There is one exception to the fixed length, and it is the wavesample block. Since it can contain anything from beeing non-existant, beeing several bytes long or could be about 1 Mb (for now, the windows driver doesn't handle correctly a wavesample over 64 Kb, and it looks like 256 Kb is the physical limit). It's size is contained in the waveheader block (struct wave). For more information on the way data is organised and what type of parameters are available, refer to patch.h. * The wavesample MUST be in Intel format when 16 bits. An Intel word is stored YY XX (ex: 25371 is stored in the .pat as 1B 63) while a Motorola word is stored XX YY (ex: 25371 is stored on a Macintosh as 63 1B). This is important to note when you take aiff samples or raw samples from a Macintosh or most sampler synths. You can use the -x option of sox to invert the two bytes or you can use wsd2snd (M. Chen) which is faster. /* Original: 11 octobre 1992 Derniere modification: 15 avril 1993 Par: Francois Dion (dionf@ere.umontreal.ca) Bug: non Note: Toute l'information contenue ici a ete trouvee independamment par plusieurs usagers de la liste gusdev. La seule information qui vient du patch.h de Gravis est pour channels, scale_freq et scale_factor. All the information contained here was found independently by members of the gusdev list. The only information from Gravis patch.h is for channels, scale_freq and scale_factor. Il est preferable d'utiliser le patch.h de Gravis pour eviter la confusion. Ce patch.h est distribue car il est annote de commentaires, ce qui manque nettement dans le sdk. It is preferable to use patch.h from Gravis to avoid confusion. This patch.h is mostly distributed because it is largely commented, a feature that is clearly lacking in the sdk. */ #define ENVELOPES 6 typedef unsigned char byte; typedef unsigned int word; typedef unsigned long int lwrd; struct header { char version[12]; /* Null terminated string. "GF1PATCH100" for v.1.00 and "GF1PATCH110" for v.1.10 */ char id[10]; /* Null terminated string. always ID#000002 */ char description[60]; /* Null terminated string. Gravis uses for copyright notice, but can be anything */ byte instrum; /* number of instruments in the patch */ char voices; /* number of voices (typically 14) char channels; /* number of wav channels that can be played concurently to the patch */ word waves; /* Total number of waveforms for all the .pat */ word volume; /* Master volume */ lwrd size; /* Memory that the patch takes in the DRAM? */ char reserved[36]; }; struct inst { word id; /* Instrument id: 0-65535 */ char name[16]; /* Name of instrument. Gravis doesn't seem to use it, but i think it's a good thing to do. You can have better names than GM */ lwrd size; /* Number of bytes for the instrument with header. To skip to next instrument or get eof if no other instrument is present */ char layers; /* Number of layers in instrument: 1-4 */ char reserved[40]; }; struct layer { char previous; /* If !=0 the wavesample to use is from the previous layer. The waveheader is still needed (what would be the point anyway) */ char id; /* Layer id: 0-3 lwrd size; /* data size in bytes in the layer, without the header. to skip to next layer for example: lseek( handle, temp_layer.size, SEEK_CUR); */ char samples; /* number of wavesamples char reserved[40]; }; struct wave { char name[7]; /* null terminated string. name of the wave. I use high, low etc... or root note as names */ byte fractions; /* Start loop point fraction in 4 bits + End loop point fraction in the 4 other bits. It is used when the loop point should be between two samples */ lwrd size; /* total size of wavesample. limited to 65535 now by the drivers, not the card. */ lwrd start_loop; /* start loop position in the wavesample */ lwrd end_loop; /* end loop position in the wavesample */ word sample_rate; /* Rate at which the wavesample has been sampled */ lwrd low_freq; /* check note.h for the correspondance. */ lwrd high_freq; /* same thing */ lwrd root_freq; /* same thing */ int tune; /* fine tune. -512 to +512, EXCLUDING 0 cause it is a multiplier. 512 is one octave off, and 1 is a neutral value */ byte balance; /* Balance: 0-15. 0=full left, 15 = full right 7 = approximately center (a little left offset), 8 = approximately center (a little right offset) */ byte env_rate[ ENVELOPES ]; /* attack rates */ byte env_offset[ ENVELOPES ]; /* attack volumes It can be represented like this (the enveloppe is totally bogus, it is just to show the concept): | | /----` | | | /------/ `\ | | | | | | / \ | | | | | | / \ | | | | | |/ \ | | | | | ---------------------------- | | | | | | <---> attack rate 0 0 1 2 3 4 5 amplitudes <----> attack rate 1 <> attack rate 2 <--> attack rate 3 <> attack rate 4 <-----> attack rate 5 */ byte tremolo_sweep; /* tremolo sweep */ byte tremolo_rate; /* tremolo rate */ byte tremolo_depth; /* tremolo depth */ byte vibrato_sweep; /* vibrato sweep */ byte vibrato_rate; /* vibrato rate (lfo) */ byte vibrato_depth; /* vibrato depth */ /* no selectable waveform for the lfo: sine */ char modes; /* bit 0: 8/16 bit */ /* bit 1: Signed/Unsigned */ /* bit 2: off/on looping */ /* bit 3: off/on bidirectionnal looping */ /* bit 4: off/on backward looping */ /* bit 5: off/on sustaining (3rd point in env.) */ /* bit 6: off/on enveloppes */ /* bit 7: off/on clamped release (6th point, env) */ int scale_freq; /* scale frequency ? */ word scale_factor; /* scale factor ? */ char reserved[36]; };