Documentation/sound/designs/compress-offload.rst

   1 =========================
   2 ALSA Compress-Offload API
   3 =========================
   4
   5 Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
   6
   7 Vinod Koul <vinod.koul@linux.intel.com>
   8
   9
  10 Overview
  11 ========
  12 Since its early days, the ALSA API was defined with PCM support or
  13 constant bitrates payloads such as IEC61937 in mind. Arguments and
  14 returned values in frames are the norm, making it a challenge to
  15 extend the existing API to compressed data streams.
  16
  17 In recent years, audio digital signal processors (DSP) were integrated
  18 in system-on-chip designs, and DSPs are also integrated in audio
  19 codecs. Processing compressed data on such DSPs results in a dramatic
  20 reduction of power consumption compared to host-based
  21 processing. Support for such hardware has not been very good in Linux,
  22 mostly because of a lack of a generic API available in the mainline
  23 kernel.
  24
  25 Rather than requiring a compatibility break with an API change of the
  26 ALSA PCM interface, a new 'Compressed Data' API is introduced to
  27 provide a control and data-streaming interface for audio DSPs.
  28
  29 The design of this API was inspired by the 2-year experience with the
  30 Intel Moorestown SOC, with many corrections required to upstream the
  31 API in the mainline kernel instead of the staging tree and make it
  32 usable by others.
  33
  34
  35 Requirements
  36 ============
  37 The main requirements are:
  38
  39 - separation between byte counts and time. Compressed formats may have
  40   a header per file, per frame, or no header at all. The payload size
  41   may vary from frame-to-frame. As a result, it is not possible to
  42   estimate reliably the duration of audio buffers when handling
  43   compressed data. Dedicated mechanisms are required to allow for
  44   reliable audio-video synchronization, which requires precise
  45   reporting of the number of samples rendered at any given time.
  46
  47 - Handling of multiple formats. PCM data only requires a specification
  48   of the sampling rate, number of channels and bits per sample. In
  49   contrast, compressed data comes in a variety of formats. Audio DSPs
  50   may also provide support for a limited number of audio encoders and
  51   decoders embedded in firmware, or may support more choices through
  52   dynamic download of libraries.
  53
  54 - Focus on main formats. This API provides support for the most
  55   popular formats used for audio and video capture and playback. It is
  56   likely that as audio compression technology advances, new formats
  57   will be added.
  58
  59 - Handling of multiple configurations. Even for a given format like
  60   AAC, some implementations may support AAC multichannel but HE-AAC
  61   stereo. Likewise WMA10 level M3 may require too much memory and cpu
  62   cycles. The new API needs to provide a generic way of listing these
  63   formats.
  64
  65 - Rendering/Grabbing only. This API does not provide any means of
  66   hardware acceleration, where PCM samples are provided back to
  67   user-space for additional processing. This API focuses instead on
  68   streaming compressed data to a DSP, with the assumption that the
  69   decoded samples are routed to a physical output or logical back-end.
  70
  71 - Complexity hiding. Existing user-space multimedia frameworks all
  72   have existing enums/structures for each compressed format. This new
  73   API assumes the existence of a platform-specific compatibility layer
  74   to expose, translate and make use of the capabilities of the audio
  75   DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
  76   applications are not supposed to make use of this API.
  77
  78
  79 Design
  80 ======
  81 The new API shares a number of concepts with the PCM API for flow
  82 control. Start, pause, resume, drain and stop commands have the same
  83 semantics no matter what the content is.
  84
  85 The concept of memory ring buffer divided in a set of fragments is
  86 borrowed from the ALSA PCM API. However, only sizes in bytes can be
  87 specified.
  88
  89 Seeks/trick modes are assumed to be handled by the host.
  90
  91 The notion of rewinds/forwards is not supported. Data committed to the
  92 ring buffer cannot be invalidated, except when dropping all buffers.
  93
  94 The Compressed Data API does not make any assumptions on how the data
  95 is transmitted to the audio DSP. DMA transfers from main memory to an
  96 embedded audio cluster or to a SPI interface for external DSPs are
  97 possible. As in the ALSA PCM case, a core set of routines is exposed;
  98 each driver implementer will have to write support for a set of
  99 mandatory routines and possibly make use of optional ones.
 100
 101 The main additions are
 102
 103 get_caps
 104   This routine returns the list of audio formats supported. Querying the
 105   codecs on a capture stream will return encoders, decoders will be
 106   listed for playback streams.
 107
 108 get_codec_caps
 109   For each codec, this routine returns a list of
 110   capabilities. The intent is to make sure all the capabilities
 111   correspond to valid settings, and to minimize the risks of
 112   configuration failures. For example, for a complex codec such as AAC,
 113   the number of channels supported may depend on a specific profile. If
 114   the capabilities were exposed with a single descriptor, it may happen
 115   that a specific combination of profiles/channels/formats may not be
 116   supported. Likewise, embedded DSPs have limited memory and cpu cycles,
 117   it is likely that some implementations make the list of capabilities
 118   dynamic and dependent on existing workloads. In addition to codec
 119   settings, this routine returns the minimum buffer size handled by the
 120   implementation. This information can be a function of the DMA buffer
 121   sizes, the number of bytes required to synchronize, etc, and can be
 122   used by userspace to define how much needs to be written in the ring
 123   buffer before playback can start.
 124
 125 set_params
 126   This routine sets the configuration chosen for a specific codec. The
 127   most important field in the parameters is the codec type; in most
 128   cases decoders will ignore other fields, while encoders will strictly
 129   comply to the settings
 130
 131 get_params
 132   This routines returns the actual settings used by the DSP. Changes to
 133   the settings should remain the exception.
 134
 135 get_timestamp
 136   The timestamp becomes a multiple field structure. It lists the number
 137   of bytes transferred, the number of samples processed and the number
 138   of samples rendered/grabbed. All these values can be used to determine
 139   the average bitrate, figure out if the ring buffer needs to be
 140   refilled or the delay due to decoding/encoding/io on the DSP.
 141
 142 Note that the list of codecs/profiles/modes was derived from the
 143 OpenMAX AL specification instead of reinventing the wheel.
 144 Modifications include:
 145 - Addition of FLAC and IEC formats
 146 - Merge of encoder/decoder capabilities
 147 - Profiles/modes listed as bitmasks to make descriptors more compact
 148 - Addition of set_params for decoders (missing in OpenMAX AL)
 149 - Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
 150 - Addition of format information for WMA
 151 - Addition of encoding options when required (derived from OpenMAX IL)
 152 - Addition of rateControlSupported (missing in OpenMAX AL)
 153
 154 State Machine
 155 =============
 156
 157 The compressed audio stream state machine is described below ::
 158
 159                                         +----------+
 160                                         |          |
 161                                         |   OPEN   |
 162                                         |          |
 163                                         +----------+
 164                                              |
 165                                              |
 166                                              | compr_set_params()
 167                                              |
 168                                              v
 169          compr_free()                  +----------+
 170   +------------------------------------|          |
 171   |                                    |   SETUP  |
 172   |          +-------------------------|          |<-------------------------+
 173   |          |       compr_write()     +----------+                          |
 174   |          |                              ^                                |
 175   |          |                              | compr_drain_notify()           |
 176   |          |                              |        or                      |
 177   |          |                              |     compr_stop()               |
 178   |          |                              |                                |
 179   |          |                         +----------+                          |
 180   |          |                         |          |                          |
 181   |          |                         |   DRAIN  |                          |
 182   |          |                         |          |                          |
 183   |          |                         +----------+                          |
 184   |          |                              ^                                |
 185   |          |                              |                                |
 186   |          |                              | compr_drain()                  |
 187   |          |                              |                                |
 188   |          v                              |                                |
 189   |    +----------+                    +----------+                          |
 190   |    |          |    compr_start()   |          |        compr_stop()      |
 191   |    | PREPARE  |------------------->|  RUNNING |--------------------------+
 192   |    |          |                    |          |                          |
 193   |    +----------+                    +----------+                          |
 194   |          |                            |    ^                             |
 195   |          |compr_free()                |    |                             |
 196   |          |              compr_pause() |    | compr_resume()              |
 197   |          |                            |    |                             |
 198   |          v                            v    |                             |
 199   |    +----------+                   +----------+                           |
 200   |    |          |                   |          |         compr_stop()      |
 201   +--->|   FREE   |                   |  PAUSE   |---------------------------+
 202        |          |                   |          |
 203        +----------+                   +----------+
 204
 205
 206 Gapless Playback
 207 ================
 208 When playing thru an album, the decoders have the ability to skip the encoder
 209 delay and padding and directly move from one track content to another. The end
 210 user can perceive this as gapless playback as we don't have silence while
 211 switching from one track to another
 212
 213 Also, there might be low-intensity noises due to encoding. Perfect gapless is
 214 difficult to reach with all types of compressed data, but works fine with most
 215 music content. The decoder needs to know the encoder delay and encoder padding.
 216 So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers
 217 and are not present by default in the bitstream, hence the need for a new
 218 interface to pass this information to the DSP. Also DSP and userspace needs to
 219 switch from one track to another and start using data for second track.
 220
 221 The main additions are:
 222
 223 set_metadata
 224   This routine sets the encoder delay and encoder padding. This can be used by
 225   decoder to strip the silence. This needs to be set before the data in the track
 226   is written.
 227
 228 set_next_track
 229   This routine tells DSP that metadata and write operation sent after this would
 230   correspond to subsequent track
 231
 232 partial drain
 233   This is called when end of file is reached. The userspace can inform DSP that
 234   EOF is reached and now DSP can start skipping padding delay. Also next write
 235   data would belong to next track
 236
 237 Sequence flow for gapless would be:
 238 - Open
 239 - Get caps / codec caps
 240 - Set params
 241 - Set metadata of the first track
 242 - Fill data of the first track
 243 - Trigger start
 244 - User-space finished sending all,
 245 - Indicate next track data by sending set_next_track
 246 - Set metadata of the next track
 247 - then call partial_drain to flush most of buffer in DSP
 248 - Fill data of the next track
 249 - DSP switches to second track
 250
 251 (note: order for partial_drain and write for next track can be reversed as well)
 252
 253 Gapless Playback SM
 254 ===================
 255
 256 For Gapless, we move from running state to partial drain and back, along
 257 with setting of meta_data and signalling for next track ::
 258
 259
 260                                         +----------+
 261                 compr_drain_notify()    |          |
 262               +------------------------>|  RUNNING |
 263               |                         |          |
 264               |                         +----------+
 265               |                              |
 266               |                              |
 267               |                              | compr_next_track()
 268               |                              |
 269               |                              V
 270               |                         +----------+
 271               |    compr_set_params()   |          |
 272               |             +-----------|NEXT_TRACK|
 273               |             |           |          |
 274               |             |           +--+-------+
 275               |             |              | |
 276               |             +--------------+ |
 277               |                              |
 278               |                              | compr_partial_drain()
 279               |                              |
 280               |                              V
 281               |                         +----------+
 282               |                         |          |
 283               +------------------------ | PARTIAL_ |
 284                                         |  DRAIN   |
 285                                         +----------+
 286
 287 Not supported
 288 =============
 289 - Support for VoIP/circuit-switched calls is not the target of this
 290   API. Support for dynamic bit-rate changes would require a tight
 291   coupling between the DSP and the host stack, limiting power savings.
 292
 293 - Packet-loss concealment is not supported. This would require an
 294   additional interface to let the decoder synthesize data when frames
 295   are lost during transmission. This may be added in the future.
 296
 297 - Volume control/routing is not handled by this API. Devices exposing a
 298   compressed data interface will be considered as regular ALSA devices;
 299   volume changes and routing information will be provided with regular
 300   ALSA kcontrols.
 301
 302 - Embedded audio effects. Such effects should be enabled in the same
 303   manner, no matter if the input was PCM or compressed.
 304
 305 - multichannel IEC encoding. Unclear if this is required.
 306
 307 - Encoding/decoding acceleration is not supported as mentioned
 308   above. It is possible to route the output of a decoder to a capture
 309   stream, or even implement transcoding capabilities. This routing
 310   would be enabled with ALSA kcontrols.
 311
 312 - Audio policy/resource management. This API does not provide any
 313   hooks to query the utilization of the audio DSP, nor any preemption
 314   mechanisms.
 315
 316 - No notion of underrun/overrun. Since the bytes written are compressed
 317   in nature and data written/read doesn't translate directly to
 318   rendered output in time, this does not deal with underrun/overrun and
 319   maybe dealt in user-library
 320
 321
 322 Credits
 323 =======
 324 - Mark Brown and Liam Girdwood for discussions on the need for this API
 325 - Harsha Priya for her work on intel_sst compressed API
 326 - Rakesh Ughreja for valuable feedback
 327 - Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
 328   demonstrating and quantifying the benefits of audio offload on a
 329   real platform.