Kyle Swenson | 8d8f654 | 2021-03-15 11:02:55 -0600 | [diff] [blame^] | 1 | The MSI Driver Guide HOWTO |
| 2 | Tom L Nguyen tom.l.nguyen@intel.com |
| 3 | 10/03/2003 |
| 4 | Revised Feb 12, 2004 by Martine Silbermann |
| 5 | email: Martine.Silbermann@hp.com |
| 6 | Revised Jun 25, 2004 by Tom L Nguyen |
| 7 | Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com> |
| 8 | Copyright 2003, 2008 Intel Corporation |
| 9 | |
| 10 | 1. About this guide |
| 11 | |
| 12 | This guide describes the basics of Message Signaled Interrupts (MSIs), |
| 13 | the advantages of using MSI over traditional interrupt mechanisms, how |
| 14 | to change your driver to use MSI or MSI-X and some basic diagnostics to |
| 15 | try if a device doesn't support MSIs. |
| 16 | |
| 17 | |
| 18 | 2. What are MSIs? |
| 19 | |
| 20 | A Message Signaled Interrupt is a write from the device to a special |
| 21 | address which causes an interrupt to be received by the CPU. |
| 22 | |
| 23 | The MSI capability was first specified in PCI 2.2 and was later enhanced |
| 24 | in PCI 3.0 to allow each interrupt to be masked individually. The MSI-X |
| 25 | capability was also introduced with PCI 3.0. It supports more interrupts |
| 26 | per device than MSI and allows interrupts to be independently configured. |
| 27 | |
| 28 | Devices may support both MSI and MSI-X, but only one can be enabled at |
| 29 | a time. |
| 30 | |
| 31 | |
| 32 | 3. Why use MSIs? |
| 33 | |
| 34 | There are three reasons why using MSIs can give an advantage over |
| 35 | traditional pin-based interrupts. |
| 36 | |
| 37 | Pin-based PCI interrupts are often shared amongst several devices. |
| 38 | To support this, the kernel must call each interrupt handler associated |
| 39 | with an interrupt, which leads to reduced performance for the system as |
| 40 | a whole. MSIs are never shared, so this problem cannot arise. |
| 41 | |
| 42 | When a device writes data to memory, then raises a pin-based interrupt, |
| 43 | it is possible that the interrupt may arrive before all the data has |
| 44 | arrived in memory (this becomes more likely with devices behind PCI-PCI |
| 45 | bridges). In order to ensure that all the data has arrived in memory, |
| 46 | the interrupt handler must read a register on the device which raised |
| 47 | the interrupt. PCI transaction ordering rules require that all the data |
| 48 | arrive in memory before the value may be returned from the register. |
| 49 | Using MSIs avoids this problem as the interrupt-generating write cannot |
| 50 | pass the data writes, so by the time the interrupt is raised, the driver |
| 51 | knows that all the data has arrived in memory. |
| 52 | |
| 53 | PCI devices can only support a single pin-based interrupt per function. |
| 54 | Often drivers have to query the device to find out what event has |
| 55 | occurred, slowing down interrupt handling for the common case. With |
| 56 | MSIs, a device can support more interrupts, allowing each interrupt |
| 57 | to be specialised to a different purpose. One possible design gives |
| 58 | infrequent conditions (such as errors) their own interrupt which allows |
| 59 | the driver to handle the normal interrupt handling path more efficiently. |
| 60 | Other possible designs include giving one interrupt to each packet queue |
| 61 | in a network card or each port in a storage controller. |
| 62 | |
| 63 | |
| 64 | 4. How to use MSIs |
| 65 | |
| 66 | PCI devices are initialised to use pin-based interrupts. The device |
| 67 | driver has to set up the device to use MSI or MSI-X. Not all machines |
| 68 | support MSIs correctly, and for those machines, the APIs described below |
| 69 | will simply fail and the device will continue to use pin-based interrupts. |
| 70 | |
| 71 | 4.1 Include kernel support for MSIs |
| 72 | |
| 73 | To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI |
| 74 | option enabled. This option is only available on some architectures, |
| 75 | and it may depend on some other options also being set. For example, |
| 76 | on x86, you must also enable X86_UP_APIC or SMP in order to see the |
| 77 | CONFIG_PCI_MSI option. |
| 78 | |
| 79 | 4.2 Using MSI |
| 80 | |
| 81 | Most of the hard work is done for the driver in the PCI layer. It simply |
| 82 | has to request that the PCI layer set up the MSI capability for this |
| 83 | device. |
| 84 | |
| 85 | 4.2.1 pci_enable_msi |
| 86 | |
| 87 | int pci_enable_msi(struct pci_dev *dev) |
| 88 | |
| 89 | A successful call allocates ONE interrupt to the device, regardless |
| 90 | of how many MSIs the device supports. The device is switched from |
| 91 | pin-based interrupt mode to MSI mode. The dev->irq number is changed |
| 92 | to a new number which represents the message signaled interrupt; |
| 93 | consequently, this function should be called before the driver calls |
| 94 | request_irq(), because an MSI is delivered via a vector that is |
| 95 | different from the vector of a pin-based interrupt. |
| 96 | |
| 97 | 4.2.2 pci_enable_msi_range |
| 98 | |
| 99 | int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec) |
| 100 | |
| 101 | This function allows a device driver to request any number of MSI |
| 102 | interrupts within specified range from 'minvec' to 'maxvec'. |
| 103 | |
| 104 | If this function returns a positive number it indicates the number of |
| 105 | MSI interrupts that have been successfully allocated. In this case |
| 106 | the device is switched from pin-based interrupt mode to MSI mode and |
| 107 | updates dev->irq to be the lowest of the new interrupts assigned to it. |
| 108 | The other interrupts assigned to the device are in the range dev->irq |
| 109 | to dev->irq + returned value - 1. Device driver can use the returned |
| 110 | number of successfully allocated MSI interrupts to further allocate |
| 111 | and initialize device resources. |
| 112 | |
| 113 | If this function returns a negative number, it indicates an error and |
| 114 | the driver should not attempt to request any more MSI interrupts for |
| 115 | this device. |
| 116 | |
| 117 | This function should be called before the driver calls request_irq(), |
| 118 | because MSI interrupts are delivered via vectors that are different |
| 119 | from the vector of a pin-based interrupt. |
| 120 | |
| 121 | It is ideal if drivers can cope with a variable number of MSI interrupts; |
| 122 | there are many reasons why the platform may not be able to provide the |
| 123 | exact number that a driver asks for. |
| 124 | |
| 125 | There could be devices that can not operate with just any number of MSI |
| 126 | interrupts within a range. See chapter 4.3.1.3 to get the idea how to |
| 127 | handle such devices for MSI-X - the same logic applies to MSI. |
| 128 | |
| 129 | 4.2.1.1 Maximum possible number of MSI interrupts |
| 130 | |
| 131 | The typical usage of MSI interrupts is to allocate as many vectors as |
| 132 | possible, likely up to the limit returned by pci_msi_vec_count() function: |
| 133 | |
| 134 | static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) |
| 135 | { |
| 136 | return pci_enable_msi_range(pdev, 1, nvec); |
| 137 | } |
| 138 | |
| 139 | Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive, |
| 140 | the value of 0 would be meaningless and could result in error. |
| 141 | |
| 142 | Some devices have a minimal limit on number of MSI interrupts. |
| 143 | In this case the function could look like this: |
| 144 | |
| 145 | static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) |
| 146 | { |
| 147 | return pci_enable_msi_range(pdev, FOO_DRIVER_MINIMUM_NVEC, nvec); |
| 148 | } |
| 149 | |
| 150 | 4.2.1.2 Exact number of MSI interrupts |
| 151 | |
| 152 | If a driver is unable or unwilling to deal with a variable number of MSI |
| 153 | interrupts it could request a particular number of interrupts by passing |
| 154 | that number to pci_enable_msi_range() function as both 'minvec' and 'maxvec' |
| 155 | parameters: |
| 156 | |
| 157 | static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) |
| 158 | { |
| 159 | return pci_enable_msi_range(pdev, nvec, nvec); |
| 160 | } |
| 161 | |
| 162 | Note, unlike pci_enable_msi_exact() function, which could be also used to |
| 163 | enable a particular number of MSI-X interrupts, pci_enable_msi_range() |
| 164 | returns either a negative errno or 'nvec' (not negative errno or 0 - as |
| 165 | pci_enable_msi_exact() does). |
| 166 | |
| 167 | 4.2.1.3 Single MSI mode |
| 168 | |
| 169 | The most notorious example of the request type described above is |
| 170 | enabling the single MSI mode for a device. It could be done by passing |
| 171 | two 1s as 'minvec' and 'maxvec': |
| 172 | |
| 173 | static int foo_driver_enable_single_msi(struct pci_dev *pdev) |
| 174 | { |
| 175 | return pci_enable_msi_range(pdev, 1, 1); |
| 176 | } |
| 177 | |
| 178 | Note, unlike pci_enable_msi() function, which could be also used to |
| 179 | enable the single MSI mode, pci_enable_msi_range() returns either a |
| 180 | negative errno or 1 (not negative errno or 0 - as pci_enable_msi() |
| 181 | does). |
| 182 | |
| 183 | 4.2.3 pci_enable_msi_exact |
| 184 | |
| 185 | int pci_enable_msi_exact(struct pci_dev *dev, int nvec) |
| 186 | |
| 187 | This variation on pci_enable_msi_range() call allows a device driver to |
| 188 | request exactly 'nvec' MSIs. |
| 189 | |
| 190 | If this function returns a negative number, it indicates an error and |
| 191 | the driver should not attempt to request any more MSI interrupts for |
| 192 | this device. |
| 193 | |
| 194 | By contrast with pci_enable_msi_range() function, pci_enable_msi_exact() |
| 195 | returns zero in case of success, which indicates MSI interrupts have been |
| 196 | successfully allocated. |
| 197 | |
| 198 | 4.2.4 pci_disable_msi |
| 199 | |
| 200 | void pci_disable_msi(struct pci_dev *dev) |
| 201 | |
| 202 | This function should be used to undo the effect of pci_enable_msi_range(). |
| 203 | Calling it restores dev->irq to the pin-based interrupt number and frees |
| 204 | the previously allocated MSIs. The interrupts may subsequently be assigned |
| 205 | to another device, so drivers should not cache the value of dev->irq. |
| 206 | |
| 207 | Before calling this function, a device driver must always call free_irq() |
| 208 | on any interrupt for which it previously called request_irq(). |
| 209 | Failure to do so results in a BUG_ON(), leaving the device with |
| 210 | MSI enabled and thus leaking its vector. |
| 211 | |
| 212 | 4.2.4 pci_msi_vec_count |
| 213 | |
| 214 | int pci_msi_vec_count(struct pci_dev *dev) |
| 215 | |
| 216 | This function could be used to retrieve the number of MSI vectors the |
| 217 | device requested (via the Multiple Message Capable register). The MSI |
| 218 | specification only allows the returned value to be a power of two, |
| 219 | up to a maximum of 2^5 (32). |
| 220 | |
| 221 | If this function returns a negative number, it indicates the device is |
| 222 | not capable of sending MSIs. |
| 223 | |
| 224 | If this function returns a positive number, it indicates the maximum |
| 225 | number of MSI interrupt vectors that could be allocated. |
| 226 | |
| 227 | 4.3 Using MSI-X |
| 228 | |
| 229 | The MSI-X capability is much more flexible than the MSI capability. |
| 230 | It supports up to 2048 interrupts, each of which can be controlled |
| 231 | independently. To support this flexibility, drivers must use an array of |
| 232 | `struct msix_entry': |
| 233 | |
| 234 | struct msix_entry { |
| 235 | u16 vector; /* kernel uses to write alloc vector */ |
| 236 | u16 entry; /* driver uses to specify entry */ |
| 237 | }; |
| 238 | |
| 239 | This allows for the device to use these interrupts in a sparse fashion; |
| 240 | for example, it could use interrupts 3 and 1027 and yet allocate only a |
| 241 | two-element array. The driver is expected to fill in the 'entry' value |
| 242 | in each element of the array to indicate for which entries the kernel |
| 243 | should assign interrupts; it is invalid to fill in two entries with the |
| 244 | same number. |
| 245 | |
| 246 | 4.3.1 pci_enable_msix_range |
| 247 | |
| 248 | int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, |
| 249 | int minvec, int maxvec) |
| 250 | |
| 251 | Calling this function asks the PCI subsystem to allocate any number of |
| 252 | MSI-X interrupts within specified range from 'minvec' to 'maxvec'. |
| 253 | The 'entries' argument is a pointer to an array of msix_entry structs |
| 254 | which should be at least 'maxvec' entries in size. |
| 255 | |
| 256 | On success, the device is switched into MSI-X mode and the function |
| 257 | returns the number of MSI-X interrupts that have been successfully |
| 258 | allocated. In this case the 'vector' member in entries numbered from |
| 259 | 0 to the returned value - 1 is populated with the interrupt number; |
| 260 | the driver should then call request_irq() for each 'vector' that it |
| 261 | decides to use. The device driver is responsible for keeping track of the |
| 262 | interrupts assigned to the MSI-X vectors so it can free them again later. |
| 263 | Device driver can use the returned number of successfully allocated MSI-X |
| 264 | interrupts to further allocate and initialize device resources. |
| 265 | |
| 266 | If this function returns a negative number, it indicates an error and |
| 267 | the driver should not attempt to allocate any more MSI-X interrupts for |
| 268 | this device. |
| 269 | |
| 270 | This function, in contrast with pci_enable_msi_range(), does not adjust |
| 271 | dev->irq. The device will not generate interrupts for this interrupt |
| 272 | number once MSI-X is enabled. |
| 273 | |
| 274 | Device drivers should normally call this function once per device |
| 275 | during the initialization phase. |
| 276 | |
| 277 | It is ideal if drivers can cope with a variable number of MSI-X interrupts; |
| 278 | there are many reasons why the platform may not be able to provide the |
| 279 | exact number that a driver asks for. |
| 280 | |
| 281 | There could be devices that can not operate with just any number of MSI-X |
| 282 | interrupts within a range. E.g., an network adapter might need let's say |
| 283 | four vectors per each queue it provides. Therefore, a number of MSI-X |
| 284 | interrupts allocated should be a multiple of four. In this case interface |
| 285 | pci_enable_msix_range() can not be used alone to request MSI-X interrupts |
| 286 | (since it can allocate any number within the range, without any notion of |
| 287 | the multiple of four) and the device driver should master a custom logic |
| 288 | to request the required number of MSI-X interrupts. |
| 289 | |
| 290 | 4.3.1.1 Maximum possible number of MSI-X interrupts |
| 291 | |
| 292 | The typical usage of MSI-X interrupts is to allocate as many vectors as |
| 293 | possible, likely up to the limit returned by pci_msix_vec_count() function: |
| 294 | |
| 295 | static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) |
| 296 | { |
| 297 | return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, |
| 298 | 1, nvec); |
| 299 | } |
| 300 | |
| 301 | Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive, |
| 302 | the value of 0 would be meaningless and could result in error. |
| 303 | |
| 304 | Some devices have a minimal limit on number of MSI-X interrupts. |
| 305 | In this case the function could look like this: |
| 306 | |
| 307 | static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) |
| 308 | { |
| 309 | return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, |
| 310 | FOO_DRIVER_MINIMUM_NVEC, nvec); |
| 311 | } |
| 312 | |
| 313 | 4.3.1.2 Exact number of MSI-X interrupts |
| 314 | |
| 315 | If a driver is unable or unwilling to deal with a variable number of MSI-X |
| 316 | interrupts it could request a particular number of interrupts by passing |
| 317 | that number to pci_enable_msix_range() function as both 'minvec' and 'maxvec' |
| 318 | parameters: |
| 319 | |
| 320 | static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) |
| 321 | { |
| 322 | return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, |
| 323 | nvec, nvec); |
| 324 | } |
| 325 | |
| 326 | Note, unlike pci_enable_msix_exact() function, which could be also used to |
| 327 | enable a particular number of MSI-X interrupts, pci_enable_msix_range() |
| 328 | returns either a negative errno or 'nvec' (not negative errno or 0 - as |
| 329 | pci_enable_msix_exact() does). |
| 330 | |
| 331 | 4.3.1.3 Specific requirements to the number of MSI-X interrupts |
| 332 | |
| 333 | As noted above, there could be devices that can not operate with just any |
| 334 | number of MSI-X interrupts within a range. E.g., let's assume a device that |
| 335 | is only capable sending the number of MSI-X interrupts which is a power of |
| 336 | two. A routine that enables MSI-X mode for such device might look like this: |
| 337 | |
| 338 | /* |
| 339 | * Assume 'minvec' and 'maxvec' are non-zero |
| 340 | */ |
| 341 | static int foo_driver_enable_msix(struct foo_adapter *adapter, |
| 342 | int minvec, int maxvec) |
| 343 | { |
| 344 | int rc; |
| 345 | |
| 346 | minvec = roundup_pow_of_two(minvec); |
| 347 | maxvec = rounddown_pow_of_two(maxvec); |
| 348 | |
| 349 | if (minvec > maxvec) |
| 350 | return -ERANGE; |
| 351 | |
| 352 | retry: |
| 353 | rc = pci_enable_msix_range(adapter->pdev, adapter->msix_entries, |
| 354 | maxvec, maxvec); |
| 355 | /* |
| 356 | * -ENOSPC is the only error code allowed to be analyzed |
| 357 | */ |
| 358 | if (rc == -ENOSPC) { |
| 359 | if (maxvec == 1) |
| 360 | return -ENOSPC; |
| 361 | |
| 362 | maxvec /= 2; |
| 363 | |
| 364 | if (minvec > maxvec) |
| 365 | return -ENOSPC; |
| 366 | |
| 367 | goto retry; |
| 368 | } |
| 369 | |
| 370 | return rc; |
| 371 | } |
| 372 | |
| 373 | Note how pci_enable_msix_range() return value is analyzed for a fallback - |
| 374 | any error code other than -ENOSPC indicates a fatal error and should not |
| 375 | be retried. |
| 376 | |
| 377 | 4.3.2 pci_enable_msix_exact |
| 378 | |
| 379 | int pci_enable_msix_exact(struct pci_dev *dev, |
| 380 | struct msix_entry *entries, int nvec) |
| 381 | |
| 382 | This variation on pci_enable_msix_range() call allows a device driver to |
| 383 | request exactly 'nvec' MSI-Xs. |
| 384 | |
| 385 | If this function returns a negative number, it indicates an error and |
| 386 | the driver should not attempt to allocate any more MSI-X interrupts for |
| 387 | this device. |
| 388 | |
| 389 | By contrast with pci_enable_msix_range() function, pci_enable_msix_exact() |
| 390 | returns zero in case of success, which indicates MSI-X interrupts have been |
| 391 | successfully allocated. |
| 392 | |
| 393 | Another version of a routine that enables MSI-X mode for a device with |
| 394 | specific requirements described in chapter 4.3.1.3 might look like this: |
| 395 | |
| 396 | /* |
| 397 | * Assume 'minvec' and 'maxvec' are non-zero |
| 398 | */ |
| 399 | static int foo_driver_enable_msix(struct foo_adapter *adapter, |
| 400 | int minvec, int maxvec) |
| 401 | { |
| 402 | int rc; |
| 403 | |
| 404 | minvec = roundup_pow_of_two(minvec); |
| 405 | maxvec = rounddown_pow_of_two(maxvec); |
| 406 | |
| 407 | if (minvec > maxvec) |
| 408 | return -ERANGE; |
| 409 | |
| 410 | retry: |
| 411 | rc = pci_enable_msix_exact(adapter->pdev, |
| 412 | adapter->msix_entries, maxvec); |
| 413 | |
| 414 | /* |
| 415 | * -ENOSPC is the only error code allowed to be analyzed |
| 416 | */ |
| 417 | if (rc == -ENOSPC) { |
| 418 | if (maxvec == 1) |
| 419 | return -ENOSPC; |
| 420 | |
| 421 | maxvec /= 2; |
| 422 | |
| 423 | if (minvec > maxvec) |
| 424 | return -ENOSPC; |
| 425 | |
| 426 | goto retry; |
| 427 | } else if (rc < 0) { |
| 428 | return rc; |
| 429 | } |
| 430 | |
| 431 | return maxvec; |
| 432 | } |
| 433 | |
| 434 | 4.3.3 pci_disable_msix |
| 435 | |
| 436 | void pci_disable_msix(struct pci_dev *dev) |
| 437 | |
| 438 | This function should be used to undo the effect of pci_enable_msix_range(). |
| 439 | It frees the previously allocated MSI-X interrupts. The interrupts may |
| 440 | subsequently be assigned to another device, so drivers should not cache |
| 441 | the value of the 'vector' elements over a call to pci_disable_msix(). |
| 442 | |
| 443 | Before calling this function, a device driver must always call free_irq() |
| 444 | on any interrupt for which it previously called request_irq(). |
| 445 | Failure to do so results in a BUG_ON(), leaving the device with |
| 446 | MSI-X enabled and thus leaking its vector. |
| 447 | |
| 448 | 4.3.3 The MSI-X Table |
| 449 | |
| 450 | The MSI-X capability specifies a BAR and offset within that BAR for the |
| 451 | MSI-X Table. This address is mapped by the PCI subsystem, and should not |
| 452 | be accessed directly by the device driver. If the driver wishes to |
| 453 | mask or unmask an interrupt, it should call disable_irq() / enable_irq(). |
| 454 | |
| 455 | 4.3.4 pci_msix_vec_count |
| 456 | |
| 457 | int pci_msix_vec_count(struct pci_dev *dev) |
| 458 | |
| 459 | This function could be used to retrieve number of entries in the device |
| 460 | MSI-X table. |
| 461 | |
| 462 | If this function returns a negative number, it indicates the device is |
| 463 | not capable of sending MSI-Xs. |
| 464 | |
| 465 | If this function returns a positive number, it indicates the maximum |
| 466 | number of MSI-X interrupt vectors that could be allocated. |
| 467 | |
| 468 | 4.4 Handling devices implementing both MSI and MSI-X capabilities |
| 469 | |
| 470 | If a device implements both MSI and MSI-X capabilities, it can |
| 471 | run in either MSI mode or MSI-X mode, but not both simultaneously. |
| 472 | This is a requirement of the PCI spec, and it is enforced by the |
| 473 | PCI layer. Calling pci_enable_msi_range() when MSI-X is already |
| 474 | enabled or pci_enable_msix_range() when MSI is already enabled |
| 475 | results in an error. If a device driver wishes to switch between MSI |
| 476 | and MSI-X at runtime, it must first quiesce the device, then switch |
| 477 | it back to pin-interrupt mode, before calling pci_enable_msi_range() |
| 478 | or pci_enable_msix_range() and resuming operation. This is not expected |
| 479 | to be a common operation but may be useful for debugging or testing |
| 480 | during development. |
| 481 | |
| 482 | 4.5 Considerations when using MSIs |
| 483 | |
| 484 | 4.5.1 Choosing between MSI-X and MSI |
| 485 | |
| 486 | If your device supports both MSI-X and MSI capabilities, you should use |
| 487 | the MSI-X facilities in preference to the MSI facilities. As mentioned |
| 488 | above, MSI-X supports any number of interrupts between 1 and 2048. |
| 489 | In contrast, MSI is restricted to a maximum of 32 interrupts (and |
| 490 | must be a power of two). In addition, the MSI interrupt vectors must |
| 491 | be allocated consecutively, so the system might not be able to allocate |
| 492 | as many vectors for MSI as it could for MSI-X. On some platforms, MSI |
| 493 | interrupts must all be targeted at the same set of CPUs whereas MSI-X |
| 494 | interrupts can all be targeted at different CPUs. |
| 495 | |
| 496 | 4.5.2 Spinlocks |
| 497 | |
| 498 | Most device drivers have a per-device spinlock which is taken in the |
| 499 | interrupt handler. With pin-based interrupts or a single MSI, it is not |
| 500 | necessary to disable interrupts (Linux guarantees the same interrupt will |
| 501 | not be re-entered). If a device uses multiple interrupts, the driver |
| 502 | must disable interrupts while the lock is held. If the device sends |
| 503 | a different interrupt, the driver will deadlock trying to recursively |
| 504 | acquire the spinlock. Such deadlocks can be avoided by using |
| 505 | spin_lock_irqsave() or spin_lock_irq() which disable local interrupts |
| 506 | and acquire the lock (see Documentation/DocBook/kernel-locking). |
| 507 | |
| 508 | 4.6 How to tell whether MSI/MSI-X is enabled on a device |
| 509 | |
| 510 | Using 'lspci -v' (as root) may show some devices with "MSI", "Message |
| 511 | Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities |
| 512 | has an 'Enable' flag which is followed with either "+" (enabled) |
| 513 | or "-" (disabled). |
| 514 | |
| 515 | |
| 516 | 5. MSI quirks |
| 517 | |
| 518 | Several PCI chipsets or devices are known not to support MSIs. |
| 519 | The PCI stack provides three ways to disable MSIs: |
| 520 | |
| 521 | 1. globally |
| 522 | 2. on all devices behind a specific bridge |
| 523 | 3. on a single device |
| 524 | |
| 525 | 5.1. Disabling MSIs globally |
| 526 | |
| 527 | Some host chipsets simply don't support MSIs properly. If we're |
| 528 | lucky, the manufacturer knows this and has indicated it in the ACPI |
| 529 | FADT table. In this case, Linux automatically disables MSIs. |
| 530 | Some boards don't include this information in the table and so we have |
| 531 | to detect them ourselves. The complete list of these is found near the |
| 532 | quirk_disable_all_msi() function in drivers/pci/quirks.c. |
| 533 | |
| 534 | If you have a board which has problems with MSIs, you can pass pci=nomsi |
| 535 | on the kernel command line to disable MSIs on all devices. It would be |
| 536 | in your best interests to report the problem to linux-pci@vger.kernel.org |
| 537 | including a full 'lspci -v' so we can add the quirks to the kernel. |
| 538 | |
| 539 | 5.2. Disabling MSIs below a bridge |
| 540 | |
| 541 | Some PCI bridges are not able to route MSIs between busses properly. |
| 542 | In this case, MSIs must be disabled on all devices behind the bridge. |
| 543 | |
| 544 | Some bridges allow you to enable MSIs by changing some bits in their |
| 545 | PCI configuration space (especially the Hypertransport chipsets such |
| 546 | as the nVidia nForce and Serverworks HT2000). As with host chipsets, |
| 547 | Linux mostly knows about them and automatically enables MSIs if it can. |
| 548 | If you have a bridge unknown to Linux, you can enable |
| 549 | MSIs in configuration space using whatever method you know works, then |
| 550 | enable MSIs on that bridge by doing: |
| 551 | |
| 552 | echo 1 > /sys/bus/pci/devices/$bridge/msi_bus |
| 553 | |
| 554 | where $bridge is the PCI address of the bridge you've enabled (eg |
| 555 | 0000:00:0e.0). |
| 556 | |
| 557 | To disable MSIs, echo 0 instead of 1. Changing this value should be |
| 558 | done with caution as it could break interrupt handling for all devices |
| 559 | below this bridge. |
| 560 | |
| 561 | Again, please notify linux-pci@vger.kernel.org of any bridges that need |
| 562 | special handling. |
| 563 | |
| 564 | 5.3. Disabling MSIs on a single device |
| 565 | |
| 566 | Some devices are known to have faulty MSI implementations. Usually this |
| 567 | is handled in the individual device driver, but occasionally it's necessary |
| 568 | to handle this with a quirk. Some drivers have an option to disable use |
| 569 | of MSI. While this is a convenient workaround for the driver author, |
| 570 | it is not good practice, and should not be emulated. |
| 571 | |
| 572 | 5.4. Finding why MSIs are disabled on a device |
| 573 | |
| 574 | From the above three sections, you can see that there are many reasons |
| 575 | why MSIs may not be enabled for a given device. Your first step should |
| 576 | be to examine your dmesg carefully to determine whether MSIs are enabled |
| 577 | for your machine. You should also check your .config to be sure you |
| 578 | have enabled CONFIG_PCI_MSI. |
| 579 | |
| 580 | Then, 'lspci -t' gives the list of bridges above a device. Reading |
| 581 | /sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1) |
| 582 | or disabled (0). If 0 is found in any of the msi_bus files belonging |
| 583 | to bridges between the PCI root and the device, MSIs are disabled. |
| 584 | |
| 585 | It is also worth checking the device driver to see whether it supports MSIs. |
| 586 | For example, it may contain calls to pci_enable_msi_range() or |
| 587 | pci_enable_msix_range(). |