Skip to content
  1. Jan 17, 2018
    • Nicholas Piggin's avatar
      powerpc/64s: Improve local TLB flush for boot and MCE on POWER9 · d4748276
      Nicholas Piggin authored
      
      
      There are several cases outside the normal address space management
      where a CPU's entire local TLB is to be flushed:
      
        1. Booting the kernel, in case something has left stale entries in
           the TLB (e.g., kexec).
      
        2. Machine check, to clean corrupted TLB entries.
      
      One other place where the TLB is flushed, is waking from deep idle
      states. The flush is a side-effect of calling ->cpu_restore with the
      intention of re-setting various SPRs. The flush itself is unnecessary
      because in the first case, the TLB should not acquire new corrupted
      TLB entries as part of sleep/wake (though they may be lost).
      
      This type of TLB flush is coded inflexibly, several times for each CPU
      type, and they have a number of problems with ISA v3.0B:
      
      - The current radix mode of the MMU is not taken into account, it is
        always done as a hash flushn For IS=2 (LPID-matching flush from host)
        and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
        the R field does not match the current radix mode.
      
      - ISA v3.0B hash must flush the partition and process table caches as
        well.
      
      - ISA v3.0B radix must flush partition and process scoped translations,
        partition and process table caches, and also the page walk cache.
      
      So consolidate the flushing code and implement it in C and inline asm
      under the mm/ directory with the rest of the flush code. Add ISA v3.0B
      cases for radix and hash, and use the radix flush in radix environment.
      
      Provide a way for IS=2 (LPID flush) to specify the radix mode of the
      partition. Have KVM pass in the radix mode of the guest.
      
      Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
      and move it later in the boot process after, the MMU registers are set
      up and before relocation is first turned on.
      
      The TLB flush is no longer called when restoring from deep idle states.
      This was not be done as a separate step because booting secondaries
      uses the same cpu_restore as idle restore, which needs the TLB flush.
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d4748276
    • Nicholas Piggin's avatar
      powerpc: System reset avoid interleaving oops using die synchronisation · 4552d128
      Nicholas Piggin authored
      The die() oops path contains a serializing lock to prevent oops
      messages from being interleaved. In the case of a system reset
      initiated oops (e.g., qemu nmi command), __die was being called
      which lacks that synchronisation and oops reports could be
      interleaved across CPUs.
      
      A recent patch 4388c9b3 ("powerpc: Do not send system reset
      request through the oops path") changed this to __die to avoid
      the debugger() call, but there is no real harm to calling it twice
      if the first time fell through. So go back to using die() here.
      This was observed to fix the problem.
      
      Fixes: 4388c9b3
      
       ("powerpc: Do not send system reset request through the oops path")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4552d128
  2. Jan 16, 2018
  3. Jan 03, 2018
  4. Dec 22, 2017
  5. Dec 20, 2017
    • Ram Pai's avatar
      powerpc: capture the PTE format changes in the dump pte report · 7e436355
      Ram Pai authored
      
      
      The H_PAGE_F_SECOND,H_PAGE_F_GIX are not in the 64K main-PTE.
      capture these changes in the dump pte report.
      
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7e436355
    • Ram Pai's avatar
      powerpc: use helper functions to get and set hash slots · a8548686
      Ram Pai authored
      
      
      replace redundant code in __hash_page_4K() and flush_hash_page()
      with helper functions pte_get_hash_gslot() and pte_set_hidx()
      
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a8548686
    • Ram Pai's avatar
      powerpc: Swizzle around 4K PTE bits to free up bit 5 and bit 6 · 273b4936
      Ram Pai authored
      
      
      We need PTE bits 3 ,4, 5, 6 and 57 to support protection-keys,
      because these are the bits we want to consolidate on across all
      configuration to support protection keys.
      
      Bit 3,4,5 and 6 are currently used on 4K-pte kernels. But bit 9
      and 10 are available. Hence we use the two available bits and
      free up bit 5 and 6. We will still not be able to free up bit 3
      and 4. In the absence of any other free bits, we will have to
      stay satisfied with what we have :-(. This means we will not
      be able to support 32 protection keys, but only 8. The bit
      numbers are big-endian as defined in the ISA3.0
      
      This patch does the following change to 4K PTE.
      
      H_PAGE_F_SECOND (S) which occupied bit 4 moves to bit 7.
      H_PAGE_F_GIX (G,I,X) which occupied bit 5, 6 and 7 also moves
      to bit 8,9, 10 respectively.
      H_PAGE_HASHPTE (H) which occupied bit 8 moves to bit 4.
      
      Before the patch, the 4k PTE format was as follows
      
       0 1 2 3 4  5  6  7  8 9 10....................57.....63
       : : : : :  :  :  :  : : :                      :     :
       v v v v v  v  v  v  v v v                      v     v
      ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
      |x|x|x|B|S |G |I |X |H| | |x|x|................| |x|x|x|
      '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
      
      After the patch, the 4k PTE format is as follows
      
       0 1 2 3 4  5  6  7  8 9 10....................57.....63
       : : : : :  :  :  :  : : :                      :     :
       v v v v v  v  v  v  v v v                      v     v
      ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
      |x|x|x|B|H |  |  |S |G|I|X|x|x|................| |.|.|.|
      '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
      
      The patch has no code changes; just swizzles around bits.
      
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      273b4936
    • Ram Pai's avatar
      powerpc: shifted-by-one hidx value · 7b84947c
      Ram Pai authored
      
      
      0xf is considered invalid hidx value. It indicates absence of a backing
      HPTE. A PTE is initialized to 0xf either
      a) when it is new it is newly allocated to hold 4k-backing-HPTE
      	or
      b) Any time it gets demoted to a 4k-backing-HPTE
      
      This patch shifts the representation by one-modulo-0xf; i.e hidx 0 is
      represented as 1, 1 as 2,... , and 0xf as 0. This convention lets us
      initialize the secondary-part of the PTE to all zeroes. PTEs are anyway
      zero'd when allocated. We do not have to zero them again; thus saving on
      the initialization.
      
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7b84947c
    • Ram Pai's avatar
      powerpc: Free up four 64K PTE bits in 64K backed HPTE pages · bf9a95f9
      Ram Pai authored
      
      
      Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6
      in the 64K backed HPTE pages. This along with the earlier
      patch will entirely free up the four bits from 64K PTE.
      The bit numbers are big-endian as defined in the ISA3.0
      
      This patch does the following change to 64K PTE backed
      by 64K HPTE.
      
      H_PAGE_F_SECOND (S) which occupied bit 4 moves to the
      	second part of the pte to bit 60.
      H_PAGE_F_GIX (G,I,X) which occupied bit 5, 6 and 7 also
      	moves to the second part of the pte to bit 61,
       	62, 63, 64 respectively
      
      since bit 7 is now freed up, we move H_PAGE_BUSY (B) from
      bit 9 to bit 7.
      
      The second part of the PTE will hold
      (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
      NOTE: None of the bits in the secondary PTE were not used
      by 64k-HPTE backed PTE.
      
      Before the patch, the 64K HPTE backed 64k PTE format was
      as follows
      
       0 1 2 3 4  5  6  7  8 9 10...........................63
       : : : : :  :  :  :  : : :                            :
       v v v v v  v  v  v  v v v                            v
      
      ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
      |x|x|x| |S |G |I |X |x|B| |x|x|................|x|x|x|x| <- primary pte
      '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
      | | | | |  |  |  |  | | | | |..................| | | | | <- secondary pte
      '_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_'
      
      After the patch, the 64k HPTE backed 64k PTE format is
      as follows
      
       0 1 2 3 4  5  6  7  8 9 10...........................63
       : : : : :  :  :  :  : : :                            :
       v v v v v  v  v  v  v v v                            v
      
      ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
      |x|x|x| |  |  |  |B |x| | |x|x|................|.|.|.|.| <- primary pte
      '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
      | | | | |  |  |  |  | | | | |..................|S|G|I|X| <- secondary pte
      '_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_'
      
      The above PTE changes is applicable to hugetlbpages aswell.
      
      The patch does the following code changes:
      
      a) moves the H_PAGE_F_SECOND and H_PAGE_F_GIX to 4k PTE
      	header since it is no more needed b the 64k PTEs.
      b) abstracts out __real_pte() and __rpte_to_hidx() so the
      	caller need not know the bit location of the slot.
      c) moves the slot bits to the secondary pte.
      
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      bf9a95f9
    • Ram Pai's avatar
      powerpc: Free up four 64K PTE bits in 4K backed HPTE pages · 9d2edb18
      Ram Pai authored
      
      
      Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6,
      in the 4K backed HPTE pages.These bits continue to be used
      for 64K backed HPTE pages in this patch, but will be freed
      up in the next patch. The bit numbers are big-endian as
      defined in the ISA3.0
      
      The patch does the following change to the 4k HTPE backed
      64K PTE's format.
      
      H_PAGE_BUSY moves from bit 3 to bit 9 (B bit in the figure
      		below)
      V0 which occupied bit 4 is not used anymore.
      V1 which occupied bit 5 is not used anymore.
      V2 which occupied bit 6 is not used anymore.
      V3 which occupied bit 7 is not used anymore.
      
      Before the patch, the 4k backed 64k PTE format was as follows
      
       0 1 2 3 4  5  6  7  8 9 10...........................63
       : : : : :  :  :  :  : : :                            :
       v v v v v  v  v  v  v v v                            v
      
      ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
      |x|x|x|B|V0|V1|V2|V3|x| | |x|x|................|x|x|x|x| <- primary pte
      '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
      |S|G|I|X|S |G |I |X |S|G|I|X|..................|S|G|I|X| <- secondary pte
      '_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_'
      
      After the patch, the 4k backed 64k PTE format is as follows
      
       0 1 2 3 4  5  6  7  8 9 10...........................63
       : : : : :  :  :  :  : : :                            :
       v v v v v  v  v  v  v v v                            v
      
      ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
      |x|x|x| |  |  |  |  |x|B| |x|x|................|.|.|.|.| <- primary pte
      '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
      |S|G|I|X|S |G |I |X |S|G|I|X|..................|S|G|I|X| <- secondary pte
      '_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_'
      
      the four bits S,G,I,X (one quadruplet per 4k HPTE) that
      cache the hash-bucket slot value, is initialized to
      1,1,1,1 indicating -- an invalid slot. If a HPTE gets
      cached in a 1111 slot(i.e 7th slot of secondary hash
      bucket), it is released immediately. In other words,
      even though 1111 is a valid slot value in the hash
      bucket, we consider it invalid and release the slot and
      the HPTE. This gives us the opportunity to determine
      the validity of S,G,I,X bits based on its contents and
      not on any of the bits V0,V1,V2 or V3 in the primary PTE
      
      When we release a HPTE cached in the 1111 slot
      we also release a legitimate slot in the primary
      hash bucket and unmap its corresponding HPTE. This
      is to ensure that we do get a HPTE cached in a slot
      of the primary hash bucket, the next time we retry.
      
      Though treating 1111 slot as invalid, reduces the
      number of available slots in the hash bucket and may
      have an effect on the performance, the probabilty of
      hitting a 1111 slot is extermely low.
      
      Compared to the current scheme, the above scheme
      reduces the number of false hash table updates
      significantly and has the added advantage of releasing
      four valuable PTE bits for other purpose.
      
      NOTE:even though bits 3, 4, 5, 6, 7 are not used when
      the 64K PTE is backed by 4k HPTE, they continue to be
      used if the PTE gets backed by 64k HPTE. The next
      patch will decouple that aswell, and truely release the
      bits.
      
      This idea was jointly developed by Paul Mackerras,
      Aneesh, Michael Ellermen and myself.
      
      4K PTE format remains unchanged currently.
      
      The patch does the following code changes
      a) PTE flags are split between 64k and 4k header files.
      b) __hash_page_4K() is reimplemented to reflect the
       above logic.
      
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9d2edb18