On Wed, Nov 11, 2020 at 8:16 PM Nick Desaulniers
<ndesaulniers(a)google.com> wrote:
On Wed, Nov 11, 2020 at 3:57 AM Magnus Karlsson
<magnus.karlsson(a)gmail.com> wrote:
>
> On Wed, Nov 11, 2020 at 2:38 AM kernel test robot <lkp(a)intel.com> wrote:
> >
> > Hi Magnus,
> >
> > I love your patch! Perhaps something to improve:
> >
> > [auto build test WARNING on bpf-next/master]
> >
> > url:
https://github.com/0day-ci/linux/commits/Magnus-Karlsson/xsk-i40e-Tx-perf...
> > base:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
master
> > config: powerpc64-randconfig-r025-20201110 (attached as .config)
> > compiler: clang version 12.0.0 (
https://github.com/llvm/llvm-project
4d81c8adb6ed9840257f6cb6b93f60856d422a15)
^ Note: clang
> > reproduce (this is a W=1 build):
> > wget
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O
~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # install powerpc64 cross compiling tool for clang build
> > # apt-get install binutils-powerpc64-linux-gnu
> > #
https://github.com/0day-ci/linux/commit/b016bbeac6692a93e61b28efa430d6464...
> > git remote add linux-review
https://github.com/0day-ci/linux
> > git fetch --no-tags linux-review
Magnus-Karlsson/xsk-i40e-Tx-performance-improvements/20201110-190310
> > git checkout b016bbeac6692a93e61b28efa430d64645032b5e
> > # save the attached .config to linux build tree
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross
ARCH=powerpc64
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp(a)intel.com>
> >
> > All warnings (new ones prefixed by >>):
> >
> > >> drivers/net/ethernet/intel/i40e/i40e_xsk.c:417:13: warning: unknown
pragma ignored [-Wunknown-pragmas]
> > #pragma GCC unroll 4
> > ^
> > 1 warning generated.
>
> And I was hoping that unknown pragmas would be ignored, but that will
> obviously not be the case with -Wunknown-pragmas added. The unrolling
> of this inner loop where the code spends most of its time gives me
> nearly 1 Mpps extra in performance which is substantial, so I would
> like to get this unrolled in some way, but without the warning. Need
> some advice please. Here are some options that comes in mind:
>
> #1: Suppress unknown pragma warnings in this file only by adding
> CFLAGS_i40e_xsk.o += -Wno-unknown-pragmas (or whatever that option
> might be) in the Makefile
>
> #2: Force the compiler to loop-unroll the loop with for example a
> switch statement with four cases that all fall through. This will make
> the code less readable.
>
> #3: Manually loop-unroll the loop. This will make the code even less
> readable than #2.
#4 support both compilers. Note Clang's syntax is slightly different
here; it doesn't accept GCC specific pragmas, and uses a slightly
different form:
https://clang.llvm.org/docs/LanguageExtensions.html#loop-unrolling .
If you wrap that in a macro based on `#ifdef __clang__`, that should
do the trick.
Yes, that did the trick. Tried it out with the compiler explorer at
https://godbolt.org/ and it compiles nicely even for clang-powerpc64.
Will spin a v3.
Thank you: Magnus
> >
> > I prefer #1 as I like to keep the code readable, but you might have
> > other better suggestions on how to tackle this.
> >
> > Thanks: Magnus
> >
> > > vim +417 drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > >
> > > 408
> > > 409 static void i40e_xmit_pkt_batch(struct i40e_ring *xdp_ring, struct
xdp_desc *desc,
> > > 410 unsigned int *total_bytes)
> > > 411 {
> > > 412 u16 ntu = xdp_ring->next_to_use;
> > > 413 struct i40e_tx_desc *tx_desc;
> > > 414 dma_addr_t dma;
> > > 415 u32 i;
> > > 416
> > > > 417 #pragma GCC unroll 4
> > > 418 for (i = 0; i < PKTS_PER_BATCH; i++) {
> > > 419 dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool,
desc[i].addr);
> > > 420
xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc[i].len);
> > > 421
> > > 422 tx_desc = I40E_TX_DESC(xdp_ring, ntu++);
> > > 423 tx_desc->buffer_addr = cpu_to_le64(dma);
> > > 424 tx_desc->cmd_type_offset_bsz =
build_ctob(I40E_TX_DESC_CMD_ICRC |
> > > 425
I40E_TX_DESC_CMD_EOP,
> > > 426 0,
desc[i].len, 0);
> > > 427
> > > 428 *total_bytes += desc[i].len;
> > > 429 }
> > > 430
> > > 431 xdp_ring->next_to_use = ntu;
> > > 432 }
> > > 433
> > >
> > > ---
> > > 0-DAY CI Kernel Test Service, Intel Corporation
> > >
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> >
> > --
> > You received this message because you are subscribed to the Google Groups
"Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email
to clang-built-linux+unsubscribe(a)googlegroups.com.
> > To view this discussion on the web visit
https://groups.google.com/d/msgid/clang-built-linux/CAJ8uoz2aDjLPtcTgZ_pO....
>
>
>
> --
> Thanks,
> ~Nick Desaulniers