将未使用的元素添加到C/C 结构加快并减慢代码执行

Adding UNUSED elements to C/C++ structure speeds up and slows down code execution

本文关键字:执行 代码 未使用 元素 添加 结构      更新时间:2023-10-16

我编写了以下结构,用于在我制作的arduino软件pwm库中使用,以一次(在uno上)或70个引脚或70个引脚(在A上)巨型)。

如书面,代码的ISR部分(ercaguy_softwarepwmupdate()),处理此结构的数组,需要 133US才能运行。但是,很奇怪,如果我删除"字节flags1"行;(在结构中),尽管FLAGS1尚未在任何地方使用,但ISR现在需要158US 运行。然后,如果我不注册"字节旗2";因此,现在都没有注册两个标志,运行时会回到以前的位置(133US)

为什么这会发生!?我该如何解决?(即:对于此特定功能,我想确保始终如一的快速代码,而不是莫名其妙的代码)。添加一个字节会大大减慢代码,但添加两个根本没有任何更改。

我正在尝试优化代码(我也需要添加另一个功能,需要一个字节来for flags),但是我不明白为什么添加一个未使用的字节将代码慢于25US,而是添加了两个未使用的代码字节根本不会更改运行时间。

我需要理解这一点,以确保我的优化始终如一。

在.h文件(我的原始结构,使用c-style typedef'ed struct):

typedef struct softPWMpin //global struct
{
  //VOLATILE VARIBLES (WILL BE ACCESSED IN AND OUTSIDE OF ISRs)
  //for pin write access:
  volatile byte pinBitMask;
  volatile byte* volatile p_PORT_out; //pointer to port output register; NB: the 1st "volatile" says the port itself (1 byte) is volatile, the 2nd "volatile" says the *pointer* itself (2 bytes, pointing to the port) is volatile.
  //for PWM output:
  volatile unsigned long resolution;
  volatile unsigned long PWMvalue; //Note: duty cycle = PWMvalue/(resolution - 1) = PWMvalue/topValue;
                          //ex: if resolution is 256, topValue is 255
                          //if PWMvalue = 255, duty_cycle = PWMvalue/topValue = 255/255 = 1 = 100%
                          //if PWMvalue = 50, duty_cycle = PWMvalue/topValue = 50/255 = 0.196 = 19.6%
  //byte flags1;
  //byte flags2;
  //NON-VOLATILE VARIABLES (WILL ONLY BE ACCESSED INSIDE AN ISR, OR OUTSIDE AN ISR, BUT NOT BOTH)
  unsigned long counter; //incremented each time update() is called; goes back to zero after reaching topValue; does NOT need to be volatile, since only the update function updates this (it is read-to or written from nowhere else)
} softPWMpin_t; 

在.h文件(新的,使用C 样式结构....查看是否有任何区别。)

struct softPWMpin //global struct
{
  //VOLATILE VARIBLES (WILL BE ACCESSED IN AND OUTSIDE OF ISRs)
  //for pin write access:
  volatile byte pinBitMask;
  volatile byte* volatile p_PORT_out; //pointer to port output register; NB: the 1st "volatile" says the port itself (1 byte) is volatile, the 2nd "volatile" says the *pointer* itself (2 bytes, pointing to the port) is volatile.
  //for PWM output:
  volatile unsigned long resolution;
  volatile unsigned long PWMvalue; //Note: duty cycle = PWMvalue/(resolution - 1) = PWMvalue/topValue;
                          //ex: if resolution is 256, topValue is 255
                          //if PWMvalue = 255, duty_cycle = PWMvalue/topValue = 255/255 = 1 = 100%
                          //if PWMvalue = 50, duty_cycle = PWMvalue/topValue = 50/255 = 0.196 = 19.6%
  //byte flags1;
  //byte flags2;
  //NON-VOLATILE VARIABLES (WILL ONLY BE ACCESSED INSIDE AN ISR, OR OUTSIDE AN ISR, BUT NOT BOTH)
  unsigned long counter; //incremented each time update() is called; goes back to zero after reaching topValue; does NOT need to be volatile, since only the update function updates this (it is read-to or written from nowhere else)
}; 

在.cpp文件中(这里我正在创建structs的数组,这是以ISR中的固定速率通过计时器中断以固定速率调用的更新函数):

//static softPWMpin_t PWMpins[MAX_NUMBER_SOFTWARE_PWM_PINS]; //C-style, old, MAX_NUMBER_SOFTWARE_PWM_PINS = 20; static to give it file scope only
static softPWMpin PWMpins[MAX_NUMBER_SOFTWARE_PWM_PINS]; //C++-style, old, MAX_NUMBER_SOFTWARE_PWM_PINS = 20; static to give it file scope only
//This function must be placed within an ISR, to be called at a fixed interval
void eRCaGuy_SoftwarePWMupdate()
{
  //Forced nonatomic block (ie: interrupts *enabled*)
  byte SREG_old = SREG; //[1 clock cycle]
  interrupts(); //[1 clock cycle] turn interrupts ON to allow *nested interrupts* (ex: handling of time-sensitive timing, such as reading incoming PWM signals or counting Timer2 overflows)
  {    
    //first, increment all counters of attached pins (ie: where the value != PIN_NOT_ATTACHED)
    //pinMapArray
    for (byte pin=0; pin<NUM_DIGITAL_PINS; pin++)
    {
      byte i = pinMapArray[pin]; //[2 clock cycles: 0.125us]; No need to turn off interrupts to read this volatile variable here since reading pinMapArray[pin] is an atomic operation (since it's a single byte)
      if (i != PIN_NOT_ATTACHED) //if the pin IS attached, increment counter and decide what to do with pin...
      {
        //Read volatile variables ONE time, all at once, to optimize code (volatile variables take more time to read [I know] since their values can't be recalled from registers [I believe]).
        noInterrupts(); //[1 clock cycle] turn off interrupts to read non-atomic volatile variables that could be updated simultaneously right now in another ISR, since nested interrupts are enabled here
        unsigned long resolution = PWMpins[i].resolution;
        unsigned long PWMvalue = PWMpins[i].PWMvalue;
        volatile byte* p_PORT_out = PWMpins[i].p_PORT_out; //[0.44us raw: 5 clock cycles, 0.3125us]
        interrupts(); //[1 clock cycle]
        //handle edge cases FIRST (PWMvalue==0 and PMWvalue==topValue), since if an edge case exists we should NOT do the main case handling below
        if (PWMvalue==0) //the PWM command is 0% duty cycle
        {
          fastDigitalWrite(p_PORT_out,PWMpins[i].pinBitMask,LOW); //write LOW [1.19us raw: 17 clock cycles, 1.0625us]
        }
        else if (PWMvalue==resolution-1) //the PWM command is 100% duty cycle
        {
          fastDigitalWrite(p_PORT_out,PWMpins[i].pinBitMask,HIGH); //write HIGH [0.88us raw; 12 clock cycles, 0.75us]
        }
        //THEN handle main cases (PWMvalue is > 0 and < topValue)
        else //(0% < PWM command < 100%)
        {
          PWMpins[i].counter++; //not volatile
          if (PWMpins[i].counter >= resolution)
          {
            PWMpins[i].counter = 0; //reset
            fastDigitalWrite(p_PORT_out,PWMpins[i].pinBitMask,HIGH);
          }
          else if (PWMpins[i].counter>=PWMvalue)
          {
            fastDigitalWrite(p_PORT_out,PWMpins[i].pinBitMask,LOW); //write LOW [1.18us raw: 17 clock cycles, 1.0625us]
          }
        }
      }
    }
  }
  SREG = SREG_old; //restore interrupt enable status
}

更新(5/4/2015,8:58 PM):

我尝试通过对齐属性更改对齐方式。我的编译器是GCC。

这是我修改.h文件中的结构以添加属性(在最后一行中)。请注意,我还将结构成员的顺序更改为最大

struct softPWMpin //C++ style
{
  volatile unsigned long resolution;
  volatile unsigned long PWMvalue; //Note: duty cycle = PWMvalue/(resolution - 1) = PWMvalue/topValue;
                          //ex: if resolution is 256, topValue is 255
                          //if PWMvalue = 255, duty_cycle = PWMvalue/topValue = 255/255 = 1 = 100%
                          //if PWMvalue = 50, duty_cycle = PWMvalue/topValue = 50/255 = 0.196 = 19.6%
  unsigned long counter; //incremented each time update() is called; goes back to zero after reaching topValue; does NOT need to be volatile, since only the update function updates this (it is read-to or written from nowhere else)
  volatile byte* volatile p_PORT_out; //pointer to port output register; NB: the 1st "volatile" says the port itself (1 byte) is volatile, the 2nd "volatile" says the *pointer* itself (2 bytes, pointing to the port) is volatile.
  volatile byte pinBitMask;
  // byte flags1;
  // byte flags2;
} __attribute__ ((aligned));

来源:https://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/type-attributes.html

这是我迄今为止尝试过的结果:

__attribute__ ((aligned));  
__attribute__ ((aligned(1)));  
__attribute__ ((aligned(2)));  
__attribute__ ((aligned(4)));  
__attribute__ ((aligned(8)));  

当我添加一个标志字节时,我似乎都没有解决我看到的问题。当离开标志字节评论2-8时,运行时的时间比133US更长,而Align 1没有区别(运行时停留133US),这意味着它已经发生了,属性未添加根本。此外,即使我使用2、4、8的对齐选项,sizeof(pwmvalue)函数仍然返回结构中的确切字节数,而没有其他填充。

...仍然不知道发生了什么...

更新,11:02 pm:

(请参见下面的评论)优化水平肯定会产生效果。例如,当我将编译器优化级别从-OS更改为-O2时,基本情况保持在133us(如前所述),Uncommunting Flags1给了我120us(vs 158us),而Uncomementing Flags1和Flags2同时给了我132us(vs 133us))。这仍然没有回答我的问题,但是我至少了解到存在优化水平以及如何更改它们。

上面段落的摘要:

Processing time of (of eRCaGuy_SoftwarePWMupdate() function)
Optimization   No flags     w/flags1     w/flags1+flags2
Os             133us        158us        133us
O2             132us        120us        132us
Memory Use (bytes: flash/global vars SRAM/sizeof(softPWMpin)/sizeof(PWMpins))
Optimization   No flags          w/flags1          w/flags1+flags2
Os             4020/591/15/300   3950/611/16/320   4020/631/17/340
O2             4154/591/15/300   4064/611/16/320   4154/631/17/340

更新(5/5/2015,4:05 PM):

  • 刚刚更新了上面的表,并提供了更多详细信息。
  • 在下面添加了资源。

资源:

GCC编译器优化级别的来源:
-https://gcc.gnu.org/onlinedocs/gcc/optimize-options.html
-https://gcc.gnu.org/onlinedocs/gnat_ugn/optimization-levels.html
-http://www.rapidtables.com/code/linux/gcc/gcc-o.htm

如何在Arduino IDE中更改编译器设置:
-http://www.instructables.com/id/arduino-ide-16x-compiler-optimisation-faster-faster/

结构包装上的信息:
-http://www.catb.org/esr/structure-packing/

数据对齐:
-http://www.songho.ca/misc/alignment/dataalign.html

编写8位ATMEL AVR Microcontroller
的有效C代码 -AVR035 AVR的高效C编码-DOC1497 -http://www.atmel.com/images/doc1497.pdf
-AVR4027提示和技巧,以优化8位AVR微控制器的C代码-DOC8453 -http://www.atmel.com/images/images/doc8453.pdf

其他可能对帮助我解决问题的信息:

对于没有标志(flags1和flags2评论)和操作系统优化
构建首选项(来自buildprefs.txt文件,Arduino吐出编译的代码):
对我来说:" c: users gabriel appdata local temp build8427371380606368699.tmp"

build.arch = AVR
build.board = AVR_UNO
build.core = arduino
build.core.path = C:Program Files (x86)Arduinohardwarearduinoavrcoresarduino
build.extra_flags = 
build.f_cpu = 16000000L
build.mcu = atmega328p
build.path = C:UsersGabrielAppDataLocalTempbuild8427371380606368699.tmp
build.project_name = software_PWM_fade13_speed_test2.cpp
build.system.path = C:Program Files (x86)Arduinohardwarearduinoavrsystem
build.usb_flags = -DUSB_VID={build.vid} -DUSB_PID={build.pid} '-DUSB_MANUFACTURER={build.usb_manufacturer}' '-DUSB_PRODUCT={build.usb_product}'
build.usb_manufacturer = 
build.variant = standard
build.variant.path = C:Program Files (x86)Arduinohardwarearduinoavrvariantsstandard
build.verbose = true
build.warn_data_percentage = 75
compiler.S.extra_flags = 
compiler.S.flags = -c -g -x assembler-with-cpp
compiler.ar.cmd = avr-ar
compiler.ar.extra_flags = 
compiler.ar.flags = rcs
compiler.c.cmd = avr-gcc
compiler.c.elf.cmd = avr-gcc
compiler.c.elf.extra_flags = 
compiler.c.elf.flags = -w -Os -Wl,--gc-sections
compiler.c.extra_flags = 
compiler.c.flags = -c -g -Os -w -ffunction-sections -fdata-sections -MMD
compiler.cpp.cmd = avr-g++
compiler.cpp.extra_flags = 
compiler.cpp.flags = -c -g -Os -w -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD
compiler.elf2hex.cmd = avr-objcopy
compiler.elf2hex.extra_flags = 
compiler.elf2hex.flags = -O ihex -R .eeprom
compiler.ldflags = 
compiler.objcopy.cmd = avr-objcopy
compiler.objcopy.eep.extra_flags = 
compiler.objcopy.eep.flags = -O ihex -j .eeprom --set-section-flags=.eeprom=alloc,load --no-change-warnings --change-section-lma .eeprom=0
compiler.path = {runtime.ide.path}/hardware/tools/avr/bin/
compiler.size.cmd = avr-size

一些大会:(OS,没有标志):

00000328 <_Z25eRCaGuy_SoftwarePWMupdatev>:
 328:   8f 92           push    r8
 32a:   9f 92           push    r9
 32c:   af 92           push    r10
 32e:   bf 92           push    r11
 330:   cf 92           push    r12
 332:   df 92           push    r13
 334:   ef 92           push    r14
 336:   ff 92           push    r15
 338:   0f 93           push    r16
 33a:   1f 93           push    r17
 33c:   cf 93           push    r28
 33e:   df 93           push    r29
 340:   0f b7           in  r16, 0x3f   ; 63
 342:   78 94           sei
 344:   20 e0           ldi r18, 0x00   ; 0
 346:   30 e0           ldi r19, 0x00   ; 0
 348:   1f e0           ldi r17, 0x0F   ; 15
 34a:   f9 01           movw    r30, r18
 34c:   e8 5a           subi    r30, 0xA8   ; 168
 34e:   fe 4f           sbci    r31, 0xFE   ; 254
 350:   80 81           ld  r24, Z
 352:   8f 3f           cpi r24, 0xFF   ; 255
 354:   09 f4           brne    .+2         ; 0x358 <_Z25eRCaGuy_SoftwarePWMupdatev+0x30>
 356:   67 c0           rjmp    .+206       ; 0x426 <_Z25eRCaGuy_SoftwarePWMupdatev+0xfe>
 358:   f8 94           cli
 35a:   90 e0           ldi r25, 0x00   ; 0
 35c:   18 9f           mul r17, r24
 35e:   f0 01           movw    r30, r0
 360:   19 9f           mul r17, r25
 362:   f0 0d           add r31, r0
 364:   11 24           eor r1, r1
 366:   e4 59           subi    r30, 0x94   ; 148
 368:   fe 4f           sbci    r31, 0xFE   ; 254
 36a:   c0 80           ld  r12, Z
 36c:   d1 80           ldd r13, Z+1    ; 0x01
 36e:   e2 80           ldd r14, Z+2    ; 0x02
 370:   f3 80           ldd r15, Z+3    ; 0x03
 372:   44 81           ldd r20, Z+4    ; 0x04
 374:   55 81           ldd r21, Z+5    ; 0x05
 376:   66 81           ldd r22, Z+6    ; 0x06
 378:   77 81           ldd r23, Z+7    ; 0x07
 37a:   04 84           ldd r0, Z+12    ; 0x0c
 37c:   f5 85           ldd r31, Z+13   ; 0x0d
 37e:   e0 2d           mov r30, r0
 380:   78 94           sei
 382:   41 15           cp  r20, r1
 384:   51 05           cpc r21, r1
 386:   61 05           cpc r22, r1
 388:   71 05           cpc r23, r1
 38a:   51 f4           brne    .+20        ; 0x3a0 <_Z25eRCaGuy_SoftwarePWMupdatev+0x78>
 38c:   18 9f           mul r17, r24
 38e:   d0 01           movw    r26, r0
 390:   19 9f           mul r17, r25
 392:   b0 0d           add r27, r0
 394:   11 24           eor r1, r1
 396:   a4 59           subi    r26, 0x94   ; 148
 398:   be 4f           sbci    r27, 0xFE   ; 254
 39a:   1e 96           adiw    r26, 0x0e   ; 14
 39c:   4c 91           ld  r20, X
 39e:   3b c0           rjmp    .+118       ; 0x416 <_Z25eRCaGuy_SoftwarePWMupdatev+0xee>
 3a0:   46 01           movw    r8, r12
 3a2:   57 01           movw    r10, r14
 3a4:   a1 e0           ldi r26, 0x01   ; 1
 3a6:   8a 1a           sub r8, r26
 3a8:   91 08           sbc r9, r1
 3aa:   a1 08           sbc r10, r1
 3ac:   b1 08           sbc r11, r1
 3ae:   48 15           cp  r20, r8
 3b0:   59 05           cpc r21, r9
 3b2:   6a 05           cpc r22, r10
 3b4:   7b 05           cpc r23, r11
 3b6:   51 f4           brne    .+20        ; 0x3cc <_Z25eRCaGuy_SoftwarePWMupdatev+0xa4>
 3b8:   18 9f           mul r17, r24
 3ba:   d0 01           movw    r26, r0
 3bc:   19 9f           mul r17, r25
 3be:   b0 0d           add r27, r0
 3c0:   11 24           eor r1, r1
 3c2:   a4 59           subi    r26, 0x94   ; 148
 3c4:   be 4f           sbci    r27, 0xFE   ; 254
 3c6:   1e 96           adiw    r26, 0x0e   ; 14
 3c8:   9c 91           ld  r25, X
 3ca:   1c c0           rjmp    .+56        ; 0x404 <_Z25eRCaGuy_SoftwarePWMupdatev+0xdc>
 3cc:   18 9f           mul r17, r24
 3ce:   e0 01           movw    r28, r0
 3d0:   19 9f           mul r17, r25
 3d2:   d0 0d           add r29, r0
 3d4:   11 24           eor r1, r1
 3d6:   c4 59           subi    r28, 0x94   ; 148
 3d8:   de 4f           sbci    r29, 0xFE   ; 254
 3da:   88 85           ldd r24, Y+8    ; 0x08
 3dc:   99 85           ldd r25, Y+9    ; 0x09
 3de:   aa 85           ldd r26, Y+10   ; 0x0a
 3e0:   bb 85           ldd r27, Y+11   ; 0x0b
 3e2:   01 96           adiw    r24, 0x01   ; 1
 3e4:   a1 1d           adc r26, r1
 3e6:   b1 1d           adc r27, r1
 3e8:   88 87           std Y+8, r24    ; 0x08
 3ea:   99 87           std Y+9, r25    ; 0x09
 3ec:   aa 87           std Y+10, r26   ; 0x0a
 3ee:   bb 87           std Y+11, r27   ; 0x0b
 3f0:   8c 15           cp  r24, r12
 3f2:   9d 05           cpc r25, r13
 3f4:   ae 05           cpc r26, r14
 3f6:   bf 05           cpc r27, r15
 3f8:   40 f0           brcs    .+16        ; 0x40a <_Z25eRCaGuy_SoftwarePWMupdatev+0xe2>
 3fa:   18 86           std Y+8, r1 ; 0x08
 3fc:   19 86           std Y+9, r1 ; 0x09
 3fe:   1a 86           std Y+10, r1    ; 0x0a
 400:   1b 86           std Y+11, r1    ; 0x0b
 402:   9e 85           ldd r25, Y+14   ; 0x0e
 404:   80 81           ld  r24, Z
 406:   89 2b           or  r24, r25
 408:   0d c0           rjmp    .+26        ; 0x424 <_Z25eRCaGuy_SoftwarePWMupdatev+0xfc>
 40a:   84 17           cp  r24, r20
 40c:   95 07           cpc r25, r21
 40e:   a6 07           cpc r26, r22
 410:   b7 07           cpc r27, r23
 412:   48 f0           brcs    .+18        ; 0x426 <_Z25eRCaGuy_SoftwarePWMupdatev+0xfe>
 414:   4e 85           ldd r20, Y+14   ; 0x0e
 416:   80 81           ld  r24, Z
 418:   90 e0           ldi r25, 0x00   ; 0
 41a:   50 e0           ldi r21, 0x00   ; 0
 41c:   40 95           com r20
 41e:   50 95           com r21
 420:   84 23           and r24, r20
 422:   95 23           and r25, r21
 424:   80 83           st  Z, r24
 426:   2f 5f           subi    r18, 0xFF   ; 255
 428:   3f 4f           sbci    r19, 0xFF   ; 255
 42a:   24 31           cpi r18, 0x14   ; 20
 42c:   31 05           cpc r19, r1
 42e:   09 f0           breq    .+2         ; 0x432 <_Z25eRCaGuy_SoftwarePWMupdatev+0x10a>
 430:   8c cf           rjmp    .-232       ; 0x34a <_Z25eRCaGuy_SoftwarePWMupdatev+0x22>
 432:   0f bf           out 0x3f, r16   ; 63
 434:   df 91           pop r29
 436:   cf 91           pop r28
 438:   1f 91           pop r17
 43a:   0f 91           pop r16
 43c:   ff 90           pop r15
 43e:   ef 90           pop r14
 440:   df 90           pop r13
 442:   cf 90           pop r12
 444:   bf 90           pop r11
 446:   af 90           pop r10
 448:   9f 90           pop r9
 44a:   8f 90           pop r8
 44c:   08 95           ret

这几乎可以肯定是一个对齐问题。从结构的大小来看,您的编译器似乎会自动包装它。


LDR指令将4字节值加载到寄存器中,并在4字节边界上操作。如果它需要加载不在4字节边界上的内存地址,则实际上执行了两个负载并将它们结合在一起以在该地址获得值。

例如,如果要在0x02上加载4字节值,则处理器将进行两个负载,因为0x02不会落在4字节边界上。

假设我们在地址0x00上有以下内存,我们希望将0x02处的4字节值加载到寄存器r0

Address |0x00|0x01|0x02|0x03|0x04|0x05|0x06|0x07|0x08|
Value   | 12 | 34 | 56 | 78 | 90 | AB | CD | EF | 12 |
------------------------------------------------------
r0: 00 00 00 00

它将首先在0x00上加载4个字节,因为那是包含0x02的4字节段,并将2个字节存储在0x020x03中:

Address |0x00|0x01|0x02|0x03|0x04|0x05|0x06|0x07|
Value   | 12 | 34 | 56 | 78 | 90 | AB | CD | EF |
Load 1  |           **   ** |
------------------------------------------------------
r0: 56 78 00 00

然后,它将在0x04上加载4个字节,即下一个4字节段,并将2个字节存储在0x040x05中。

Address |0x00|0x01|0x02|0x03|0x04|0x05|0x06|0x07|
Value   | 12 | 34 | 56 | 78 | 90 | AB | CD | EF |
Load 2                      | **   **           |
------------------------------------------------------
r0: 56 78 90 AB

您可以看到,每次您要在0x02上访问该值时,处理器实际上都必须将您的指令分为两个操作。但是,如果您想在0x04上访问该值,则处理器可以在一个操作中进行:

Address |0x00|0x01|0x02|0x03|0x04|0x05|0x06|0x07|
Value   | 12 | 34 | 56 | 78 | 90 | AB | CD | EF |
Load 1                      | **   **   **   ** |
------------------------------------------------------
r0: 90 AB CD EF

在您的示例中,在flags1flags2都注释出来,您的结构的大小为15。这意味着您数组中的每个第二个结构都会在奇数地址中,所以它都不是指针或长会员将正确对齐。

通过引入flags变量之一,您的结构的大小增加到16,这是4个倍数。这确保了所有结构在4个字节的边界上开始,因此您可能不会遇到对齐问题。


可能有一个编译器标志可以帮助您解决这个问题,但是总的来说,很高兴知道您的结构的布局。对齐是一个棘手的问题,只有符合当前标准的编译器具有明确的行为。