将未使用的元素添加到C/C 结构加快并减慢代码执行
Adding UNUSED elements to C/C++ structure speeds up and slows down code execution
我编写了以下结构,用于在我制作的arduino软件pwm库中使用,以一次(在uno上)或70个引脚或70个引脚(在A上)巨型)。
如书面,代码的ISR部分(ercaguy_softwarepwmupdate()),处理此结构的数组,需要 133US才能运行。但是,很奇怪,如果我删除"字节flags1"行;(在结构中),尽管FLAGS1尚未在任何地方使用,但ISR现在需要158US 运行。然后,如果我不注册"字节旗2";因此,现在都没有注册两个标志,运行时会回到以前的位置(133US)。
为什么这会发生!?我该如何解决?(即:对于此特定功能,我想确保始终如一的快速代码,而不是莫名其妙的代码)。添加一个字节会大大减慢代码,但添加两个根本没有任何更改。
我正在尝试优化代码(我也需要添加另一个功能,需要一个字节来for flags),但是我不明白为什么添加一个未使用的字节将代码慢于25US,而是添加了两个未使用的代码字节根本不会更改运行时间。
我需要理解这一点,以确保我的优化始终如一。
在.h文件(我的原始结构,使用c-style typedef'ed struct):
typedef struct softPWMpin //global struct
{
//VOLATILE VARIBLES (WILL BE ACCESSED IN AND OUTSIDE OF ISRs)
//for pin write access:
volatile byte pinBitMask;
volatile byte* volatile p_PORT_out; //pointer to port output register; NB: the 1st "volatile" says the port itself (1 byte) is volatile, the 2nd "volatile" says the *pointer* itself (2 bytes, pointing to the port) is volatile.
//for PWM output:
volatile unsigned long resolution;
volatile unsigned long PWMvalue; //Note: duty cycle = PWMvalue/(resolution - 1) = PWMvalue/topValue;
//ex: if resolution is 256, topValue is 255
//if PWMvalue = 255, duty_cycle = PWMvalue/topValue = 255/255 = 1 = 100%
//if PWMvalue = 50, duty_cycle = PWMvalue/topValue = 50/255 = 0.196 = 19.6%
//byte flags1;
//byte flags2;
//NON-VOLATILE VARIABLES (WILL ONLY BE ACCESSED INSIDE AN ISR, OR OUTSIDE AN ISR, BUT NOT BOTH)
unsigned long counter; //incremented each time update() is called; goes back to zero after reaching topValue; does NOT need to be volatile, since only the update function updates this (it is read-to or written from nowhere else)
} softPWMpin_t;
在.h文件(新的,使用C 样式结构....查看是否有任何区别。)
struct softPWMpin //global struct
{
//VOLATILE VARIBLES (WILL BE ACCESSED IN AND OUTSIDE OF ISRs)
//for pin write access:
volatile byte pinBitMask;
volatile byte* volatile p_PORT_out; //pointer to port output register; NB: the 1st "volatile" says the port itself (1 byte) is volatile, the 2nd "volatile" says the *pointer* itself (2 bytes, pointing to the port) is volatile.
//for PWM output:
volatile unsigned long resolution;
volatile unsigned long PWMvalue; //Note: duty cycle = PWMvalue/(resolution - 1) = PWMvalue/topValue;
//ex: if resolution is 256, topValue is 255
//if PWMvalue = 255, duty_cycle = PWMvalue/topValue = 255/255 = 1 = 100%
//if PWMvalue = 50, duty_cycle = PWMvalue/topValue = 50/255 = 0.196 = 19.6%
//byte flags1;
//byte flags2;
//NON-VOLATILE VARIABLES (WILL ONLY BE ACCESSED INSIDE AN ISR, OR OUTSIDE AN ISR, BUT NOT BOTH)
unsigned long counter; //incremented each time update() is called; goes back to zero after reaching topValue; does NOT need to be volatile, since only the update function updates this (it is read-to or written from nowhere else)
};
在.cpp文件中(这里我正在创建structs的数组,这是以ISR中的固定速率通过计时器中断以固定速率调用的更新函数):
//static softPWMpin_t PWMpins[MAX_NUMBER_SOFTWARE_PWM_PINS]; //C-style, old, MAX_NUMBER_SOFTWARE_PWM_PINS = 20; static to give it file scope only
static softPWMpin PWMpins[MAX_NUMBER_SOFTWARE_PWM_PINS]; //C++-style, old, MAX_NUMBER_SOFTWARE_PWM_PINS = 20; static to give it file scope only
//This function must be placed within an ISR, to be called at a fixed interval
void eRCaGuy_SoftwarePWMupdate()
{
//Forced nonatomic block (ie: interrupts *enabled*)
byte SREG_old = SREG; //[1 clock cycle]
interrupts(); //[1 clock cycle] turn interrupts ON to allow *nested interrupts* (ex: handling of time-sensitive timing, such as reading incoming PWM signals or counting Timer2 overflows)
{
//first, increment all counters of attached pins (ie: where the value != PIN_NOT_ATTACHED)
//pinMapArray
for (byte pin=0; pin<NUM_DIGITAL_PINS; pin++)
{
byte i = pinMapArray[pin]; //[2 clock cycles: 0.125us]; No need to turn off interrupts to read this volatile variable here since reading pinMapArray[pin] is an atomic operation (since it's a single byte)
if (i != PIN_NOT_ATTACHED) //if the pin IS attached, increment counter and decide what to do with pin...
{
//Read volatile variables ONE time, all at once, to optimize code (volatile variables take more time to read [I know] since their values can't be recalled from registers [I believe]).
noInterrupts(); //[1 clock cycle] turn off interrupts to read non-atomic volatile variables that could be updated simultaneously right now in another ISR, since nested interrupts are enabled here
unsigned long resolution = PWMpins[i].resolution;
unsigned long PWMvalue = PWMpins[i].PWMvalue;
volatile byte* p_PORT_out = PWMpins[i].p_PORT_out; //[0.44us raw: 5 clock cycles, 0.3125us]
interrupts(); //[1 clock cycle]
//handle edge cases FIRST (PWMvalue==0 and PMWvalue==topValue), since if an edge case exists we should NOT do the main case handling below
if (PWMvalue==0) //the PWM command is 0% duty cycle
{
fastDigitalWrite(p_PORT_out,PWMpins[i].pinBitMask,LOW); //write LOW [1.19us raw: 17 clock cycles, 1.0625us]
}
else if (PWMvalue==resolution-1) //the PWM command is 100% duty cycle
{
fastDigitalWrite(p_PORT_out,PWMpins[i].pinBitMask,HIGH); //write HIGH [0.88us raw; 12 clock cycles, 0.75us]
}
//THEN handle main cases (PWMvalue is > 0 and < topValue)
else //(0% < PWM command < 100%)
{
PWMpins[i].counter++; //not volatile
if (PWMpins[i].counter >= resolution)
{
PWMpins[i].counter = 0; //reset
fastDigitalWrite(p_PORT_out,PWMpins[i].pinBitMask,HIGH);
}
else if (PWMpins[i].counter>=PWMvalue)
{
fastDigitalWrite(p_PORT_out,PWMpins[i].pinBitMask,LOW); //write LOW [1.18us raw: 17 clock cycles, 1.0625us]
}
}
}
}
}
SREG = SREG_old; //restore interrupt enable status
}
更新(5/4/2015,8:58 PM):
我尝试通过对齐属性更改对齐方式。我的编译器是GCC。
这是我修改.h文件中的结构以添加属性(在最后一行中)。请注意,我还将结构成员的顺序更改为最大:
struct softPWMpin //C++ style
{
volatile unsigned long resolution;
volatile unsigned long PWMvalue; //Note: duty cycle = PWMvalue/(resolution - 1) = PWMvalue/topValue;
//ex: if resolution is 256, topValue is 255
//if PWMvalue = 255, duty_cycle = PWMvalue/topValue = 255/255 = 1 = 100%
//if PWMvalue = 50, duty_cycle = PWMvalue/topValue = 50/255 = 0.196 = 19.6%
unsigned long counter; //incremented each time update() is called; goes back to zero after reaching topValue; does NOT need to be volatile, since only the update function updates this (it is read-to or written from nowhere else)
volatile byte* volatile p_PORT_out; //pointer to port output register; NB: the 1st "volatile" says the port itself (1 byte) is volatile, the 2nd "volatile" says the *pointer* itself (2 bytes, pointing to the port) is volatile.
volatile byte pinBitMask;
// byte flags1;
// byte flags2;
} __attribute__ ((aligned));
来源:https://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/type-attributes.html
这是我迄今为止尝试过的结果:
__attribute__ ((aligned));
__attribute__ ((aligned(1)));
__attribute__ ((aligned(2)));
__attribute__ ((aligned(4)));
__attribute__ ((aligned(8)));
当我添加一个标志字节时,我似乎都没有解决我看到的问题。当离开标志字节评论2-8时,运行时的时间比133US更长,而Align 1没有区别(运行时停留133US),这意味着它已经发生了,属性未添加根本。此外,即使我使用2、4、8的对齐选项,sizeof(pwmvalue)函数仍然返回结构中的确切字节数,而没有其他填充。
...仍然不知道发生了什么...
更新,11:02 pm:
(请参见下面的评论)优化水平肯定会产生效果。例如,当我将编译器优化级别从-OS更改为-O2时,基本情况保持在133us(如前所述),Uncommunting Flags1给了我120us(vs 158us),而Uncomementing Flags1和Flags2同时给了我132us(vs 133us))。这仍然没有回答我的问题,但是我至少了解到存在优化水平以及如何更改它们。
上面段落的摘要:
Processing time of (of eRCaGuy_SoftwarePWMupdate() function)
Optimization No flags w/flags1 w/flags1+flags2
Os 133us 158us 133us
O2 132us 120us 132us
Memory Use (bytes: flash/global vars SRAM/sizeof(softPWMpin)/sizeof(PWMpins))
Optimization No flags w/flags1 w/flags1+flags2
Os 4020/591/15/300 3950/611/16/320 4020/631/17/340
O2 4154/591/15/300 4064/611/16/320 4154/631/17/340
更新(5/5/2015,4:05 PM):
- 刚刚更新了上面的表,并提供了更多详细信息。
- 在下面添加了资源。
资源:
GCC编译器优化级别的来源:
-https://gcc.gnu.org/onlinedocs/gcc/optimize-options.html
-https://gcc.gnu.org/onlinedocs/gnat_ugn/optimization-levels.html
-http://www.rapidtables.com/code/linux/gcc/gcc-o.htm
如何在Arduino IDE中更改编译器设置:
-http://www.instructables.com/id/arduino-ide-16x-compiler-optimisation-faster-faster/
结构包装上的信息:
-http://www.catb.org/esr/structure-packing/
数据对齐:
-http://www.songho.ca/misc/alignment/dataalign.html
编写8位ATMEL AVR Microcontroller
的有效C代码 -AVR035 AVR的高效C编码-DOC1497 -http://www.atmel.com/images/doc1497.pdf
-AVR4027提示和技巧,以优化8位AVR微控制器的C代码-DOC8453 -http://www.atmel.com/images/images/doc8453.pdf
其他可能对帮助我解决问题的信息:
对于没有标志(flags1和flags2评论)和操作系统优化
构建首选项(来自buildprefs.txt文件,Arduino吐出编译的代码):
对我来说:" c: users gabriel appdata local temp build8427371380606368699.tmp"
build.arch = AVR
build.board = AVR_UNO
build.core = arduino
build.core.path = C:Program Files (x86)Arduinohardwarearduinoavrcoresarduino
build.extra_flags =
build.f_cpu = 16000000L
build.mcu = atmega328p
build.path = C:UsersGabrielAppDataLocalTempbuild8427371380606368699.tmp
build.project_name = software_PWM_fade13_speed_test2.cpp
build.system.path = C:Program Files (x86)Arduinohardwarearduinoavrsystem
build.usb_flags = -DUSB_VID={build.vid} -DUSB_PID={build.pid} '-DUSB_MANUFACTURER={build.usb_manufacturer}' '-DUSB_PRODUCT={build.usb_product}'
build.usb_manufacturer =
build.variant = standard
build.variant.path = C:Program Files (x86)Arduinohardwarearduinoavrvariantsstandard
build.verbose = true
build.warn_data_percentage = 75
compiler.S.extra_flags =
compiler.S.flags = -c -g -x assembler-with-cpp
compiler.ar.cmd = avr-ar
compiler.ar.extra_flags =
compiler.ar.flags = rcs
compiler.c.cmd = avr-gcc
compiler.c.elf.cmd = avr-gcc
compiler.c.elf.extra_flags =
compiler.c.elf.flags = -w -Os -Wl,--gc-sections
compiler.c.extra_flags =
compiler.c.flags = -c -g -Os -w -ffunction-sections -fdata-sections -MMD
compiler.cpp.cmd = avr-g++
compiler.cpp.extra_flags =
compiler.cpp.flags = -c -g -Os -w -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD
compiler.elf2hex.cmd = avr-objcopy
compiler.elf2hex.extra_flags =
compiler.elf2hex.flags = -O ihex -R .eeprom
compiler.ldflags =
compiler.objcopy.cmd = avr-objcopy
compiler.objcopy.eep.extra_flags =
compiler.objcopy.eep.flags = -O ihex -j .eeprom --set-section-flags=.eeprom=alloc,load --no-change-warnings --change-section-lma .eeprom=0
compiler.path = {runtime.ide.path}/hardware/tools/avr/bin/
compiler.size.cmd = avr-size
一些大会:(OS,没有标志):
00000328 <_Z25eRCaGuy_SoftwarePWMupdatev>:
328: 8f 92 push r8
32a: 9f 92 push r9
32c: af 92 push r10
32e: bf 92 push r11
330: cf 92 push r12
332: df 92 push r13
334: ef 92 push r14
336: ff 92 push r15
338: 0f 93 push r16
33a: 1f 93 push r17
33c: cf 93 push r28
33e: df 93 push r29
340: 0f b7 in r16, 0x3f ; 63
342: 78 94 sei
344: 20 e0 ldi r18, 0x00 ; 0
346: 30 e0 ldi r19, 0x00 ; 0
348: 1f e0 ldi r17, 0x0F ; 15
34a: f9 01 movw r30, r18
34c: e8 5a subi r30, 0xA8 ; 168
34e: fe 4f sbci r31, 0xFE ; 254
350: 80 81 ld r24, Z
352: 8f 3f cpi r24, 0xFF ; 255
354: 09 f4 brne .+2 ; 0x358 <_Z25eRCaGuy_SoftwarePWMupdatev+0x30>
356: 67 c0 rjmp .+206 ; 0x426 <_Z25eRCaGuy_SoftwarePWMupdatev+0xfe>
358: f8 94 cli
35a: 90 e0 ldi r25, 0x00 ; 0
35c: 18 9f mul r17, r24
35e: f0 01 movw r30, r0
360: 19 9f mul r17, r25
362: f0 0d add r31, r0
364: 11 24 eor r1, r1
366: e4 59 subi r30, 0x94 ; 148
368: fe 4f sbci r31, 0xFE ; 254
36a: c0 80 ld r12, Z
36c: d1 80 ldd r13, Z+1 ; 0x01
36e: e2 80 ldd r14, Z+2 ; 0x02
370: f3 80 ldd r15, Z+3 ; 0x03
372: 44 81 ldd r20, Z+4 ; 0x04
374: 55 81 ldd r21, Z+5 ; 0x05
376: 66 81 ldd r22, Z+6 ; 0x06
378: 77 81 ldd r23, Z+7 ; 0x07
37a: 04 84 ldd r0, Z+12 ; 0x0c
37c: f5 85 ldd r31, Z+13 ; 0x0d
37e: e0 2d mov r30, r0
380: 78 94 sei
382: 41 15 cp r20, r1
384: 51 05 cpc r21, r1
386: 61 05 cpc r22, r1
388: 71 05 cpc r23, r1
38a: 51 f4 brne .+20 ; 0x3a0 <_Z25eRCaGuy_SoftwarePWMupdatev+0x78>
38c: 18 9f mul r17, r24
38e: d0 01 movw r26, r0
390: 19 9f mul r17, r25
392: b0 0d add r27, r0
394: 11 24 eor r1, r1
396: a4 59 subi r26, 0x94 ; 148
398: be 4f sbci r27, 0xFE ; 254
39a: 1e 96 adiw r26, 0x0e ; 14
39c: 4c 91 ld r20, X
39e: 3b c0 rjmp .+118 ; 0x416 <_Z25eRCaGuy_SoftwarePWMupdatev+0xee>
3a0: 46 01 movw r8, r12
3a2: 57 01 movw r10, r14
3a4: a1 e0 ldi r26, 0x01 ; 1
3a6: 8a 1a sub r8, r26
3a8: 91 08 sbc r9, r1
3aa: a1 08 sbc r10, r1
3ac: b1 08 sbc r11, r1
3ae: 48 15 cp r20, r8
3b0: 59 05 cpc r21, r9
3b2: 6a 05 cpc r22, r10
3b4: 7b 05 cpc r23, r11
3b6: 51 f4 brne .+20 ; 0x3cc <_Z25eRCaGuy_SoftwarePWMupdatev+0xa4>
3b8: 18 9f mul r17, r24
3ba: d0 01 movw r26, r0
3bc: 19 9f mul r17, r25
3be: b0 0d add r27, r0
3c0: 11 24 eor r1, r1
3c2: a4 59 subi r26, 0x94 ; 148
3c4: be 4f sbci r27, 0xFE ; 254
3c6: 1e 96 adiw r26, 0x0e ; 14
3c8: 9c 91 ld r25, X
3ca: 1c c0 rjmp .+56 ; 0x404 <_Z25eRCaGuy_SoftwarePWMupdatev+0xdc>
3cc: 18 9f mul r17, r24
3ce: e0 01 movw r28, r0
3d0: 19 9f mul r17, r25
3d2: d0 0d add r29, r0
3d4: 11 24 eor r1, r1
3d6: c4 59 subi r28, 0x94 ; 148
3d8: de 4f sbci r29, 0xFE ; 254
3da: 88 85 ldd r24, Y+8 ; 0x08
3dc: 99 85 ldd r25, Y+9 ; 0x09
3de: aa 85 ldd r26, Y+10 ; 0x0a
3e0: bb 85 ldd r27, Y+11 ; 0x0b
3e2: 01 96 adiw r24, 0x01 ; 1
3e4: a1 1d adc r26, r1
3e6: b1 1d adc r27, r1
3e8: 88 87 std Y+8, r24 ; 0x08
3ea: 99 87 std Y+9, r25 ; 0x09
3ec: aa 87 std Y+10, r26 ; 0x0a
3ee: bb 87 std Y+11, r27 ; 0x0b
3f0: 8c 15 cp r24, r12
3f2: 9d 05 cpc r25, r13
3f4: ae 05 cpc r26, r14
3f6: bf 05 cpc r27, r15
3f8: 40 f0 brcs .+16 ; 0x40a <_Z25eRCaGuy_SoftwarePWMupdatev+0xe2>
3fa: 18 86 std Y+8, r1 ; 0x08
3fc: 19 86 std Y+9, r1 ; 0x09
3fe: 1a 86 std Y+10, r1 ; 0x0a
400: 1b 86 std Y+11, r1 ; 0x0b
402: 9e 85 ldd r25, Y+14 ; 0x0e
404: 80 81 ld r24, Z
406: 89 2b or r24, r25
408: 0d c0 rjmp .+26 ; 0x424 <_Z25eRCaGuy_SoftwarePWMupdatev+0xfc>
40a: 84 17 cp r24, r20
40c: 95 07 cpc r25, r21
40e: a6 07 cpc r26, r22
410: b7 07 cpc r27, r23
412: 48 f0 brcs .+18 ; 0x426 <_Z25eRCaGuy_SoftwarePWMupdatev+0xfe>
414: 4e 85 ldd r20, Y+14 ; 0x0e
416: 80 81 ld r24, Z
418: 90 e0 ldi r25, 0x00 ; 0
41a: 50 e0 ldi r21, 0x00 ; 0
41c: 40 95 com r20
41e: 50 95 com r21
420: 84 23 and r24, r20
422: 95 23 and r25, r21
424: 80 83 st Z, r24
426: 2f 5f subi r18, 0xFF ; 255
428: 3f 4f sbci r19, 0xFF ; 255
42a: 24 31 cpi r18, 0x14 ; 20
42c: 31 05 cpc r19, r1
42e: 09 f0 breq .+2 ; 0x432 <_Z25eRCaGuy_SoftwarePWMupdatev+0x10a>
430: 8c cf rjmp .-232 ; 0x34a <_Z25eRCaGuy_SoftwarePWMupdatev+0x22>
432: 0f bf out 0x3f, r16 ; 63
434: df 91 pop r29
436: cf 91 pop r28
438: 1f 91 pop r17
43a: 0f 91 pop r16
43c: ff 90 pop r15
43e: ef 90 pop r14
440: df 90 pop r13
442: cf 90 pop r12
444: bf 90 pop r11
446: af 90 pop r10
448: 9f 90 pop r9
44a: 8f 90 pop r8
44c: 08 95 ret
这几乎可以肯定是一个对齐问题。从结构的大小来看,您的编译器似乎会自动包装它。
LDR
指令将4字节值加载到寄存器中,并在4字节边界上操作。如果它需要加载不在4字节边界上的内存地址,则实际上执行了两个负载并将它们结合在一起以在该地址获得值。
例如,如果要在0x02
上加载4字节值,则处理器将进行两个负载,因为0x02
不会落在4字节边界上。
假设我们在地址0x00
上有以下内存,我们希望将0x02
处的4字节值加载到寄存器r0
:
Address |0x00|0x01|0x02|0x03|0x04|0x05|0x06|0x07|0x08|
Value | 12 | 34 | 56 | 78 | 90 | AB | CD | EF | 12 |
------------------------------------------------------
r0: 00 00 00 00
它将首先在0x00
上加载4个字节,因为那是包含0x02
的4字节段,并将2个字节存储在0x02
和0x03
中:
Address |0x00|0x01|0x02|0x03|0x04|0x05|0x06|0x07|
Value | 12 | 34 | 56 | 78 | 90 | AB | CD | EF |
Load 1 | ** ** |
------------------------------------------------------
r0: 56 78 00 00
然后,它将在0x04
上加载4个字节,即下一个4字节段,并将2个字节存储在0x04
和0x05
中。
Address |0x00|0x01|0x02|0x03|0x04|0x05|0x06|0x07|
Value | 12 | 34 | 56 | 78 | 90 | AB | CD | EF |
Load 2 | ** ** |
------------------------------------------------------
r0: 56 78 90 AB
您可以看到,每次您要在0x02
上访问该值时,处理器实际上都必须将您的指令分为两个操作。但是,如果您想在0x04
上访问该值,则处理器可以在一个操作中进行:
Address |0x00|0x01|0x02|0x03|0x04|0x05|0x06|0x07|
Value | 12 | 34 | 56 | 78 | 90 | AB | CD | EF |
Load 1 | ** ** ** ** |
------------------------------------------------------
r0: 90 AB CD EF
在您的示例中,在flags1
和flags2
都注释出来,您的结构的大小为15。这意味着您数组中的每个第二个结构都会在奇数地址中,所以它都不是指针或长会员将正确对齐。
通过引入flags
变量之一,您的结构的大小增加到16,这是4个倍数。这确保了所有结构在4个字节的边界上开始,因此您可能不会遇到对齐问题。
可能有一个编译器标志可以帮助您解决这个问题,但是总的来说,很高兴知道您的结构的布局。对齐是一个棘手的问题,只有符合当前标准的编译器具有明确的行为。
- 需要将此代码更改为执行代码
- 当前不会命中断点。没有调试器目标代码类型的可执行代码与此文件关联
- 单步执行代码时重复上一行
- 如何使用介子在C++中执行代码覆盖?
- Visual Studio 2017,C++,在单步执行代码时指向错误的行
- 在 R 中执行C++代码
- 通过 dll 注入在主线程中执行代码
- 无法在 c++ 中循环后执行代码
- 执行C 代码时快速频繁的文件访问
- 执行 C++ 代码后出错
- 一个人如何从代表函数的字符串中执行运行时执行C 代码
- 是否可以在程序崩溃后执行代码?
- 第一次在 Linux 上执行 c++ 代码的时间非常慢
- 计算 JSON 中的条目数并相应地执行代码
- 将在 CATCH 块之后执行代码
- 分析执行C++代码的每一行所花费的确切时间
- 我在执行代码时不断得到"Bus Error"?
- c while()..执行代码行的条件
- 从并行线程在主 Maya 线程上执行代码
- 在调用GNUPLOT之后,如何继续执行C 代码