BlackDog All American 15654 Posts user info edit post |
Quote : | "Intel Outlines New Tech to Boost Performance of Single-Threaded Software on Multi-Core Chips." |
http://www.xbitlabs.com/news/cpu/display/20100519235548_Intel_Outlines_New_Tech_to_Boost_Performance_of_Single_Threaded_Software_on_Multi_Core_Chips.html
Quote : | "Intel to Use Multi-Core Chips to Boost Performance of Single-Threaded Apps [05/19/2010 11:55 PM] by Anton Shilov
The modern trend of microprocessors’ development is focused around creation of devices with as many cores as possible. However, there are algorithms that cannot benefit from many-core architectures or multi-threading execution. In order to boost performance of single-threaded applications on multi-core microprocessors, Intel Corp. recently outlined the technology called “Anaphase”.
Researchers from Intel Labs Barcelona have developed “Anaphase” technology, which is a novel hardware/software hybrid approach to leverage multiple cores in order to improve single-thread performance on multi-core processors. This research focuses on different speculative techniques to automatically partition single thread applications to be processed on multiple cores.
The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: mechanisms to support multiple versions; mechanisms to detect violations among threads; mechanisms to reconstruct the original sequential order; and mechanisms to checkpoint the architectural state and recovery to handle misspeculations. On the hardware side, a new unit called “Inter-Core Memory Coherency Module” (ICMC) could be integrated into the die of future processors.
According to Intel, the proposed hardware/software scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a so-called “tiny-core” (Intel did not reveal, what tiny-core actually is, but it may potentially be a part of the company’s SSC 48-core processor), and up to 2.6 times for some selected applications.
At the present Anaphase is a research project and the Intel Labs Barcelona researchers are looking into ways how to potentially integrate this technology into future processor designs.
Considering that Intel is working on numerous many-core designs, including Larrabee x86 graphics processor and 48-core supercomputer on a chip (SCC) prototype, the ICMC may indeed be a useful piece of hardware. In fact, not only for Intel. Both ATI, graphics business unit of Advanced Micro Devices, and Nvidia Corp. are working hard to implement their many-core graphics processing units (GPUs) into various high-performance computing (HPC) segments. Although raw horsepower is more important for HPC that performance of single-threaded apps, as general purpose processing on GPUs (GPGPU) becomes more popular on different markets, hardware/software tricks to speed up single-threaded algorithms may become necessary." |
I wonder if this will allow games with poor multi-core support perform better, such as Crysis or GTA IV?5/20/2010 7:31:22 PM |
jbtilley All American 12797 Posts user info edit post |
I thought GTA IV was one of the few games where you had to have a quad core processor to get any decent performance.
Edit: Oh, reading comprehension. I guess you're asking if this will help games that don't attempt to take advantage of multiple cores get the same boost in performance as games that do, like Crysis and GTA IV.
[Edited on May 21, 2010 at 9:45 AM. Reason : -] 5/21/2010 9:31:21 AM |
sirpoovey New Recruit 5 Posts user info edit post |
This provides very few details about their "novel" approach. However, to me this sounds like they are just bringing Thread Level Speculation to their hardware. Do a search in google scholar for TLS or thread level speculation. This is an idea that's been around since about 2000 but has never reached mainstream fabrication. James Tuck in the ECE department at NCSU has done some extensive research in this domain (http://people.engr.ncsu.edu/jtuck/pub.html). Also when he was a student in UIUC, his research group did a lot of work in TLS. (http://iacoma.cs.uiuc.edu/)
Essentially the idea is to speculatively create threads either at run-time or via the compiler to speculatively run loop iterations or subroutines in hopes that their will not be memory violations. Quite a bit of research has focused on handling conflicts in software and hardware using memory signatures to detect violations. It seems Intel here is opting for a hardware based approach using the "Inter-Core Memory Coherency Module."
About whether or not it will help games is questionable. TLS is useful for very specific applications. It is extremely useful when there is a lot of parallelism available that the programmer has not exploited or when it is too difficult to exploit because it is hard to prove there are no conflicts or races. Also, if conflicts are rare (i.e. a conditional in the code that doesn't happen often but causes a violation between loop iterations) then TLS can provide parallelism in the common case.
When it comes to game applications a large amount of the "easy" parallelism is exploited at the GPU since frame rendering is a very task parallel type of parallelism. I believe it potentially "could" provide some improvement for games, but the target for this type of research is largely more scientific based, and these are the type of benchmarks and workloads that these techniques are evaluated on. (e.g. SPEC 2k6) 5/21/2010 12:22:32 PM |
Prospero All American 11662 Posts user info edit post |
i have no idea what you just said. 5/21/2010 12:30:59 PM |
sirpoovey New Recruit 5 Posts user info edit post |
Here's a dumb, but simple example:
int a[100]; int b[100];
for (int i=1; i < 100; i++) { if (i == 50) { b[58] = 5; } a[i] = b[i]; }
Here every iteration of that loop could be done in parallel without any conflicts, except that iteration 58 depends on iteration 50. Because of that race condition, then a compiler would detect a "loop-carry" dependence that means that the loop could not be turned into a parallel loop.
However in TLS all iterations are done in parallel and at runtime the conflict at iteration 58 will be detected when it sees that another thread wrote to a location that it read. Iteration 58 would then roll back and re-execute its iteration using the new value. 5/21/2010 12:40:33 PM |
BlackDog All American 15654 Posts user info edit post |
Quote : | "This provides very few details about their "novel" approach. However, to me this sounds like they are just bringing Thread Level Speculation to their hardware. Do a search in google scholar for TLS or thread level speculation. This is an idea that's been around since about 2000 but has never reached mainstream fabrication. James Tuck in the ECE department at NCSU has done some extensive research in this domain (http://people.engr.ncsu.edu/jtuck/pub.html). Also when he was a student in UIUC, his research group did a lot of work in TLS. (http://iacoma.cs.uiuc.edu/)
Essentially the idea is to speculatively create threads either at run-time or via the compiler to speculatively run loop iterations or subroutines in hopes that their will not be memory violations. Quite a bit of research has focused on handling conflicts in software and hardware using memory signatures to detect violations. It seems Intel here is opting for a hardware based approach using the "Inter-Core Memory Coherency Module."
About whether or not it will help games is questionable. TLS is useful for very specific applications. It is extremely useful when there is a lot of parallelism available that the programmer has not exploited or when it is too difficult to exploit because it is hard to prove there are no conflicts or races. Also, if conflicts are rare (i.e. a conditional in the code that doesn't happen often but causes a violation between loop iterations) then TLS can provide parallelism in the common case.
When it comes to game applications a large amount of the "easy" parallelism is exploited at the GPU since frame rendering is a very task parallel type of parallelism. I believe it potentially "could" provide some improvement for games, but the target for this type of research is largely more scientific based, and these are the type of benchmarks and workloads that these techniques are evaluated on. (e.g. SPEC 2k6)" |
thanks for your input, pretty cool TLS is being studied at NCSU5/21/2010 12:53:05 PM |
gs7 All American 2354 Posts user info edit post |
Quote : | "I believe it potentially "could" provide some improvement for games, but the target for this type of research is largely more scientific based, and these are the type of benchmarks and workloads that these techniques are evaluated on. (e.g. SPEC 2k6)" |
Really good post, thanks ... and with regard to your last point, I believe the AI and physics calculations could benefit greatly from a multi-core implementation. So while it may not directly affect the performance of older games, it may give more growing room for newer games to have more complexity. Am I correct in that assertion?5/24/2010 1:00:43 PM |
sirpoovey New Recruit 5 Posts user info edit post |
I agree with you about the AI and physics engines. In SPEC 2006 which they claim an average 10% speedup there are several physics applications. Gromacs is an n-body simulation. Raytracing (povray) is also a benchmark in just about every benchmark suite imaginable in the architecture community today. So, yes, I think there is an opportunity, but from that article it's really hard to guess how much opportunity there is. (I'm curious what the "selected applications" are)
Ref: SPEC 2006 Benchmark List Integer - http://www.spec.org/cpu2006/CINT2006/ Floating Point - http://www.spec.org/cpu2006/CFP2006/ 5/24/2010 4:05:41 PM |