• Application : Logical group of multiple participating process
  • App Domain: Logical group of assemblies loaded under a process address space .
  • Process: Host one one more app domain From user view point this is what you see in you task manager.
  • Threads: One process may hosts one or more thread. Each thread carry responsibility of executing just one task (From programmer view point it is a function with some code)
    • GUI Thread: Application main thread (GUI application only) and process all the application event like mouse click. More importantly it has a message pump that other threads do not have. We are not supposed to perform any long running task on GUI Thread.
    • Foreground thread :Whenever an application creates a thread that thread is marked as Foreground thread. It is mainly used of mission critical task like account transaction that can not be stopped in between.CLR has to wait for all foreground thread to exit before shutting down the application however this do not apply if main thread throws an unhanded exception.
    • Background thread: Technically almost like foreground thread but with some special information that mark is a background thread. By creating background thread you are declaring CLR that this thread is not mission critical and can be terminated prematurely at the time of application shutdown otherwise CLR has to wait till all the threads exits.  
  • Fiber: It is further division of thread in terms of work units, In lay man view point it is actually child thread parent thread takes care of all the responsibility of rescheduling the fibers running under it`s context.A fiber must be manually scheduled by the application. Fibers run in the context of the threads that schedule them. Each thread can schedule multiple fibers. In general, fibers do not provide advantages over a well-designed multithreaded application. However, using fibers can make it easier to port applications that were designed to schedule their own threads. CPU does not know anything about fiber, it is all CLR tweak.
  • Thread Pool : CLR mechanism of optimizing thread reuse. CLR provides one dedicated thread pool to each running application and application that want to use the thread pool simply puts job (a function win input data)in thread pool execution queue.
  • Context Switching : Process of shifting actual CPU execution from one thread to other. It is an overhead for underlying CPU.
  • Timer : A mechanism of queuing up some executable task either in main thread execution queue or a thread pool.
  • Threading.Tasks : By default task uses almost same mechanism that thread pool uses but unlike thread pool it offer mechanism such as progress reporting and cancellation. 
  • Job Object : A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. Operations performed on the job object affect all processes associated with the job object. Read More...
  • Execution Context : A Data Structure associated with thread that flows to child thread and contains all user and security information.
  • User Mode Synchronization Construct : Provided by CLR or compiler as special CPU instruction applicable within process.
  • Kernel Mode Synchronization Construct: Provided by OS and visible across process and app domains.
  • Primitive Construct : simplest construct that are not written on the top of construct.
  • Sync Blocks Optional data structure that can be associated with any heap object and mostly used for thread synchronization, it actually Dot.Net representation of Win32 CRITICAL_SECTION structure.

  • User-mode scheduling (UMS) is a lightweight mechanism that applications can use to schedule their own threads. An application can switch between UMS threads in user mode without involving the system scheduler and regain control of the processor if a UMS thread blocks in the kernel. UMS threads differ from fibers in that each UMS thread has its own thread context instead of sharing the thread context of a single thread. The ability to switch between threads in user mode makes UMS more efficient than thread pools for managing large numbers of short-duration work items that require few system calls. Read more...
  • NUMA Architecture:Stands for non-uniform memory access (NUMA) to increase processor speed without increasing the load on the processor bus.In a NUMA system, CPUs are arranged in smaller systems called nodes. Each node has its own processors and memory, and is connected to the larger system through a cache-coherent interconnect bus. Read more...
  • Sleep :- Stops immediately and put thread in wait condition. 
  • Thread.Suspend and Resume :does not cause a thread to immediately stop execution. The common language runtime must wait until the thread has reached a safe point before it can suspend the thread. A thread cannot be suspended if it has not been started or if it has stopped. For details on safe points, Suspend and Resume is now Obsolete
  • Yield : What This method is called it tells OS that calling thread do not want to be scheduled and CPU can start executing another thread by suspending it.  The operating system selects the thread to yield to.
  • Wait Always use with some synchronization construct and halt the execution of threat till  synchronization construct  is signaled.
  • Cache Lnes :Sequential memory around the request location is called cache lines that is usually loaded when CPU request to memory address. Technically this reduce page fault in cache.
When data is read from memory, the requested data as well as data around it (referred to as a cache line) is loaded from memory into the caches, then the program is served from the caches. This loading of a whole cache line rather than individual bytes can dramatically improve application performance. On our laptop the cache line size for both L1 and L2 is 64 bytes. Since applications frequently read bytes sequentially in memory (common when accessing arrays and the like), applications can avoid hitting main memory on every request by loading a series of data in a cache line, since it's likely that the data about to be read has already been loaded into the cache. However, this does mean that a developer needs to be cognizant of how the application accesses memory in order to take the greatest advantage of the cache.

  • Cache Coherency problem :If there is a multi core system then both core has its one cache, now if two thread are trying to R-W same data them it is quite possible that two cache have different copy of data. This is also called false sharing. http://msdn.microsoft.com/en-us/magazine/cc872851.aspx

  • Volatile read/write: It is a solution to cache coherency problem. Once volatile is applied then CLR add instructions to clear the cache lines ASAP any write is performed.
  • Sync Block: It is a data structure in CLR that is equivalent to win32 CRITICAL_SECTION and virtually associated to each object in managed heap  . Monitor Class  uses this sync block data structure and thus lock statement too. 
basic idea is that every object in the heap has a data structure associated with it (similar to a Win32 RITICAL_SECTION  structure)  that can be used as a thread synchronization lock.

Now obviously, associating a CRITICAL_SECTION field (which is approximately 24 bytes on a 32-bit system and about 40 bytes on a 64-bit system) with every object in the heap is quite wasteful, especially since most objects never require thread-safe access to them. To reduce memory usage, the CLR team uses a more efficient way to offer the functionality just described. Here's how it works: When the CLR initializes, it allocates an array of sync blocks. 

sync block is a chunk of memory that can be associated with an object. Each sync block contains the same fields that you would find in a Win32 CRITICAL_SECTION structure.