Silicon photonics newcomer, each with its own tricks.
With the advent of the era of big data, the challenge of data transmission is becoming increasingly significant. The industry's demand for high-speed, high-density, low-power, and cost-effective network solutions has surged dramatically. As a breakthrough technology, silicon photonics is gradually becoming the focus of attention. It seems that every few months, another startup emerges, promising to provide high bandwidth over longer distances while using less power than copper interconnects.
According to forecasts by the renowned market research institution LightCounting, by 2022, silicon photonic technology is expected to comprehensively surpass traditional optical modules in peak speed per second, energy consumption, and cost; and by 2024, the market value of silicon photonic modules is projected to reach 6.5 billion US dollars, accounting for as much as 60% of the market share. In other words, setting aside existing electronic modules, future optical modules will be largely replaced by silicon photonic technology.
For the field of silicon photonic interconnects, which is still in the process of accumulating development, competition has gradually become fierce. For the next roadmap of silicon photonic technology, one can refer to the article "The Next Generation Technology Roadmap for Silicon Photonics." In this article, we merely compile some ideas from new players in the silicon photonics field.
Ayar Labs TeraPHY
On December 15, 2020, the startup Ayar Labs showcased its first available TeraPHY, an optical I/O chiplet manufactured using GlobalFoundries' 45nm silicon photonic process.
Advertisement
In terms of application fields, the TeraPHY chiplet can be co-packaged with devices such as Ethernet switch chips, general-purpose processors (CPUs), graphics processing units (GPUs), AI processors, and field-programmable gate arrays (FPGAs). Ayar Labs stated that it is working on integrating optical components into Ethernet switch chips, which is the most relevant application with co-packaged optical components, but its focus is on artificial intelligence, high-performance computing, and aerospace applications.
Unlike some other companies, Ayar Labs focuses on "optically enabled computing," specifically addressing the bandwidth-distance bottlenecks faced by traditional computer architectures that rely on electronic host ASICs, rather than optical computing.
Ayar Labs stated that with the rapid growth of computing demands, these ASICs have been trying to transfer more and more bandwidth between different packages. Moreover, although much of the communication within data centers is conducted in the optical domain, most package-to-package communication is still carried out through copper interconnects. This creates a trade-off: the more copper in the electrical domain, the greater the I/O distance, the less the bandwidth between chips, and the lower the latency and power requirements.
Ayar Labs hopes to break this bottleneck through "a new type of photonic integration, technology, and product," truly promoting the transition between electrical and optical, and as close to the main core ASIC as possible. It is reported that one element of this product is an optical I/O chiplet called TeraPHY. This chiplet is designed as a flip-chip, connected to the ASIC (such as Nvidia GPU) on the same substrate, and placed directly next to the ASIC. The light carrying data is input (or output) to a single crystal through a single-mode fiber connection, where the micro-ring resonator system decodes the optical signal into an electrical signal.Intel and Ayar Labs previously detailed a Stratix 10 FPGA co-packaged with two TeraPHYs for phased array radar design, as part of the U.S. government-supported DARPA PIPES and the Electronics Resurgence Initiative. Adding optical I/O chiplets to FPGAs is suitable for a variety of aerospace applications, including avionics, satellites, and electronic warfare.
The TeraPHY demonstrated by Ayar Labs uses 8 transmitter-receiver pairs, with each pair supporting 8 channels operating at 16, 25, or 32 gigabits per second (Gbps) to achieve up to 2.048 terabits of optical I/O. The chip can use a serial electrical interface or Intel's Advanced Interface Bus (AIB), which is a wide bus design using slower 2Gbps channels. The latest TeraPHY uses a 32Gbps non-return-to-zero (NRZ) serial interface, and Saleh stated that the company is developing a 56Gbps version.
Additionally, the company also demonstrated 4-level pulse amplitude modulation (PAM-4) technology, but many applications require as low latency as possible. "PAM-4 gives you a higher data rate, but it comes with the burden of forward error correction," said Saleh. With PAM-4 and forward error correction, the latency is in the hundreds of nanoseconds (ns), while with an NRZ link, the latency is 5 nanoseconds.
Ayar Labs' next AIB-based parallel I/O TeraPHY design will use Intel's AIB 1.0 specification and will use 16 tiles, each with 80 2Gbps channels, to achieve a 2.5Tbps electrical interface. In contrast, the TeraPHY used with the Stratix 10 FPGA has 24 AIB tiles, each with 20 2Gbps channels, for a total electrical bandwidth of 960 GB, while its optical I/O is 2.56Tbps, as it uses 10 send-receive pairs.
The optical bandwidth is intentionally higher than the electrical bandwidth. First, not all send-receive macros on the chip need to be used. Second, the chip has a crossbar switch that allows one-to-many connections, so that electrical channels can be sent on multiple optical interfaces, and vice versa.
Ayar Labs notes that the focus of the chip is to leverage the high bandwidth of the host SoC (system on a chip) to convert it into the optical domain as quickly as possible, and then to move a large amount of bandwidth in a more scalable way over long distances with low energy.
The miniaturization and integration level achieved by the micro-resonator architecture allow the optical-to-electrical/electrical-to-optical conversion to occur at "a denser starting point, ultimately becoming more energy-efficient. Traditional products (such as pluggable transceivers) are targeted at a more mature optical communication market.
Another core component of Ayar Labs' system is the company's SuperNova laser light source, which is located on a different chip and can produce 16 wavelengths of light, transmitted to 16 optical fibers (each fiber itself can carry 16 wavelengths). This separates the light source from the ASIC package, and the company believes this will provide more flexible deployment across applications and easier on-site part replacement.
Lightmatter Passage
On October 27, 2020, Lightmatter announced the launch of Lightmatter Passage—a wafer-scale programmable photonic interconnect that allows heterogeneous chip arrays (CPUs, GPUs, memory, accelerators) to communicate with each other at an unprecedented speed. Passage realizes the reality of on-chip rack interconnects, providing fully reconfigurable connection topologies between chips, thereby reducing the cost and complexity of building heterogeneous computing systems.The unique design of Passage encapsulates 40 switchable integrated photonic channels into the same space that traditionally only supports a single optical fiber. Passage is the first product in a multi-year interconnect roadmap that continuously improves performance, achieving 1Tbps dynamically reconfigurable interconnect on a 48-chip array with dimensions of 8 inches x 8 inches, with a maximum communication latency of 5 nanoseconds. The result is higher bandwidth communication at lower energy, without the need for expensive fiber-to-chip packaging processes. This architectural approach provides a proven path to inter-chip communication with 100Tbps bandwidth, which is 100 times the most advanced photonic interconnect solution currently available.
Before announcing Passage, Lightmatter launched its artificial intelligence (AI) photonic computer chip in August 2020: a universal AI inference accelerator that uses light to compute and transmit data, thereby reducing heat and energy consumption and improving computational performance by an order of magnitude. Passage is capable of integrating this chip with various other chips to achieve high-speed computing systems at the single wafer level. This system directly meets the urgent demand for faster, more energy-efficient (super) computers that can support the next generation of AI inference and training workloads.
Lightmatter hopes to disrupt the advanced packaging game with Passage. Passage connects to 48 customer chips on an optical interposer. Passage is built on GlobalFoundries Fotonix 45CLO process technology. It is designed to connect many chips with very high bandwidth and performance. This optical interposer breaks the bandwidth limit, providing 768 terabits per second between each tile, and can be expanded to multiple interposers at 128 terabits per second, a capability and scale level that traditional packaging cannot achieve.
The pluggable optical devices that Lightmatter calls Gen 1 have been used for years to connect switches within data centers. Due to companies such as Intel and Ayar Labs, the 2nd and 3rd generation optical devices (placing optical devices in the same package or directly connected) are beginning to enter the network switch and computing fields. Lightmatter wants to jump directly to the 4th and 5th generation with Passage.
The scale of standard joint packaging optical devices such as Intel and Ayar Labs is an order of magnitude lower than the optical interposer solution used by Lightmatter. Its interconnect density is 40 times higher because only about 200 optical fibers can be inserted into a single chip. While Passage has a dynamically configurable structure, the interconnect is completely static. This optical interposer can switch and route between chips, and the entire interconnect can be reconfigured within 1ms.
Lightmatter states that they can support all topologies, such as all to all, 1D ring, Torus, Spine, and Leaf, etc. The maximum latency between any chip and any other chip on the 48-chip array with Passage's switching and routing is 2ns, and the switching is achieved by modulating color using ring resonators and guiding them using Mach-Zehnder interferometers.
Lightmatter's photonic wafer-level interposer has A0 silicon and claims to use less than 50 watts of power per site. Each site has 8 hybrid lasers driving 32 channels; each channel operates at 32Gbps NRZ.
Lightmatter's wafer-level silicon photonic chip mainly uses silicon-based manufacturing technology; it has many of the same limitations. That is, the linewidth limit of lithography tools. GlobalFoundries and Lightmatter solved this problem by stitching waveguides. The inter-mask connection of nanophotonic waveguides has a loss of only 0.004 dB at each mask cross. The waveguide loss is 0.5 dB/cm, and the loss of each Mach-Zehnder interferometer is 0.08 dB. There is also a loss of 0.028 dB for each cross.
Lightmatter states that with UCIe, they can operate the highest specification of 32Gbps chiplet to interposer interconnect. If using direct SERDES, they believe they can operate at a speed of 112G. The customer ASIC is 3D packaged on the interposer. Then OSAT will assemble this final product. It can have multiple variants, from a smaller interposer with only 8 chips to 48 chips. The Passage package must also power the chips packaged on top. It does this by providing up to 700W of power per tile using TSV. Water cooling is required at this power level, but if the customer ASIC consumes less, they can use air cooling.
Lightmatter also gave an example of a decomposed memory design and multi-tenant architecture. They started with their interposer that can support any protocol, including CXL. The customer ASIC on the top of the interposer can achieve an air gap through the reconfigurable network, so it is impossible to pass data between specific chips. The biggest question is whether the product will appear and when it will appear. It may just be vaporware, or it may be the future of high-end leading classified server designs. Lightmatter must attract other companies to manufacture chips for this platform. These companies must trust their expensive development with an unproven partner.Lightelligence Hummingbird
On June 29th, Lightelligence launched the world's first chip-on-optic-network (oNOC) processor, Hummingbird, specifically designed for the following fields. It utilizes advanced vertical stacking packaging technology to integrate photonic chips and electronic chips into a single package, serving as a communication network for data centers and other high-performance applications.
Hummingbird is the second product in Lightelligence's photonic computing product suite. Its Photonic Arithmetic Computing Engine (PACE) platform was released at the end of 2021, leveraging customized 3D packaging and seamless collaborative design to fully integrate photonics and electronics in a compact form factor.
Hummingbird is the first product in the series utilizing Lightelligence's oNOC platform, which achieves innovative interconnect topologies through silicon photonics, significantly enhancing computational performance. Its waveguides propagate signals at the speed of light and utilize a full broadcast network of data to each core on a 64-core domain-specific AI processor chip, giving Hummingbird a significant advantage over traditional digital interconnect solutions in terms of reduced latency and power consumption.
The challenge of computational scaling inspired the creation of optical interconnect solutions. Unlike digital networks, Hummingbird's oNOC technology enhances density scaling by enabling interconnect topologies that were previously unattainable.
In oNOC, power consumption and latency are almost unaffected by distance, making the technology highly suitable for developing new, more powerful topologies that do not rely on nearest-neighbor communication. oNOC topologies like Hummingbird achieve higher computational power utilization even in a single electronic IC configuration due to more efficient communication. With oNOC, mapping workloads to hardware becomes easier, and there is greater freedom to choose the right topology for computational tasks.
In Hummingbird, Lightelligence implemented a low-latency optical full-broadcast network across 64 cores. With 64 transmitters and 512 receivers, it provides a framework for implementing various dense optical network topologies.
Hummingbird's electronic and photonic ICs are co-packaged and integrated into a PCIe form factor, which can be installed in industry-standard servers. Combined with Lightelligence's Software Development Kit (SDK), it optimizes machine learning and artificial intelligence workloads to fully leverage oNOC. The oNOC and Hummingbird IP can also be customized for other unique workloads and applications.
It is reported that future generations of Hummingbird will adopt reticle-stitching to support a chiplet architecture, enabling better scalability, improved energy efficiency, and further reduction of bottlenecks.
Celestial Photonic FabricAfter more than a year of silence, Celestial AI has made a comeback, announcing a new type of silicon photonic interconnect that covers the entire field from chip to chip. This includes connections from chip to chip, package to package, and node to node.
When it first emerged at the beginning of last year, Celestial AI focused on building an artificial intelligence accelerator called Orion, which would use optical interconnect technology. Since then, the company's focus has shifted to licensing its photonic structures to chip manufacturers.
In terms of underlying technology, Celestial Photonic Fabric is based on the combination of silicon photonics and advanced CMOS technology, designed in collaboration with Broadcom, and using TSMC's 4-nanometer and 5-nanometer process technologies.
The most advanced form of interconnect involves stacking third-party ASICs or SoCs on an optical interposer layer, or using the company's optical multi-chip interconnect bridge (OMIB) packaging technology to transfer data between chips. To us, this sounds a lot like what Lightmatter is doing with Passage, which we saw not long ago, but Lazovsky insists that Celestial's technology is several orders of magnitude more efficient and can easily support hundreds of watts of heat. Whether this is true or not remains to be seen.
For the initial design, Celestial's Photonic Fabric uses 56 Gb/s SerDes. The company claims that each node has four ports, each port has four channels, and it can achieve about 1.8 Tb/s per square millimeter. Lazovsky claims: "If you want to interconnect to a quad (four HBM stacks in a module), we can easily match the full HBM3 bandwidth."
For its second-generation photonic structures, Celestial is moving to 112 Gb/s SerDes and increasing the number of channels from 4 to 8, effectively quadrupling the bandwidth to 7.2 Tb/s per mm².
To extract the maximum bandwidth provided by Celestial's photonic structure means considering the company's optical interposer layer or OMIB when designing the chip. According to Lazovsky, this essentially requires replacing the existing PHY with its own technology. Nevertheless, the interconnect does not rely on proprietary protocols (although it can work with these protocols), but is designed with Compute Express Link (CXL), Universal Chiplet Interconnect Express (UCIe), PCIe, and JEDEC HBM in mind.
The company acknowledges that the technology "looks very similar to Ayar Labs' TeraPHY," and Photonic Fabric can also be deployed as a chiplet as well as a PCI-Express add-in card. PCI-Express is arguably the most practical because it does not require chip manufacturers to re-architect their chips to support Celestial's interposer, nor does it rely on the still-nascent UCIe protocol for chip-to-chip communication.
The downside of PCI-Express is that it is a significant bottleneck. Although Celestial's optical devices can provide massive bandwidth, the maximum speed of the X16 PCI-Express 5.0 interface in each direction is about 64 GB/s. If we had to guess, this option does exist as a proof of concept to familiarize customers with the technology.
The company claims that this chiplet architecture can provide higher bandwidth but is still subject to the bottleneck of the approximately 14.4 Tb/s UCIe interface. We would note that UCIe still has a long way to go before it is ready for prime time, but it sounds like the chiplet can also be used with chip manufacturers' proprietary structures.Of course, the challenges facing such optical interconnects have not changed. Unless your urgent need for bandwidth far exceeds what can be achieved with copper wires, there are a plethora of existing and well-tested technologies available for physically piecing together small chips. TSMC's CoWoS packaging technology is just one example.
However, over longer distances, even between packages, optical devices begin to make more sense, especially in bandwidth-sensitive HPC and workloads oriented towards AI/ML. This is one of the first practical use cases that Celestial sees for photonic structures.
The company says that because the interconnect supports Compute Express Link (CXL), it can be used to share HBM3 memory. This concept is similar to the CXL memory pooling we have discussed in detail in the past. The idea is that multiple hosts can connect to memory devices as if they were connected to a shared storage server. Due to HBM's astonishing memory bandwidth (up to 819 GB/s), it can be placed at most a few millimeters away from the chip.
For those training large language models, this can be a bit painful because the memory-to-compute ratio found on accelerators such as Nvidia's H100 or AMD's MI250X is fixed. To get the right amount of one (e.g., memory) may mean having to pay for more of the other than you actually need.
Celestial claims that if properly implemented, its Photonic Fabric can achieve enough bandwidth to not only support HBM3 over long distances but also ultimately pool memory between multiple accelerators.
Thus, perhaps this is the killer application that will not only make optical interconnects ubiquitous but also bring composable infrastructure into the mainstream.
Leave A Comment