Amazon begins shipping Alexa’s Cloud AI to its own silicone

11/14/2020 USA

Amazon engineers discuss migrating 80 percent of Alexa’s workload to Infernia ASIC in this three-minute clip.

On Thursday, the Amazon AWS blogpost announced that the company has removed most of its cloud processing from the Nvidia GPU for its Alexa personal assistant and on its own Infernatia application specific integrated circuit (ASIC). Amazon god Sebastian Stormack describes the hardware design of the Inferentia as follows:

AWS Inferentia is a custom chip to accelerate machine learning interface workloads and optimize their cost. Each AWS inferentia chip has four neurons. Each neuroncore implements a high-performance systolic array matrix multiply engine, which greatly accelerates the typical deep learning operations such as convolutions and transformers. Neuroncore is also equipped with a wide-n-chip cache, which helps cut out external memory cess caches, dramatically reducing delays and increasing throughput.

When an Amazon customer uses an Alexa Personal Assistant, usually owned by Echo or Echo Dot, very little processing is done on the device. The workload for a specific Alexa request looks something like this:

One human speaking with Amazon Echo says: “Alexa, what is the special ingredient in Earl Gray tea?”
Echo uses its on-board processing to find the word wake – Alexa dete
Echo Amazon streams the request to data centers
Within the Amazon data center, voice is converted into stream streammen (reference AI workload)
Still in the data center, phonemen are converted into words (reference AI workload)
Words are assembled into phrases (reference AI workload)
Phrases are distilled in purpose (Reference AI workload)
Objective leads to proper fulfillment service, which responds as a JSON document
The JSON document is analyzed, including the text for Alexa’s reply
The text form of Alexa’s answer is converted to natural-sounding speech (Reference AI workload)
Natural speech audio is streamed back to the Echo device for playback: “It’s bergamot orange oil.”

As you can see, almost all the real work done to fulfill the Alexa request is done in the cloud – not just in units or Echo Dot devices. And most of that cloud work is done by conventional if not logic but conjecture – which is the side that provides the answer to neural network processing.

According to Stormack, the Nvidia GPU Shifting this estimated workload from hardware to Amazon’s own Infantia chip has resulted in a 30-percent lower cost on Alexa’s text-to-speech workloads and a 25 percent improvement from end to end. Amazon isn’t the only company to use the Infantia processor – the chip powers the Amazon AWS Inf1 example, which is available to the general public and competes with Amazon’s GPU-powered G4 models.

Amazon’s AWS Neuron software software development kit allows machine-learning developers to use Inferentia as a target for popular frameworks, including TenserFlow, Pytorch and MXNet.

List of images by Amazon

TricksFast News that Matters to You Most

Amazon begins shipping Alexa’s Cloud AI to its own silicone

Related

Related Articles

Amazon begins shipping Alexa’s Cloud AI to its own silicone

Related

Related Articles

Biden is reversing Trump’s policy and peacefully increasing Palestinian-Palestinian aid

Teenager arrested in videotaped attack on Asian couple in Tacoma, Washington

MLBA moves l-star game out of Atlanta in response to Georgia’s new voting law