×

gildings in this subreddit have paid for 15.53 weeks of server time

Private data gone public: Razer leaks 100,000+ gamers’ personal info by COMPUTER1313 in hardware

[–]Seth0x7DD 1 point2 points  (0 children)

Yeha that's a bummer. Just in case it wasn't a 100% clear though: Letting the Macro Keys press Ctrl+Alt+Shift+A+D and letting AHK react on that might be possible. For SC you'd need some tool that tells you the scancode for that key. Something like keyboard state view might be helpful.

Would be cool if there was less unnecessary software. What manufacturer is that keyboard from?

NVIDIA to Acquire Arm for $40 Billion, Creating World’s Premier Computing Company for the Age of AI by zyck_titan in hardware

[–]_nPCpps 72 points73 points 2 (0 children)

Considering how Nvidia handled:

  • FOSS drivers
  • Industry standards (proprietary Gsync v. Freesync, proprietary SLI technology, proprietary 30 series power delivery)
  • Segmentation of its product lines to forcefully (because of its monopoly on high perf GPUs) milk $ from customers (of particular note being the draconian licensing required to pass-through-connect your own video cards in VMs)

This is extremely bad news.

Nvidia will start jacking up $ for ARM CPUs ASAP while forcing tech bundles on ARM customers (you want the CPU => you have to also buy the GPU).

A lot of large industry companies are now faced with being vulnerable to Nvidia's Oracle-like business practices. AFAIK, even AMD has ARM cores embedded in its CPUs. Now a race will begin to drop ARM from all these places where it grew over the years to avoid exposure to Nvidia.

Finally, the focus on AI from this announcement serves only to highlight that Nvidia's top priorities for ARM are absolutely not aligned towards primarily evolving ARM as a general computing platform. Nvidia wants to embed ARM on its video cards to further push them towards being specialised computing units that provide very high performance in narrow scenarios. They want this because such products can be sold for very high markups.

This is the beginning of the end for ARM as a general computing architecture. It will be killed off over the next few years in the same way other successful European IT corporations have been killed off by US rivals via acquisitions (e.g. Nokia, Skype).

How are modern cpus and gpus designed? by CaramilkThief in hardware

[–]GTS81 431 points432 points  (0 children)

A good 18 years ago, I asked the very same question you did, probably in the same period of time when I was doing digital logic design in university. So you know those "advanced topics" in your lecture notes that your lecturer probably tells you not to worry about them except maybe 3a because hinthint, it's in the next quiz? Well, all those are the easy parts/ basics in real world CPU design.

I'll try to shine a light on your question. Here goes:

You start with a high level abstraction language like VHDL, Verilog or Systemverilog to describe the various blocks in your design. Known as RTL (Register Transfer Language), this is what the hardware designer code to realize functions like an adder, state machines, comparators, multiplexers, etc. Of course with the dominance of synchronous clocked design, this also means coding in the clocked storage elements known as flip flops and latches. You know those funky toggle flip flops and such they jam down your throat in school? Toss that aside. 99% of the time, it's a d-ff or d-latch. LOL. So yeah, you use the always_ff or always@ blocks to describe those stuff in RTL, and each standalone file (preferably) consists of a module. That module can then be called over and over again, connecting the known interface ports to different signals at the upper level.

Once you have the RTL coded, there are several ways to get from there to layout mask. If you look at an advanced CPU like say, Intel Core CPU, you will find more than one "design style" used to build the circuitry. Structured high performance arithmetic circuit e.g. very wide and fast adder would require a designer to read the RTL and draw the schematic in schematic-entry software like Cadence Virtuoso. Then a layout designer translates the schematic by placing the logic gates' layout on the floorplan and wire them up. Ok, time to take a detour to talk about standard cells.

While it is entirely possible to create a CPU from transistors (yeah just a bunch of channels with gates strapped over them), it is more efficient to build a layer of abstraction over individual transistors by stringing them together to form logic functions. So a team goes in and draws the layout of groups of transistors connected together to create AND/NAND/OR/DFF/DLAT functions and then make them into black boxes with the boundaries, internal blockages, and interface pins visible.

So back to design style. You know those L1/L2/L3 cache these companies will tout every new CPU? They are realized using SRAM and/or Register File circuits. These involves taking transistors to form bitcells which are then tiled to form bitslices then bitvectors, then banks, and of course there's the decoders, static latch, sense amp, and the stuff to form essentially a very compact structure for on-die memory.

Then there's also RTL2GDS/synthesis/APR where more and more blocks are turning to nowadays. Basically you run a synthesis tool like Synopsys Design Compiler/ Fusion Compiler/ Cadence Genus that does the following:

RTL analysis + elaboration = GTECH netlist (gate level representation at high level but no technology mapping)

Synthesis person then puts in a bunch of clock definition, interface timing constraints, timing exceptions, parasitic information, stdcell timing library, floorplan information (boundaries, blockages, IO locations, hard macro placement) and does a compilation. The objective is to meet 3 conflicting goals: power+performance+area (PPA). This is either very quick (because designer coded easy block/ very smart designer, thought about the corner cases) or very slow (manager breathes down neck angrily every day).

Then at the end of synthesis, a netlist is prepared and shipped off to APR or Automated Place and Route. APR is run on tools like Synopsys IC Compiler 2 or Cadence Innovus. Here, the netlist goes through 3 main steps:

Placement - although most synthesis tools does placement, in APR, the placement of the stdcells must be LEGAL. In a cell-based design, the floorplan is defined with row sites which are even tuples of the stdcell height. The stdcells must be placed on row so that the power rails that run through them are all connected. Power/ground rails for stdcells are drawn at the top and bottom edges. So the placer does a bunch of placement, coarse grain, fine grain, repeat; all the while retiming and resizing the entire design to meet PPA goals. When it's finally satisfied, it writes you a report and a new database that goes to...

Clock Tree Synthesis - In CTS, the ideal clocks described by the synthesis person must be physically built to reach all the flops/latches otherwise known as clock sinks. Due to electrical fanout, one will never be able to drive all 1000 clock sinks with a single clock buffer. So the CTS tool build buffer trees, splitting and merging them, all the while minimizing skew. Skew is the difference in arrival times of clock pulses at the sinks. Newer CTS tools utilize "useful skew" where delays are purposely added/ removed from a clock tree to allow for setup/ hold timing to be met (more about that in STA section below).

Routing - Finally, after the clock tree is build, it's time to wire-up the design. All the placed stdcells would need their pins connected according to the netlist. The auto-router uses some sort of blockage aware algorithm like Steiner routing to connect the stdcells using the metal layers available in the process. Advanced nodes can have > 10 routing layers, usually copper. Lower routing layers have tighter pitch i.e. the spacing+width. This allows for more wires to be available locally but it comes at the expense of higher resistance. Upper metal layers are much wider and can travel longer distances. The router needs to make sure every logical connectivity is satisfied physically by having no opens, no shorts, and as clean as possible to the process design rule checks (DRC).

At the end of routing, ideally we have a block that meets all the PPA goals, and will fabricate properly because like good design citizens, we have met every DRC by the process team.

I put a short comment in someone's else's comment about validation and verification. Here, I'd like to get your attention on 2 that circuit designers usually do:

First, it's static timing analysis or STA. The layout from RTL2GDS or hand drawn is extracted with tools like StarRC. The resulting parasitics file is fed into an STA tool like PrimeTime which then calculates the cell and net delays, string them in the paths they are connected to, and checks for setup and hold. If you ever want to set foot in a design team upon graduation, please learn what is setup and hold, and be very good at it. I have interesting setup/ hold questions for grads I interview. ;)

Then there's also layout checks. Basically it makes sure your active devices are placed in the correct places, you don't get weird latchups, wires are spaced properly, vias aren't to near to one another... thousands of those rules. Once you're clean or managed to get them waived (learn who your waiver czar is :)), then you can tell your manager, "I'm done!". He/ she then says "good job" and half a day later asks you to pick up 2 other blocks your friend is struggling with. Welcome to chip design.

AMD to introduce Zen3 on October 8, Radeon RX 6000 series on October 28 - VideoCardz.com by bphase in hardware

[–]RampageBC 85 points86 points  (0 children)

Needless to say, the release was dumb...
Whip-lash from the cancelled sales... and the refunds did come.
They asked us *snarl* "Be you Intels?"
And we said, "Nay... We are all on Zen"
Rock!

AMD to introduce Zen3 on October 8, Radeon RX 6000 series on October 28 - VideoCardz.com by bphase in hardware

[–]got-trunks 142 points143 points  (0 children)

Look into my eyes and it's easy to see
Zen+ was zen 2, and zen 2 was zen 3,
It was destiny.
Once every hundred-thousand years or so,
When the drivers get good tho releases are slow
Then the sales doth grow

Can someone explain GPU Architecture to me? (shader cores, tensor cores, fp32, int32, etc.) by Mentalitzz in hardware

[–]Qesa 91 points92 points  (0 children)

To do any work on any sort of computer, you need a few things:

  • What instruction you're doing (am I adding two numbers? Multiplying? Calculating a cosine? Multiplying matrices?)
  • The data that you're applying these instructions to
  • Hardware to physically execute the instruction. E.g. take the instruction code for add, the binary representation of 2 and 2, and give back the binary representation of 4.

The last part is called an arithmetic logic unit, or ALU. An ALU that operates on 32-bit floating point data is what nvidia calls a CUDA core. These are not like a CPU core however; CPU cores contain the instruction scheduling too, not to mention cache and a whole lot of other stuff like reordering instructions.

Instead, the analogous unit in a GPU is a streaming multiprocessor (SM) for Nvidia, or compute unit for AMD. Like with a CPU core, an SM contains an instruction scheduler, cache, and ALUs. Unlike a CPU core, it has many ALUs for a single instruction scheduler, and none of the stuff like reordering or speculative execution intended to make a single thread go quickly. Instead they issue the same instruction to many items of data to take advantage of the parallelism inherent to graphics. This is called single instruction multiple data or SIMD. By cutting out a lot of the complex machinery in a CPU, many more cores can fit in a GPU, but they are not as flexible and are very slow at doing a single task, which is compensated by doing tens of thousands of tasks in parallel.

In a SM, there are a number of different types of ALUs. Turing has int32, fp32, tensor, load/store, and SFU in various ratios. Int32 do whole numbers, fp32 do decimals, tensors do matrices, load/store load and store data from memory into the SM, and SFU do certain complex calculations such as trigonometry. Note that you can still do trigonometry and matrix calculations with regular fp32 cores, but they are much slower and require many consecutive instructions to complete, so these specialised cores are present to speed up certain tasks.

As mentioned above, only the fp32 ALUs are counted as CUDA cores. In Ampere, the int32 ALUs are instead capable of both integer and fp32 execution. As such they are now counted towards the CUDA core metric and the number doubles.

If playing a game you'll be using int32, fp32, LD/st and SFU cores. Tensor cores will also be used if you're using DLSS. If streaming, there is a dedicated encoder on the GPU that doesn't use any of the regular SMs to operate.

[Rumor]RTX 3080 to score 8600 and RTX 3090 score 10 000 in 3DMark Time Spy Extreme? by goodbadidontknow in hardware

[–]ImJacksLackOfBeetus 117 points118 points  (0 children)

Machine learning (ML) is the art of teaching your computer to do things without explicitly telling it HOW to do those things.

ML is also teaching your computer to do things that would usually need a human.

ML is also teaching your computer to do things with ease that you couldn't program it to do.

ML is pattern recognition.

If there's one thing I want you to take away from this it's:

PATTERN RECOGNITION

 

Which is, kinda unrelated, also a pretty good William Gibson novel.

In classical programming you EXPLICITLY tell your computer WHAT TO DO, not really WHAT YOU WANT.

IF x happens THEN do y, ELSE do z.

In ML, it's the other way around, you tell your computer WHAT YOU WANT, or rather WHAT PATTERNS YOU WANT IT TO FIGURE OUT (learn to differentiate between cats and dogs! = Figure out the rules and patterns that make a cat a cat and the rules and patterns that make a dog a dog) but you don't tell it HOW to do it. You let the ML algorithm figure that out on its own, it's kinda like telling a small child what you want it to do and then sitting back and watching it struggle, until it eventually figures out HOW to do it.

Classic example, teach a computer to differentiate between cat and dog pictures. How would you do it in a regular programming language? Write a billion IF - THEN - ELSE statements to account for thousands of dog and cat breeds, photographed from thousands of different angles under thousands of different lighting conditions?

Yeah, have fun.

That's where ML comes in. Train it on a couple thousand pictures (50:50 cat/dog) and you have a piece of software that figures out the rules itself, a software you can just throw any cat/dog picture at and it goes: yep, that's a cat. Heck, these days you can train a ML model in minutes to not only differentiate between cat/dog but also to tell you exactly what breed it is.

The same can be applied to all kinds of A or B problems. They are called classification problems, btw.

Let's say you're a doctor sitting on thousands of xray pictures and you know which ones are of cancer patients and which ones aren't. Feed that into your ML algorithm, let it figure out the patterns between cancer and not-cancer. This could be a great step within developing countries that might have enough nurses to push "start" on the xray machine, but not enough highly-qualified doctors with enough time to sift through all those images and correctly interpret the pictures.

ML could help with triage, run patient's pictures through the machine and put those that your machine determines to have high cancer probability at the top of the list to see an actual doctor.

Another problem set is regression, which basically means "guess the number". Let's say you want to sell your house and you want your machine to suggest a selling price. Feed your ML algorithm a thousand houses and as many parameters as you can: Selling price, number of bedrooms, square metres etc. Let the ML algorithm chew on that and figure out the pattern of what kind of house fetches what price. Then put in YOUR house parameters and it'll tell you: Based on what I've seen, your house will probably fetch this price.

Another application is clustering. Let's say you run an online shop and you have thousands of customers, but you don't know if there are actual groups of customers that you could easily advertise to. You know what everybody bought, you know where they're from due to the shipping adress, you know their age, gender, how much they spend, what items they've bought over their customer lifetime etc. Feed all that into your ML software and let it figure out demographic trends, groups and connections for you to inform your next advertising campaign.

These are all interpretative applications, but you can also throw generative problems at ML.

Feed a couple hundred or thousand classical songs into it and let it figure out what patterns and rules make a classical song, then use the resulting model to generate new, random classical music from scratch.

That's also how https://thispersondoesnotexist.com/ works. NVIDIA fed their machine a shit ton of pictures of people and told it: Figure out what is a person. What are the rules that make a face, what are the patterns.

Remember, pattern recognition on a human or, even better, beyond-human level is a huge part of ML.

And it did. And now it can generate you a photorealistic picture of a person that doesn't even exist, because it learned the rules and patterns. On its own. ML is approaching a rudimentary form of imagination at this point.

Deepfakes are another application. Basically you tell your machine "This video, this is what Steve Buscemi looks like. Figure out the rules that make 'Buscemi' what he is!" and "This is what my brother Frank looks like" and then "YOU figure out HOW to make my brother Frank's face conform to the 'Buscemi' rules you just figured out, facial expressions and mouth movements included!". And it will. It'll struggle for a while, but it will.

And that's one of the fun things about ML, the iterative nature of the process. It learns from its mistakes. You tell your machine "do it once!" and it'll do maybe ok, most likely it'll crap its pants. Then you tell it "alright, do it a hundred times. This is the target! Each time, look at how far you've missed the target. Next time, miss the target a little less!".

Congratulations, you've just learned what a loss function is.

Much of that ML work can be accelerated greatly by running it on GPUs because there is a lot of matrix math involved and GPUs are REALLY good at that. That's why ML practitioners are really interested in fast GPUs. Also, lots of VRAM. Having lots of VRAM means you can load a lot of your dataset that you want your machine to learn from at once onto the GPU, that's why /u/Quaxi_ was hoping for even more than 12GB.

There are all kinds of reasons to get into ML, if you just abstract the examples I've given to whatever interests YOU. Just think about it this way for a start: Do you have a lot of data and are there patterns you want to find in it? Then learning ML might be for you!

Some food for thought:

https://www.ubuntupit.com/top-20-best-machine-learning-applications-in-real-world/

https://ml-showcase.com/

And the best, most practically oriented course I've come across so far:

https://course.fast.ai/videos/?lesson=1

NVIDIA Stops Production of RTX 20 Series GPUs: Prices Expected to Increase in Coming Months by [deleted] in hardware

[–]velimak 6 points7 points  (0 children)

I've been out of the gpu market for about a decade, can you explain why the Turing cards are so disappointing and lackluster?

They aren't;

but hardware forums are filled with elitist nerds (I say that with love) that think they personally are some expert amalgamation of Manufacturer, Consumer, and Economist.

They believe they know a) how to run a business better than Nvidia b) know the chip manufacturing process and costs better than Nvidia c) know how to increase performance year-over-year better than Nvidia d) know marketing and sales better than Nvidia.

The 2000 series came with new software (Nvidia as an AI/Software company is arguably overtaking the hardware aspect) featuresets such as DLSS 2.0, Raytacing, RTX Voice, DirectX 12_2, and Tensor cores for AI workloads.

The 2000 series also came with a 30-50% increase in traditional Raster performance over the 1000 series.

These 2000 cards are the pinnacle cutting edge of Graphical Processing tech development and manufacturing ever seen on Earth, but /r/hardware would tell you they are 'disappointing and lackluster'.

This reception is not driven by performance or featureset, but driven wholly by price. Consumers have seen the GPU market prices drive continually upward. Main factors are; a) Lack of competition b) crypto mining c) outright manufacturing and R & D costs, climbing the cutting edge doesn't come free.

But the consumer base is divided into 2 camps; a) A vocal majority comprised largely of youth who can't afford top-end GPUs who complain loudly about the price on reddit and b) A silent minority of affluent consumers who just pay any price to buy the best-of-the-best even if that means shelling out $1200 - there are enough of these consumers for Nvidia to rightly justify their prices.

At the end of the day, many consumers just want awesome shit for cheap prices. I sympathize with that feeling, but they are woefully ignorant to the reality of the situation.

Many consumers are entitled, and they also refuse to accept the notion that you aren't meant to buy every new Product release. If you buy a 1080TI in 2016, you weren't the target demographic of 2080TI sales in 2018, but maybe you will be for the 3080TI - but 2080TI owners won't be. There is 0 expectation for consumers to drop $1200 on a GPU every 2 years.

There are people out there who actively get mad at Nvidia for offering Halo products that are outside of their personal affordable range, as if premium products and consumer choice are a personal affront against them.

For every 1080TI owner that was let down by the 2000 series, there is a 980 owner who had diamond pipe in his pants when he installed his 2080TI.

WHAT DO YOU MEAN the 2080TI WAS ONLY 50% FASTER THAN THE 1080TI? AND THE PRICE WENT UP? NVIDIA HAS TO PROVIDE 100% PERFORMANCE GAINS YEAR-OVER-YEAR AT HALF THE COST - FUCKING CORPORATE GREED MAN! /s

e: My personal situation. I own a GTX 1080 I got in March 2017. It's a beast of a card. When the 2000 series came out, I was satisfied with not upgrading. The only cards that beat it measurably are 2080 and 2080TI, and the value proposition isn't there. I'll be jumping on a 3080ti. Waiting and buying what makes sense for you, what a novel idea right?

Why cant AMD reach clock speeds as high as Intel does even though they are on 7nm already? by Skankhunt-XLII in hardware

[–]TimRobSD 477 points478 points  (0 children)

In modern processes after about 28nm, the problem has become the scaling of interconnect delay. These days transistors are much smaller than the wires that interconnect them, to the degree that it’s hard to get enough metal to interconnect all the transistors your could possibly fit in a fixed area (that’s also why the total number of metal layers is going up too). The transistors have an abundance of performance - a classic measure such as Ft (analogous to max switching frequency) is literally 100s of GHz, and has been for some time now.

BUT the wires that connect them have stalled in performance terms. Yes, they get denser, which helps drive overall density & parallelism, but denser wires mean thinner wires spaced at smaller intervals, both of which degrade wire performance. Thinner wires = higher resistance wires = more wire delay (the R in RC goes up). Denser wiring means more track-to-track capacitance, so C goes up too = even more delay.

So wire delay is dominating and transistor delay is less and less significant in overall path delay. Hence absolute frequency is mostly stalling unless you want to throw power at the problem & simply blast the transistors to shrink their portion of the delay down as much as you can. This is where you hit the diminishing returns of needing ever more power to buy smaller and smaller increments in frequency. Oh, and BTW power = heat & heat = more resistance = even longer wire delay!!

In conclusion, it’s harder & harder to reduce interconnect delay without radically new BEOL materials (cobalt wires, even lower k dielectrics/air gap dielectrics) but it’s easier to throw more transistors into more execution units running in parallel. This is where you’re getting CPU performance these days - more cores/threads, with more execution resources in parallel, with more and more cache to feed those hungry execution units with instructions & data.

Intel & AMD are both having the same problem but are attacking it in different ways. Intel is pushing very hard on clock speed because for various reasons they’re stuck on a lower density manufacturing process. AMD has added enormous amounts of cache and 2x more cores than we’ve been used to seeing but are running them a bit slower. Both end up bringing you amazing amounts of compute for your $$.

Linus reviews the Liqid Element LQD4500 SSD, with up to 29gb/sec speeds by bizude in hardware

[–]OverkillerSRB 37 points38 points  (0 children)

  1. Make clickbait titles and monetize clicks
  2. Make a video being apologetic about it and fix the titles of old videos that have already earned 90% of their earning potential, monetize clicks on that video as well
  3. Keep doing clickbait and repeat 2 as neccecary for bonus profit

What is "high static pressure" really? by lrenaud in hardware

[–]dito49 46 points47 points  (0 children)

Noctua has a comparison chart of their popular 120mm fans, showcasing the pressure-airflow performance and typical pressure 'requirements' of cooling components, to give numbers.

Straight From the horse's mouth: Huawei Chairman Guo Ping official speech following the recent US sanction by wengchunkn in hardware

[–]Cloud668 80 points81 points  (0 children)

I am so insanely bored in quarantine I translated the speech. Each paragraph roughly corresponds to a slide.

 

Welcome to the 2020 Huawei Global Analyst Summit, thank you for your support and attention.

 

Today, I intend to share what we’ve experienced this year and our prospects for the future. For the past year, despite inaccessibility to many important technologies, Huawei has maintained compliance with regulations, fulfilled our contractual obligations to both clients and suppliers, and continue to survive and progress.

 

Since May 16th last year, it has been a full year since Huawei entered the US BIS Entity List. After the initial panic, we focused on communication with clients and partners to maintain supply and most of them have been understanding. This is still an ongoing process.

 

Last month, my colleagues also released our 2019 yearly report. Our revenue figure has reached RMB¥858.8 billion (USD$120.8 billion). You can also see our response to entering the Entity List as a massive increase in R&D spending. We also significantly increased our inventory, bringing more pressure to logistics and risk management. The good news is: we’re still alive.

 

Our theme and focus for the last year have been “patching”. From rough estimates, we have invested over 15000 man-years on ICT continuity, produced more than 60 million lines of code, designed 1800+ SBC boards, and audited sales for over 16000 part numbers. These investments allowed us to survive. Our sales, supply chains, partnerships, and customer service were not disrupted. I would like to represent Huawei to sincerely thank our clients, partners, and everyone who has supported Huawei.

 

You may all know that two days ago the US Department of Commerce has installed new regulations and rules targeting Huawei. Our business will unavoidably be impacted. However, the past year of endurance has also made us resilient. We believe that we can quickly find the solutions. Also, we still cannot understand why the US government continues to suppress Huawei. What can this bring to the world? Huawei, since our inception, has always focused on bringing digital technology to more people, families, and organizations to foster global progress.

 

In the past 30 years, Huawei has taken digital technology out of its ivory tower and accelerated its globalization. We setup over 1500 networks for 170+ countries and regions around the globe, provided smart devices to over 600 million consumers. We have serviced over 3 billion users worldwide.

 

Huawei has persisted in building a diverse industry ecosystem. We have treated industry growth, mutual profit, and interest sharing as tenets to produce value for the industry and society with our clients and partners. For example, we have provided remote laboratories, technical training, and startup funding for over 3 million developers. Through industry organizations and business alliances we continue to bring shared success to the entire industry.

 

For many years, Huawei has actively participated in industry standards bodies to contribute to the development of standards. In connectivity, Huawei has firmly advocated for unified global standards. In cloud and computing, we’ve supported more openness and inclusivity in global organizations. We have continued to contribute to the industry through proposing standards, open source operating systems, open source databases, etc. Till now, Huawei owns more than 85000 patents granted, but we will not charge excessive fees, and certainly will not weaponize these patents.

 

Huawei has a responsibility to create new industries through technological innovation. For example, if naked-eye 3D is able to be realized, it can revolutionize visual experience to be applied in entertainment, medicine, education etc. At the same time naked-eye 3D can create billions of new product demands, leapfrogging industry development. We have already selected some areas to heavily invest into through cooperation with universities and research institutes for long-term basic research. We hope to solve key problems in industry development through these partnerships to bring about further innovation.

 

Over the last year, many changes have happened to the industry, showing us clearly that fragmenting standards and supply chains benefit no one. Further fragmentation will being severe repercussions to the entire industry and to the entire society.

 

Unifying standards is critical to industry development. Reviewing the past 20 years, since the era of wireless 2G, mainstream US suppliers and operators adopted separate standards. Because the US manufacturers had to fulfill varying requirements from different operators, their progress have been stalled. Today, there are no competitors with Huawei among the US wireless manufacturers. Why?

 

In contrast, Europe standardized GSM in the 2G era and brought it to the world. From GSM, UMTS, to LTE, Europe’s mobile operators have been business leaders in the global market. All the global mobile operators you can name are European, which has also allowed Europe’s manufacturers to remain competitive. This comparison tells us that standards must be unified for industry development.

 

The things that have happened to Huawei over the past year has also led many corporations and even nations to realize the risks of having a single supplier. By the end of 2019, the US Entity List has more than 1000 companies, with many of the newly added companies being technology-based. This has led countries and businesses to have extra concerns when selecting an ICT supplier. I recall an exchange with a head of state last year. He said, “I’ll build two clouds from different countries. As long as they don’t cause trouble at the same time, we’re in good shape.” I think many customers may have the same idea. Many corporations may follow Huawei in adopting a globalized and heterogenous supply strategy to secure business operations.

 

We have also noticed serious erosion of trust in global cooperation. We saw that the French President expressed concern in an interview regarding the security of Europe’s local data processing, affecting projects involving global partnerships. The US’s persistence in targeting technologically advanced corporations in other countries will likely also impact how much other nations trust American technology in the long term, exacerbating global industry conflicts. At the end, I believe it will harm the US’s own interests.

 

Despite setbacks, Huawei will never move towards isolationism. We will preserve to globalize. In the past 7 years, Huawei’s annual purchase volume grow at a compound rate of 27%. We have grown rapidly along with our suppliers. Our strategy to globalize and diversify will not change. Last year, we spent over USD$18.7 billion in the US. If the US government allows, we will continue to purchase American goods. Of course, we will also assist and nurture the growth of other suppliers to continue innovating and building a more competitive supply chain.

 

Today’s world has formed a global system of cooperation. I believe this system cannot be reversed and will not be reversed. We at the first step of a road to an intelligent world, and the ICT industry will have vastly more opportunities than competition. Huawei calls for the industry to work together to strengthen IP protection, preserve fairness in the marketplace, and ensure a supply chain with globally unified standards and cooperation.

 

We believe that in the next 30 years, mankind will enter a new era of connectivity and intelligence. ICT infrastructure is the foundation of a “smart” society. Artificial intelligence, the Internet of Things, 5G, and other new technologies will be integrated into every aspect of society. This progress will be the new source of economic growth, improving consumer experience, building smart cities, and pushing industries into digitization.

 

According to estimates, by 2025 digital economy will reach 23 trillion. Huawei remains confident in the future of the ICT industry. Huawei will continue to invest in connectivity, calculation, and terminals; We will continue to partner and growth the supply chain, standards, and talent pool to develop the entire industry with our clients, partners, and standards bodies.

 

In the next two days, as usual, we will be discussing in depth industry details, research trends, global partnerships etc. with our analysts and press partners.

Our CEO Ren often says, Huawei is like this punctured plane. The past year of “patching” has been our refrain and built us to be robust. With the support of our customers and partners, we believe this plane can continue to fly forward into the future. Thank you.

120mm AIO CPU coolers - why aren't they better? by Last_Jedi in hardware

[–]sk9592 9 points10 points  (0 children)

I don’t understand why when AMD made the decision to break cooler mounting compatibility from AM3 to AM4, why they didn’t get rid of that stupid bracket mounting system all together.

Back in the 90s it could be forgiven because we didn’t know any better. But in 2017, we had 20 years of evidence how terrible it was. On top of that, it was incredibly space inefficient compared to LGA 115X mounting at a time when SFF is becoming more popular.

It kinda felt like a slap on the face. They broke compatibility by moving the holes 2mm. Either maintain backwards compatibility, to break it and replace it with a superior design. AMD did neither.

TSMC has started development on its 2nm process by What_is_a_reddot in hardware

[–]dudemanguy301 349 points350 points  (0 children)

Computer chips are made of billions of transistors. A transistor is like an electrical switch, it is ON when electricity flows through and it if OFF when electricity isn't flowing through.

Transistors are made of a number of peices:

  • The Source which is where the electricity comes from
  • The Drain which is where the electricity goes to
  • The Channel which is what the electricity flows through from the Source to the Drain
  • The Gate which is what controls the flow of electricity in the Channel.

As transistors get smaller they can switch faster, take less power, and you can fit more of them in a given space, BUT it gets harder to stop electricity from flowing when the gate is supposed to be closed. If this gets bad enough electricity can leak through a closed gate. To prevent this the surface area that the gate makes contact with the channel needs to increase.

Years ago, transistors where Planar (flat), the gate sat on top of the channel making contact from the top.

Current Transistors are called Fin-FETs which as the name suggests they have a Fin that sticks up. The channel can now be covered from the top and both sides, giving much more surface for the gate to make contact with the channel and control the flow of electricity.

In the near future foundries will be moving to GAA-FETs designs. In this case GAA stands for Gate All Around, so as the name suggests the Gate will now surround the Channel from the top, bottom, and sides. The channel will be a tube that passes through the center of the gate, like putting a wire through a tunnel.

Looking for a scientific article I read about how processor performance hasn't actually improved much over the last 10 years if you consider single-core performance and how Moore's Law seems to have catch up with processors by Kelvets in hardware

[–]blandge 6 points7 points  (0 children)

The reason core performance at a given clock frequency hasn't improved much over the last 10 years or so has very little to do with Moore's law (or Dennard scaling as somebody eluded to), but instead has to do with the limits of improving instruction level parallelism.

There are three ways to increase core performance: 1. Increase clock frequency 2. Expand instruction set 3. Parallelize

You specifically asked about single core performance at the same clock, so let's throw out increasing clock frequency and throw out core-level parallelism, which equates to more cores.

Expanding the instruction set: One way to increase performance is by adding new instructions. An instruction set architecture (ISA) that includes only add and subtract instructions takes a long time to do complex math. Let's multiple 4 x 4 as an example:

An ISA consisting of only add and subtract instructions must perform the following instructions to add 4x2:

  • 4 + 4 = 8

  • 8 + 4 = 12

12 + 4 = 16

You can see in this example that multiplying 4 x 4 requires three instructions (three clock cycles) to reach a result. If the CPU pipeline included an arithmetic logic unit (ALU) that included a multiplier circuit, then the CPU could provide a multiply instruction that could compute 4 x 4 = 16 in one clock cycle. In this way, we've increased the performance of the CPU for multiplication by a substantial margin.

Much of the performance improvements within the core have been adding hardware circuits or logic blocks that enable new instructions for mathematical operations including: logical shift, multiplication, division, trigonometrical instructions (sine, cosine) , Encryption instructions (AES), etc.. We still perform a single operation per clock (although sometimes complex instructions take multiple clocks), but we do more overall work.

This ISA expansion is taken even further with very complex instructions that take many hundreds or thousands of clock cycles. This has given rise to two separate trends in computing:

  • Complex Instruction Set Architecture (Intel x86, AMD x64)
  • Reduced Instruction Set Architecture (ARM, MIPS)

Each has their own advantages and disadvantages. CISC tends to require more transistors (and use more power) and RISC requires more cycles to do complex instructions (and have lower performance). There are many, many books and papers debating this famous debate. I'd encourage you to look into it.

CISC architectures have added so many instructions to do every increasingly specific things, that the likelihood that a given workload can be improved by adding more transistors is smaller and smaller, and the added cost to die area and power are increasingly irrelevant to most workloads. We're reaching diminishing returns.

Parallelism: As I mentioned before, I won't speak about core-level parallelism, so let's talk about the other kinds of parallelism.

  • Instruction-Level Parallelism: Instruction-level parallelism (ILP) is really the crux of the slowdown. There are number of things you need to do to service a typical instruction:

  • Fetch the instruction

  • Decode the instruction

  • Perform the operation

  • Access the data being operated on

  • Write the resulting data back to memory

The first step in ILP is pipelining. A long time ago someone thought of the brilliant idea that the 5 steps outline above don’t have to be performed in series. Instead of doing all 5 steps for each instruction (which takes a long time) before moving on to the next one, you do them in parallel.

So you fetch instruction1(instr1), then, while you decode instr1 you fetch instr2. Then while you operate on instr 1 a decode instr2, you fetch intr3, and so on. In this way, you are in theory doing 5 times more work at a time. Your ILP speedup is 5x. In practice it’s not really 5x, because some of the stages take longer than 1/5 the time, but you get the point.

The problem with pipelining is two-fold: 1) there is a theoretical limit to how many instructions within a single program you can do in parallel because of something called “data hazards.” Data hazards are caused when the input of one instruction is dependent on the output of an earlier instruction, so it has to wait until that instruction finishes all 5 stages before it can perform the operation. For example:

  • instr1: A + B = C

  • instr2: C + D = E

Because instr2 uses the result of instr1, it must wait until instr1 writes the data back to memory (which is the last stage of the pipeline). This puts a insurpassible limit on the amount of data parallelism that is possible. There are ways we can mitigate data hazard penalties, but methods we use to mitigate these dependencies (register renaming) has diminishing returns.** We are reaching the point of diminishing returns.**

Another issues that limits ILP is branch prediction. Branch prediction occurs when the program code reachings a “branch”, or a decision point (e.g.: if user presses right arrow go right else if user presses left arrow go left). Using sophisticated methods, we can predict which branch the code will take. If we predict wrong, we have to pay a penalty, or delay the CPU execution, while we recover to the other branch.

Our branch predictors have gotten so good that the predict right well over 99% of the time. We are reaching diminishing returns.

There are other methods to challenges and optimizations (memory aliasing, multi-issue pipelines) for ILP, but suffice it to say, they are all reaching diminishing returns. We’ve optimized the core pipeline so much that it takes a massive investment in transistors to increase performance just a tiny bit, and even if we Moore’s Law and Dennard Scaling were still observed within the core, we still are still approaching the theoretical ILP limit that cannot be attained or surpassed.

Data-level Parallelism: One area where there is plenty of potential for parallelism is highly-parallel computations where you perform the same instruction over and over on a large set of data. This is what makes up image processing workloads such as used in video games, image detection AI, and many scientific workloads. Because there workloads are so parallelizable, you can increase performance though single-instruction multiple-data ISAs. This harkens back to point 2. (increase instruction set), but it’s utilized heavily in modern CPUs and even more so in GPUs.

There is a famous limit to parallelization called Amdahl’s Law, which essentially states that you can only parallelize a workload to the extent that its operations can be performed in parallel (they don’t have data dependencies, or serial sections). Because we usually offload highly parallel tasks to the GPU, this leaves CPUs to do workloads that have a low theoretical speedup. This means that the tasks give to the CPU have a bounded theoretical limit of data parallelization that doesn’t warrant offloading to the GPU, so in these cases, we are reaching the point of diminishing returns.

There are many other bottlenecks on a system that limit single core, fixed frequency performance. A big one that I didn’t talk about has to do with memory access latency. We’ll leave these discussions for another day.

tl;dr: The overall answer to your question is quite simple. There is a theoretical limit to much you can improve a single-threaded workload without increasing the clock frequency, and as you approach that limit you hit diminish returns. We’ve optimized our CPU cores so much that in almost every area, we cannot improve performance without an exponential increase in power and die area, and even if we could rely on Moore and Dennard to provide continuous improvements to power and die area, we’d still hit the theoretical limit very quickly.

Intel's 65w TDP i9-10900f actually uses 224w at stock by Naekyr in hardware

[–]Democrab 110 points111 points  (0 children)

Mate, we already had one really bad bushfire this year, we're not about to start importing more of them from Intel.