How To Flashing tianhe star h920

keyword : How To Flashing tianhe star h920 for bootloop , How To Flashing tianhe star h920 for softbrick , How To Flashing tianhe star h920 for hardbrick , How To Flashing tianhe star h920 Error Camera , How To Flashing tianhe star h920 blank screen , How To Flashing tianhe star h920 lost password , How To Flashing tianhe star h920 stuck logo , How To Flashing tianhe star h920 new 2017. How To Flashing tianhe star h920 repair phone.
Download one of the above file:
How To Flashing tianhe star h920
>> i'm going to talk a little bit today aboutbig data and exascale computing. so two sort of hot topics in washington. andi'm going to start with a little bit of politics, but ipromise only one slide of politics, then i'm going to talk about science, talk a littlebit about systems, and hopefully get you thinking aboutthey thinks that you might do after this class with everything you've learned in this class.so last fall the white house announced something called the national strategiccomputing initiative. and it has a whole bunch of things on here that i'm not going to readto you, but one of the things it says is that the department of energy which runs lawrenceberkeley lab, as i think you know, where i
do -- spend a lot of my time, and both theoffice and science side, like this lab, and nsa whichruns labs like livermore and los alamos and sandia, will execute a joint program focusedon advanced simulation through a capable exascale computing system program. so, that's-- so the word capable is very important. it's about really useful exascale systemsas opposed to just cobbling together something totry to hit an exaflop of compluting performance. and i'll say a little bit more about thatlater. but the other thing that it says is there are five goals, at least in another-- another document, and i won't read all of these, it'sall about making, you know, the us very
competitive and things like that, and havinguseful systems that are very productive. but thefirst one is to create systems that can apply exaflops of computing power to exabytes ofdata. so it's -- and there's a lot of talk around this initiative that it should be aboutthe convergence between data and hpc. now, i kindof hate the terms data and hpc in this context. it's really about the convergenceof simulation and analysis. but i'm going to talk alittle bit more about what this idea of convergence between kind of data and -- and -- andhpc or simulation and -- sorry, observation, analysis and simulation on the other sidereally
mean. because we use computers for two differentthings. we use them very broadly. we use them to simulate things, theories thatwe're trying to understand. you have a set ofequations, you -- you discretize the equations, you run a simulation, you say, oh, we predictthat in a hundred years the climate is going to look like this, and -- but -- but we alsouse them for analysis. so we take a whole bunchof data and we use computing to try to figure out what are the patterns in the data. soscience has traditionally been focused on thesimulation side of things, and certainly the department of energy has focused almostentirely, it's computing program on simulation.
but do we have a lot of science problems?and data is another thing that we could look at. so what does data like in science? sothis is what data means in kind of the rest of theworld. as you know, data affects what we buy. ittells the people who run grocery stores to put diapers next to beer. it tells sportsteams who to hire, who to fire, and how to run theirteams with very detailed data about, you know, exactly the speed and the type of pitchingand what pitchers match up with what batters in various ways, things like that.and it tells us what kinds of -- how to manage ourfarms and -- and you know, what kinds of things
to plant and what works best and so on.okay. so if all these things are good for the rest of the world, what about science?so i did sort of a silly experiment a couple of years ago which was to say, well, let'ssay that i want to do find some science data, and i tooksome data from a neutrino experiment, now iknow very little about neutrino physics, but let's say that i just am wondering, well whatelse can i find that looks like this neutrino data?so i took my image data from my neutrino experiment, i stuck it in -- i stuck it intogoogle image search, and of course i get a bunch ofgraphics that look somewhat similar in terms
of they have lots of more blue than othercolors, and they have kind of squiggly lines, and have absolutely nothing at all to do withneutrinos. now, i could have done a better search, right? i could have stuck the wordneutrino into it, and i might have gotten something that had something to do with neutrinophysics, but the point is that the kinds of searches that you do, you know, i could gointo google and say that i'm looking for a greenshirt for a woman, size eight, and i'm going to get a whole bunch of green shirts for womenof size eight that i can buy either used or new,or whatever, in the area, i can pick it up today, i can have it shipped to me within24 hours,
all of that kind of stuff. in science it'snot like that. it's very hard for me to define the datathat's behind some of the publications that people have, and the experiments that theyrun, and the simulations that they run. and partof this is because people, science communities, and it's very different across different sciencecommunities, but some of them hold their data very close, and they say the data belongsto me, i'm going to publish as many papers asi can before i retire, and nobody else gets to see my data. and some fields are very open,so the climate modeling community has a big community,and the data gets put into a big
shared database, and lots and lots of peoplewrite papers on data that they did not collect, they did not run the simulations for, theyjust analyzed the data that was in the community. but part of it is that even if you get peopleto open up their data, and increasingly this ishappening partly because ostp, that very same organization that released that white houseinitiative announcement also has told people you have to make your data public if you'refunded by a federal agency such as nsf or doe. okay. so we can make the data public,but can we find anything? and it's still prettyhard. so what are the computing challenges inhere? well, it would be things like can you
search for scientific data on the web in thesame way that you search for, you know, that youcan do shopping and things like that. can youautomate metadata? so this is something i hear a lot from the people who work in thedata management field, as they say, oh, well theproblem is you can't find anything because nobody annotates their data. and that's absolutelytrue, that you need to have people who mark their data with information about whenit was collected, who collected, what was theexperiment, what was the instrument that was used, and things like that. but it's alsothe
case that the reason that search works ona search system like it and the internet and google is because some of the data gets automaticallyannotated, right? and so the question is can we automatically annotateif we have some amount of annotation that's done manually, can we infer annotations onother scientific data? can -- can we identify thefeatures and allow people to say, oh, hey, this looks like genomic data, and by the way,this looks like it's probably human genome data,and by the way if you listened to one of our faculty candidates talk earlier, i can tellyou not only that it's human genome data, but thelast name of the person that it came from
because i can find other data on the web that'scompletely public and line it up carefully enough that i can figure out who it belongsto. but anyway, for scientific data weed like to beable to do feature identification. and this datacan be from images, it can be -- come from genomes, it can come from simulations, comefrom other experimental devices and so on. okay. so that's problem number one is transformthe way we -- we -- we operate as scientists by making it possible to use otherpeople's scientific data easily. okay the secondthing is to think about the big experiment. so the department of energy is really knownfor
running big huge experimental facilities.one of them is the advanced light source up at thelab. so how many of you have visited the lab? i don't remember, did you take a tour upthere? okay. so you probably didn't make it very far up the hill, but if you -- if youlook at the lab from a distance, there's this bigdomed building, and that's the advanced light source. and there's a teeny, tiny pictureof what the light source looks like. it's not a ring,and there's photons that fly around the ring, and then these beam lines that -- that comeoff of it, and each one of those is like an experimentalend station where different kinds of
experiments are run. and the way the scientistsuse the advanced light source today is they book a room at the -- at the guest house atthe lab, and they say i'm coming at this particular week, they get an allocation of time in thebeam line, for each one of the beam lines, andthey fly in, they bring their disk drive. if they are not running a very big experimentthey bring their memory stick. if they are runningsomething bigger they may be bringing a little disk array or something like that. they runtheir experiment, they take their data off the endof the beam line, they fly home to their home institution, analyze the data, write theirpublication, and the data gets thrown away.
now, they may not actually throw away theirdata, but if you're like me, and i remember years ago when i was a junior professor andi was trying to figure out what do i do withall this paper that people keep sending me? theydon't send me very much paper anymore, but they used to. and it's kind of the sameproblem with simulation -- with data. and a very wise senior member of the department,beresford parlett, said to me, well -- i said, how do you keep your office so neat? and hesaid, well i figured that with each one of these memos i get i can either throw it away,in which case it's gone forever, or i can -- ican file it in which case it's gone forever
because i'llnever remember where i put it, so you might as well throw it away. and that's really thepoint with scientific data, not that intentionally throw it away, but you can never really findit again. and there's a great story in sciencethat was written a few years ago, but somebody who did ground breaking science by reconstructingdata from an experiment that had been shut down ten years earlier, and they hadto read it off of the graphs that were in thepublications and things like that because they couldn't find the data.okay. so what do we want to have the -- the future of one of these light sources looklike rather than you fly in and use it, you
actually maybe just send your data to theadvanced light source in some kind of a, you know,a packing container, and there's robots on the onthe other hand that run the experiments, and it turns out that actually one of the beamlines at the advanced light source already usesrobots and already does this. it's a -- a beam linethat's run for the pharmaceutical companies, and in fact, it was used for the ebola, thetreatment that -- that was developed for ebola. so they -- they send it to the advanced lightsource and then they do some imaging on it and then send the results back. but what we'dlike to do is have that -- that experiment
happen, the data automatically flows fromthe experiment to, say, a super computing centersomeplace that has a whole bunch of storage in it and has the compute necessary to analyzeit where you can do some simulations and compare it to the analysis and things likethat. you have some kind of a -- a gateway, andthere's -- this is something called the spot suite which -- which was built up at lblspecifically looking for data that's coming out of a couple of the beam lines. and therewas an example of one of the scientists who wason a train in europe. he was looking at his smart phone, and he said i can see the experiment,and yes, it looks like it's a good
experiment. so this is not where you do yourlarge scale data analysis, but you're trying tojust figure out is the beam line actually pointed at the cockroach, or is it just pointedat some other piece of thing, you know, data, or someother arbitrary thing, in which case the experiment isn't going to give you anythinguseful. so you get some realtime feedback, andthen you do some offline analysis. and the data, because it's stored in a place whereyou can serve it to the rest of the communitythrough, say, web services and things like that, canthen be used by other people to write other
papers. so you don't get just one paper outof it, you get lots and lots of papers out ofit. so here's a real example of this that we didabout a year and a half ago. this was a demonstration for the super computing conference.so at the advanced light source they were trying to figure out -- so the groupwas trying to figure out how to develop organic photovoltaic materials that could be manufacturedat large scale. so they had this great material, was a very efficient photovoltaicmaterial, but it turned out when they tried to doproduce it at large scale, so they're kind of -- think of it as paint and they are paintingit on
these things. it wasn't working as it scaledup in terms of how it was -- how it was drying. so literally what they're doing at the lightsource is they're watching paint dry. they takethe photovoltaic printer up to the light source, they put it next to it, and they start printingthe photovoltaics, they shoot the gisaxs beam line on it which is a grazing incidence small angle x-ray scattering experiment device.so you're sort of shooting this x-ray at it, and thenit's scattering around, and you're collecting the scattering patterns from it. they takethe data, they send it out over esnet -- oh bythe way, you know, robots might be sitting
heretoo to help with that, although in this case it was done manually -- you send it over thewide area network which is esnet and the doe labsto the super computing facility. it was analyzed with spot suite. you can look atthe images in it that i talked about earlier. bunchof applied math problems that come up, and there's a center called camera which isspecifically looking at the applied math research questions that come up in a lot of thesetypes of beam -- these kind of experimental facilities. and then they sent the data tooak ridge national lab where they had the titansuper computer which is even faster, has lots
of gpus on it, and the titan super computerthey ran basically a large set of simulations totry to explain what the data was that they were seeing. so the idea here is that you're-- you see this scattering pattern, you're tryingto figure out what it is, and so trying to interpretthe data. and one of the ways that you could do that is you can say, well let me run asimulation of if i had this material with this crystal structure, and i shot an x-raybeam at it, it would produce this -- this scattering pattern.so you run a simulation of what the material might look like. and you might run thousandsof those, and -- and then you try to figure
outwhich one most closely match it is data that you saw in the experiment.so this is a common theme that i'll talk about in other science areas of puttingtogether observational data with simulation data to try to help interpret the simulationdata. it turned out in that particular point theyhad a slight miscalculation. they had dedicated theentire titan system to -- for a single day, or at least for 8 hours of run time, and theywere off by a factor of fifty, so they actually neededabout an exaflop of data for that calculation. wethink they could do a much smarter optimization
in order to figure out exactly whichsimulations to run once they've run some of them. so that's one of the interesting mathproblems that comes up in this. so in terms of computing challenges, you know,robotics, special purpose architectures, they put things like we'relooking at neuromorphic chips at the beam line,because it turns out it's an image analysis problem which neuromorphic chips may be prettygood at, fpgas, things like that, for kind of the front end processing which is justfor filtering data, figuring out what data tosave, what data to throw away, can you compress itin some way. and there's, as i said, lots
of mathematics, algorithmic questions forboth the realtime analysis and for the offline analysisthat happens later. you might do massive numbers of simulations in this sort of inverseproblem, and the network and the software infrastructure for all the data movement andmanagement is also -- there's also lots of interesting questions there.okay. example -- next example is from climate and biology. and i have to say when ifirst heard this one, i heard it 2 or 3 times before i thought i could even believe it becauseit just sounded so -- so ambitious and maybea little crazy that you could actually do this. butwhat they're trying to do is to understand
-- to put information about the biologicalmake-up of all of the microbes in a community in,say, a watershed area and figure out how that affects the climate. so here would be a kindof question you might ask is if the climate warms by two degrees will this biologicalcommunity thrive or will it die? will the -- whatever community lives there absorb carbon or will it release carbon? okay. so that'sthe -- those are sort of biological environmental questions that you're trying to ask, but youalso might want to then put the information about that biology into the climate models.and these are images from, this is where you're collecting data from the watershed, you'retrying to figure out what the genomes are
in it, you actually may want to go all theway down to the genome structures, or you mayuse kind of coarser information about the biological make-up they -- of the communitiesthat live in it, and this is a picture of a climate modeling, and i think you probablysaw examples of this in michael weiner's talk inthe climate code, but this is trying to put these two fairly different fields togetherto say can you actually make the climate models adaptto the information you have about the biology in a particular regional community.okay. this one is a -- a multimodal analysis problem. some soft people i've talked toabout this want to look at can you tell from
flying over a region and taking imagessomething about the -- the -- whether that -- that particular area has a particular kindof microbial community in it or something elseabout the environment from the images. so you might match that up with other data thatyou've got on -- with very fine detailed censored data that you've collected from aparticular community can you then infer from images exactly what is going on in the environmentin an area. and then trying to put this together with the high performance computingmodels, and actually drive the simulations with the data that you collect from the environment.so the next example, actually also comes from the advanced light source, but it tiestogether with a project that i think you didn't
hear about this year called the materialsgenome. you did hear about it? okay. so you've heard about -- about the materials projector the materials genome from kristen persson which is, as you heard, was doing 10s ofthousands of simulations of a particular category of material to try to understand somethinglike what is a new material that you might use for batteries. you also study things likethat at the advanced light source. and i thinkin the interest of time i won't go into more details,then, since i've talked about both of those and you've heard a little bit more about it.but one of the interesting questions is on this-- this database of -- of materials from simulations
can you actually do machine learning to figureout what -- what functional characteristics areassociated with which atomic structures in the materials and come up with some kind ofpatterns that you can use to then engineer materials. and there's lots of analysis problemsthat come up with the -- the images and other kinds of data that comes out of the lightsources as well. one of the system level problems that comes up with these sorts ofexperiments on this side, where you're doing an experiment with something like theadvanced light source is the data comes out at a certain time when the experiment isrunning and the -- some of the computing at least has to happen immediately because youhave too much data to store all of it, and
the question is also whether you -- can -- canyou schedule jobs in realtime on a system likenersc, assuming you can get the data across thenetwork, then can you schedule a job so that you can actually do some of the analysis onthe fly. that doesn't mix very well with the traditionalhpc model of how you schedule the systems which is very much of a batch orientatedqueue of jobs, as you know, your jobs can sit in queues for hours or even days if they'revery large jobs, and so that kind of realtime requirement isn't a very good fit for thattype of computation. so the -- the next example, the last timei -- is from genomics, and last time i talked
ithink i talked about a little bit this genome assembly problem which is trying to reconstructgenomes from a bunch of fragments that are read from the sequencers, and then there wasthis even harder problem which is metagenome assembly where you have, for example, inthat scoop of water that you scoop up in a -- kind of in an environment, you're lookingat a whole bunch of microbial communities, andin that case you -- the problem is even harder than the normal assembly problem because someof the microbes will occur at very high frequency, and some will have very low percent,you know, percentages of -- of representation in a particular sample. soit's easier to assemble the ones that occur
withhigh frequency, but how do you get at the ones that are -- that are very low frequency,because they might just look like errors or noise in the data that you -- that you collectedoverall. so there's a new assembly technique that we're working onto actually take thesethings apart and even try to get to the low -- the things that are in a low representationin the sample. so what do you -- why do you careabout these metagenome samples? and these are a bunch of examples from the jointgenome institute of why they look at assembled metagenomes. and i'll just givemaybe two examples. one of them is actually discovering new forms of life. so it turnsout that there are forms of life that we don't
necessarily know about and their existence,because we -- we don't really see them, they're just -- and they're just a microbe, and sothat -- that when you put these sequences together,if you can get enough confidence that you've got a new -- new genome sequence that is notsomething that has been seen before, then you can actually discover new forms of life.so kind of a fundamental science question. butyou also, then, what you're trying to do from more of an engineering or applications perspectiveis figure out can you use a particular set of microbes to produce something that youcare about? so, for example, one of the problems we have today is antibiotics, thatwe don't we actually don't have a very large
-- itseems like we have a bunch of antibiotics, but it turns out that the market doesn't reallyencourage drug manufactures to develop more antibiotics because when you get a newone, the doctors don't want to prescribe it because they -- if they prescribe it too muchthen people will -- it will become resistant toit, the infections will become resistant to it, and so itturns out you don't make very much money off of a new antibiotic. and you can matterantibiotics with these different microbial communities. so one idea is to be able todo manufacturing like this by studying a wholebunch of microbes, and then being able to
putthem together, the information together to manufacture what you want. these algorithms,as i mentioned before in my -- when i talked about upc and upc++, the pgas models, youhave these big distributed memory graph structures which are represented as hash tables,so there are fairly different kinds of computational load than most of the sorts of -- ofpatterns that you saw in this class when you were looking at sources of parallelism. forexample, so having low latency interconnects and low overhead communication are reallyimportant for being able to walk around graphs in a very large scale graph structure, andthese algorithms that i talked about specifically look at the assembly of these complexgenomes or sets of genomes. and then they
also want to do comparisons across databases,so you're trying to figure out what is the functional behavior of a particular gene,and to do that you look, you compare that to -- to existingdatabases of genes. i think that julian borrill was here -- isthere a question? >> so do we have to upgrade homework three?>> upgrade homework three. was that the genome assembly one?>> yes, right. is it too easy? >> you mean... i don't know. you can ask thestudents if it was too easy. one piece of the genome assembly problem thatyou did in homework three, that's right, i forgot that you've all been, hopefully, workedhard on that, to get that done, so you were
working on the contig generation problem whichis where you're walking along the graph, right, to make longer and longer pieces, andso actually what you may want to do in a metagenome case is have multiple graphs thatare being constructed at the same time. that's one way of thinking about it. or youactually run the algorithm over and over again with different values of k, and in addition,doing some other kind of analysis. so, yes, wecan certainly make the problem harder any time you're ready for that. i know that julianborrill was here on tuesday talking about cosmology. this is a different kind of aspectof cosmology. that's a picture of saul perlmutter,and who's doing -- looking at super nova,
whereas julian was mostly talk about cosmicmicrowave background. the point here is that people use this image data and have used thatto make major discoveries, in this case, also about the expansion of the universe. the problemthat i think julian may have also mentioned that peter nugent who works moreon this -- this super nova detection and then analysis problem is what they call the systemicbias problem. so they have gotten to the point where they use machine learning algorithmsto automatically find super novas, so exploding stars in the images from telescopesat night, and there's actually pagers or logical pagers that go off if it looks like there'sa super nova, because you want to redirect thetelescopes to capture a -- a -- an image or
a video of that super nova as it's exploding,because it doesn't take that long for it to explode, and so if you can catch it earlyon, then you can get really good image data, and thenunderstand more about actually the process by which super novas explode. and, but thenthe other problem is what -- the systemic biasproblem which is, for example, the instruments, the telescopes in the northern hemispherehave a certain bias in them that are different than the telescopes in the southernhemisphere. they want to put the two datasets together so they can try to remove the bias.and when you're trying to look for little tiny signals, like exactly how far apart arethese two
super nova because you're trying to measurethe constants, the cosmological constants inthe expansion of the universe, those kinds of -- of very fine details really matter.so there's, you know, better machine learning algorithms, removing the systemic biasthat i just mentioned in this, and then in both of these cases what they also do is torun simulations. so, you know, in -- i think itis a difference between big data problems thatcome up in a commercial setting, and to some extent medicine, although i think this isalso changing the little bit in medicine, whereyou may not care why it is that you put beers
nextto -- beer next to diapers in the grocery store, you only care that you sell more of-- you sell more beer if you put it near the diapers,right? so it -- you don't care about the human process or whatever is going on in the human'smind that causes that to be the case. but ifyou're a scientist, and we say, oh, weave realized that there's a certain pattern inthis cosmological data, your next question is well,what is the theory that explains that? and sothe simulations are taking a theory that you've proposed, and then showing what it does,you know, in a simulation to try to match
it up with the data. so you can't really divorcethe simulation from the experiment because the-- the simulation is helping you to interpret thedata. the brain, another national initiative which is looking at, you know, truly reallytrying to understand the behavior of the brain. lotsof really interesting data that comes up in thiscase. there's a couple of labs working on this, so lawrence berkeley lab and argonnenational lab are two of the ones that are involved. i think there's a larger set aswell, but one simple way of thinking about the differenttypes of dataset is there's the dead brain
problem and the live brain problem. so inthe dead brain problem you're looking at the structure of the brain. you may be slicingan up a brain to try to figure out what you coulddetermine about the connectivity and the structure. in the live brain problem you're tryingto figure out dynamics, so they're looking at things like mouse brains, trying to figureout, trying to use sensors to detective exactlywhat's going on in the neural activity in the brain,and then trying to associate that, the regions with different functional behavior. so thoseare two different aspects of the problem. there's lots of data, in this case, that theyare
collecting that the europeans had a big -- their-- their big push in the brain was actually insimulation, was somewhat criticized for doing too much simulation before you understoodthe data. i think the program in the us, the brain initiative here is looking more atcombining the data possibly with simulation, as i said, to help us once we understand moreabout the data, then maybe to build models of what's happening, and then to use thosefor simulations. multimodal analysis is one ofthe big problems that comes up here, so you've got different datasets from things like mrisor electron microscopes or ct scans, or whatever, mass spec imaging, and you're tryingto somehow put this data together in a way
that allows you to understand something aboutthe functional behavior of different parts ofthe brain. i think i'll skip -- you've seen the genome one, and i think i also mentionedbefore the -- this problem of doing sort of matrixreconstruction when you're trying to put together seismic data and -- and compare it against,and in the middle of a simulation. so this is where some things that -- that -- that kindof affect the way you think about programming, the language in which you are programmingthem, the ability to read or write into a largedistributed data structure do come up in these data analysis problems more than they comeup, i would say in the simulation problems.
so the last class of applications that i'lljust -- just mention here, mostly at the lab we've been thinking about the large experiments.as i said, doe runs a lot of these big experimental facilities whether it's telescopesor satellites with telescopes or whatever, andthe genome -- the genome sequencers that they use for environment, analyzing the biologyin the environment and coming up with biological remediation techniques as well as thingslike the advanced light source and particle accelerators at places like cern. but thekind of emerging area in energy and science is alsolooking at sensors out there in the wild, if youwill, so sensors, for example, on our smart
phones that tell us how fast we're movingin traffic. it looks like a google map image,it's actually not from google map. this is fromalex bayen's project, connected corridors here at uc berkeley. he's also involved withthe energy technologies area at the lab. and lookingat, for example, what are the energy impact obvious different kinds of mode oftransportation. so i heard a really interesting talkby somebody from lyft saying that, i guess, eighty percent of the people that moved intosan francisco within the last couple of years do not own a car at all. and, you know, thismodel maybe -- i heard somebody else, horst
simon whose slides i'm going to use a littlebit later saying, why would anybody give up theircar? i love driving. driving is great, but i thinknot everybody agrees with that, and there is sort of a, you know, a claim that maybebabies born today will never learn how to drive.so you imagine a world in which the cars are allsitting parked out in who knows where, and that every morning when you're getting readyto leave the car comes and picks you up at your house so that you don't need to worryabout, you know, parking in a congested area, and -- and they're all in fleets and stufflike
that so the highway systems are much moreefficient, but the interesting thing is what --what -- what companies may optimize for in that kind of a model and what the governmentmay want to optimize for. for example, reducing climate impacts may not be the same.right? so there's commercial interests, there's vital interests, there's government policyinterest, there's cost of infrastructure, all of these different things, and so usingsome kind of a model and analysis of these, the transportationsystem is what -- is what you're looking at. but in that case the data is not comingfrom an experiment, it's coming from observation. it was -- i -- by the way, ididn't understand this until i talked to alex
a bit aboutit. i didn't -- i thought that what we're seeing in things like google maps or othermodels like this, waze and so on, is -- is observationaldata. it's actually simulations of the kind of fluiddynamics, if you will, of the transportation system that is -- that is driven by theobservational data that they're seeing. so it's not just reporting data, kind of rawdata, it's actually then matching that with kind of asimulation model. because the way traffic willbehave in terms of the way congestion forms, they can get much more accurate picture ofwhat the traffic is going to look like if
they fit it, if they actually use a modelas well. power grid is another area where this comes up,and part of -- in both of these cases one of thequestions is can you understand and then even influence human behavior. so this is lookingat how people use electricity in different pricing models, and also different times ofday, and different temperatures, and then trying tofigure out exactly what -- what causes them tochange their use of electricity depending on the price. how high do you have to raisethe price before they will notice and actuallychange -- change behavior and things like
that?okay. so, that's kind of mostly about the -- the science. the examples that i'm goingto talk about, i'm going to switch now and mostly start talking on a lower level on moreof the -- maybe more of the computer scienceissues and some of the things in the underlying systems, but first just to mention that sciencedata is big and it's growing, and it's growing for various specific reasons that -- thati can -- we can point to. this slide just kind ofsummarizes some of the examples that i've talked about before, but shows that they havethe -- the four vs, if you will, of big data, which i think came -- originally came fromibm, so
velocity, volume, variety and veracity. andwhether you look at, and veracity is looking atkind of how noisy is the data and things like that, and, you know, whether you're talkingabout cosmology or materials science or climate or these light sources which are used inmaterials as well as biology and chemistry, and high energy physics and biology, all ofthese have data rates that are growing exponentially.now how fast are they? you know, lots of things are growing exponentially. what's theconstant? so this was a graph i put together afew years ago, and everybody likes it because it's so simple, but i have to tell you whatit
actually is, it's really just looking at averagegrowth rates, extrapolated over time, kind ofaveraged out over times, a -- a -- an average growth rate, and then just plotting thosegrowth rates on a simple excel plot so that we can see the difference between theexponents, and so you all know that a different exponent makes a big difference in anexponential growth, but this kind of makes it more graphical. if you look at, you know,processor speed down here, so our favorite moore's law, at least until it started slowingdown, but this is just kind of all smoothed out so you can't tell, so this is processorperformance here. actually, i think that is -- that's actually moore's law, so it's probablytransistor density, and this curve up here
is sequencers. so genome sequencers, whichgot another boost recently with the -- another,a lumo technology. so they're -- they're continuing to get higher and higher rates.and then detectors, so what does that mean? detectors are all these different kinds ofdevices. you could think of them as the ccds in ourcameras, but there are other kind of detectors that are used. and actually at the lab theyhave developed some of these very high speed, very dense detectors which have beenincreasing the -- the data rates coming out of the detectors have been increasing fasterthan even the processor speed or even then thegenome sequencers, at least according to the
--the historical trend numbers. and -- and of course, you could say, well i don't reallycare about processor performance. i care aboutthe speed of data, because these are all data intensive problems, but poor memory is downhere at the bottom of the -- of the growth chart. okay. so where do these kinds of bigdata rates come from? this is just an example of something called a -- a stem detector whichis -- was developed by -- by peter dennis, andit's producing a data rate of a hundred thousand frames per seconds in this pixilateddetector. so what does that mean from a data volume -- or sorry, data velocity standpoint?this -- this little graph over here, maybe
it's a little bit small to see, but what you'relooking at for different technologies, the -- thethousands of trams per second on the y axis, sokiloframes per second, and the number of pixels that you're getting out of it, and so there'stwo different -- so the -- the red squares are giving you the kiloframes per second.the blue dots are telling you the number of giga bitsper second of network bandwidth that you need to get off of the detector. and so with thisnew detector that they're building, they actually went and got brocade to send them specialnetworking gear so that they could actually getthe data off of -- and i believe this was
the electron, the cryoem, electron microscopefacility that they were building with this new detectorso they could send the data then to nersc rather than just having to figure out, youknow, how many memory sticks they would need to plug into something or other to be ableto store the data. so -- so these things, becauseof the -- both the density and the data rate of these, the -- the network, the data raterequirements are really going up dramatically as well.okay. let's talk a little bit about algorithmic convergence, and i'll say a little bit lessabout this probably than i -- i was planning to. i gave you some examples on some of thescience examples that i talked about of the
different algorithms that come up. but the-- the thing that i'll talk about here is i wentinto this sort of a few years ago assuming that big data algorithms were completely differentthan simulation algorithms because that was really the conventional wisdom, right? that,oh, we need a different kind of machine, we need different kinds of algorithms, it's allabout graphs, it's all about something completely different. so a couple years ago the nationalacademies did a report, i think it was actually cochaired by our own mike jordan. and theylooked at what are the computational kernels that come up in big data analysis problems,and they called them the seven giants of data.
that's because they knew about phil colella'sseven dwarfs of simulation modeling simulation. so there's the original sevendwarfs. you may have heard of the 13 motifs whichcame up from the parlab project which was expanded somewhat. in fact, i think some ofthose things in the 13 show up in the list on the left, but so these were the originalseven things that come up in simulation. so thesethings should all look familiar to you now because you've heard about all -- i thinkall of them throughout the course, so particle methods, both structured and unstructuredmeshes, sparse and dense linear algebra, you know, spectral methods, so ffts, things likethat, so those are really the building blocks
of alot of simulation codes. and -- and so if you could figure out how to make all thesethings go fast we can make -- if we can figure out howto make a lot of simulation problems go fast. so how different are the seven giants of data?well, first of all, they had to collapse linear algebra into one because they wanteda list of seven, and so, you know, they collapsed dense linear algebra and sparselinear algebra, and they do use both of them, right? so if you look at something like traininga convolutional neural net, it looks like densematrix-matrix multiply, i think that using it afterwards is more of a dense matrix vectormultiply, but dense linear algebra comes up.
i'm -- i'm working on another machine learningalgorithm which is using a large dense -- sparse dense matrix multiply, so it's a combinationof sparse linear algebra and dense linear algebra. i mean, the interesting thing, andi would say the difference between a lot of the dataanalysis problems and the simulation problem social security there's less structure inthe sparse matrices and the analysis problems, because you don't have any physical domainthat you're starting with that provides you withsome structure, right? if you have a nearest neighbor computation on a two-dimensionalgrid, a rectangular grid, so very simple case, you can figure out what that looks like. asa
sparse matrix it looks like a kind of diagonalmatrix. but if you take that same -- and so evenif you make it an unstructured grid, okay, you don't exactly have a penta diagonal, butyou're going to have something that has a similar sort of structure to it if it hasa two-dimensional sort of mesh-like structure.and whereas in the case of the data analysis, you may have have unstructured looking datato start out with, because your problem is trying to figure out where the structure is,and in fact, and what -- what this machine learning algorithm is doing is a thresholdingoperation which i think comes up in many of these where you're doing an operation thatis maybe close to or even is a dense matrix
multiply, and you're trying to figure outwhich values are small enough that i can treat themas 0. but computationally it's still a dense problem, and then it's, afterwards becomesa sparse representation of the -- of the problem.but what was a surprise to me in this list wasactually how much similarity there was between the two, and there are certainly things on,you know, each list that doesn't exactly correspond to something on the other side. but itturns out that a lot of these data analysis algorithms do turn into sparse and dense linearalgebra, even in graph theory. and i think you had aydin. did aydin come and talk aboutgraph theory? you can think of a lot of graph
algorithms as being a sparse matrix algorithmas well. so, there's a lot more similarity, i would say, than differences between thedifferent algorithms.okay. so the next set of questions is about software convergence, and so howsimilar are the software stacks. and there's a lot of discussion about the fact that thesesoftware stacks are quite different. and in fact, the national strategic computing initiative, few go back to that was really looking attrying to encourage some convergence between the two. and so why is that? because if theytell the department of energy, go off and buildexascale computers, and they say but those
exascale computers are only good for thedepartment of energy, it's not a very good investment for the government to make. theywant to make sure that those computers, those computer systems, would actually be goodfor things other than -- other than modeling simulation. that they might be good for dataanalysis problems that people in industry care about. but the -- sort of the next leveldown from the algorithm, so we've said, well, there'ssome algorithmic similarity, so is the software different? and the answer is softwareis actually fairly different between the two. so the first difference, and i think i useda version of this slide before, but the first differencegoes back to what i talked about when you're
-- when you're really talking about analytics,where you're trying to find structure in some unstructured data, you don't have the abilityto chop it up in very meaningful piece that isminimize, right, the number of edge cuts across the -- the different partitions. so what you'veheard about in modeling simulation is you take a physical domain, you put a mesh onit, you chop it up into -- into pieces, domain decomposition, you give each processor, forexample, one of those pieces, and you're -- the name of the game in that graph partitioningproblem is to give each processor an equal piece while minimizing the number of edgesthat cut between the processor boundaries, because that will determine the amount ofcommunication you need. assuming you -- you
could do that, and we can, you know, but there'sdifferent graph algorithm -- graph partition algorithms that you heard about, there's alsodifferent meshing algorithms for laying out themeshes, but at a high level, once unsaturated fatty acids got that partition, you can thenwrite a message passing program that says, well, i divide up my domain into pieces, icompute independently on each piece, i then send and receive some data between themthat all happens in one phase, so it's a bulk synchronous program, and then i repeat againby computing and receiving data. and if the domainis changing, maybe every once in a while i go back and i remesh my code and i divideit up again or something like that, but still
i canlive within that pretty bulk synchronous kind of model. but the key was, because i had aphysical structure that told me i could partition that data relatively quickly, and then i woulduse it for many, many time steps. the problem that i may come up if i'm looking at reallyunstructured data -- and one of these i think is a twitter graph, and the other one is somekind of a web, a connectivity graph, i don't even remember anymore -- is that the questionyou're trying to answer is what is the -- what is the connectivity structure of this graph?so you can't really, you know, figure out whatthe graph partition is and then say, okay, now ican run really fast, sparse matrix vector
multiply which all lows me to figure out howto partition the graph, because you had to partitionthe graph in order to solve that problem. so instead what happens in some of these algorithms,which you also saw when i talked about the genomics, and you saw this in homeworkthree, is that you -- you know, you just start computing, and you start walking aroundon things, and just have to send messages when you send messages, because you may needto remotely write or remotely read data. sow grab whatever you need whenever you needit, and it's a fairly different, kind of high level execution model. and so that's one ofthe differences between the two domains, although, i mean, to be fair, some of thesethings when you turn them into linear algebra,
although they might start looking like this,maybe -- may actually be able to execute as something that is partitioned. maybe it'snot an optimal partition of the sparse matrix, butyou come up with some kind of a, a pretty good partition.somehow i have this slide repeated. okay. so, but there's -- there's -- an observationthat other people have made is that the software stacks on used in data analyticsand used in -- and used in high performance scientific simulations are actually quitedifferent. so this is actually a picture put together by jack dongarra and -- and dan reed, both ofwhom certainly are -- have been working in
thehpc field for many years. dan has been doing more work in the data world. he spent anumber of years at microsoft and now is at iowa state, i believe. and so it's an interestingpicture to see what the differences are, and i'll have another slide where i go throughit in a little bit more detail, but you could seethey're pretty different stacks on the top, right? imean, there's -- there's very little commonality between the file systems that are used, the-- the software layers, the programming modelsand everything are all -- all fairly different between the two different -- the two differentsystems. and then down at the bottom, well
we've got ethernet switches, maybe infinibandswitches, there might be something even faster like the cray's systems switches inthe -- in something like the edison or the corisystems you've been using. and maybe there's gpus over here. there's sometimes gpus, ithink, on the other side. if you're looking at things like convolutional neural nets,but maybe a little bit more -- maybe hasn't really propagatedquite as much into the data analytic space as -- as it has on some of the -- on someof the scientific simulations side. so let's look at the two ecosystems and tryto figure out why do they look different. so i -- i flipped it upside down because itwas easier for me to think about it this way,
say,well, let's start with a processor building block. both of them start with commodityprocessors. i was at a meeting earlier this week where somebody from facebook talkedabout their, i guess they have five big huge data centers, and she talked a little bitabout the kinds of systems that are in them, and shesaid they don't buy the latest processors, so theremay be a difference in the commoditiness of the processors in the sense that they maystill be buying sandy bridge processors in the -- inthe data centers for, you know, commercial cloud providers, whereas the simulation peopleare typically going to try to buy the latest
processor when they put their hpc systemstogether. and there's going to be a -- a cost tothat, right, when you buy the most recent processor than if you buy something that'sa little bit older, but basically it's the same kindof processors, right? now on the hpc side, as imentioned, there are accelerates, and of course, the cori system, and i've heard the whiteboxes are in, which are the little, little versions of the cori nodes that are comingin which will have the knight's landing processors-- has knight's landing processors on it. so there's, iwould say the hpc or simulation side is pushing
a little bit more aggressively on both gpusand also looking at things like xeon phi. and i'll say more about why that is and -- andthe march to exascale, but basically it's to tryto get more computing per joule of energy thatyou're spending. and so, you're trying to get to lighter weight, simpler processors.they both use dram, maybe they -- there again theyuse different kind of dram. i heard a story, i don't know if it's true, about a googleexecutive who was visiting a dram manufacturer, and they had this big huge bin of dram partsthat were -- that looked like they were being excessed, and the -- and the google personsaid, well, what are you doing with those
dramchips? and they said, well, we're throwing them out because one of the memory banks isbad in that whole lot of chips. and he said, well, how much will you sell them to me for?because we'll just rewrite the operating system to map out that particular page, whichwould be something that we -- i would never consider doing at a center like nersc, becausethe idea of trying to figure out, you know, how to maintain an operating -- our own versionof the operating system, we don't have that many programmers around, but, you know,that -- but basically they're -- they're still using the same kind of dram, other than maybea few of those differences. is there anotherquestion?
so there's a local disk on the cloud systems,and there's a shared file system on the hpc systems. they're much lower density inthe cloud, so they have space between the blades in them, and that allows them to beair cooled. >> [inaudible].>> oh, and dram stands for non volatile ram, so think of it as flash memory, and there's a few different technologies,but something that's kind of, it's supposed to be,you know, higher bandwidth than disk, and so it's not a spinning disk, but it's a solidstate sort of ram. and the cori, even the phaseone system has some non volatile ram on it.
one of the questions of the reasons that istuck it in a couple of different places here is -- didi put it in a couple... yes -- there's a question about what should you think of this -- thisnew type of memory as an extension of dram, soi put it on each of my compute nodes has a little bit of nv ram on it. it just makesmy dram bigger, or maybe even replaces my dram.or do i think of it as just a higher speed disk? so maybe i stick it out. if i'm an hpcsystems designer, i stick it out somewhere in thenetwork, and that's the way it's configured in cori.it's not on each node, it's somewhere in the
network there, so nv ram nodes, just likethere are some disk i/o nodes in the system.so i think there's still some question about exactly how are we going to use it andwhether you think about it as an extension of memory or you think about it as an -- asan extension or acceleration of the file system.so you know, there's different kind of models for the -->> [inaudible]. >> oh, does it? oh, right, the people herecan only see this, is that right? okay. so -- so the -- there's a very different utilizationmetric, so i think the cloud providers will runtheir systems at under 50 percent utilization.
that's because they never want people towait, right? so, especially if you're building a -- a cloud for a company like microsoft,one of their goals is to have it be the fastest searchengine, so you run your systems at fairly lowutilization, i don't know exactly what number, but you run at a low enough utilization thateven in surge times you always have enough computing capacity to -- to deal with thesurge. whereas the hpc systems, the reason you getqueued up in the queue is because we have torun those systems by a kind of government mandate at about 90 percent utilization, sowe
always let scientists wait, and the machineis heavily utilized. the scientist not so heavilyutilized. so, fault tolerant programming, and this is the one that was -- was maybethe biggest surprise to me when i realized whatwas going on, or at least as my understanding of it right now is the cloud providers areputting local disks in their nodes. local disks, itturns out, have much higher failure rates than the dram which i think is still higherfailure rates than the processors, but it might bea few percent -- one of the web pages that i waslooking at yesterday -- a few percentages
of the -- the -- like 3 or 4 percent of thedisks will die in a given year. so what this means whenyou're running a huge center is the probability that a node fails is much higher if you sticka local disk on it than if you don't have a local diskon it. okay. so that's part of the reason the original hadoop programming model i thinkwas written with the idea of fault tolerance in mind, that we're going to just send outindependent pieces of work to these processors and if some of them don't come back, i'llsend them out again, and then i'll collect all the results together in a big reductionoperation, and that was -- was really fundamentally partof the mind set of the people designing that
programming model. whereas the message passingprogramming model which is based on this idea, first of all, of batch schedulein the system. i might have to wait in the queue, butwhen i get through the queue i get whatever chunk of the system that i asked for, andthat will give you the ability to write reallytightly coupled applications, sorry, that will -- that willcommunicate frequently with nearest neighbors without going through some sort of aglobal reduction and shuffle operation. so the scheduling policies are different, theapplications, the -- the coupleness of the -- the network that is re -- the speed ofthe network
is also related to this ability to have algorithmsthat are tightly coupled. and so -- so you could see the programming models that comeup out of the bottom of that from the set ofconstraints, and each one of them is internally consistent in terms of -- of, you know, ifi'm assuming failures, then i have to have a faulttolerant programming model. by the way, the way that i will deal with fault tolerancein my spark or hadoop model, especially in hadoop,but even in some extent in spark, is i'll write things out to local disk so that theywill be persistent, and/or i'll write them out tosome disk. and that creates overhead in the
programming model that you don't have if youassume that the the system is -- is resilient --is failure-free which is really what you're assuming on the mpi side. and instead, onthe mpi side what they do for resilience is they havemore of an after the fact, kind of thing that's cobbled on to the side which is a checkpointingrestarting model that said every once in a while i'll stop my simulation and dump outthe state. but it's quite different between thetwo. and i think they're both internally consistent in the sense that if you're going to builda system with local disk you have to have somethingfault tolerant and vice versa.
any questions?okay. so last thing about the system convergence is really a little bit about cost ofthe two different systems. this was actually data that we ran a few, about five years agocomparing commercial clouds to -- to hpc systems and the -- the slow down that you get fordifferent kinds of applications, and these -- these numbers do change depending on exactlywhat the cloud system looks like and probably, and to some extent what the hpc systemlooks like. but this was actually comparing to more buck an infiniband cluster ratherthan a really high performance system, as i recall,but they -- for some of the applications like blast and games, there's actually not verymuch performance difference between the two.
and that's because what blast is doing, it'sa genome alignment algorithm, and it's replicating the reference that you're aligningto, and then independently running all the different jobs against it. so in that case,the network doesn't matter, the speed of synchronization, that is there's no need forbatch scheduling because you're basically justusing it as a big task farm, farming system. whereas paratec which is the -- this tallbar right here is doing a big huge global fftin the middle, a 3d fft. and as you know from studying ffts, that involves all-to-all communicationand both the speed, the bisection bandwidth of the network, and also the abilityto have all of the processors tied together
inorder to do that, that all-to-all communication are really important to the performance.so you know, the cost differences are a little bit different across the different --different types of systems as well, but i think that the -- the myth is that an hpcsystem is much more expensive because people will lookat it and say, well you spend a hundred million dollars on that hpc system, and ican buy time in a cloud for only a few cents percore hour, but if you actually do the -- the math you'll find out that both of them turninto a few cents per core hour. i think it's abouttwo and a half cents per core hour right now
atnersc. and so the real question is how big of an allocation do you have, which reallysays how much that allocation of time costs you.and it goes back to questions like how you manage the resources. do you run at high utilizationor low utilization which will affect your cost. do you collect a profit or do you notcollect a profit? so the myth here is that supercomputers are expensive and clouds are cheap which in fact can in some cases, especiallyif you're trying to get really the high -- highperformance access in the cloud, can be quite different. and you also may pay for your datacoming in and out.
okay. the last sort of set of things to talkabout is a little bit about exascale. i think you've heard a lot about hpc in the past,and i decided today to talk a little bit less aboutexascale. what the exascale computing initiative is about is really getting real performanceand real scientific problems, things that doe cares about and things like that. andso the initiative is all organized around that ideathat there will be a set of applications and they willused advanced methods and things like that and techniques to get high performance outof the systems that will come from that. butlet's say that's a little bit hard. let's
talk about aneven simpler metric. let's say that all we cared about was building an exaflop systemthat ran the linpack benchmark. so you've seenthese graphs before, i think, from the top 500 and this was the slide that was put together,this was from horst simon, although i think it'sa standard top 500 slide. and this is circa 2007. so here's the prediction of whatperformance would look like on the top 500 systems back in 2007, and it said that weyou would two hit an exaflop in about 2017. right?so there's the -- whoops, sorry. that's the --that's the sum. there's the -- here's the
exaflop hitting in, more like 2018 for the-- for the fastest system. so they -- just to remindyou in case you haven't -- don't look at these all thetime, this n = one, the blue curve in the middle here is the top, the fastest machinein the world. the -- at least it's on the list. andthe red line here is the slowest machine on the top500 list, so it's the 500th machine, roughly, in the world. and the -- then the other oneup here is the sum of all of the top 500 machinesin term of its speed. so you are looking at when does the fastest machine become an exaflop?well, somewhere around 2018, maybe
november of 2018 or something like that. sowhat does that -- so what do things actually look like today? well, this is what the datalooks like going out through 2015. the first 2016measurements will come in july at the international super computing conference ingermany, and that will -- so that the -- yeah, we only have the 2015 numbers so far. so youcan see the three different curves are different colors now, but there's the fastest machinein the world. so it's flattened out. that's the tianhe-2 machine that is in china. now,it's flattened out before. this was the earth simulatorin japan for a while, that i think that one was... whoops, maybe it's this one right here...was the earth simulator. so there was been
times before when the -- the -- the fastestmachine in the world can sit there for a couple ofyears, but it's now been, you know, two and a half years, it will probably be three yearswhere that -- that machine is going to sit there at the top 500 which i think is a recordfor the length of time that a system has been thetop, the fastest on that top 500 list. but the otherthing is if you look at these trend lines, and there's a couple different dash curvesand dash lines in here, so there's a straight lineon this blue thing going from here up through the top.you know, you can see that dash line up here,
but there's another one that's much flatter,and that's the one that if you look at the -- the trends since 2013, the trend has beenmuch flatter than if you look at the trend up until2013. the -- the slowest machine on the top 500list actually turned over. that curve turned over in closer to 2008. that's probably wasthe result of 2004 or 2005 when -- when clockspeed stopped scaling. a lot of those machines atthe bottom of the top 500 list are still commodity x86 processors as opposed to havinggpus on them or having xeon phis or some other kind of accelerator on them. so themachines at the very top got extended a little
bit longer, and that -- the sum curve extendeda little bit longer because of the -- because of the fact that they had these acceleratorson them which are -- are more computing per -- perjoule of energy than the systems at the bottom which were -- were, you know, tendedto be more -- more conventional sort of processors.so what actually limits computing performance? so let's say that i -- i guess the waythat i like to think about it is the exascale programming is just a reflection of what'shappening in the broader computing community which is computing performance is slowingdown. that is, it's not getting slower, but it's not getting fast as fast -- faster atthe rate that
has gotten faster in the past. and that willeffect everything from data analytics problems whether they're running on the cloud or whetherthey're running on your -- your smart phone, and it will also affect the hpc modelingsimulation problems and the fastest machines in the world that are coupled toget to solve these problems. so what limits that, well it's heat, it'sthe amount of energy that they're using and theheat density, and so this is our, you know, how hot is your laptop picture. well, hotenough to fry an egg. and so that's one laptop, butif you look at the fastest machines in the world,this is the number of megawatts per machine.
and this is actually missing the tianhe-2machine which is up there at 18 megawatts. so all of the doe centers are up -- tryingto upgrade their facilities to at least 20 megawatts,so we're trying to get to 20 megawatts up the hill at nersc at the facility that youvisited at, we're at 12 and a half megawatts rightnow. other facilities are already over 20. i don't -- the -- the person from facebookwouldn't say how many megawatts are in the -- are intheir data center, but kind of guessing just fromit looked like, the size of those buildings are, they're probably -- she did say somethinglike
10s of megawatts, so i have a feeling thatthey are more like 60 or something like that. nowthey're running more racks than we're running. they're less densely packed. they're aircooled, but they're nevertheless, the same problems exist throughout the community -- thecomputing community. so the goal of an exaflop was originally, and i think we're thinkingwe might have to relax a little bit, to get an exaflop computer in 20 megawatts. okay?so what does that turn into? that means 20 picojoulesper operation. and that's an interesting target, even if you don't care about an exaflopcomputer, it says something about, you know, how much power do you need in insideof soda hall in the machine room if you're
going to run a little cluster here, or howmuch is your cloud time going to cost you even ifyou go and buy it from amazon because they're paying the electricity bill as well, evenif they get a discounted rate as we do at thelab. okay. so if you look at the history of, youknow, this -- this performance growth, the original super computers were down here werethese cray vector super computers, so shared memory and single processor, but really,you know, wide vectors that ran certain kinds of array operations very quickly, andthen the killer micros came over and took over,and we started building mpp systems, massively
parallel systems, and then in about 2005everybody realized they couldn't build single processors anymore that were faster, and sothey got -- they got the parallelism, and now the whole question is about energy. sohow much energy can you get per -- sorry how littleenergy can you use per unit of computation which of course is about both data movementand about the cost of computing. and i'm sure jim has said before that most of theenergy goes into moving the data around in thesystem. we sometimes call this attack of the killer cell phones, not because we're goingto build our super computers out of cell phones,but it gives you the idea that if you want
to goand figure out how to build an exaflop system in 20 megawatts you better talk to peoplewho really understand low power computing devices and not people who look at just the-- the high end.so these exascale problems really affect computing at all scales. independently, as isaid, of whether you care about a whole exaflop or want to run a whole exaflop job,because mostly it's about power which means the way to get power is to increaseparallelism and decrease the clock rate that's on the system, and even down at the devicelevel there are techniques that might give us more parallelism at lower energy, but youhave
to have more parallelism, so overall you cansave energy, but you're going to have to find more parallelism at all different levels.and unlike the move from vector super computers tomassively parallel computers, now we're worrying mostly about parallelism that's inside ofthe processor chip, because that's where the real growth in parallelism will be. i don'tthink we'll make our hpc systems a lot larger, justbecause then resilience and power and everything becomes a big problem. so it'sabout moving the data around in the system andthe algorithms to map onto those architectures and anything, everything.so i'll just wrap up. this is kind of one
picture to give you an idea of how muchenergy you safe by using lightweight cores, and i think then we're going to do hopefullya -- a little survey after that, but how much energyused by using lightweight cores rather than heavyweight cores. this is a picture of theintel nehalem processor here on the right. thebig one, with a four-core processor from a few years ago. and there's a picture of acell phone processor on the left. so that -- thatintel nehalem processor's a fifty-gigaflops processor. so if you're going to build a supercomputer, as i said, they tend to want to build
them out of the most high performance processors,or at least that's what we've been doing in the past. so you would certainlypick a fifty-gigaflops processor rather than afour-gigaflops processor which is what the cell phone processor was. however, if youlook add the power usage, this one is a hundredwatts, and this one is point 1 watts. so there's afactor of 10 difference in terms of this one being faster by a factor of 10. but there'sa factor of a thousand advantage in the power of the-- the two, with the advantage going to the small cell phone processor. so on averagethere's a factor of a hundred difference.
and yousee this in gpus as well as things like xeon phi. things still may be trailing a littlebit in terms of the -- the energy advantages compared toreally lightweight, very simple cores that arecoming up in the individual cores that are in a gpu processor. so it means you need tothink about parallelism at a much lower level. iknow you wrote code in openmp. i don't think you've written gpu code in here. you did?oh, they wrote -- you wrote gpu codes. so, yeah, i think a lot more of that have kindof -- kind of code or whether it's for gpus orvectors and things like that on xeon phi,
and the lightweight threading that will comeup on the cori phase 2. so hopefully you'll wantto continue doing this in your research, whether you're in computer science or in one other-- some other science discipline, and maybe we'llget access to the cori phase 2 system through your research project and can try out thexeon phi systems next summer when they -- they come into cori. so i was just going tostop -- end with a take-home message for the -- a little bit from my lecture, but maybea little bit more from the class, from the things thati think you want to remember as you go out into the world and do high performance computing.the first one is i like the verb to
roofline your code. so you saw the rooflinemodel in here. what does this really mean? itmeans always understand what the fundamental bottleneck is that you're up against onyour system and try to figure out whether you're close to it or not. and if you can'tfigure out which bottleneck you're close to on thesystem, whether it's memory bandwidth, or network bandwidth, or compute performance,or something else, then you're probably not at the limit of the system. and i don't meanbe at, you know, 99 percent of that limit, butyou're not within a factor of two of something that you can kind of point to in the system,then you probably have not performance engineered
your code. maybe you don't care, butthat's -- that would be something that you should look at. i have to say that one ofthe things that i find a little frustrating byreading some of the papers on, you know, the dataanalysis sort of software stacks is the lack of absolute performance numbers in those --those papers. so there's a lot of relative performance numbers that say things like,spark is a hundred times faster than hadoop, but theanalysis that says, you know, how much faster is the hardware than spark is allowing youto get the hardware that the -- the algorithm togo is not so prevalent in those algorithms.
so hopefully you'll go off and write thosepapers. there are a couple of them that have beenwritten recently. i think the factor of 10x is still inthere between kind of a well-optimized code and what you can get from something likespark, but for some machine learning algorithms, although getting the data in and out ofthe system is still a bottleneck on the hpc systems. understand your motifs of yourapplications. i would say especially if you're a computer scientist and you don't plan togo off and learn about biology or chemistry orphysics or one of these other fields in detail, youshould at least try to figure out how to ask
enough questions if you're working onbenchmarks, or you're working on applications to understand what class of problems areyou designing at architecture, or programming model, or software system for. and the thirdone is, you know, question the conventional wisdom. there are a lot of people goingaround saying, you know, the cloud is best, or the cloud is cheapest, or my system onlyruns on shared memory machines. that was the storywe heard about the genomics problem. or my -- you know, my machine learning algorithmrequires either shared memory or gpus or something like that. so question the conventionalwisdom. and -- and don't think that data is all about business applications. there'sa lot of data problems that come up in science
aswell. and finally that the clouds and the hpc systems are -- are optimized for differentkind of usage models, but the fundamental buildingblocks, as we saw from that picture, are really very similar, and you know, there'llbe more gpus in the commercial cloud if they turnout to be useful for the applications of people that want to sell them. the business modelsare quite different between the two. the time at nersc is free, except that you can onlyget it if you've got an allocation of time. andthe time in the cloud you pay for, so that meansthat you're -- your thesis advisor can trade
off your salary with compute time in the cloud,but they can't trade off your salary with compute time at nersc, so that's the differencebetween them. all right. thanks very much. [applause]>> so are there any questions before eta kappa nu gets started?so -- so one quick question which is probably a longer discussion which is is it time toreengineer 267 to change the dwarfs or giant that is we emphasize. so, i'm not sure>> well, i have to admit i was thinking about this last night as i was putting mylecture together, and i would be really interested in hearing what this class says. maybe youcan put it in your comments in your hcan survey, but would it be better to have a class onhigh performance data analytics, and class
on high performance modeling simulation, orto have one class that tries to do both? andso, i think that's -- that's sort of -- i mean, yes, wesaw there was a lot of similarity between them, but it's also a little hard to cram,you know, all of the different kind of algorithmic ideasinto one class. >> we could just talk faster.>> we could talk faster. i think you and i talk pretty fast as it is. [laughing]. allright. thanks again. and then you have an announcement.
No comments :
Post a Comment