More from Cremieux Recueil
What you read in an abstract, a title, or even in the body of a paper might be the opposite of what papers actually show, and sometimes result are just made up
Cognitive testing might be making a comeback. That could be very important.
Is that immigrant high-skilled or do they just have a fancy degree?
What does history tell us about how the executive branch can run the government?
More in science
It’s time for a new north start for this company
In partnership with Google, the Computer History Museum has released the source code to AlexNet, the neural network that in 2012 kickstarted today’s prevailing approach to AI. The source code is available as open source on CHM’s GitHub page. What Is AlexNet? AlexNet is an artificial neural network created to recognize the contents of photographic images. It was developed in 2012 by then University of Toronto graduate students Alex Krizhevsky and Ilya Sutskever and their faculty advisor, Geoffrey Hinton. The Origins of Deep Learning Hinton is regarded as one of the fathers of deep learning, the type of artificial intelligence that uses neural networks and is the foundation of today’s mainstream AI. Simple three-layer neural networks with only one layer of adaptive weights were first built in the late 1950s—most notably by Cornell researcher Frank Rosenblatt—but they were found to have limitations. [This explainer gives more details on how neural networks work.] In particular, researchers needed networks with more than one layer of adaptive weights, but there wasn’t a good way to train them. By the early 1970s, neural networks had been largely rejected by AI researchers. Frank Rosenblatt [left, shown with Charles W. Wightman] developed the first artificial neural network, the perceptron, in 1957.Division of Rare and Manuscript Collections/Cornell University Library In the 1980s, neural network research was revived outside the AI community by cognitive scientists at the University of California San Diego, under the new name of “connectionism.” After finishing his Ph.D. at the University of Edinburgh in 1978, Hinton had become a postdoctoral fellow at UCSD, where he collaborated with David Rumelhart and Ronald Williams. The three rediscovered the backpropagation algorithm for training neural networks, and in 1986 they published two papers showing that it enabled neural networks to learn multiple layers of features for language and vision tasks. Backpropagation, which is foundational to deep learning today, uses the difference between the current output and the desired output of the network to adjust the weights in each layer, from the output layer backward to the input layer. University of Toronto. Away from the centers of traditional AI, Hinton’s work and those of his graduate students made Toronto a center of deep learning research over the coming decades. One postdoctoral student of Hinton’s was Yann LeCun, now chief scientist at Meta. While working in Toronto, LeCun showed that when backpropagation was used in “convolutional” neural networks, they became very good at recognizing handwritten numbers. ImageNet and GPUs Despite these advances, neural networks could not consistently outperform other types of machine learning algorithms. They needed two developments from outside of AI to pave the way. The first was the emergence of vastly larger amounts of data for training, made available through the Web. The second was enough computational power to perform this training, in the form of 3D graphics chips, known as GPUs. By 2012, the time was ripe for AlexNet. Fei-Fei Li’s ImageNet image dataset, completed in 2009, was pivotal in training AlexNet. Here, Li [right] talks with Tom Kalil at the Computer History Museum.Douglas Fairbairn/Computer History Museum The data needed to train AlexNet was found in ImageNet, a project started and led by Stanford professor Fei-Fei Li. Beginning in 2006, and against conventional wisdom, Li envisioned a dataset of images covering every noun in the English language. She and her graduate students began collecting images found on the Internet and classifying them using a taxonomy provided by WordNet, a database of words and their relationships to each other. Given the enormity of their task, Li and her collaborators ultimately crowdsourced the task of labeling images to gig workers, using Amazon’s Mechanical Turk platform. competition in 2010 to encourage research teams to improve their image recognition algorithms. But over the next two years, the best systems only made marginal improvements. NVIDIA, cofounded by CEO Jensen Huang, had led the way in the 2000s in making GPUs more generalizable and programmable for applications beyond 3D graphics, especially with the CUDA programming system released in 2007. Both ImageNet and CUDA were, like neural networks themselves, fairly niche developments that were waiting for the right circumstances to shine. In 2012, AlexNet brought together these elements—deep neural networks, big datasets, and GPUs— for the first time, with pathbreaking results. Each of these needed the other. How AlexNet Was Created By the late 2000s, Hinton’s grad students at the University of Toronto were beginning to use GPUs to train neural networks for both image and speech recognition. Their first successes came in speech recognition, but success in image recognition would point to deep learning as a possible general-purpose solution to AI. One student, Ilya Sutskever, believed that the performance of neural networks would scale with the amount of data available, and the arrival of ImageNet provided the opportunity. In 2011, Sutskever convinced fellow grad student Alex Krizhevsky, who had a keen ability to wring maximum performance out of GPUs, to train a convolutional neural network for ImageNet, with Hinton serving as principal investigator. AlexNet used NVIDIA GPUs running CUDA code trained on the ImageNet dataset. NVIDIA CEO Jensen Huang was named a 2024 CHM Fellow for his contributions to computer graphics chips and AI.Douglas Fairbairn/Computer History Museum Krizhevsky had already written CUDA code for a convolutional neural network using NVIDIA GPUs, called cuda-convnet, trained on the much smaller CIFAR-10 image dataset. He extended cuda-convnet with support for multiple GPUs and other features and retrained it on ImageNet. The training was done on a computer with two NVIDIA cards in Krizhevsky’s bedroom at his parents’ house. Over the course of the next year, he constantly tweaked the network’s parameters and retrained it until it achieved performance superior to its competitors. The network would ultimately be named AlexNet, after Krizhevsky. Geoff Hinton summed up the AlexNet project this way: “Ilya thought we should do it, Alex made it work, and I got the Nobel prize.” Krizhevsky, Sutskever, and Hinton wrote a paper on AlexNet that was published in the fall of 2012 and presented by Krizhevsky at a computer vision conference in Florence, Italy, in October. Veteran computer vision researchers weren’t convinced, but LeCun, who was at the meeting, pronounced it a turning point for AI. He was right. Before AlexNet, almost none of the leading computer vision papers used neural nets. After it, almost all of them would. synthesize believable human voices, beat champion Go players, and generate artwork, culminating with the release of ChatGPT in November 2022 by OpenAI, a company cofounded by Sutskever. Releasing the AlexNet Source Code In 2020, I reached out to Krizhevsky to ask about the possibility of allowing CHM to release the AlexNet source code, due to its historical significance. He connected me to Hinton, who was working at Google at the time. Google owned AlexNet, having acquired DNNresearch, the company owned by Hinton, Sutskever, and Krizhevsky. Hinton got the ball rolling by connecting CHM to the right team at Google. CHM worked with the Google team for five years to negotiate the release. The team also helped us identify the specific version of the AlexNet source code to release—there have been many versions of AlexNet over the years. There are other repositories of code called AlexNet on GitHub, but many of these are re-creations based on the famous paper, not the original code. CHM’s GitHub page. This post originally appeared on the blog of the Computer History Museum. Acknowledgments Special thanks to Geoffrey Hinton for providing his quote and reviewing the text, to Cade Metz and Alex Krizhevsky for additional clarifications, and to David Bieber and the rest of the team at Google for their work in securing the source code release. References Fei-Fei Li, The Worlds I See: Curiosity, Exploration, and Discovery at the Dawn of AI. First edition, Flatiron Books, New York, 2023. Cade Metz, Genius Makers: The Mavericks Who Brought AI to Google, Facebook, and the World. First edition, Penguin Random House, New York, 2022.
Language is an interesting neurological function to study. No animal other than humans has such a highly developed dedicated language processing area, or languages as complex and nuanced as humans. Although, whale language is more complex than we previously thought, but still not (we don’t think) at human level. To better understand how human language […] The post The Neuroscience of Constructed Languages first appeared on NeuroLogica Blog.
A growing body of work suggests that cell metabolism — the chemical reactions that provide energy and building materials — plays a vital, overlooked role in the first steps of life. The post How Metabolism Can Shape Cells’ Destinies first appeared on Quanta Magazine
I saw a couple of interesting talks this morning before heading out: Alessandro Chiesa of Parma spoke about using spin-containing molecules potentially as qubits, and about chiral-induced spin selectivity (CISS) in electron transfer. Regarding the former, here is a review. Spin-containing molecules can have interesting properties as single qubits, or, for spins higher than 1/2, qudits, with unpaired electrons often confined to a transition metal or rare earth ion somewhat protected from the rest of the universe by the rest of the molecule. The result can be very long coherence times for their spins. Doing multi-qubit operations is very challenging with such building blocks, however. There are some theory proposals and attempts to couple molecular qubits to superconducting resonators, but it's tough! Regarding chiral induced spin selectivity, he discused recent work trying to use molecules where a donor region is linked to an acceptor region via a chiral bridge, and trying to manipulate spin centers this way. A question in all the CISS work is, how can the effects be large when spin-orbit coupling is generally very weak in light, organic molecules? He has a recent treatment of this, arguing that if one models the bridge as a chain of sites with large \(U/t\), where \(U\) is the on-site repulsion energy and \(t\) is the hopping contribution, then exchange processes between sites can effectively amplify the otherwise weak spin-orbit effects. I need to read and think more about this. Richard Schlitz of Konstanz gave a nice talk about some pretty recent research using a scanning tunneling microscope tip (with magnetic iron atoms on the end) to drive electron paramagnetic resonance in a single pentacene molecule (sitting on MgO on Ag, where it tends to grab an electron from the silver and host a spin). The experimental approach was initially explained here. The actual polarized tunneling current can drive the resonance, and exactly how depends on the bias conditions. At high bias, when there is strong resonant tunneling, the current exerts a damping-like torque, while at low bias, when tunneling is far off resonance, the current exerts a field-like torque. Neat stuff. Leah Weiss from Chicago gave a clear presentation about not-yet-published results (based on earlier work), doing optically detected EPR of Er-containing molecules. These condense into mm-sized molecular crystals, with the molecular environment being nice and clean, leading to very little inhomogeneous broadening of the lines. There are spin-selective transitions that can be driven using near telecom-wavelength (1.55 \(\mu m\)) light. When the (anisotropic) \(g\)-factors of the different levels are different, there are some very promising ways to do orientation-selective and spin-selective spectroscopy. Looking forward to seeing the paper on this. And that's it for me for the meeting. A couple of thoughts: I'm not sold on the combined March/April meeting. Six years ago when I was a DCMP member-at-large, the discussion was all about how the March Meeting was too big, making it hard to find and get good deals on host sites, and maybe the meeting should split. Now they've made it even bigger. Doesn't this make planning more difficult and hosting more expensive since there are fewer options? (I'm not an economist, but....) A benefit for the April meeting attendees is that grad students and postdocs get access to the career/networking events held at the MM. If you're going to do the combination, then it seems like you should have the courage of your convictions and really mingle the two, rather than keeping the March talks in the convention center and the April talks in site hotels. I understand that van der Waals/twisted materials are great laboratories for physics, and that topological states in these are exciting. Still, by my count there were 7 invited sessions broadly about this topic, and 35 invited talks on this over four days seems a bit extreme. By my count, there were eight dilution refrigerator vendors at the exhibition (Maybell, Bluefors, Ice, Oxford, Danaher/Leiden, Formfactor, Zero-Point Cryo, and Quantum Design if you count their PPMS insert). Wow. I'm sure there will be other cool results presented today and tomorrow that I am missing - feel free to mention them in the comments.