Surviving the flood

Published in Physics World, 1 Oct 2014

Planned big-science facilities are set to generate more data than all the global Internet traffic combined. Jon Cartwright finds out how scientists will deal with the data deluge

When the €2bn ($2.6bn) Square Kilometre Array (SKA) sees first light in the 2020s, astronomers will have an unprecedented window into the early universe. Quite what the world’s biggest radio telescope will discover is of course an open question – but with hundreds of thousands of dishes and antennas spread out across Africa and Australasia, you might think the science will be limited only by the enormous extent of the telescope’s sensitivity, or its field of view.

But you would be wrong. “It’s the electricity bill,” says Tim Cornwell, the SKA’s head of computing. “While we have the capital cost to build the computer system, actually running it at full capacity is looking to be a problem.” The reason SKA bosses are concerned about electricity bills is that the telescope will require the operation of three supercomputers, each with an electricity consumption of up to 10 MW. And the reason that the telescope needs three energy-hungry supercomputers is that it will be churning out more than 250 000 petabytes of data every year – enough to fill 36 million DVDs. (One petabyte is approximately 10^15 bytes.) When you consider that uploads to Facebook amount to 180 petabytes a year, you begin to see why handling data at the SKA could be a bottleneck.

This is the “data deluge” – and it is not just confined to the SKA. The CERN particle-physics lab, for example, stores around 30 petabytes of data every year (and discards about 100 times that amount) while the European Synchrotron Radiation Facility (ESRF) has been annually generating upwards of one petabyte. Experimental physics is drowning in data, and without big changes in the way data are managed, the science could fall far short of its potential. […]

To read the rest of this article, please email for a pdf.