The Internet relies on a decentralized architecture where control of core Internet services is distributed across the ‘network of networks‘. This ensures a more resilient network and avoids single points of failure or control.
In this focus area, we present data on the distribution of the market shares of core web technologies and infrastructure to see how services are concentrated among a few actors and countries or distributed among many, and to track how this changes over time.
We present two different views of Internet Centralization:
- Market Concentration: The concentration of providers in a given market
- Country Market Shares: The jurisdiction of providers in a given market.
To infer the degree to which power is concentrated or distributed, and how this is changing over time, we compute two metrics based on the underlying data of market shares:
The Gini coefficient measures the degree of inequality in a distribution and is widely used in economics to measure wealth and income inequality. It ranks income distribution on a scale between 0 and 1, where 1 means complete inequality (one actor owns all the shares) and 0 means perfect equality (every actor has the same share).
Herfindahl-Hirschman Index (HHI)
The Herfindahl-Hirschman Index (HHI) is a commonly accepted measure of market concentration and is calculated by squaring the market share of each firm competing in a market, and then summing the resulting numbers. HHI values are in the range 0 to 10,000. HHI values over 2,500 indicate highly concentrated markets.
—- [Market Concentration]
The distribution of market shares amongst service providers, and how it is evolving over time. Switch between the two concentration metrics – the Gini Coefficient and the HHI value – and adjust the sample size to learn about the degree of concentration in the provisioning of core Internet technologies.
Data center providers supply hardware and software infrastructure to serve websites on the Internet.
Top Level Domain
Top Level Domains are the highest level in the hierarchical Domain Name System (DNS) of the Internet.
SSL certificate authorities are trusted institutions that issue SSL certificates for verifying the owner of a website and encrypting web traffic with SSL/TLS.
DNS (domain name system) servers manage mappings between Internet domain names and their associated records such as IP addresses.
Content Delivery Networks (CDNs) are geographically distributed networks of proxy servers and their data centers.
A web hosting service provides hardware and software infrastructure to enable webmasters to make their websites accessible via the Internet.
Internet Centralization History
—- [Country View (Provider View)]
The jurisdiction market share of core Internet technologies and how it is evolving over time. Switch between technologies, sample size, and market shares – unweighted, or weighted by a country’s population of Internet users to see the relative influence of a given country.
—- [Framework View]
Our View on Internet Centralization
The concept of centralization is concerned with the degree to which an activity within a given set of relationships is dependent on a small set of actors or functions. In a fully centralized system, the activity is dependent on one actor or function, acting as a single point of control. In contrast, in a decentralized system any dependencies or points of control for a given activity are more evenly distributed. In other words, we can think of centralization as a concept that describes the degree to which dependencies and control are concentrated amongst a few actors or functions or distributed amongst many.
In this light, the notion of Internet centralization can mean different things as it depends on what activity we are analyzing. For example, since the Internet is a network of networks, we could look at interconnection patterns to analyze the degree to which the activity of routing is dependent on certain networks having many interconnections with other networks. This would tell us something about the degree to which the activity of routing is more centralised or less centralized, which in turn has implications for issues such as Internet resilience.
For Internet Society Pulse, we have chosen to look at a broader set of activities that are also important from the view of dependencies and points of control, and which arguably form part of the Internet’s core services . Specifically, we are looking at the distribution of market shares in the provisioning of core web technologies to infer the degree to which power is concentrated or distributed, and how this is changing over time. To do this, we compute two metrics based on the underlying data of market shares – the Gini coefficient and the Herfindahl-Hirschman Index (HHI). These metrics provide complementary insights for understanding trends of Internet centralization.
About the Data and the Market Shares
The centralization metrics are based on data from web technology surveys performed by our Pulse data partner W3Techs. These web surveys are conducted by downloading webpages of the top 10 million websites and then analyzing the information provided by the website servers – much like the way a search engine indexes content on the web.
As noted by W3Techs, it is impossible for their technology surveys to be 100% accurate, not least since the information provided is in many instances determined by the website owners themselves. For some websites information may not have been made available by the website owner, and for others it may simply be inaccurate. Furthermore, some web technologies may provide more means to reveal information about their usage than others, meaning that the ability to accurately identify the provider may vary between different technologies.
Finally, W3Tech’s web surveys do not encompass the full web, but constitute a large sample (10 million websites) based on Alexa and Tranco rankings. By identifying the technologies used by websites in the sample, a corresponding market share is calculated. Thus, market share in this context should be interpreted as “the share of observations in the sample that were identified to use a technology by provider X”.
To learn more about additional considerations with regards to the web technology surveys, and issues of validity and reliability, we recommend you read more from W3Techs. They provide an overview of their methodology and a more detailed FAQ.
Two Metrics Providing Two Views on Centralization
The first of our centralization metrics is the Gini coefficient, which measures the degree of inequality in a distribution and is widely used in economics to measure wealth and income inequality. The benefit of using the Gini coefficient is that it will work on any distribution in which the baseline (ideal state) is complete equality. Thus, in terms of being an indicator of Internet (de)centralization we can think of such an ideal state as one in which providers have equal shares of the market.
The Gini coefficient ranks income distribution on a scale between 0 and 1, where 1 means complete inequality (one actor owns all the shares) and a measure of 0 means perfect equality (every actor has exactly the same share). A higher Gini coefficient is consequently indicative of a higher degree of concentration.
Our second centralization metric is the Herfindahl-Hirschman Index (HHI), which is a commonly accepted measure of market concentration and is calculated by squaring the market share of each firm competing in a market, and then summing the resulting numbers. A key benefit with HHI is that the index value also reflects information about the underlying distribution of market shares. It approaches zero when a market is occupied by a large number of actors of relatively equal size and reaches its maximum of 10,000 points when a market is controlled by a single actor. The HHI increases both as the number of firms in the market decreases and as the disparity in size between those firms increases.
A market with an HHI value of less than 1,500 is considered a competitive marketplace, an HHI value of 1,500 to 2,500 indicates a moderately concentrated marketplace, and an HHI value of 2,500 or greater indicates a highly concentrated marketplace.
To understand the difference between the Gini coefficient and HHI it is helpful to look at two scenarios:
- Imagine a market with only two providers, each with 50% market share. In this scenario, the Gini coefficient would be 0 as the distribution is perfectly equal. In contrast, the HHI value would be 5,000, indicating a highly concentrated market controlled by a few large providers.
- In contrast, imagine a scenario of 100 providers, of which 5 providers have 16% market share each, while 95 providers only have 0.21% each. In this scenario we would get a Gini coefficient of 0.72, which indicates a highly unequal distribution, while the resulting HHI value of 1286 indicates a competitive market.
For those interested in Internet centralization it is clear that both metrics have their strengths and weaknesses. As illustrated by scenario 1, the Gini coefficient’s focus on the equality of the distribution misses the fact that the market is concentrated in the hands of two providers – which is captured by the HHI. Conversely, in scenario two, reference to the HHI value alone would result in a reader missing the fact that five providers out of 100 control 80% of the market, while the GINI coefficient reflects the fact that the market is greatly unequal, with power concentrated among a few providers.
While it is important to look at the underlying data for each market, a shortcut for the intuitive interpretation of combinations of HHI values and Gini coefficients is provided in the following table:
(Sample) Size Matters
The Internet is big and popular, but some parts of it are more popular than others. Most users spend most of their time viewing a small set of websites that dominate their Internet experience. So, from a view of Internet centralization we want to know not only to what extent the Internet as a whole is centralized, but also to what degree there are significant dependencies and points of control in the parts of the Internet that we use the most. On the Internet Society Pulse platform, we offer the possibility to adjust the sample size to the top 1,000 and top 10,000 websites.
Adjusting the sample size provides an opportunity to learn more about concentration in the market for the most popular services on the web. For instance, Content Delivery Networks (CDNs) provide different tiers of service to suit the needs of different website owners. These services may be provided free of charge to individual hobby users, or on commercial terms for enterprise services targeted towards websites with high traffic volumes.
It may well be that a provider offering a free service has a large, or even dominant, market share when considering a large sample of millions of websites, which would have a big impact on the Gini and HHI metrics described above. However, while this tells us something about Internet centralization as seen from one perspective, the story is potentially very different when focusing on the parts of the Internet that we tend to use the most.
Centralization From the View of Jurisdictions
A metric that reflects the relationship among providers constitutes one view of centralization, but a complementary view is to also account for the concentration of service providers in specific legal jurisdictions. For example, if web hosting services were completely evenly distributed among providers, but all of those providers were based in the United States, the market for web hosting would be decentralized among providers but completely centralized in one jurisdiction. Legal jurisdiction typically enables governments to license technology service providers and regulate their activities.
To provide a more rounded analysis of Internet centralization we associated a legal jurisdiction with each technology provider in our dataset. This allows us to produce a metric that describes each country’s market share in a given industry. Sometimes, jurisdiction is ambiguous. For example, GlobalSign is headquartered in Belgium but is a subsidiary of a Japanese company. In this case, we assume both Belgium and Japan have jurisdiction over GlobalSign. Such ambiguity is not common in our dataset however and attributing a jurisdiction to a single country is typically straightforward.
Calculating each country’s market share of core Internet services tells part of the story, but using only this metric could be misleading. It’s unrealistic to assume that Vanuatu (population 300,000) would or should have the same market share of core Internet services as the USA (population 300,000,000). A more reasonable assumption might be that each country represents a share of the Internet proportional to (or weighted by) its population of Internet users.
To calculate these weighted market shares we use the World Bank’s latest figures on individuals using the Internet as a percentage of the population to calculate each country’s value as a percentage of the world’s total number of Internet users. We then use these percentages to weight each country’s market share to produce a market share weighted by Internet-using population.
The weights are calculated against contemporary populations. For example, for market shares collected in 2016, we weighted these market shares against the 2016 population, and so on. When data were missing, we forward-filled the data (i.e. we assumed the population remained the same as it was in the prior year).
Intuitively, a low-population country with a large market share will get a higher number; a high-population country with a small market share will get a lower number. When the corresponding Gini coefficient is calculated over these weighted market shares, it can be said to reflect the degree of inequality in the distribution of jurisdictional control over core Internet services, weighted by population size.
We are very grateful to W3Techs for partnering with us on the underlying data for the Pulse Internet Centralization focus area.
The data collection, methodology and theoretical assumptions for this focus area have been developed in collaboration with Dr. Nick Merrill of UC Berkeley’s Center for Long-Term Cybersecurity. You can learn more about his work and his methodology for data collection here.