Pablo Hoffman
2 Mins
May 27, 2015

Gender inequality across programming languages

Gender inequality is a hot topic in the tech industry. Over the last several years we’ve gathered business profiles for our clients, and we realized this data would prove useful in identifying trends in how gender and employment relations to one another.

The following study is based on UK profiles to determine the gender of a profile using the given name, which covered approximately 80% of the users. We had collected data from 2010 through to 2015, so we were able to identify changes between each year.

The following languages were analyzed:

  • Python
  • Ruby
  • Java
  • C#
  • C++
  • JavaScript
  • PHP

Results

Male and female percentages in the IT Industry

gender_it

Male and female percentages outside the IT industry

gender_nonit

Male and female percentages by language

gender_language

Female percentage by year

gender_time

Ruby by a large margin appears to have the highest percentage of women, and C++ the lowest.

Gender imbalance seems to be less prominent outside the IT industry, but the percentage of women across languages seems to be increasing over time.

Methodology

The source for this study was provided by our Data Services massive business profiles collection. The gender of a profile was inferred by its given name.

The programming language associated with the user was determined by inspecting the descriptions of the person’s prior experience. Two methodologies were used for this.

The first methodology, which we’ll refer to as 'Method 1', associated a user with a programming language if the language name appeared in the description.

As this can lead to a user being assigned more than one language, we also used an alternate methodology, 'Method 2', that assigned a language if that language was the only one that appeared in the description.

The results presented above were the average of these two methodologies

We considered using people's list of skills but decided against it as it would've prevented us from retrieving results by year.

We excluded languages such as BASIC due to ambiguity and analyzed jobs from 2010 to the present. We also excluded languages where there weren't enough jobs to keep our confidence interval below 1% at the 95th percentile.

If you would like additional information about our methodology or have any suggestions for a study please contact sales@zyte.com.