Skip to content

Gender Inequality Across Programming Languages

Gender inequality is a hot topic in the tech industry. Over the last several years we’ve gathered LinkedIn data for our clients, and we realised this data would prove useful in identifying trends in how gender and employment relate to one another.

We analyzed UK public LinkedIn profiles and determined the gender of a profile using the given name, which covered approximately 80% of the users. We had collected data from 2010 through to 2015, so we were able to identify changes between each year.

The following languages were analyzed:

  • Python
  • Ruby
  • Java
  • C#
  • C++
  • JavaScript
  • PHP

Results

Male and female percentages in the IT Industry

gender_it

Male and female percentages outside the IT industry

gender_nonit

Male and female percentages by language

gender_language

Female percentage by year

gender_time

Ruby by a large margin appears to have the highest percentage of women, and C++ the lowest.

Gender imbalance seems to be less prominent outside the IT industry, but the percentage of women across languages seems to be increasing over time.

Methodology

As mentioned earlier, we used UK public LinkedIn profiles, determining the gender of a profile with its given name.

The programming language associated with the user was determined by inspecting the descriptions of the person’s prior experience. Two methodologies were used for this.

The first methodology, which we’ll refer to as ‘Method 1′, associated a user with a programming language if the language name appeared in the description.

As this can lead to a user being assigned more than one language, we also used an alternate methodology, ‘Method 2′, that assigned a language if that language was the only one which appeared in the description.

The results presented above were the average of these two methodolgoies

We considered using people’s list of skills, but decided against it as it would’ve prevented us from retrieving results by year.

We excluded languages such as BASIC due to ambiguity, and analyzed jobs from 2010 to present. We also excluded languages where there weren’t enough jobs to keep our confidence interval below 1% at the 95th percentile.

If you would like additional information about our methodology or have any suggestions for a study please contact sales@scrapinghub.com.

Traveling Tips for Remote Workers

Being free to work from wherever you feel like, no boundaries holding you to a specific place or country. This is one of the greatest advantages of working remotely, and it’s leading many people to travel around the globe while completing their work. Today Claudio Salazar, a Scrapinghubber from Chile, is here to share his experiences and tips for these who seek working on the road.

Claudio’s traveling adventures started in September 2013 and so far he has visited 8 countries and more than 20 cities. He was never the kind of guy that loved traveling, but motivated by the need to improve his English skills, he decided to buy his first flight ticket and get started.

claudio-travelWhen asked about the benefits from starting a journey like his, he points the escape from the routine as one of the most positive aspects, impacting both your morale and your open-mindedness. “I think that staying in one place for a long time makes you live in a routine, but staying in constant change refill your energies since every day you wake up with the discovery of new things in mind. This improves your motivation to work and keep you in a good mood.”

But like many trips, things are not always so easy and comfortable. Claudio has faced some drawbacks since he left Chile, such as dealing with different time zones and adapting to new places and cultures. As well as planning a nice trip, you’ll also need to be flexible, adapting your working hours and sometimes having to attend to virtual meetings during late or unusual hours. “An important thing when you try this lifestyle is the flexibility you’ll need to have. Scrapinghub gives me the freedom to manage my working hours and actually work at any time without restriction.”

After traveling around the world for 16 months, Claudio is currently living the good life with his girlfriend in Paris, France. If you’d like to start a journey like Claudio’s, here’s some good advice:

Plans & Visas

First, you need to research about the countries that you want to visit – check if they are safe, its legislation, visa requirements, where do you plan to live and so on. Also, always keep in mind the following country you plan to visit, especially if you move as a tourist, because when you enter a country they will probably ask for your outbound ticket. Figure out the details before arriving and avoid unnecessary stress.

Health Insurance

Make sure you have health insurance. You never know when you might get sick, and medical services are expensive in any country, so you better be prepared. You can find online many companies offering health insurances and many sites that offer comparisons between them.

If you want to visit multiple countries, a good thing is to look for a continental insurance and check if it fits your needs. Most of the insurances must be contracted from your country, so figure it out beforehand. Usually you can contract an insurance for 3 months and renew it, but if you miss the deadline while traveling you can’t re-contract it. There are also a few more expensive options that allow you to contract the insurance independently from your departure or current location.

When you fell sick, you’ll have to call the phone number your insurance company provided and they’ll make an appointment in the nearest hospital from your residency. In case you need medicines you might have to buy it yourself in a pharmacy, depending on your health insurance, and then ask for refunds.

Accommodation

Make sure you rent an apartment or room before arriving to the country, because they could ask you where are you going to live while you’re staying. Try to get a flat with a nice desk, a comfortable chair and internet connection so you can properly do your work. Sites like Airbnb are useful for finding shorter term lets; more expensive than a 6-12 month lease but cheaper than a hotel.

Credit Cards & Bank Statements

Print your bank statements before traveling because you’ll probably be asked for them (in a typical “show me the money” case by arriving). Keep in mind to always travel with two credit cards, and have debit cards for emergency cases.

Getting Along

As a foreigner you will learn new things daily as you meet people. A good tip is to check beforehand for popular forums and communities online, or even the well-known Facebook, Couchsurfing. Another good option is finding a meetup site to meet people and have fun. Being a foreigner, you’re likely to receive some kind of special treatment from the locals.

Getting Things Done

Since you’ll be working while traveling, one of your challenges will be to keep fulfilling your responsibilities at work while on the road. Aside from the trip preparations, you’ll have to manage any unexpected travel issue and still get things done remotely.

Make sure to always bring your gadgets (smartphone, tablet) and your notebook with you – you never know when one of them may break or malfunction, so a backup gadget can save the communication with your team and buy you time until you address the issues. A good advice is to buy a prepaid cellphone chip with 3G or 4G internet when you arrive to a country so you have a backup internet connection if needed.

Also, before renting a flat or choosing a new place to visit, it is important to check for close cafes with internet connection, coworking spaces and wifi zones – so you have more options to keep on working in case your internet lets you down. A good thing is to narrow your choices for places with these resources available nearby.

In addition to Claudio, many other Scrapinghubbers have been traveling while working remotely, and you can see a few of their adventures and routes in the following map (feel free to navigate through the left menu for more):

Do you have an interesting story or tip to share about traveling while working remotely? We’d be glad to hear it! Feel free to share in the comments below. Safe travels!

A Career in Remote Working

This year I have reached a major milestone in my life, which is getting my bachelor’s degree in mathematics. When I made the decision to go back to college, it was solely because my experience working in your company, I figured out that having a math background would be a great foundation for getting into ML-related stuff.

Now my life journey have a new beginning and I wouldn’t be here if it weren’t for the opportunity you gave me. I will be always grateful with you.

— Rolando, our first hire

From the beginning, Scrapinghub has been a fully remote company, and now boasts over 100 employees working from all over the world, either from their homes or local coworking spaces. Our decision to maintain a fully remote workforce has proven to be a very good decision, allowing us to access a much wider range of talent compared to hiring locally.

So what about the career of someone who is considering remote work? It can seem risky to leave your cushy on-site job behind in favour of working for a company located across the globe, but the risk is smaller than you would imagine.

Rolando is a great example of someone who has had a lot of success with working remotely for the past 7 years. Born in Argentina, and raised in Bolivia, Rolando Espinoza at 30 years old has been a very important part of Scrapinghub since the very beginning. Rolando began his journey at Scrapinghub at our founder Pablo’s first company, Insophia, where he started as a Python developer. By June 2010 he was already working on the first version of Scrapinghub’s dashboard.

The only time Rolando has worked on-site since, is as a software developer in Uruguay alongside founders Pablo and Shane at the former headquarters of Insophia, where Scrapinghub’s Uruguay office now resides.

Working at Scrapinghub allowed Rolando to pursue a degree in mathematics, as he was able to fit his work schedule around his studies. At Scrapinghub, we allocate teams who share similar time zones and give each member control over their own schedule. You can work mornings, nights, or during whichever time you feel like. Day-to-day rearrangements can always be made and agreed upon within the team, and this flexibility has shown to be a win-win for all.

Rolando was very happy with his decision to complete a degree in mathematics. In his own words: “being a remote employee, rather than an independent freelancer, gives the opportunity to work in really interesting and challenging large-scale projects along with very smart people.”

Rolando worked on a number of machine learning projects here at Scrapinghub, allowing him to make use of the mathematical knowledge he was gaining at university from the outset. He is now looking forward to joining a CS/Math graduate program in the near future.

Rolando comments “regarding to machine learning, now I have a better grasp and can understand most of the notation and mathematical terminology in the algorithms and related papers.”

From web development to large scale web data mining projects, from open source projects to large professional services projects, such as the Memex project, he thinks “this is something very attractive, especially for those who live in cities or countries with a small and narrow software industry.”

Rolando believes that working with smart, highly skilled colleagues from such a variety of countries, has been an excellent opportunity to learn from them and push himself further. He’s proud of being able to keep up with the expectations of a company that strives to provide world-class services from top talent from all over the world.

We’re all incredibly grateful here at Scrapinghub for all of Rolando’s excellent work and contributions to the company, which he has brought from the very beginning.

If you’re inspired by Rolando’s story, excited about the prospect of working remotely and looking to join a team of smart, motivated people, check out our open positions.

Frontera: The Brain Behind the Crawls

At Scrapinghub we’re always building and running large crawls–last year we had 11 billion requests made on Scrapy Cloud alone. Crawling millions of pages from the internet requires more sophistication than getting a few contacts of a list, as we need to make sure that we get reliable data, up to date lists of item pages and are able to optimise our crawl as much as possible.

From these complex projects emerge technologies that can be used across all of our spiders, and we’re very pleased to release Frontera, a flexible frontier for web crawlers.

Frontera, formerly Crawl Frontier, is an open source framework we developed to facilitate building a crawl frontier, helping manage our crawling logic and sharing it between spiders in our Scrapy projects.

What is a crawl frontier?

A crawl frontier is the system in charge of the logic and policies to follow when crawling websites, and plays a key role in more sophisticated crawling systems. It allows us to set rules about what pages should be crawled next, visiting priorities and ordering, how often pages are revisited, and any behaviour we may want to build into the crawl.

While Frontera was originally designed for use with Scrapy, it’s completely agnostic and can be used with any other crawling framework or standalone project.

In this post we’re going to demonstrate how Frontera can improve the way you crawl using Scrapy. We’ll show you how you can use Scrapy to scrape articles from Hacker News while using Frontera to ensure the same articles aren’t visited again in subsequent crawls.

The frontier needs to be initialised with a set of starting URLs (seeds), and then the crawler will ask the frontier which pages should visit. As the crawler visits pages it will inform back to the frontier of each page’s response and extracted URLs.

The frontier will decide how to use this information according to the defined logic. This process continues until an end condition is reached. Some crawlers may never stop, we refer to these as continuous crawls.

Creating a Spider for HackerNews

Hopefully you’re now familiar with what Frontera does. If not, have take a look at this textbook’s section for more theory on how a crawl frontier works.

You can checkout the project we’ll be developing in this example from GitHub.

Let’s start by creating a new project and spider:

scrapy startproject hn_scraper
cd hn_scraper
scrapy genspider HackerNews news.ycombinator.com

You should have a directory structure similar to the following:

hn_scraper
hn_scraper/hn_scraper
hn_scraper/hn_scraper/__init__.py
hn_scraper/hn_scraper/__init__.pyc
hn_scraper/hn_scraper/items.py
hn_scraper/hn_scraper/pipelines.py
hn_scraper/hn_scraper/settings.py
hn_scraper/hn_scraper/settings.pyc
hn_scraper/hn_scraper/spiders
hn_scraper/hn_scraper/spiders/__init__.py
hn_scraper/hn_scraper/spiders/__init__.pyc
hn_scraper/hn_scraper/spiders/HackerNews.py
hn_scraper/scrapy.cfg

Due to the way the spider template is set up, your start_urls in spiders/HackerNews.py will look like this:

start_urls = (
    'http://www.news.ycombinator.com/',
)

So you will want to correct it like so:

start_urls = (
    'https://news.ycombinator.com/',
)

We also need to create an item definition for the article we’re scraping:

items.py
import scrapy

class HnArticleItem(scrapy.Item):
    url = scrapy.Field()
    title = scrapy.Field()
    item_id = scrapy.Field()
    pass

Here the url field will refer to the outbound URL, the title to the article’s title, and the item_id to HN’s item ID.

We then need to define a link extractor so Scrapy will know which links to follow and extract data from.

Hacker News doesn’t make use of CSS classes for each item row, and another problem is that the article’s item URL, author and comments count are on a separate row from the article title and outbound URL. We’ll need to use XPath in this case.

First let’s gather all of the rows containing a title and outbound URL. If you inspect the DOM, you will notice these rows contain 3 cells, whereas the subtext rows contain 2 cells. So we can use something like the following:

selector = Selector(response)

rows = selector.xpath('//table[@id="hnmain"]//td[count(table) = 1]' \
                          '//table[count(tr) > 1]//tr[count(td) = 3]')

We then iterate over each row, retrieving the article URL and title, and we also need to retrieve the item URL and author from the subtext row, which we can find using the following-sibling axis. You should create a method similar to the following:

def parse_item(self, response):
    selector = Selector(response)

    rows = selector.xpath('//table[@id="hnmain"]//td[count(table) = 1]' \
                              '//table[count(tr) > 1]//tr[count(td) = 3]')
    for row in rows:
        item = HnArticleItem()

        article = row.xpath('td[@class="title" and count(a) = 1]//a')
        article_url = self.extract_one(article, './@href', '')
        article_title = self.extract_one(article, './text()', '')
        item['url'] = article_url
        item['title'] = article_title

        subtext = row.xpath(
            './following-sibling::tr[1]//td[@class="subtext" and count(a) = 3]')
        if subtext:
            item_author = self.extract_one(subtext, './/a[1]/@href', '')
            item_id = self.extract_one(subtext, './/a[2]/@href', '')
            item['author'] = item_author[8:]
            item['id'] = int(item_id[8:])

        yield item

The extract_one method is a helper function to extract the first result:

def extract_one(self, selector, xpath, default=None):
    extracted = selector.xpath(xpath).extract()
    if extracted:
        return extracted[0]
    return default

There’s currently a bug with Frontera’s SQLalchemy middleware where callbacks aren’t called, so right now we need to inherit from Spider and override the parse method and make it call our parse_item function. Here’s an example of what the spider should look like:

spiders/HackerNews.py

# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request
from scrapy.spider import Spider
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector

from hn_scraper.items import HnArticleItem


class HackernewsSpider(Spider):
    name = "HackerNews"
    allowed_domains = ["news.ycombinator.com"]
    start_urls = ('https://news.ycombinator.com/', )

    link_extractor = SgmlLinkExtractor(
        allow=('news', ),
        restrict_xpaths=('//a[text()="More"]', ))

    def extract_one(self, selector, xpath, default=None):
        extracted = selector.xpath(xpath).extract()
        if extracted:
            return extracted[0]
        return default

    def parse(self, response):
        for link in self.link_extractor.extract_links(response):
            request = Request(url=link.url)
            request.meta.update(link_text=link.text)
            yield request

        for item in self.parse_item(response):
            yield item

    def parse_item(self, response):
        selector = Selector(response)

        rows = selector.xpath('//table[@id="hnmain"]//td[count(table) = 1]' \
                              '//table[count(tr) > 1]//tr[count(td) = 3]')
        for row in rows:
            item = HnArticleItem()

            article = row.xpath('td[@class="title" and count(a) = 1]//a')
            article_url = self.extract_one(article, './@href', '')
            article_title = self.extract_one(article, './text()', '')
            item['url'] = article_url
            item['title'] = article_title

            subtext = row.xpath(
                './following-sibling::tr[1]//td[@class="subtext" and count(a) = 3]')
            if subtext:
                item_author = self.extract_one(subtext, './/a[1]/@href', '')
                item_id = self.extract_one(subtext, './/a[2]/@href', '')
                item['author'] = item_author[8:]
                item['id'] = int(item_id[8:])

            yield item

Enabling Frontera in Our Project

Now all we need to do is configure the Scrapy project to use Frontera with the SQLalchemy middleware. First install Frontera:

pip install frontera

First enable Frontera’s middlewares and scheduler by adding the following to settings.py:

SPIDER_MIDDLEWARES = {}
DOWNLOADER_MIDDLEWARES = {}
SPIDER_MIDDLEWARES.update({
    'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 999
}, )
DOWNLOADER_MIDDLEWARES.update({
    'frontera.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware':
    999
})
SCHEDULER = 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler'
FRONTERA_SETTINGS = 'hn_scraper.frontera_settings'

Next create a file named frontera_settings.py, as specified above in FRONTERA_SETTINGS, to store any settings related to the frontier:

BACKEND = 'frontera.contrib.backends.sqlalchemy.FIFO'
SQLALCHEMYBACKEND_ENGINE = 'sqlite:///hn_frontier.db'
MAX_REQUESTS = 2000
MAX_NEXT_REQUESTS = 10
DELAY_ON_EMPTY = 0.0

Here we specify hn_frontier.db as the SQLite database file, which is where Frontera will store pages it has crawled.

Running the Spider

Let’s run the spider:

scrapy crawl HackerNews -o results.csv -t csv

You can review the items being scraped in results.csv while the spider is running.

You will notice the hn_scraper.db file we specified earlier will be created. You can browse it using the sqlite3 command line tool:

sqlite> attach "hn_frontier.db" as hns;
sqlite> .tables
hns.pages
sqlite> select * from hns.pages;
https://news.ycombinator.com/|f1f3bd09de659fc955d2db1e439e3200802c4645|0|20150413231805460038|200|CRAWLED|
https://news.ycombinator.com/news?p=2|e273a7bbcf16fdcdb74191eb0e6bddf984be6487|1|20150413231809316300|200|CRAWLED|
https://news.ycombinator.com/news?p=3|f804e8cd8ff236bb0777220fb241fcbad6bf0145|2|20150413231810321708|200|CRAWLED|
https://news.ycombinator.com/news?p=4|5dfeb8168e126c5b497dfa48032760ad30189454|3|20150413231811333822|200|CRAWLED|
https://news.ycombinator.com/news?p=5|2ea8685c1863fca3075c4f5d451aa286f4af4261|4|20150413231812425024|200|CRAWLED|
https://news.ycombinator.com/news?p=6|b7ca907cc8b5d1f783325d99bc3a8d5ae7dcec58|5|20150413231813312731|200|CRAWLED|
https://news.ycombinator.com/news?p=7|81f45c4153cc8f2a291157b10bdce682563362f1|6|20150413231814324002|200|CRAWLED|
https://news.ycombinator.com/news?p=8|5fbe397d005c2f79829169f2ec7858b2a7d0097d|7|20150413231815443002|200|CRAWLED|
https://news.ycombinator.com/news?p=9|14ee3557a2920b62be3fd521893241c43864c728|8|20150413231816426616|200|CRAWLED|

As shown above, the database has one table, pages, which stores the URL, its fingerprint, timestamp and response code. This schema is specific to the SQLalchemy backend, and different backends will may use different schemas, and some don’t persist crawled pages at all.

Frontera backends aren’t limited to storing crawled pages; they’re the core component of Frontera, and hold all crawl frontier related logic you wish to make use of, so which backend you use is heavily tied to what you want to achieve with Frontera.

In many cases you will want to create your own backend. This is a lot easier than it sounds, and you can find all the information you need in the documentation.

Hopefully this tutorial has given you a good insight into Frontera and how you can use it to improve the way you manage your crawling logic. Feel free to checkout the code and docs. If you run into a problem please report it at the issue tracker.

Scrape Data Visually with Portia and Scrapy Cloud

It’s been several months since we first integrated Portia into our Scrapy Cloud platform, and last week we officially began to phase out Autoscraping in favor of Portia.

In case you aren’t familiar with Portia, it’s an open source tool we developed for visually scraping websites. Portia allows you to make templates of pages you want to scrape and uses those templates to create a spider to scrape similar pages.


Autoscraping is our predecessor to Portia, and for the time being it’s still available to users who already have Autoscraping-based projects. Any new projects as well as existing projects without Autoscraping spiders will be only be able to use Portia.

In this post we’re going to introduce Portia by creating a spider for Allrecipes. Let’s start by creating a new Portia project:

portia-create-project

Once the project has been created, you will be redirected to the main Portia screen:

portia-project-screen

To create a new spider for the project, begin by entering the website URL in Portia’s address bar and clicking the ‘New Spider’ button. Portia will create a new spider and display the page:

portia-loaded-page

You can navigate the site like you normally would until you find a page containing data you want to scrape. Sites which require JavaScript to render aren’t currently supported.

portia-page-to-annotate

Once you’ve found a page with data you’re interested in, click the ‘Annotate this page’ button at the top to create a new template.

You will notice that hovering over an element will highlight it, and clicking it will create an annotation. An annotation defines a mapping between an element’s attribute or content to a field in an item you wish to scrape.

portia-create-annotation

In the screenshot above we have clicked the title of a recipe. On the left of the annotation window you will see an ‘Attribute’ dropdown. This allows you to select which part of the element you wish to map. In this case we’re going to map the content, but when annotating elements like images you may want to select a different attribute such as the ‘src’ value.

The value which will be extracted for this particular page is shown in the middle of the annotation window under ‘Value’. On the right you can select the field to map the attribute to. Because new projects are created with a default item, there are already fields we can map to.

Let’s say we don’t want to use the default item. For the time being we will discard the annotation by clicking the red trash can icon at the top of the annotation window.

portia-delete-annotation

Move your mouse to the right to display the right-hand sidebar and expand the ‘Extracted item’ tab. You will notice the current extracted item type will be ‘default’, click the ‘Edit items’ button.

portia-edit-items

Here you can edit the default item and its fields, as well as create more items if you wish. In this example we’ll simply edit the default item:

portia-recipe-item

Click ‘Save changes’ and you will now be able to map elements to your new set of fields. Once you have annotated everything you wish to extract, click ‘Save template’ and you will be redirected to the spider’s start URL. You can now test your spider by visiting another page similar to the one you annotated:

portia-extracted-items

Once you’ve tested several pages and are satisfied your spider is working, you can now deploy your project to Dash. Click the project link in the breadcrumbs (displayed top left) to leave the spider and go to the project page.

portia-publish-changes

Click the ‘Publish changes’ button on the right-hand sidebar to publish your project, and you should receive a message box asking if you want to be redirected to the schedule page. Click ‘OK’ and you will be redirected to the jobs page in Dash where you can now schedule your spider.

portia-dash-schedule-spider

Click the ‘Schedule’ button on the top right and select your spider from the dropdown. Click ‘Schedule’ and Dash will start your spider. You will notice that items are being scraped just like any standard Scrapy spider, and you can go into the jobs item page and download the scraped items as you normally would:

portia-scraped-items

That’s all there is to it! Hopefully this demonstrates just how easy it is to create spiders using Portia without writing any code whatsoever. There are a lot of features we didn’t cover, and we recommend taking a look at the documentation in GitHub if you want to learn more. Portia is open source, so you can run your own instance if you don’t wish to use Scrapy Cloud, and we are open to pull requests!

Sign Up For Scrapy Cloud (Free)

Scrapinghub: A Remote Working Success Story

When Scrapinghub came into the world in 2010, one thing we wanted was for it to be a company which could be powered by a global workforce, each individual working remotely from anywhere in the world.
Reduced commuting time and as a consequence increased family time were the primary reasons for this. In Uruguay, Pablo was commuting long distances to do work which realistically could have be done just as easily from home, and  Shane wanted to divide his time between Ireland, London and Japan. Having a regular office space was never going to work out for these guys.

Where we are based!

The Pitfalls of Open Plan

From the employee’s point of view, as well as eliminating the daily commute and wiping out the associated costs – fuel, parking, tax and insurance if  you own a car, or bus/train fares if relying on public transport –  remote working allows you to work in an office space of your choosing. You decorate and fit out your own space to your own tastes. No more putting up with the pitfalls of open plan, with distractions and interruptions possible every minute of the day. No more complaining about air con or lack thereof. How often have you picked up a cold or flu from a sick co-worker? The spread of colds and other illnesses is a huge disadvantage to a shared working space.

Communication

Yes an open plan environment is good for collaboration but with the likes of Skype and Google Hangouts, we get all the benefits of face-to-face communication in an instant. All you need is a webcam, mic and a Google+ or Skype account. Simple! We can hold meetings, conduct interviews, brainstorm and share presentations.
For real time messaging, HipChat and Slack are the primary team communication tools used by remote companies. In Scrapinghub, we use Slack as a platform to bring all of our communication together in one place. It’s great for real-time messaging, as well as sharing and archiving documents. It encourages daily intercommunication and significantly reduces the amount of emails sent.

Savings

From an employer’s point of view, a major benefit of a fully remote company is huge cost savings on office rent. This in particular is important for small start-ups who might be tight on initial cashflow. Other benefits include having wider access to talent and the fact that remote working is a huge selling point to potential hires.

Productivity

The big downside of remote working from an employers’ perspective is obviously productivity or a worry of reduced productivity if an employee works unsupervised from home. Research by Harvard Business Review shows that productivity will actually increase when people are trusted by their company to work remotely. This is mainly due to a quieter environment.  Some employers are very slow to change from a traditional workplace to remote working because of productivity worries.
Whether productivity increases or decreases completely depends on the inner workings of the individual business, and if a culture of creativity, trust and motivation exists, then it’s the perfect working model.

Social Interaction

Scrapinghub regularly holds a virtual office tour day. So often we have meetings via Hangouts and we get little glimpses into our colleagues’ offices often on the other side of the world. A poster or book spine might catch the eye, and these virtual office tour days are a way of learning more about each other, stepping into our colleague’s space for just a minute and seeing what it’s like on their end.
Social interaction is also encouraged through the use of an off topic channel on Slack and different communities on Google+ such as Scholars, Book Club and Technology. Scrapinghubbers can discuss non-work related issues this way and many team members meet up with their colleagues from around the world when they are travelling.

Top Tips for Remote Working

  1. Routine: If your working hours are self-monitored, try to work the same hours every day. Set your alarm clock and get up at the same time each morning. Create a work schedule and set yourself a tea break and lunch break to give your day structure.
  2. Work space: Ensure you have a defined work space in your home, preferably with a door so you can close it if there are children, guests or pets that may distract you.
  3. Health:  Sitting at a computer for hours on end isn’t healthy for the body. Try to get up from your computer every 30 minutes or so and walk around. Stretch your arms above your head and take some deep breaths.  This short time will also give your eyes a break from the screen.
  4. Social:  If you find working from home lonely, then why not look into co-working? This involves people from different organisations using a shared work environment.
  5. Focus: It’s very tempting to check your personal email and social media when working from home. This is hugely distracting so use an app like Self Control to block distracting sites for a set period of time.

 

 

Bye Bye HipChat, Hello Slack!

For many years now, we have used HipChat for our team collaboration. For the most part we were satisfied, but there were a number of pain points and over time Slack started to seem like the better option for us. Last month we decided to ditch HipChat in favour of Slack. Here’s why.

User Interface

 

Slack has a much more visual interface; avatars are shown alongside messages, and the application as a whole is much more vibrant and colorful. One thing we really like about Slack is you can see a nice summary of your recent mentions. Another cool feature is the ability to star items such as messages and files, which you can later access from the Flexpane menu in the top right corner.

Slack notification preferences

Slack notification preferences

Notifications can be configured on a case-by-case basis for channels and groups, and you can even add highlight words to let you know when someone mentions a word or phrase.

Migration

 

Migrating from HipChat to Slack was a breeze. Slack allows easy importing of logs from HipChat, not to mention other chat services such as Flowdock and Campfire, as well as text files and CSV.

Slack has a great guide for importing chat logs from another service.

Teams, Groups and Channels

 

With Slack, accounts are at the team level just like HipChat, however you can use the same email address for multiple teams, unlike HipChat where each account must have its own email address. Another benefit of Slack is you can sign into multiple teams simultaneously. We found with HipChat our clients would sometimes run into trouble due to using HipChat for their own company as well, so this feature helps a lot in this regard.

Admittedly, the distinction between channels and private groups was a little confusing at first, as their functionality is very similar. Private groups are very similar to channels, however, unlike channels which are visible to everyone, private groups are visible only to the creator and those who were invited. Accounts can be restricted to a subset of channels, and single-channel guests can only access one channel.

Integration with Third-Party Services

 

Slack integrates with a large number of services, and this doesn’t include their community-built integrations.

We created a Slack bot based on Limbo (previously called Slask) using Slack’s Real Time Messaging API which allowed us to easily port our build bot to Slack.

Our #news channel

Our #news channel

Slack’s Twitter integration notifies you of mentions and retweets in real time, allowing us to quickly respond to relevant tweets and keep in tune with our audience. We also created a news channel and made use of Slack’s RSS integration to post the latest articles from various sources and keep everyone in the know.

Downsides of Slack

 

There’s no self-hosted version of Slack available, and while this wasn’t a feature we needed, this may be a deal breaker for some companies.

In our case, the only big negative we found was there’s no native app for Linux, but we found HipChat’s Linux client to have fallen behind the Mac and web clients, so it was easy to ignore this caveat in favor of a more consistent experience between platforms. We also noticed the mobile apps sync a lot better compared to HipChat when using multiple devices.

One minor grievance is the lack of @here in Slack. In HipChat, @here would notify all channel members who are available, however, in Slack the only alternative is @channel which addresses the whole room regardless of their availability.

Pricing and Feature Comparison

 

It’s worth pointing out that HipChat is a lot cheaper than Slack, and HipChat Basic’s message history is up to 25,000 messages compared to Slack Lite’s 10,000 message limit. Both HipChat Basic and Slack Lite are limited to 5GB file storage.

Slack HipChat
Pros
  • Support for multiple teams
  • Much better search
  • Larger selection of integrations
  • More customizable
  • Sleeker UI
  • Video conferencing
  • Screen sharing
  • More affordable
  • Simpler interface
Cons
  • No Windows or Linux client
  • No self-hosted version available
  • No support for multiple groups under one account
  • Poor search functionality
Platforms OS X, Web, iOS, Android (Windows Desktop + Phone in development) Windows, OS X, Linux, Web, iOS, Android
API Yes Yes
Supported integrations 73 available 65 available
Video No (coming soon) Yes (Plus only)
Screen sharing No (coming soon) Yes (Plus only)
Pricing Lite: Free
Standard: $6.67 per user / month
Plus: $12.50 per user / month
Basic: Free
Plus: $2 per user / month

 

Final Thoughts

 

On the whole we’ve been really impressed by Slack, and for such a new application (it’s initial release was in August 2013!) it’s very slick and well polished. The lack of video and screen sharing haven’t been a problem for us, as we use Google+ hangouts for meetings, and Slack includes a nice ‘/hangout’ command to start a hangout with everyone in the room. For those who need these features, you will be pleased to know that earlier this year Slack has acquired Screenhero with an aim to add voice, video and screen sharing to Slack.

Comment here or discuss on HN

Follow

Get every new post delivered to your Inbox.

Join 821 other followers