(or how to tell your Mom what you do if you work in tech)

The following is a post from Benjy Weinberger, Foursquare’s West Coast Engineering Director. benjyw_1331686656_9

If you work in the tech industry then your daily conversations are littered with tech terms. You’ll probably have at least a vague idea of what these mean, but if you’re not in a technical role it’s sometimes hard to put these concepts and buzzwords in precise context.

In this post I’ll briefly explain ten basic terms that engineers use every day. Whatever your role in the tech industry, you’ll benefit from knowing exactly what these mean.

Brevity will require me to leave many important details out. If you’d like me to elaborate further, or if there are other concepts you’d like explained, let me know! I’ll be happy to write another post in this vein in the future.

1. API

Like any other work we do, software needs to be organized. We typically organize software into distinct modules, each responsible for a different task.

These components often need to talk to each other. For example, a DISPLAY module that displays a web page can send a URL to a FETCHER module that pulls web pages from the internet and returns the contents of those pages.

An Application Programming Interface (API) is a formal specification of how one software module interacts with another. For example, the API of the FETCHER component might be something like:

content = FETCHER.fetch(URL)

Meaning, “pass a URL into my ‘fetch’ function, and I will return the content of the web page at that URL”. The DISPLAY component can use this, but so can any other module that needs to fetch web pages.

In other words, APIs offer a simple, standardized way to provide functionality, without requiring a lot of intricate coordination. 

2. Technology Stack

Every engineering team works on unique problems: making tweets flow, mining check-ins for recommendations, sharing photos.

However most engineering projects also have many problems in common: how to store and retrieve data efficiently, how to serve web pages over the network, how to handle user logins and so on.

It would be a huge waste of resources to have every engineering team solve those common problems over and over again, re-inventing the wheel each time. Instead, we rely on standard components to do this general-purpose work. The set of standard components we choose are our technology stack.

A common example of a technology stack is the LAMP stack: Linux for the operating system, Apache for the web server, MySQL for the database and PHP (or Python) for the server coding environment.

3. DNS

Computers love numbers, so servers on a network locate and identify each other using their IP addresses, which are dot-separated numbers, like 107.23.22.73. These are difficult for humans to work with, so servers are given human-readable domain names.

The Domain Name Service (DNS) is a distributed global directory that converts human-readable domain names into machine-friendly IP addresses. When you type foursquare.com into your browser’s address bar, the browser contacts a DNS server to ask it to translate that name into an IP address, and then sends the original request to that IP address.

4. Open Source

Software is written by developers in a high-level programming language like C++, Java or Python, and then converted into low-level machine instructions that computers understand. You can use software without access to the original source code from which it was created, so in conventional commercial software projects, the source code is proprietary and secret.

Open source projects, however, make their source code publicly available. This allows users to read the code in order to debug the software, to modify and improve it, and to re-use parts of the code for other purposes.

Open source software has the distinct advantage of being free, both “as in speech” and “as in beer”. It’s this second type of free-ness that is most appealing: why pay for proprietary software if there’s a free open-source alternative that’s good enough?

Prominent open source projects include: The Linux operating system, the MySQL database server, the Apache web server, the Hadoop data analysis system and more.

Open source projects encourage sharing and inspiration. Most open source projects encourage people to contribute their changes back to the project, for everyone else to enjoy.

5. Machine Learning

Computers are very good at doing mundane tasks, like sorting and searching, much faster than humans can. Whenever there’s a straightforward algorithm for arriving at a solution, the computer’s brute force can apply that algorithm with amazing speed.

However there are tasks that humans are much better than computers at, such as converting speech to text, or identifying visible objects. These are tasks for which there are no known straightforward algorithms.

Humans, of course, aren’t born with these abilities, but learn them early in life, doing poorly at first but then improving bit by bit. So one approach to getting computers to be able to perform these tasks is to devise algorithms that simulate learning.

These Machine learning algorithms infer general rules from a set of examples, in a manner superficially similar to human learning. They are useful for finding approximate solutions to those problems for which there are no known straightforward algorithms. Siri, for example, is the result of a machine learning algorithm that approximates human understanding of speech.

6. Version Control

In any engineering organization, multiple people collaborate on the same set of source code files. There is a constant danger of people stomping on each others’ work: If Alice and Bob work on the same file, unbeknownst to each other, then if Bob saves his work right after Alice does he’ll overwrite her changes.

Version control systems store source code and manage their versioning. They prohibit conflicting updates from multiple people. When Bob tries to save his work, the system will tell him that Alice has updated the file since he last read it, and force Bob to read Alice’s version and re-apply his changes on top of that. The system provides file-merging tools to make this re-applying of changes easy.

Version control systems also store all previous versions of each file. This allows developers to make progress while still being able to debug servers running older versions of the code. Developers can temporarily “roll back” to an older, more stable version of the code, if a new version turns out to be buggy.

Git is a commonly used version control system, and github is a SaaS version of git.

7. Algorithm

An algorithm is like a recipe: it’s a list of step-by-step instructions that can be unambiguously and mindlessly followed by a computer. Algorithms are implemented by writing out those instructions in a particular programming language. All software development is, at its core, implementation of algorithms.

Some algorithms produce an always-correct output, such as a sorted list of numbers. Others produce an estimated output that may only be approximately correct, such as a transcription of a voicemail message. Machine learning algorithms (see above) are examples of this class of approximating algorithms.

Some algorithms are deterministic: they produce the same result each time they are run on the same input. Others are probabilistic: they produce a different result each time they are run on the same input. An example of a probabilistic algorithm is one that simulates a coin tossing game.

8. Client / Server

When two computers interact over a network, the client initiates the interaction by sending a request to the server. The server performs some task and returns the results to the client.

This is easiest to understand by analogy:

Say, for example, your sister phones you and asks if you can give her a ride to the airport. You say you’ll pick her up at 6. In this scenario your sister is the client, making the request, and you are the server, responding to it.

Now say you were planning to go to a movie with a friend, so you put your sister on hold while you phone the friend and ask if he minds going to the later show. He doesn’t mind, so you return to the call with your sister and tell her you’ll pick her up at 6.

In the first conversation, your sister is the client and you are the server. In the second conversation, you are the client and your friend is the server. Note that in a single “roundtrip” you were acting as the server in one interaction and the client in another.

Web browsers and mobile devices are often referred to colloquially as “the clients”, because they are the first initiators of network requests, and so are always clients, never servers.

9. UNIX / Linux

An operating system is software that manages computer hardware and bridges between the machine and the programs that run on it. Two very familiar OSes are Microsoft Windows and Mac OS X.

Long before Bill Gates and Steve Jobs came along, however, there was UNIX.

UNIX is an operating system first developed at Bell Labs in the early 1970s. It’s notable because many of its innovations strongly influenced the design of later operating systems, all the way to the present day.

Linux is an open source UNIX-like operating system that has been in continuous development since 1991. Unlike Windows or OS X, Linux is free. This means that companies can install it on all their datacenter machines without it costing any money. And since companies can have hundreds, thousands or even tens of thousands of datacenter servers, this leads to huge savings, and helps explain why Linux is so phenomenally popular.

10. Distributed Systems

Large-scale services like Google and Facebook have so much data, and serve so many requests, that no single server can possibly handle it all. A distributed system is one that uses multiple computers, connected by a network, to perform a task or provide a service.

The user of such a system has no idea of the details: how many computers are involved, how they’re connected and what each one does. The system appears to the outside like one entity, even though internally it’s composed of many parts.

Distributed systems provide two advantages: they can handle far more data and far more traffic than any single machine, and they can be far more reliable: if a single machine in a distributed system fails, other machines step in to take over the broken machine’s work. A distributed system is a reliable whole built out of unreliable parts.