1. Basic

Frequently thorought this document I will refer to “Emacers” or “Doomers” or “Vanilla users”. In every such case imagine a little extra “(that responded to this survey)” caveat — sampling bias is no joke.

The respondent pool looks fairly diverse though, so that’s rather nice 😃.

This is an analysis of the publically avalible data made avalible from the 2020 Emacs User Survey.

This analysis was done with the intent of helping the Emacs community understand itself better, and which aspects of Emacs could benefit the most from development effort.

Feeling lazy? Jump to Conclusions.

1.1. Univariate breakdowns

Pairwise incidence matrix adds to [x,y] for every result containing both x and y.

1.1.1. Survey respondents

response_origin.svg
Figure 1: Breakdown of survey respondents label:respondent-origin

1.1.2. Languages

Throughout this analysis, I will use a number of pairwise incidence matrix plots. This is useful when examining variables which may hold multiple values, seeing which pairs of values do and don’t appear together.

Each square has \(1\) added to it, for each case where the variable of its row and column are both present. So, the \((x, y)\) cell contains the number of instances where both \(x\) and \(y\) were present. The diagonal \((x, x)\) simply gives the number of times \(x\) appears.

In the proportional pairwise incidence matrix, the values of each row are divided by the diagonal \((x, x)\). Now the \((x, y)\) entry gives the proportion of the time that \(y\) is present when \(x\) already is.

This visualisation is also applied to correlation matrices further on.

languages.svg
Figure 2: Pairwise incidence matrix of programming language use label:language-matrix

The proportional pairwise incidence matrix simply divides each [x,y] entry by [x,x].

languages_proportional.svg
Figure 3: Proportional pairwise incidence matrix of programming language use

It could be interesting to consider how the Emacs community’s use of languages differs from programmers in general. I considered using the Tirobe index, but the StackOverflow developer survey better mirrors the style of question used in this survey (list languages you use vs. primary language).

so2020_relative_lang_popularity.svg
Figure 4: Language popularity relative to StackOverflow 2020 survey
1.1.2.1. Observations
  • The ’usual suspects’ seem popular (Python, Bash, HTML, Javascript, etc.)
    • Haskell seems more popular than one would expect (similar to Go)
    • Given that this is an Emacs survey, I would have expected more Lisp. Perhaps this relates to the low portion of responses that rate their elisp proficiency above “simple functions”, or people are simply assuming elisp and only checking “lisp” when they use other lisps
  • Python, Bash, and Javascript can be thought of as “the big three” — they’re consistently used a lot in combination with other languages
  • C, C++, and Assembly really like each other
  • The distribution of people using Haskell with other languages is unusually flat / uniform.
  • Julia users seem exceptionally monogamous, only dallying with a little Haskell and a pinch of R
  • Python is nigh-universally popular, with the two exceptions of Haskell (but consistent with Haskellers general use of other languages), and oddly — Typescript.
  • C users seem to tend to use a large number of other languages as well.

1.1.3. Packages

top_packages.svg
Figure 5: Most popular packages label:package-pie
packages.svg
Figure 6: Proportional pairwise incidence matrix of package usage
lsp_servers.svg
Figure 7: LSP server popularity label:lsp-server-pie
1.1.3.1. Observations
  • The top 4 packages are clear: Magit, Org-mode, Projectile, and LSP-mode
  • Magit seems to have uniquely broad appeal, no other package comes close in the package proportional pairwise incidence matrix

1.1.4. Emacs use cases

usecases.svg
Figure 8: Pairwise incidence matrix of Emacs usecases label:usecase-pimat
usecase_proportional.svg
Figure 9: Proportional pairwise incidence matrix of Emacs usecases label:usecase-ppimat
1.1.4.1. Observations
  • Use cases are very mixed.
  • The vast majority of responses consisted of software development and writing
  • Software developers mix uses the most, followed by writers

1.1.5. Disabled UI elements

disabled_elements.svg
Figure 10: Pairwise incidence matrix of disabled UI elements label:disabled-ui-pimat
disabled_elements_proportional.svg
Figure 11: Proportional pairwise incidence matrix of disabled UI elements label:disabled-ui-ppimat
1.1.5.1. Observations
  • Barely anyone likes the tool bar, almost everyone likes the modeline
    • What’s with these modeline people? Disable the modeline but keep everything else. They’re crazy.
  • Tool and scroll bar dislike are most closely linked
  • Splash screen haters seem to dislike everything else (except the modeline) pretty evenly

1.2. Breakdowns by category

framework_usage.svg
Figure 12: Emacs users by Framework label:framework-pie

In the following graphics, to make it easier to compare the proportion of users who match a criteria within each framework, the user counts are normalised. To gain an intuition for the overall situation, mix the Custom column with a pinch of Doom, Vanilla, and Spacemacs.

use_case_by_kit.svg
Figure 13: OS proportion by starter kit, broken down by Emacs framework label:os-by-framework
doom_breakdown.svg
Figure 14: Categorical survey responses, broken down by Emacs framework label:big-framework-breakdown

1.2.1. Observations

Oh wow, a lot to unpack. Let’s pick some highlights.

1.2.1.1. Purpose
  • Work usage high across frameworks, and all have a decent slice of hobbyists
    • Custom almost entirely work + hobby
  • Doom less popular with tinkers, more with students and hobbyists
1.2.1.2. Use case
  • Vanilla particularly popular for writing
  • Doom is more popular for research writing, but outdone by “Other” which is highest (of the frameworks) in research writing and other.
1.2.1.3. Version
  • Doom users are the most up-to-date
  • Vanilla users use older versions the most
1.2.1.4. OS
  • Doom and Prelude have the least Windows users
  • Vanilla has the most BSD users
  • Prelude has the most Mac users
1.2.1.5. Run mode
  • Vanilla users use the daemon the least
1.2.1.6. GUI/TUI
  • TUI is massively more popular with Vanilla users
  • GUI very slightly more popular with Doom/Spacemacs than the rest
1.2.1.7. Keybindings, now + initial
  • Doom and Spacemacs
    • Starts as half Vim half Emacs
    • another third converts from Emacs to Vim bindings from “initial” to “now”
      • Vim keybindings are popular, and well-received by these users
    • CUA least popular
  • Everything else
    • around 80% Emacs, but more like 90% for Prelude
    • Not much change between “initial” and “now”
      • Custom users grab some other keybindings
1.2.1.8. Previous editor
  • Doom slightly more popular than Spacemacs for ex-Vimmers
  • Doom twice as popular than the next most (Spacemacs) for VSCode users
  • None of the others differ notably
1.2.1.9. Org usage
  • Doom users use Org the most, but not by much
    • However rate of “not using org” is the lowest by a fair bit
  • Across frameworks, around half use Org daily, and 80% use Org
1.2.1.10. Completeion
  • Doomers like Ivy, Spacers like Helm
  • Half of Vanillans don’t like completion it seems
    • but those that do, use ido as much as ivy/helm
  • Other frameworks have a pretty consistent ~15% on ido
1.2.1.11. Elisp package management
  • use-package rules, and a lot of other people like package.el
  • spacemacs does it’s own thing mostly
1.2.1.12. Elisp package source
  • Melpa dominates
  • Doom users grab packages from source much more than anyone else
    • Prelude and spacemacs seem to avoid source
1.2.1.13. Theme
  • Prelude users like zenburn a fair bit
  • Doomers like doom-one
1.2.1.14. Error checking
  • Most vanilla users don’t make mistakes 😉
  • Everybody else is fairly similar (mostly flycheck, some rather confident individuals, and a small slice of flymake)
1.2.1.15. TRAMP
  • Consistently around 50/50 usage
1.2.1.16. Terminal emulator
  • Doomers love vterm
  • Eshell is generally pretty popular (quarter of users)
1.2.1.17. Mail client
  • Quarter of people do mail in Emacs it seems
  • Mu4e dominates in Doom and Custom, semi-even split between Mu4e/Notmuch elsewhere
1.2.1.18. Elisp proficiency
  • Consistantly, half of people feel confident with simple functions, and most of the remainder with copy and paste
  • Custom users are the most confident about package writing by far

1.3. Breakdown by Emacs experience

1.3.1. Framework

experience_by_starter_framework.svg
Figure 15: Emacs absolute users by year, broken down by framework label:framework-popularity

Let’s now look at the the distribution of years of Emacs experience, by framework, normalised by the total users of each framework.

experience_by_starter_framework_normalised.svg
Figure 16: Emacs users by year, broken down by framework, as a proportion of the total users of each framework label:per-framework-user-distribution

Now normalising by total Emacs usage,

experience_by_starter_framework_emacs_usage_normalised.svg
Figure 17: Emacs framework popularity label:framework-popularity-normalised

1.3.2. Observations

  • Consistent preferences throughout the 10-30 year experience range
  • Only one dip as users become more recent, which is from ~2000–2005
    • dot com bubble?
  • The few 30+ year users are almost all on Custom + Vanilla
  • Spacemacs has a ~5 year wide peak of ~15% centred on 3 year old users
  • Prelude has a ~10 year wide to peak of ~3% users centred on 15 year old users
  • Doom’s popularity looks like a trumpet bell, almost half of new Emacs users (who are involved in the community) seem to be using Doom.

1.4. Prior Editor/IDE

experience_by_prior_editor.svg
Figure 18: Prior editor as a proportion of users for each year label:prior-editors

1.4.1. Observations

  • Initially, the majority of users were ’fresh’ to Emacs (no prior editor/IDE)
  • Vim has semi-consistently been a source for around a quarter of new users, though that’s been increacing to almost half as of late
  • Eclipse, Notepad++, and Sublime have all ’peaked’
  • The proportion of users coming from VSCode has risen rapidly, from 5% to 30% over 5 years.

2. Text mining

Here, four techniques are applied:

  • Word clouds, to for an indication of which words are most prevelent
  • Association graphs, where links are made between words that appear together a lot
  • Cluster dendogram, a hierachical tree of words
  • Response ’represetativeness’
    • We have word frequency data
    • Responses are given points equal to the number of times a word is seen in the corpus for each of the 100 most frequent words (the same words seen in the word clouds) to create a ’represetativeness score’
    • We plot the distribution of response points, and provide the top responses to examine

2.1. Org mode purpose


Org purpose word cloud, association graph, and cluster dendogram.
tm_org_purpose_response_representativeness.svg
Figure 19: Representativeness points distribution across Org purpose responses

2.1.1. Sentiments

  • Org is used for all kinds of writing, primaraly note taking
  • A lot of people use it with task management, todo list management, … helping themselves get organised (see the L3 chunk of the dendogram)
  • People who reference writing with Org tend to mention
    • research
    • literate programing
  • The use of the task management facilities of Org is split between personal and work settings (see association graph)

2.2. Emacs improvements


Emacs improvements word cloud, association graph, and cluster dendogram.
tm_emacs_improvements_response_representativeness.svg
Figure 20: Representativeness points distribution across Emacs improvements responses

2.2.1. Sentiments

  • Performance, speed improvements are popular
    • Seems to be some hope that gccemacs and multithreading may be good for this
  • Talk about:
    • A more modern GUI
    • Better defaults
    • LSP support
  • New users may struggle in getting Emacs to “just work” (emacs–new–easier–make–work)

2.3. Emacs strengths


Emacs strengths word cloud, association graph, and cluster dendogram.
tm_emacs_strengths_response_representativeness.svg
Figure 21: Representativeness points distribution across Emacs strengths responses

2.3.1. Sentiments

  • Extensibility, Extensibility, Extensibility
    • Oh, and flexibility, configuration, customisation, …
  • Emacs lisp can do anything I want
  • It’s free software
  • Great community, who have created a good package ecosystem
    • Magit and Org being standout examples, which “just work”
  • Use one editor for everything text
    • good programming language support

2.4. Emacs learning difficulties


Emacs learning difficulties word cloud, association graph, and cluster dendogram.
tm_emacs_learning_difficulties_response_representativeness.svg
Figure 22: Representativeness points distribution across Emacs learning difficulties responses

2.4.1. Sentiments

  • Keybindings are the main stumbling block
  • Elisp is hard to get into, looks really strange at first
    • lots of people didn’t understand it, some still don’t
  • It takes a lot of time to get comfortable with the ’basics’
  • Not enough help getting started. Interested in a good tutorial.

2.5. Emacs, one thing to do differently


Emacs, do one thing differently word cloud, association graph, and cluster dendogram.
tm_emacs_do_one_thing_differently_response_representativeness.svg
Figure 23: Representativeness points distribution across Emacs, do one thing differently responses

2.5.1. Sentiments

  • Better defaults / language support
  • Modern defaults
  • Need to “just work” better

3. Multivariate analysis

To perform multivariate analysis, I’ll examine the subset of questions and responses that I feel can be (sensibly) placed on a numeric scale.

R
#
os_score <- # unix-ness
  match_scorer(os_matcher,
               c("BSD"=0,
                 "Linux"=1,
                 "MacOS"=2,
                 "WSL"=3,
                 "Windows"=3,
                 "Other"=NA))

usecase_score <- # how much coding
  match_scorer(usecase_matcher,
               c("Software Development"=0,
                 "Data Science"=1,
                 "Research Writing"=2,
                 "Writing"=3,
                 "Other"=NA))

version_score <-
  match_scorer(version_matcher,
               c("25"=25,
                 "26"=26,
                 "27"=27,
                 "28"=28,
                 "gcc"=28,
                 "Other"=NA))

keybindings_score <- # how far from defaults
  match_scorer(keybindings_matcher,
               c("CUA"=0,
                 "Emacs"=1,
                 "Vim"=2,
                 "Other"=3))

usage_score <- # how frequent
  match_scorer(usage_matcher,
               c("daily"=4,
                 "weekly"=3,
                 "monthly"=2,
                 "time to time"=1,
                 "don't use"=0,
                 "no"=0,
                 "Other"=NA))

package_repo_score <- # how walled-garden
  match_scorer(package_repo_matcher,
               c("elpa"=0,
                 "melpa"=1,
                 "source"=2,
                 "Other"=NA))

elisp_skill_score <- # how proficient
  match_scorer(elisp_skill_matcher,
               c("packages"=3,
                 "simple functions"=2,
                 "copy paste"=1,
                 "none"=0,
                 "no"=0,
                 "Other"=NA))

contribution_score <- # how much contributing
  match_scorer(contribution_matcher,
               c("maintainer"=3,
                 "regularly"=2,
                 "time to time"=1,
                 "no"=0,
                 "Other"=NA))

3.1. Pairwise correlation

How’s a pairwise correlation matrix look?

pairwise_correlation.svg
Figure 24: Pairwise correlation matrix of numeric form survey data
pairwise_scatter_plots.png
Figure 25: Pairwise value and correlation, and univariate distributions

3.1.1. Observations

These variables exhibit a high degree of independance, with few exceptions.

It is interesting that MELPA contribution is more strongly correlated with elisp proficiency than contribution to the Emacs core.

3.2. PCA

pca_scree_plot.svg
Figure 26: Scree plot label:scree-plot

This decline in contribution to total variance is rather slow. Let’s look at the first few PCs.

pc_loading_matrix.svg
Figure 27: Variable loadings of each principal component label:pc-loadings

So far, this direction of analysis does not look very promising.

This has been good for establishing the independence between these factors, and it is interesting to see the scree plot and loadings.

4. Conclusions

This survey was, in many respects very successful. It had 7344 respondents, from a mix of sources.

Unsurprisingly, the respondents seem to be heavily biased towards more community-involved users. For instance, using the number of self-reported MELPA maintainers (394), the total respondents and total number of MELPA packages (\(\sim\,\)4,800) suggest a mere 90,000 Emacs users globally. The last StackOverflow survey that polled Development Environments indicated StackOverflow sees around 2 million Emacs users monthly.

4.1. The Current State of Affairs

The single most apparent result of this survey is the diversity. There is no good ’average’ respondent. Emacs is used primarily for programming, however only 27% of respondents only listed Software Development as their use of Emacs. It’s a similar story when it comes to languages, where there are half as many people using Emacs for Haskell as C++. It is impossible to make an accurate generalisation about the nature of Emacs’ use.

However, it is posible to make generalisations about what Emacs users like. In a word: “Extensibility” (and to the surprise of no one). Related terms like “Versitility”, “Flexibility”, “Customisation”, etc. come up frequntly in the responses. I doubt the apparent diversity of use cases, and the headline strength of Emacs being “Extensibility” are a coincidence.

The respondents are predominantly on Linux (65%), with most of the rest on MacOS (25%), then a sliver on Windows (10%) / BSD (2%). This is a huge Compared to the 2020 StackOverflow Survey, BSD is 20x more prevalent, Linux 2.5x, MacOS 1x, and Windows 0.15x.

4.3. Pain points (new users)

With this section it’s worth keeping in mind there is likely a strong survivor bias at play — only those that perservered through any difficulties they faced woud still be using Emacs and answering this survey.

Three topics consistantly appeared as off-putting factors

  • Keybindings
    • Four decades ago the keyboard / CUA landscape was very different
    • 12% of all respondants mentioned keybindings when discussing learning difficulties
  • Lack of a good tutorial
    • Without anything, Emacs can be overwhelming
    • Whe completely new to Emacs, the manual can also be overwhelming
  • Elisp
    • Hard to work out where to start (see: Tutorial)
    • The non-elisp way of customising Emacs is not as obvious and smooth (to use) as it should be

4.4. Desired improvements

Bearing in mind the apparent bias towards Emacs-developers discussed earlier, the three most-mentioned topic seem to be:

  • Improved performance
  • Improved threading / async / coroutines
  • Better defaults, OOTB language functionality
    • Oh, and inclusion some generally useful tools like company/magit

4.5. Final comments

All in all, I think this paints a rather positive picture for the state of Emacs and its community. Interest in Emacs seems on the rise, likely helped by the popularisation of Emacs starter kits / frameworks — which are exploring ways to make Emacs more accessible to certain segments of the population (ex-Vimmers for instance).

Some of the lesser pain-points, and a few major desired improvements are actively being addressed as I write this (thanks to gccemacs and pgtk), and LSP is unlocking a fantastic amount of work on language-specific functionality. I am optimistic that with time other prominent concerns/desires will also be addressed, and with luck future surveys will be able to interrogate the community about their involvement with Emacs development.

To everyone that participated in the survey, thank you! It is my hope that these results, and (with luck) those of future surveys will help us better understand the Emacs community, and inform development.

Date: 2020-12-10

Author: TEC

Created: 2023-06-07 Wed 12:46