As repeatedly promised by Twitter CEO Elon Musk, Twitter has opened a portion of its source code to public inspection, including the algorithm it uses to recommend tweets in users’ timelines.
On GitHub, Twitter published two repositories containing code for many parts that make the social network tick, including the mechanism Twitter uses to control the tweets users see on the For You timeline. In a blog post, Twitter characterized the move as a “first step to be[ing] more transparent” while at the same time “[preventing] risk” to Twitter itself and people on the platform.
On a Twitter Spaces session today, Musk clarified:
“Our initial release of the so-called algorithm is going to be quite embarrassing, and people are going to find a lot of mistakes, but we’re going to fix them very quickly,” Musk said. “Even if you don’t agree with something, at least you’ll know why it’s there, and that you’re not being secretly manipulated … The analog, here, that we’re aspiring to is the great example of Linux as an open source operating system … One can, in theory, discover many exploits for Linux. In reality, what happens is the community identifies and fixes those exploits.”
On that second point in the blog post about preventing risk, the open source releases don’t include the code that powers Twitter’s ad recommendations or the data used to train Twitter’s recommendation algorithm. Moreover, they include few instructions on how to inspect or actually use the code — reinforcing the idea that the releases are strictly developer-focused.
“[We excluded] any code that would compromise user safety and privacy or the ability to protect our platform from bad actors, including undermining our efforts at combating child sexual exploitation and manipulation,” Twitter wrote. It’s a bit of mixed messaging coming only weeks after Twitter fired much of its ethical AI and trust and safety staff, which was responsible for content moderation among other user security-related tasks. But the company nonetheless insists that it “[took] steps to ensure that user safety and privacy would be protected” with today’s code release.
Twitter says it’s working on tools to manage code suggestions from the community and sync changes to its internal repository. Presumably, those will be made available at a future date — there’s no sign of them at the present.
“We’re going to look for suggestions, not just on bugs but also on how the algorithm should work,” Musk said on the Spaces session. “It’s going to be an evolving process. I wouldn’t expect it to be a nonstop upward movement… but we’re very open to what would improve the user experience.”
At first glance, the algorithm is fairly complex — but not necessarily surprising in any way from a technical standpoint. It’s made up of multiple models, including a model for detecting “not safe for work” or abusive content, determining the likelihood of a Twitter user interacting with another user and calculating a Twitter user’s “reputation.” (It’s unclear what “reputation” refers to, exactly; the high-level documentation isn’t clear on that.) Several neural networks are responsible for ranking the tweets and recommending accounts to follow, while a filtering component hides tweets to — forgive the jargon — “support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments and coarse-grained downranking.”
In an engineering blog post, Twitter reveals more about the recommendation pipeline, which it claims runs approximately five billion times per day:
“We attempt to extract the best 1,500 tweets from a pool of hundreds of millions … Today, the For You timeline consists of 50% [tweets from people you don’t follow] and 50% [tweets from people you follow] on average, though this may vary from user to user,” Twitter wrote. “Ranking [tweets] is achieved with a ~48-million-parameter neural network that is continuously trained on tweet interactions to optimize for positive engagement (e.g. likes, retweets and replies).”
Twitter users don’t see the full 1,500 tweets, of course. They’re filtered according to content restrictions and other criteria and factors considered by the models, like if tweets have “negative feedback” and if they’re mainly from the same Twitter user, or from users who’ve been blocked or muted.
Gizmodo notes that one thing that doesn’t appear to have been made public is the list of VIPs that Twitter pushes to users. This week, Platformer reported that Twitter has a rotating list of noteworthy users, including YouTuber Mr. Beast and Daily Wire founder Ben Shapiro, that it uses to monitor changes to the recommendation algorithm by increasing the visibility of these “power users” seemingly at will.
There’s more evidence that the algorithm may treat tweets differently depending on the source. Researcher Jane Manchun Wong noted that Twitter’s algorithm specifically labels whether the tweet author is Elon Musk and has others labels indicating whether the author is a “power user” as well as whether they’re a Republican or Democrat.
During the Spaces session this afternoon, a Twitter engineer said that the labels were used only for metrics. But Musk — who said he wasn’t aware of the labels prior to today — said that they shouldn’t be there.
“It definitely shouldn’t be dividing people into Republicans and Democrats, that makes no sense,” Musk said.
The release of the source code comes after several controversies involving tweaks to Twitter’s recommendation algorithm in recent months. According to Platformer, in February, Musk called on Twitter’s engineers to reconfigure the algorithm so his tweets would be more widely viewed. (Twitter later walked back this change — at least somewhat.) In November, Twitter began showing users more tweets from people they don’t follow — a move the platform attempted prior to Musk’s acquisition but later reversed after a backlash from users.