Appen

Appen · 2026-04-29T18:53:10.094Z

Barcelona next week 🇪🇸 Sergio Bruccoleri, our VP of Delivery, will be at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2026, connecting with researchers working on speech, audio, and multimodal systems and hearing what’s actually evolving in real-world deployments. If you’re attending, it’d be great to see you there.

IT Services and IT Consulting

Kirkland, Washington 1,067,404 followers

Appen is your trusted data partner, powering cutting-edge AI applications for the world's most innovative companies.

See jobs Follow

Discover all 20,554 employees

About us

Appen has been a leader in AI training data for over 25 years, providing high-quality, diverse datasets that power the world's leading AI models. Our end-to-end platform, deep expertise, and scalable human-in-the-loop services enable AI innovators to build and optimize cutting-edge models. We specialize in creating bespoke, human-generated data to train, fine-tune, and evaluate AI models across multiple domains, including generative AI, large language models (LLMs), computer vision, speech recognition, and more. Our solutions support critical AI functions such as supervised fine-tuning, reinforcement learning with human feedback (RLHF), model evaluation, and bias mitigation. Our advanced AI-assisted data annotation platform, combined with a global crowd of more than 1M contributors in over 200 countries, ensures the delivery of accurate and diverse datasets. Our commitment to quality, scalability, and ethical AI practices makes Appen a trusted partner for enterprises aiming to develop and deploy effective AI solutions. At Appen, we foster a culture of innovation, collaboration, and excellence. We value curiosity, accountability, and a commitment to delivering the highest-quality AI solutions. We support work-life balance with flexible work arrangements and a dynamic, results-driven environment. Employees have access to competitive pay, comprehensive benefits, and opportunities for continuous learning and career growth. Our team works closely with the world’s top technology companies and enterprises, tackling exciting challenges and shaping the future of artificial intelligence.

Website: http://appen.com
External link for Appen
Industry: IT Services and IT Consulting
Company size: 501-1,000 employees
Headquarters: Kirkland, Washington
Type: Public Company
Founded: 1996
Specialties: Search, Annotation, Evaluation, Personalization, Transcription, Spam Detection, Translation and Localization, Data Collection, training data, artificial intelligence , machine learning, data preparation, model evaluation, datasets, computer vision, natural language processing, LLM, and generative ai

Locations

Primary

12131 113th Ave NE

Suite 100

Kirkland, Washington 98034, US

Get directions
Sydney, AU

Get directions
9 Help Street

Level 6

Chatswood, 2067, AU

Get directions
Get directions
Beijing, CN

Get directions
Cavite, PH

Get directions
Exeter, GB

Get directions

Employees at Appen

See all employees

Updates

Appen

1,067,404 followers
1d
Report this post
Great discussion at SlatorCon London 2026 around the evolving role of human expertise in AI development. One theme that came up repeatedly: as models become more capable, the bottleneck is no longer just data volume. It’s evaluation quality, domain expertise, and the ability to generate meaningful feedback signals for increasingly complex systems. A few areas Sergio Bruccoleri, VP, Delivery, Appen, touched on during the panel: • Evaluation is shifting beyond static benchmarks toward more dynamic, environment-based assessment • Domains like coding, STEM, legal, and finance require deeper subject matter expertise to properly evaluate reasoning quality and edge cases • As agentic systems evolve, high-quality human feedback becomes increasingly important for alignment, reliability, and model behavior in production settings Interesting conversations across the broader language AI ecosystem on where training, evaluation, and human-in-the-loop systems are headed next.
1 Comment

Like Comment Share
Appen

1,067,404 followers
1w
Report this post
Appen recently completed an independent third-party evaluation of Subquadratic's SSA (Sparse Self-Attention) kernel. The core architectural claim: replacing O(n²) full self-attention with a learned sparse formulation that routes computation only to the most relevant key-value blocks, enabling near-linear scaling as context length increases. What we measured: Efficiency (NVIDIA B200, bfloat16, PyTorch 2.11.0) - 56.2× wall clock speedup vs. FlashAttention-2 at 1M tokens - 62.8× FLOP reduction vs. dense attention at 1M tokens - FLOP counts independently validated via torch.profiler (within 0.7–3.9% of theoretical) Long-context retrieval - RULER at 128K tokens - 95.6% average score across all evaluated tasks (LLM-judged via Claude Opus 4.6) - Perfect retrieval on all single-needle tasks Ultra-long context - MRCR at 512K–1M token context lengths - 86.2% average score on the hardest 8-needle retrieval bucket Coding - SWE-Bench Verified - 81.8% resolved rate with extended thinking enabled Evaluation was conducted independently with access scoped to API endpoints only. No model weights, training data, fine-tuning configurations, or ground-truth labels were provided in advance. The efficiency scaling results are particularly notable. Full report: https://lnkd.in/e-Q4FrEt #LongContextLLM #AttentionMechanism #AIBenchmarking #SparseAttention #NLP #MachineLearning
4 Comments

Like Comment Share
Appen

1,067,404 followers
1w
Report this post
SOTA ASR models report near-human accuracy on public test sets. So why do they still fail users in the real world? The answer isn't the models, it's the benchmarks. Widely used benchmarks like LibriSpeech are built on clean, scripted, accent-narrow speech. Real production speech is spontaneous, noisy, accented, and multi-speaker. That gap doesn't show up on a leaderboard. It shows up when your product ships. There's also a compounding problem: benchmaxxing. Models optimised to climb public leaderboards don't generalise; they're tuned to the test, not the task. Our new whitepaper lays out how to fix this. We cover: → Why current benchmarks systematically overstate real-world ASR performance → The evidence: WERs that jump from ~12% on read speech to 42% on casual conversation → Our 5-stage methodology for building production-representative speech benchmarks (scoping → contributor sourcing → speech design → recording → transcription) → How private, held-out benchmark sets resist benchmaxxing → Our partnership with Hugging Face to make the Open ASR Leaderboard a more trustworthy signal If you're building or evaluating ASR systems, this is the benchmarking gap your eval stack may not be surfacing. Read the full whitepaper [link in comments] #SpeechRecognition #ASR #AIBenchmarking #NLP #SpeechAI #MachineLearning #DataQuality
3 Comments

Like Comment Share
Appen reposted this
Xuedong D. Huang
2w
Report this post
Good to see Zoom Scribe API is competitive for real world speech recognition benchmarks.
Steven Zheng

Machine Learning Engineer @Hugging Face 🤗 | MVA @ENS Paris-Saclay
2w

Big announcement for speech AI Benchmarks get gamed. So we added a repellent. The Open ASR Leaderboard now includes private evaluation data from Appen and DataoceanAI, making speech recognition benchmarks more robust against test-set contamination and “benchmaxxing.” Better signal. Less overfitting. More real-world ASR. Read more 👇 https://lnkd.in/dwTZheD2
4 Comments

Like Comment Share
Appen reposted this
Sergio Bruccoleri
2w
Report this post
Data makes all the difference. Especially when it reflects real production conditions. One of the biggest challenges in AI today is what we call “benchmaxxing” — training and testing models on the same public datasets, only to see performance drop once models hit the real world. That’s exactly why Appen partnered with Hugging Face on the #OpenASR Dashboard initiative. As AI solutions become more productized and customer-facing, production readiness matters more than leaderboard scores. Models need to perform under shifting datasets, noisy environments, and real operational constraints — not just in controlled benchmarks. In this article, we share: • How Appen contributes to the OpenASR Dashboard • Our methodology for building evaluation datasets • Why production-grade data is critical for trustworthy AI benchmarking • How we help ensure models are ranked based on true real-world performance Because the future of AI evaluation isn’t just about benchmark accuracy. It’s about reliability in production. Link to the article in the comments!

1 Comment

Like Comment Share
Appen reposted this
Eric Bezzam
2w
Report this post
Things are shaking up on the Open ASR Leaderboard 🪇 We added 11 dataset splits to the leaderboard, so what changed? Well if you’re just looking at the average WER: 𝗻𝗼𝘁𝗵𝗶𝗻𝗴. We’ve kept it as the average over standard public benchmarks (AMI, Earnings22, Librispeech, etc). Public ASR benchmarks are incredibly valuable, but they also come with known limitations: • transcription inconsistencies and errors • "benchmaxxing", namely optimizing for leaderboard performance rather than real-world robustness Some of this can even happen unintentionally, e.g. if pre-trained LLMs used benchmark transcripts or metadata in their training corpora. Reliable ASR evaluation is hard. The truth is there’s no single dataset or metric that fully captures real-world performance, and there is no one model to rule them all. To improve robustness and better capture these nuances in the Open ASR Leaderboard, we’ve worked with Appen and Dataocean AI to add 11 high-quality, private English datasets spanning: • scripted + conversational speech • multiple accents (American, Australian, British, Canadian, Indian) We added these in a new "Private data" tab on the Open ASR Leaderboard. From the main "Leaderboard" tab, the average remains across the public datasets, and you can toggle on the private sets (or toggle off public sets) to see how it affects the average WER. 𝗧𝗵𝗶𝘀 𝗵𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀 𝗵𝗼𝘄 𝗱𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝘁 𝗔𝗦𝗥 𝗿𝗮𝗻𝗸𝗶𝗻𝗴𝘀 𝗮𝗿𝗲 𝗼𝗻 𝘁𝗵𝗲 𝗰𝗵𝗼𝗶𝗰𝗲 𝗼𝗳 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗱𝗮𝘁𝗮. Trustworthy transcription matters. ASR is often the first component in conversational systems, and transcription failures propagate downstream into LLMs and user experience, which is why evaluation choices matter so much. 𝗧𝗵𝗲𝘀𝗲 𝗻𝗲𝘄 𝗱𝗮𝘁𝗮𝘀𝗲𝘁𝘀 𝗮𝗿𝗲 𝗼𝗻𝗹𝘆 𝗮 𝘀𝘁𝗲𝗽, 𝗻𝗼𝘁 𝗮 𝗳𝗶𝗻𝗮𝗹 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻. Relevant and trustworthy evaluation cannot stay static. The leaderboard remains community-driven, and we’d love your feedback, suggestions for improving evaluation, and additional datasets (public or private). 📝 Blog on the new private sets: https://lnkd.in/eJCeTcsp 🧑💻 GitHub for contributions and suggestions: https://lnkd.in/enNrqUke 💡 Also highly recommend Dylan Fox’s piece on the limitations of widely-used datasets: https://lnkd.in/eWx3uUc2 Let’s build more reliable ASR evaluation 🤗

4 Comments

Like Comment Share
Appen

1,067,404 followers
2w
Report this post
Seven private English ASR datasets by Appen are now powering a new evaluation track on the Hugging Face Open ASR Leaderboard. Better signal, harder to game. 📖 https://lnkd.in/eFibXcES https://lnkd.in/eXr54bcd
Steven Zheng
2w

Big announcement for speech AI Benchmarks get gamed. So we added a repellent. The Open ASR Leaderboard now includes private evaluation data from Appen and DataoceanAI, making speech recognition benchmarks more robust against test-set contamination and “benchmaxxing.” Better signal. Less overfitting. More real-world ASR. Read more 👇 https://lnkd.in/dwTZheD2
1 Comment

Like Comment Share
Appen

1,067,404 followers
2w Edited
Report this post
Appen partnered with Hugging Face to bring private, benchmarking-resistant ASR evaluation datasets to the Open ASR Leaderboard. The leaderboard has been visited over 710,000 times since 2023. But public benchmarks have a problem: models can be optimized to climb rankings without actually performing better in the real world. That's benchmaxxing. Our contribution: seven private English ASR datasets spanning scripted and conversational speech across American, Australian, Canadian, and Indian accents. Because they're kept private, they can't be gamed, making leaderboard results more trustworthy. The data speaks for itself. When our datasets are included, model rankings shift. That's the signal a public-only benchmark can't give you. Read the full story → https://lnkd.in/eFibXcES #SpeechAI #ASR #AIEvaluation #HuggingFace #Benchmarking
2 Comments

Like Comment Share
Appen

1,067,404 followers
2w
Report this post
If you’re at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) this week, join us for an evening with Appen and Hugging Face. May 6 (Wednesday) | 6:00-9:00 PM We’re bringing together researchers, engineers and practitioners working across speech, audio, NLP and multimodal AI for a casual happy hour. Whether you want to exchange ideas or just unwind after a full day of sessions, this is the space for it. No pitches. No demos. Just people, drinks and real conversations. If you're around, come by and say hi. Register here: https://luma.com/7j9nagtx
5 Comments

Like Comment Share
Appen

1,067,404 followers
3w
Report this post
Barcelona next week 🇪🇸 Sergio Bruccoleri, our VP of Delivery, will be at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2026, connecting with researchers working on speech, audio, and multimodal systems and hearing what’s actually evolving in real-world deployments. If you’re attending, it’d be great to see you there.
3 Comments

Like Comment Share

Appen

IT Services and IT Consulting

Kirkland, Washington 1,067,404 followers

Appen is your trusted data partner, powering cutting-edge AI applications for the world's most innovative companies.

About us

Locations

Employees at Appen

Mark Eudy

Katja Tootle-Pizka

Flt Lt Bipin Chandra Dutt Pendyala

Vanessa Liu

Updates

Join now to see what you are missing

Similar pages

Outlier

TELUS Digital AI Data Solutions

TELUS Digital

Scale AI

DataAnnotation

Welocalize

RWS Group

Lionbridge

Remotasks

OneForma

Browse jobs

Appen jobs

Analyst jobs

Moderator jobs

Engineer jobs

Translator jobs

Virtual Assistant jobs

Intern jobs

Specialist jobs

Assistant jobs

Manager jobs

Project Manager jobs

Writer jobs

Customer Service Representative jobs

Associate jobs

Editor jobs

Developer jobs

Scientist jobs

Director jobs

Linguist jobs

Recruiter jobs