Engineering Principle and Practices from Other Companies

Article
06/23/2014

Many of the resources here are coming from two sites. For external references, it mainly comes from infoq.com . InfoQ hold many QCON conferences in Beijing, New York, London, San Francisco which attracted presenter and attendee from hottest IT company. You can quickly look at New York 2013 Videos at here or search for QCON in www.infoq.com for more resources.

How other companies running their service

How Netflix running service

Building for the Cloud @ Netﬂix

Carl Quinn presents the build and deployment architecture used by Netflix in order to provide content out of Amazon AWS.

Resiliency through Failure - Netflix's Approach to Extreme Availability in the Cloud *(hot)

Ariel Tseitlin discusses Netflix' suite of tools, collectively called the Simian Army, used to improve resiliency and maintain the cloud environment. The tools simulate failure in order to see how the system reacts to it.

Machine Learning & Recommender Systems at Netflix Scale (New)

Xavier Amatriain discusses the machine learning algorithms and architecture behind Netflix' recommender systems, offline experiments and online A/B testing.

From Code to Monkeys: Continuous Delivery at Netflix

Dianne Marsh presents the open source tools used by Netflix to keep the continuous delivery wheels spinning.

Big Data Platform as a Service at Netflix

Nov 18, 2013 ... Jeff Magnusson takes a deep dive into key services of Netflix's “data platform as a service” architecture, including RESTful services that: provide ...

How Netflix Architects for Survival

Nov 29, 2013 ... Jeremy Edberg discusses how Netflix designs their systems and deployment processes to help the service survive both catastrophic events like ...

Asgard, the Grails App that Deploys Netflix to the Cloud

Oct 22, 2013 ... Joe Sondow presents how Netflix uses Asgard to deploy code updates and manage resources in the Amazon cloud.

How LinkedIn running service

Lessons from Building and Scaling LinkedIn *(hot)

Jay Kreps discusses the evolution of LinkedIn's architecture and lessons learned scaling from a monolithic application to a distributed set of services, from one database to distributed data stores.

Data Infrastructure @ LinkedIn

Sid Anand presents the architecture set in place at LinkedIn and the data infrastructure running Java and Scala apps on top of Oracle, Voldemort, DataBus and Kafka.

Samza: Real-time Stream Processing at LinkedIn

Chris Riccomini discusses: Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap.

How Facebook shipping codes

Development and Deployment at Facebook (new) by Kent Beck

More than one billion users log in to Facebook at least once a month to connect and share content with each other. Among other activities, these users upload over 2.5 billion content items every day. In this article we describe the development and deployment of the software that supports all this activity, focusing on the site's primary codebase for the Web front-end.

Chuck Rossi unveils some of the tools and processes used by Facebook for pushing new updates every day.

Evolution of Code Design at Facebook

Nick Schrock presents how Facebook’s code evolved over time, explaining some new constructs – fbobjects, Preparables, Ents - introduced to address the complexities of a large social graph.

Big Data Architectures at Facebook

Ashish Thusoo presents the data scalability issues at Facebook and the data architecture evolution from EDW to Hadoop to Puma.

Facebook News Feed: Social Data at Scale

Serkan Piantino discusses news feeds at Facebook: the basics, infrastructure used, how feed data is stored, and Centrifuge – a storage solution.

How Twitter monitoring services

Lessons Learned Building Storm

Nathan Marz shares lessons learned building Storm, an open-source, distributed, real-time computation system.

Storm: Distributed and Fault-Tolerant Real-time Computation (New)

Nathan Marz introduces Twitter Storm, outlining its architecture and use cases, and takes a look at future features to be made available.

Forecasting at Twitter (new)

Arun Kejariwal, from Twitter, talked at Velocity Conf London last month about forecasting algorithms used at Twitter to proactively predict system resource needs as well as business metrics such as number of users or tweets. Given the dynamic nature of their data stream, they found that a refined ARIMA model works well once data is cleansed, including removal of outliers.

Decomposing Twitter: Adventures in Service-Oriented Architecture

Jeremy Cloud discusses SOA at Twitter, approaches taken for maintaining high levels of concurrency, and briefly touches on some functional design patterns used to manage code complexity.

Innovation at Google

Large-Scale Continuous Testing in the Cloud (new)

John Penix describes the test automation system and the supporting build system infrastructure used by Google.

Innovation at Google (new)

Patrick Copeland presents the first three principles of the eXtreme innovation approach based on the Pretotyping Manifesto: Innovators Beat Ideas, Pretotypes Beat Productypes, and Data Beats Opinion.

Agile Project Management: Lessons Learned at Google (new)

A retrospective on Google's first Scrum implementation. Jeff Sutherland visited Google to do an analysis of the first Google implementation of Scrum on one of their largest distributed projects. Their strategy for inserting Scrum step by step into the Google engineering teams showed great insight and provides helpful lessons learned for all Agile teams.

Others

* Scaling Reddit from 1 Million to 1 Billion–Pitfalls and Lessons

Jeremy Edberg shares some of the lessons learned scaling Reddit, advising on pitfalls to avoid.

How a Small Team Scales Instagram

Mike Krieger discusses Instagram's best and worst infrastructure decisions, building and deploying scalable and extensible services.

Talks by different topics

Hadoop

Building Applications using Apache Hadoop

Eli Collins overviews how to build new applications with Hadoop and how to integrate Hadoop with existing applications, providing an update on the state of Hadoop ecosystem, frameworks and APIs.

Bill Yetman and Jeremy Pollack discuss using several Agile techniques -start simple, get going, iterate- and the “measure everything” principle to create the architecture behind the Family History website.

High Speed Smart Data Ingest into Hadoop (new)

Oleg Zhurakousky discusses architectural tradeoffs and alternative implementations of real-time high speed data ingest into Hadoop.

Leveraging Your Hadoop Cluster Better - Running Performant Code at Scale

Michael Kopp explains how to run performance code at scale with Hadoop and how to analyze and optimize Hadoop jobs.

Fault Tolerant

Anomaly Detection, Fault Tolerance and Anticipation Patterns

John Allspaw discusses fault tolerance, anomaly detection and anticipation patterns helpful to create highly available and resilient systems.

Continuous Integration and Continuous Delivery

DevOps

CALL OF DUTY: DEV OPS By STEPHEN BURTON, TECH EVANGELIST APPDYNAMICS

Alert, Monitoring, Availability and Metric

How do you measure quality of a service? by Brian Harry (new!)

References

Engineering Principle and Practices from Other Companies

Additional resources