Testing in CI | App Dev Guide

Continuous Integration (CI) means running tests automatically on every commit, before code reaches main or production. CI testing catches bugs immediately—a developer knows within minutes if their code broke something. This fast feedback is invaluable. Without CI, bugs hide until someone manually tests or a user finds them.

Why CI Testing Matters

CI testing provides instant feedback:

Catch bugs early: A developer fixes their own code in the same session, not days later.
Prevent breaking changes from reaching main: A test failure blocks a merge, protecting the main branch.
Build confidence: Passing CI tests mean the code has been vetted by automated checks.
Document assumptions: Tests document how code should behave, visible to reviewers.
Enable refactoring: With CI tests, developers can refactor safely. Tests prove nothing broke.
Reduce manual testing: Automated tests replace tedious manual regression testing.

Teams without CI often discover bugs late—during manual testing, UAT, or in production. CI shifts testing left, catching bugs when they're cheapest to fix.

CI/CD Platforms

Popular CI/CD platforms:

Platform	Pros	Cons
GitHub Actions	Native to GitHub, free tier, easy workflow files	Limited to GitHub, can be verbose
GitLab CI/CD	Built into GitLab, powerful, free tier	Only for GitLab
CircleCI	Great user experience, multi-platform support, free tier	Paid for advanced features
Jenkins	Open source, highly customizable, works with any git host	Self-hosted (requires maintenance), steeper learning curve
Travis CI	Simple to set up, good for open source	Paid tiers, smaller community

Most modern teams use GitHub Actions (free, simple) or GitLab CI/CD. If you need advanced features or multiple platforms, CircleCI is excellent. Jenkins is the choice for teams needing maximum control.

Setting Up a CI Pipeline

A basic CI pipeline has these stages:

Trigger: On every commit (or pull request), the pipeline starts.
Checkout: CI pulls the code.
Install dependencies: npm install, pip install, etc.
Lint/format: ESLint, Prettier, etc. fail the build if code doesn't meet standards.
Type check: TypeScript, mypy, etc. catch type errors.
Unit tests: Run fast unit tests.
Integration tests: Run integration tests against test database.
Build: Compile or bundle the code.
Security scanning: Snyk, SonarQube, dependency scanning.
E2E tests (optional): Run against staging environment (slower, run less frequently).
Report: Generate reports and notify developers of results.

Not every pipeline includes all stages. A simple pipeline might just: install, lint, unit test, build. A complex one might include all stages plus deployment. Tailor to your needs.

Test Parallelization and Sharding

Test suites can grow large (thousands of tests). Running them sequentially takes too long. Parallelization runs tests in parallel:

Split tests across workers: CI system runs tests on multiple machines or processes. Worker 1 runs tests A-M, Worker 2 runs N-Z.
Sharding: Tests are sharded (divided) by category. Unit tests on one machine, integration tests on another.
Load balancing: Sophisticated systems distribute tests to balance work. Fast tests and slow tests mix so no worker is idle.

With 4 workers, a 10-minute test suite becomes 2.5 minutes. With 10 workers, 1 minute. Parallelization is powerful but requires care: tests must be independent (they can run in any order).

Failing Fast

Not all tests are equally important. Organize tests by speed and order them to fail fast:

Linting and type checking: Run first (milliseconds). Catch obvious errors.
Unit tests: Fast (seconds total). Most tests are here.
Integration tests: Moderate (tens of seconds).
E2E tests: Slow (minutes). Run last or skip for speed.
Performance tests: Very slow (minutes/hours). Run on schedule, not per-commit.

If a lint check fails, no point running tests. If unit tests fail, integration tests will likely fail too. Order tests so developers get feedback fast.

Tip

Fail-fast strategy: If any fast stage fails, stop. Don't waste time running slower tests. A unit test failure probably causes integration test failures. Let developers fix the root cause first.

Test Caching

CI can be slow if you rebuild/retiest unchanged code. Caching speeds things up:

Dependency caching: Cache node_modules or equivalent. Skip npm install if package.json hasn't changed.
Build artifacts: Cache the build output. Skip rebuild if source hasn't changed.
Test caching: Skip tests for files that haven't changed. If only the README changed, skip tests.
Docker layer caching: Docker caches layers. If base layer hasn't changed, it's reused.

Caching must be smart: if you cache incorrectly, tests pass locally but fail in CI (cache inconsistency). Most modern CI systems handle caching well; configure it but be cautious about edge cases.

Branch Protection Rules

CI is only useful if you enforce its results. Branch protection rules on GitHub (or equivalent) prevent merging without passing CI:

Require status checks to pass: CI pipeline must pass before merging.
Require code review: At least one approval before merge (in addition to tests passing).
Dismiss stale PR approvals: If tests are rerun, approvals are dismissed. New approvals required.
Require branches to be up to date: PR must be rebased on latest main before merging. Prevents merge conflicts and ensures latest tests ran.

With branch protection rules, it's impossible to merge failing code. Developers must fix it first. This discipline keeps main clean.

Flaky Test Detection

A flaky test passes sometimes, fails other times, without code changes. Flaky tests are poison: developers stop trusting the test suite. Detect and quarantine them:

Monitor test failures: If a test fails, then immediately passes on retry, it's probably flaky.
Disable flaky tests: Mark them as quarantined. They run but don't block merges.
Investigate: Why is the test flaky? Timing issue? Race condition? Inconsistent test data?
Fix and re-enable: Once fixed, re-enable the test.

Flakiness usually comes from E2E tests (timing, network), but can happen in unit tests (randomness, mock issues). A flaky test is worse than no test.

Test Reporting

CI should provide clear test reports:

Summary: X tests passed, Y failed, Z skipped.
Failed test details: Which tests failed and why? Show the assertion error.
Timing: How long did tests take? Are they getting slower?
Coverage: Code coverage percentage. Trend over time.
Artifacts: Logs, screenshots, videos from failed E2E tests.
Annotations: GitHub/GitLab show test results in the PR interface directly.

Good reports make debugging easier. A developer can see which test failed and why without diving into CI logs.

Different Test Suites on Different Triggers

Not every test needs to run on every trigger. Smart pipelines run different tests for different situations:

On every commit: Linting, type checking, unit tests (fast). Takes < 5 minutes.
On pull request: All of above plus integration tests. Takes < 15 minutes.
Before merging to main: All tests plus E2E tests. Takes < 30 minutes.
On schedule (nightly): Full test suite plus performance tests and security scans. Takes 1+ hours.
Before production deployment: Smoke tests against staging. Takes < 5 minutes.

This approach balances speed (developers get feedback fast) with thoroughness (critical tests run before production).

Secrets in CI

CI jobs often need secrets (database passwords, API keys). Never hardcode secrets in CI configuration:

Use secret management: GitHub Secrets, GitLab Variables, CircleCI Contexts store secrets encrypted.
Reference secrets in configuration: "$DATABASE_PASSWORD" is replaced with the actual password at runtime.
Don't log secrets: Make sure secrets aren't printed in logs. CI systems mask them, but be careful.
Rotate secrets: If a CI secret is exposed, rotate it immediately.

Secrets in CI are powerful for testing against real services, but require care to keep them safe.

Docker-Based Test Environments

CI environments should be consistent. Using Docker ensures tests run the same everywhere:

Container as test environment: Everything the app needs (base OS, runtime, dependencies) is in the container.
Services in containers: Database, cache, message queue all run in containers. Spin up fresh for each test.
Consistency: Developer's laptop, CI system, and production all use the same container image.

This prevents the "works on my machine, fails in CI" problem. Docker is powerful for consistent test environments.

Keeping CI Fast as Test Suite Grows

As your codebase grows, so does your test suite. CI can become slow (15+ minutes per commit). Strategies to stay fast:

Parallelize: Use multiple workers. Split tests across machines.
Cache aggressively: Cache dependencies, build artifacts, test databases.
Remove slow tests: If a test is slow and doesn't add value, remove it.
Optimize slow tests: Profile tests. Why is this one slow? Can you make it faster?
Run different tests on different schedules: Quick tests on every commit, slow tests nightly.
Filter tests by impact: Only run tests affected by the change. (Tools like Nx do this.)
Hardware: Use better hardware in CI. Faster machines = faster tests.

A slow CI pipeline hurts productivity. Invest in speed. Developers should get feedback within 10 minutes.

Developer Insight

CI is a living system: Monitor pipeline time. If it's growing, investigate. Slow CI tempts developers to skip running tests locally and merge without checking. Keep it fast to maintain team discipline.

Key Takeaways

CI (Continuous Integration) runs tests automatically on every commit. This catches bugs immediately and prevents broken code from reaching main. Set up a CI pipeline with stages: linting, type checking, unit tests, integration tests, build, security scanning. Parallelize tests for speed. Use branch protection rules to require passing CI before merging. Detect and quarantine flaky tests. Run different test suites on different triggers (quick tests per-commit, slow tests nightly). Monitor CI performance and optimize as the test suite grows. CI is the safety net that keeps your codebase healthy.