TrialTrack - Clinical Trial Project Management
All posts

Data Management

Clinical Data Management: The Lifecycle, Step by Step

Dejan Murko

At a glance

  • Clinical data management (CDM) is the discipline that turns messy trial data into a clean, locked, analyzable, defensible dataset. Its job is reliable data you can stand behind at inspection.
  • CDM is a lifecycle: start-up (design the forms and database), conduct (enter, validate, query, code), and close-out (review, reconcile, lock).
  • Each phase produces a named deliverable and is backed by a data-integrity control. The control, audit trails, ALCOA attributes, validated systems, is what makes the deliverable defensible.
  • CDM is not the EDC system and not the data management plan. CDM is the discipline; EDC is a tool; the DMP is the document that governs the work.
  • The through-line is data integrity. GCP and Part 11 require records that are attributable, traceable, and protected, which is why each lifecycle step exists.

Most “what is clinical data management” pages fall into one of two traps: they bury beginners in CDISC acronyms, or they pivot quickly into a software pitch. Neither connects each step of the work to the regulatory control that makes it defensible, and most never separate the CDM process from the system you run it on and the plan that governs it. This guide is the operator’s lifecycle view. It walks CDM phase by phase, names the deliverable and the data-integrity control at each step, distinguishes CDM from EDC and the DMP, and shows where inspection risk actually concentrates. The data management plan document and EDC selection have their own dedicated guides; this is the discipline.

What clinical data management is (and what it is not)

Clinical data management is the set of processes that ensure the data coming out of a trial are reliable enough to support the conclusions drawn from them. GCP frames the obligation directly: the sponsor should ensure the integrity and confidentiality of data generated and managed, and should apply quality control to the relevant stages of data handling, focusing quality assurance and quality control activities on data of higher criticality (ICH E6(R3) §3.16.1). That sentence is CDM in miniature: protect integrity, control quality, and concentrate effort where the data matters most.

CDM vs the EDC system vs the data management plan

Three things get conflated; keep them distinct:

  • CDM (the discipline) is what you do: design forms, build the database, clean data, manage queries, code terms, lock the database.
  • The EDC system (the tool) is what you capture and manage data in.
  • The data management plan (the document) is what governs the work. GCP expects the sponsor to pre-specify the data to be collected and the method of collection, with additional detail, including a data flow diagram, in a protocol-related document such as a data management plan (ICH E6(R3) §3.16.1).

Confusing these is the root of most CDM misunderstandings. The rest of this guide is about the discipline.

The CDM lifecycle, phase by phase

CDM runs in three phases. For each, the deliverable and the control that proves it.

Start-up: protocol review, CRF design, database build, edit-check specification

Before any data arrives, CDM designs how it will be captured. The deliverables are the case report form (the CRF/eCRF), the database, and the edit-check specification. The governing control is that systems must be built and validated for purpose. GCP requires that the approach to validation of computerised systems be based on a risk assessment considering the system’s intended use and the importance of the data, and that both standard functionality and protocol-specific configurations, including automated data-entry checks and calculations, be validated (ICH E6(R3) §4.3.4). A poorly designed CRF or an unvalidated edit check is where downstream query backlogs are born.

Conduct: data entry, validation, query management, medical coding

As the trial runs, data is entered, validated against edit checks, and discrepancies become queries to resolve. Medical coding maps verbatim terms (adverse events, medications) to standard dictionaries such as MedDRA and WHODrug, named industry standards rather than regulatory requirements. The deliverables here are clean data and a query log; the controls are the audit trail and traceability. GCP requires that computerised systems document the initial data entry and any subsequent changes or deletions, that audit trails be kept enabled and interpretable, and that the date and time of entries be captured unambiguously (ICH E6(R3) §4.2.2). Part 11 reinforces this for any electronic record: secure, computer-generated, time-stamped audit trails that record the author of entries creating, modifying, or deleting records, without obscuring previous information (21 CFR 11.10(e)). Every correction must be traceable; that traceability is the whole point.

Close-out: data review, reconciliation, soft lock, database lock and archive

At the end, CDM reviews the data, reconciles external datasets (labs, serious adverse events), applies a soft lock, then a hard database lock, and archives. The deliverable is the locked, analyzable dataset; the control is finalization discipline. GCP requires procedures covering the full data life cycle, including finalisation of data sets prior to analysis (ICH E6(R3) §4.2), and is strict about late changes: deviations from the planned analysis or changes after unblinding must be documented, justified, authorised by the investigator, and reflected in the audit trail (ICH E6(R3) §3.16.2). A blown database-lock date is not just a schedule slip; it is where rushed, poorly documented changes creep in.

Roles on a CDM team (and how small teams compress them)

A full CDM team has data managers, database programmers, medical coders, and a data management lead. On a small team, one or two people wear all those hats. That compression is fine as long as the controls hold: GCP’s requirement that those who develop, maintain, or use computerised systems be qualified for their tasks (echoed by Part 11’s training requirement at 21 CFR 11.10(i)) does not scale down. The risk for lean teams is not having too few people; it is letting the controls lapse because the same person designs, enters, and reviews without an audit trail to keep them honest.

The clinical trial database / CDMS: what it is and what it must do

The clinical trial database is where study data lives; the CDMS is the broader system that manages it. Whatever the product, it must support the data-integrity controls. GCP requires that computerised systems be validated and kept in a validated state, with periodic review as appropriate (ICH E6(R3) §4.3.4), and Part 11 requires that closed systems limit access to authorized individuals, protect records for ready retrieval throughout retention, and generate accurate and complete copies for inspection (21 CFR 11.10). A clinical trial database is, in Part 11 terms, a store of electronic records (21 CFR 11.3(b)(6)), so those controls are not optional features; they are what makes the database defensible.

Data integrity as the through-line: ALCOA+ and why each control exists

The thread tying every phase together is data integrity, commonly summarized as ALCOA+. The core attributes are grounded in GCP: source records should be attributable, legible, contemporaneous, original, accurate, and complete, and changes should be traceable, should not obscure the original entry, and should be explained where necessary via an audit trail (ICH E6(R3) §2.12.2). The ”+” extensions (consistent, enduring, available) map onto the same regulatory expectations: records must be protected for ready retrieval throughout the retention period and available for inspection (21 CFR 11.10(c)), and essential records must be retained and available to regulators and auditors (ICH E6(R3) §9.5). Read this way, ALCOA+ is not a slogan; each letter corresponds to a control the regulations actually require, which is why each lifecycle step exists.

Database lock, explained (soft lock vs hard lock)

Database lock is the moment the dataset is frozen for analysis, and it is the deliverable the whole conduct phase builds toward. In practice it happens in two steps. A soft lock is a provisional freeze: data entry and most cleaning are complete, queries are resolved, and the team does a final review, but the database can still be reopened if a genuine issue surfaces. A hard lock is the irreversible freeze: access is restricted, no further changes are made, and analysis proceeds against a fixed dataset.

The discipline around lock is what GCP cares about. The data life cycle must include the finalisation of data sets prior to analysis (ICH E6(R3) §4.2), and any change after the database is unblinded must be the exception, not the rule: such changes should be clearly documented and justified, occur only in exceptional circumstances, be authorised by the investigator, and be reflected in the audit trail (ICH E6(R3) §3.16.2). That is why a rushed lock is dangerous. When teams push to hit a lock date with queries still open, the changes that follow are exactly the late, poorly documented edits that draw inspection findings. The soft-then-hard sequence exists to make the freeze deliberate, with a last chance to catch problems before the dataset becomes immutable.

Where small-team CDM goes wrong (common pitfalls)

  • Treating the eCRF as the only control. Without validated edit checks and a working query process, errors accumulate (ICH E6(R3) §4.3.4).
  • Letting the audit trail lapse. Disabling or ignoring audit trails breaks traceability, which GCP and Part 11 both require (ICH E6(R3) §4.2.2; 21 CFR 11.10(e)).
  • Rushing the lock. Late, undocumented changes around database lock are an inspection magnet; GCP requires post-unblinding changes to be justified and audit-trailed (ICH E6(R3) §3.16.2).
  • No data management plan. Without the DMP governing collection and data flow, the process is improvised (ICH E6(R3) §3.16.1).
  • One person, no separation. Compressing roles is fine; removing the audit trail that keeps them accountable is not.

A note on scope and tooling: TrialTrack is clinical project management software, not a CDMS. It does not capture or manage clinical study data. If you need a clinical trial database, EDC, or coding tools, those are a different category; a project tool tracks the operational work around the trial, not the data itself.

The bottom line

Clinical data management is the lifecycle that produces a clean, locked, defensible dataset, and every phase pairs a deliverable with a data-integrity control. Design and validate the system in start-up, capture and clean with audit-trailed traceability during conduct, and finalize with disciplined database lock at close-out. Keep CDM, the EDC tool, and the DMP distinct, and treat ALCOA+ not as a poster but as the set of controls GCP and Part 11 actually require. Get the controls right and the data will survive inspection; that, in the end, is the job.

Sources

Dejan Murko

Dejan Murko

Dejan is the co-founder of Mayet, building software for biotech and pharma teams.