Applications for Apple platforms typically use the CoreData framework to maintain their persistent data. This article explores the consequences for migration when CoreData object models are generated from Swift source code as described in the previous article. That article proposed that object models be generated by a Schema type parametrized by sets of managed object subclasses, that each such subclass determines an entity of the derived model, and that the managed properties of each entity are specified by applying custom macros.

We focus on the use of lightweight migration and external processes as suggested by the WWDC 2022 session Evolve your Core Data schema. We require that persistent stores be opened with respect to both the current schema and the list of all prior schema versions. Prior versions for which lightweight migration to the next version is not possible must be accompanied by a procedure to make store content compatible with the next version; we’ll refer to such procedures as migration scripts and use the term custom migration to denote the pairing of a schema and a migration script. A migration script generally requires an intermediate object model to bridge between previous and next versions, and these are derived where necessary from the differences between successive schema. Custom migrations are implemented as an implicit sequence of three steps: lightweight migration from source to intermediate model, script execution, and lightweight migration from intermediate to target model. This results in a powerful and robust incremental migration process expressed entirely in Swift.

We begin with an overview of CoreData’s migration process as it relates to the current development. The second section presents the custom data types used to specify migrations between schema versions. The third and fourth sections describe the calculation of schema differences and how these differences determine intermediate object models. The fifth section describes the management of the incremental migration process, and the final section presents some considerations for Codable attribute types.

CoreData migration Link to heading

The layout of a CoreData persistent store is defined by the managed object model with which it was created or last updated. Each store maintains the identity of its defining object model as a dictionary mapping entity names to their versionHash values. Each entity’s versionHash is affected by various of its parameter values as well as the versionHash values of its component property descriptors. CoreData will open an existing store for a given model provided it has the same identity as that defining the store content.

As an application’s object model evolves, a process of migration is required to make existing stores compatible with the model version used by the application. CoreData can implicitly infer and apply so called lightweight migrations for many model changes, but in general an application must maintain a history of model versions and explicitly migrate stores from any previous version to that used by the application. If you’re working with non-trivial object models evolving over time, you’re likely to encounter the need for explicit migration.

CoreData’s migration process is performed by an instance of the NSMigrationManager class and generally maps the content of a source store into a target store. If source and target stores are the same, the migration process is called in-place or lightweight; otherwise it is called heavyweight. Heavyweight migration is expensive because both source and target stores must generally exist for the duration of the migration process; consequently Apple advocates using lightweight migration whenever possible.

A CoreData migration is defined by an instance of the NSMappingModel class. CoreData can infer mapping models for many model changes including addition, removal or renaming of entities and properties. However, lightweight migration with an inferred mapping is possible only if the store content is already consistent with the new model (modulo property and entity renaming and removal). For example, CoreData will infer a mapping model after adding a non-optional attribute to an entity, but lightweight migration will fail for stores which contain instances of the affected entity. To avoid this failure one must either create a custom mapping model (which implies a heavyweight migration), or create an intermediate model in which the added attribute is optional, perform lightweight migration to the intermediate model, execute a procedure to assign non-nil values to all instances of the new attribute, and finally perform lightweight migration to the target model.

CoreData streamlines the migration process when all model changes are compatible with lightweight migration - i.e. when there exists an inferrable lightweight migration from any previous model version to the application’s model version. Otherwise an application must explicitly manage the migration process, which means:

  • maintaining all model versions in xcdatamodeld format (i.e. a directory of archived object models);
  • determining which model version corresponds to the existing store;
  • defining all required mapping models;
  • applying the appropriate sequence of mapping models, which may involve use of temporary stores;
  • and ensuring the process is restartable.

We won’t get into the details of mapping models as our process requires only inferrable mapping models and lightweight migration.

Specifying migrations Link to heading

This section introduces the elements used to specify model versions and migrations, and walks through a simple migration scenario.

In the proposed system, modifying an Entity subclass generally requires both old and new versions of the class to exist with the same entity name (as both old and new entity descriptions must exist within the respective object models). An entity description’s name is taken from the entityName property of the defining class, which returns the class name independent of defining scope; its managedObjectClassName is determined by applying NSStringFromClass to the defining class, which produces a mangling of class name and defining scope. This naturally allows old and new classes versions to exist with the same names in different scopes. To enable multiple versions to exist in the same scope, we rework entityName to allow class names such as E_v1 to specify both entity name E and version number 1.

class var entityName : String
  { entityNameAndVersion.entityName }

class var entityNameAndVersion : (entityName: String, version: Int)
  { ... }

Note that an entity’s managedObjectClassName does not affect its versionHash, so we can rename an Entity subclass in subsequent schema versions provided its entityName remains intact.

We define a Migration structure to pair a source schema with an optional migration script,

struct Migration {
  typealias Script = (NSManagedObjectContext) throws -> Void
  var source : Schema
  let script : Migration.Script?
  let scriptIsIdempotent : Bool
}

where a migration script is simply an operation on a managed object context. We extend the method for opening a persistent store with a list of Migration instances which specify the schema version history from newest to oldest,

func openWith(schema s: Schema, migrations ms: [Migration] = []) throws

which means the target schema of each migration is implicit in the order of the argument migrations.

We can now walk through application code for a simple migration scenario. Suppose our application defines an initial object model with a single entity E

class E : Entity {
}

and opens a DataStore instance on application launch.

class AppDelegate : UIApplicationDelegate {
  let store = DataStore()
  func application(_: UIApplication, willFinishLaunchingWithOptions: [UIApplication.LaunchOptionsKey: Any]? = nil) -> Bool {
    do {
      try store.openWith(schema: try Schema(objectTypes: [E.self]))
    }
    catch let error {
      fatalError("\(error)")
    }
  }
}

Suppose we must now update the object model to add a non-optional attribute to entity E. As this requires a custom migration, we start by assigning a new name to a snapshot of the current entity definition,

class E_v1 : Entity {
}

modify the current definition,

class E : Entity {
  @Attribute var a : Int
}

and finally provide DataStore’s open method with a pairing of the original schema version and a migration script:

try store.openWith(schema: try Schema(objectTypes: [E.self]), migrations: [
  .init(schema: try Schema(objectTypes: [E_v1.self]), script: { context in
    for object in try context.fetch(NSFetchRequest<NSManagedObject>(entityName: "E")) {
      object.setValue(Int.random(in: 0 ... 9), forKey: "a")
    }
  }),
])

Calculating schema differences Link to heading

This section describes the calculation of differences between two schema versions, which is necessary to decide whether or not the associated object models are compatible with lightweight migration.

As an object model is a mapping of entity names to their descriptors and an entity contains a mapping of its property names to their descriptors, we start with a Diffable protocol,

public protocol Diffable {
  associatedtype Difference
  func difference(from other: Self) throws -> Difference?
}

and a method for calculating dictionary difference modulo renaming.

extension Dictionary : Diffable where Value : Diffable {
  struct Difference {
    public var added : [Key] = []
    public var removed : [Key] = []
    public var modified : [Key: Value.Difference] = [:]
  }

  func difference(from old: Self) throws -> Difference?
    { try difference(from: old, moduloRenaming: {_ in nil}) }

  func difference(from old: Self, moduloRenaming rename: (Value) -> Key?) throws -> Difference?
    { ... }
}

We extend each of our property descriptors with a renamingIdentifier, corresponding to that of NSPropertyDescription, are make them Diffable through associated Change types which enumerate the parameters contributing to their versionHash; conformance is required for attributes

extension Attribute : Diffable {
  enum Change : Hashable { case name, isOptional, type, ... }
  func difference(from old: Self) throws -> Set<Change>?
    { ... }
}

and relationships.

extension Relationship : Diffable {
  enum Change : Hashable { case name, relatedEntityName, inverseName, rangeOfCount, ... }
  func difference(from old: Self) throws -> Set<Change>?
    { ... }
}

Calculating Entity difference is more complicated in that we must maintain changes to both entity parameters and associated properties.

extension Entity : Diffable {
  enum DescriptorChange : Hashable { case name, isAbstract }
  struct Difference : Equatable {
    let descriptorChanges : Set<DescriptorChange>
    let attributesDifference : Dictionary<String, Attribute>.Difference
    let relationshipsDifference : Dictionary<String, Relationship>.Difference
  }
  public func difference(from old: Self) throws -> Difference?
    { ... }
}

Note that when an attribute changes from one Codable type to another, the generated object models have no inherent distinction because the storage type remains unchanged (i.e. as transformable). Consequently, Attribute must retain the declared value type (independent of optionality) in order to detect type changes.

Creating intermediate models Link to heading

This section describes how the difference between two schema versions is used to construct a sequence of migration steps with an implicit intermediate schema when necessary.

Each migration step corresponds to either a lightweight migration to a given object model or execution of a custom script.

enum Step {
  case lightweight(NSManagedObjectModel)
  case script(Script)
}

The construction is implemented as a method on the target Schema,

func migrationSteps(
    to targetModel: NSManagedObjectModel, 
    from sourceModel: NSManagedObjectModel, 
    of sourceSchema: Schema, 
    using migrationScript: Migration.Script?
) throws -> [Migration.Step]

where the object models for source and target schema are provided to avoid recalculation. The construction begins by taking the intermediate schema to be a copy of the source schema. It then traverses the difference between the source and target schemas, noting when those differences require a migration script and modifying the intermediate schema where necessary. Entities and properties which exist only in the target schema are added to the intermediate schema, with addition of non-optional properties signaling the need for a migration script. The effect of changes to entity/property pairs common to both schemas is determined by the kind and relative configuration of those model components:

  • attribute optionality – the intermediate model inherits the optionality of the source model, and a script is required when target is non-optional
  • attribute type – the intermediate model must contain two distinctly-named versions of the affected attribute to allow the migration script to both access old values and assign new values, and the new attribute version must be marked optional; due to CoreData’s treatment of renaming, the intermediate model must rename both old and new attribute versions, which in turn requires implicit renaming in the target model to restore the original name
  • relationship range – changing the arity of a relationship requires a migration script if the new arity does not contain the old arity; otherwise we must relax the arity specified by the intermediate model to ensure compatibility with both source and target models.
  • property transience – changing a property from transient to non-transient requires a migration script (to assign property values) if the property is non-optional in the target model

Note that all changes to the intermediate schema imply the need for a migration script.

After defining the intermediate schema, the list of migration steps is determined according to whether or not a migration script was deemed necessary: if so, we generate an intermediate object model and return the 3-step sequence of lightweight migration to the intermediate model, script execution, and lightweight migration to the target model; otherwise a single lightweight migration to the target model is sufficient.

Note that renaming of entities and properties is handled by CoreData and that our difference calculation implicitly accounts for renaming; for example, in a combination of renaming and changing optionality, only the latter need be considered in constructing the intermediate model. Note also that changing the kind of a property, say from attribute to relationship, is considered as a combination of addition and removal. Finally, migration scripts associated with attribute type changes can obtain the names of the old and new attribute versions using the following extension to Schema:

static func renameOld(_ name: String) -> String
static func renameNew(_ name: String) -> String

Managing incremental migration Link to heading

This section shows how incremental migration is performed using the schema version history provided on opening a persistent store. All methods presented below are members of the DataStore class.

The open method now requires both the current schema and a list pairing each previous schema with its migration script (if necessary), ordered from most to least recent. Recall that a Schema’s object model is generated via its createRuntimeInfo method. Migration is necessary if the persistent store exists and is incompatible with current schema’s object model model:

func openWith(schema s: Schema, migrations ms: [Migration] = []) throws {
  let info = try schema.createRuntimeInfo()
  if FileManager.default.fileExists(atPath: storeURL.path) {
    let metadata = try getMetadata()
    if info.managedObjectModel.isConfiguration(withName: nil, compatibleWithStoreMetadata: metadata) == false {
      ...
    }
  }
  ...
}

When migration is necessary, we find the most recent previous schema whose object model is compatible with the store metadata by iteratively generating object models, and concatenate the steps required to migrate between each adjacent pair leading from the compatible version to the current version.

func migrationPath(
    from metadata: [String: Any], 
    to current: (schema: Schema, model: NSManagedObjectModel), 
    using migrations: [Migration]
) throws -> (sourceModel: NSManagedObjectModel, migrationSteps: [Migration.Step])

Finally, each step is applied to the store in the appropriate order.

func migrate(from storeModel: NSManagedObjectModel, using step: Migration.Step)
  throws -> NSManagedObjectModel {
  switch step {
    case .lightweight(let targetModel) :
      try migrate(from: storeModel, to: targetModel)
      return targetModel
    case .script(let script) :
      try update(as: storeModel) { context in
        if try context.tryFetchObject(makeFetchRequest(for: Migration.ScriptMarker.self)) == nil {
          try script(context)
          try context.create(Migration.ScriptMarker.self) { _ in }
          try context.save()
        }
      }
      return storeModel
  }
}

Note that applying a script decides whether or not to run the script based on the existance of an instance of the ScriptMarker entity, which is added to the intermediate schema when a migration script is deemed necessary – a detail omitted from the previous section.

The method for performing lightweight migration uses the inferred mapping model between store and target object models,

func migrate(from storeModel: NSManagedObjectModel, to targetModel: NSManagedObjectModel) throws {
  precondition(isOpen == false && isCompatible(with: storeModel))
  let mapping = try NSMappingModel.inferredMappingModel(forSourceModel: storeModel, destinationModel: targetModel)
  let manager = NSMigrationManager(sourceModel: storeModel, destinationModel: targetModel)
  try manager.migrateStore(from: storeURL, type: storeType, mapping: mapping, to: storeURL, type: storeType)
}

and the method for script application simply envelops script execution between opening and saving/closing the store.

public func update(as storeModel: NSManagedObjectModel, using script: (NSManagedObjectContext) throws -> Void) throws {
  precondition(isOpen == false && isCompatible(with: storeModel))
  let coordinator = NSPersistentStoreCoordinator(managedObjectModel: storeModel)
  let store = try coordinator.addPersistentStore(type: storeType, at: storeURL)
  let context = NSManagedObjectContext(concurrencyType: .mainQueueConcurrencyType)
  context.persistentStoreCoordinator = coordinator
  try script(context)
  try context.save()
  try coordinator.remove(store)
}

Note that there is no means to ensure migration scripts perform all necessary changes for compatibility with the target model for the subsequent step. Failure to do so results in an error being thrown in the subsequent lightweight migration step. Recovering from this issue in production would require shipping an update in which the script has been corrected and made idempotent; so migration scripts must be thoroughly tested.

In order for incremental migration to be reliable, all object model versions must have distinct identities. This is enforced by adding an implicit entity to each model with a distinct version hash modifier taken from the versionId property of the schema. That property is assigned either explicitly on initialization or implicitly from each schema’s position in the version history list upon opening the store. Implicit identifiers are increasing natural numbers assigned from oldest to newest.

It is important to note that once a schema has been released in production, one must not change either its version identifier or its definition (in a way which affects the identity of the generated model).

Codable attribute types Link to heading

This section discusses issues involved in the use of Codable attribute types.

While custom implementations of Codable can be quite flexible, one will generally require a migration when changing the representation of a such a type used in an attribute declaration. Recalling that attribute values of Codable type T are stored as instances of Boxed<T>, convenience methods for getting and setting those values within migration scripts are provided by an extension on NSManagedObject.

func unboxedValue<T: Codable>(of t: T.Type = T.self, forKey key: String) throws -> T
  { ... }
func setBoxedValue<T: Codable>(_ value: T, forKey key: String)
  { ... }

The retrieval function throws if the stored value is not a Boxed<T>, which would indicate the property was uninitialized or the type system was subverted through explicit use of setValue(:forKey:).

Although the version hashes of generated NSAttributeDescriptions are unaffected by change of declared type in this case (all codable types map to storage type transformable), script execution is ensured by the *versionId *of the enclosing schema.

As a simple example, we can define two versions of entity E which differ in the types of their point attribute:

struct Point2d : Codable { var x, y: Float }
class E_v1 : Entity {
  @Attribute("point") var point : Point2d
}

struct Point3d : Codable { var x, y, z: Float }
class E : Entity {
  @Attribute("point") var point : Point3d
}

The required migration script sets each instance of the new attribute according to the value of the old attribute, distinguishing old and new attributes via Schema’s renaming methods.

store.openWith(schema: try Schema(objectTypes: [E.self]), migrations: [
  Migration(schema: try Schema(objectTypes: [E_v1.self]), script: { context in
    for object in try context.fetch(NSFetchRequest<NSManagedObject>(entityName: "E")) {
      let p = try object.unboxedValue(of: Point2d.self, forKey: Schema.renameOld("point"))
      object.setBoxedValue(Point3d(x: p.x, y: p.y, z: 0), forKey: Schema.renameNew("point"))
    }
  }),
])

Note that change of a Codable type name does not impact the ability to decode a previously encoded value: in the previous example we could have started with a type named Point, then extended its structure while preserving the original definition with an alternate name such as Point_v1.

Summary Link to heading

The intent of this article was to answer the question of how to perform migration when object models are generated from Swift source code. The attempt to answer that question became an exploration of what can be accomplished with the combination of lightweight migration and script execution. The result is a cohesive approach to incremental migration.

This development resulted in a number of refinements to the system presented in the previous article. Most notably:

  • attribute descriptors must retain their declaration types in order to detect type changes;
  • separation between Storable and Nullable types was necessary to distinguish optionality and type changes.