Introducing EPUB CFI

CFI, or Canonical Fragment Identifier, is a device designed by IDPF to refer any location of an element or context in an EPUB publication. CFI has a structure similar to that of an HTTP address and has been the result of inspiration by the power of the Web’s hyperlinks.
We at Anfengde was applying the % percentage to indicate reading progress when developing our AnReader. Though we had spent a lot of time on the EPUB3 Specifications, even making a rather formal translation to contribute to IDPF, we ignored the chapters about CFI. Then Ric Wright’s mentioning of it caught our attention when he pulled our code and ran our SDK. Ric Wright is serving part time as technical director of the Readium SDK project and is the president of Geo F/X. So we researched this part again and discussed within our team about the rules IDPF has made. It’s yet a tentative measure as we understand it since IDPF only labels it as recommended.
Below is a brief introduction, for full documentation please check at IDPF’s site.

The characters employed by IDPF to construct identifiers are “/”, ”[“, “~”, “!”, “@” and “^”, among which “^” escapes characters to avoid confliction. Identifiers are a sequence of steps placed in brackets and prefixed with “epubcfi”. They are then appended to the end of IRI with #, e.g.,
book.epub#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10).

In the above example, even numbers like 6 and 4 after “/” indicate that the step is an element of a DOM object. Odd numbers after “/” are texts to be rendered. Contents inside square brackets ([…]) are assertions for verification. Each special character is defined with specific usages in the link about, so we only cite an example from there to demonstrate the clues.
Given the following Package Document:

<?xml version="1.0"?>

<package version="2.0" 
         unique-identifier="bookid" 
         xmlns="http://www.idpf.org/2007/opf"
         xmlns:dc="http://purl.org/dc/elements/1.1/" 
         xmlns:opf="http://www.idpf.org/2007/opf">

 <metadata>
        <dc:title>...</dc:title>
        <dc:identifier id="bookid">...</dc:identifier>
        <dc:creator>...</dc:creator>
        <dc:language>en</dc:language>
 </metadata>
 <manifest>
        <item id="toc"
              properties="nav"
              href="toc.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="titlepage" 
              href="titlepage.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter01" 
              href="chapter01.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter02" 
              href="chapter02.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter03" 
              href="chapter03.xhtml" 
              media-type="application/xhtml+xml"/>
        <item id="chapter04" 
              href="chapter04.xhtml" 
              media-type="application/xhtml+xml"/>
    </manifest>
    
    <spine>
        <itemref id="titleref"  idref="titlepage"/>
        <itemref id="chap01ref" idref="chapter01"/>
        <itemref id="chap02ref" idref="chapter02"/>
        <itemref id="chap03ref" idref="chapter03"/>
        <itemref id="chap04ref" idref="chapter04"/>
    </spine>
    
</package>

and the XHTML Content Document chapter01.xhtml:

<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>...</title>
    </head>
    
    <body id="body01">
        <p>...</p>
        <p>...</p>
        <p>...</p>
        <p>...</p>
        <p id="para05">xxx<em>yyy</em>0123456789</p>
        <p>...</p>
        <p>...</p>
        <img id="svgimg" src="foo.svg" alt="..."/>
        <p>...</p>
        <p>...</p>
    </body>
</html>

Then epubcfi(/6) is the step of “spine”, the 3rd element in the package, epubcfi(/6/4) the 2nd spine element which is chap01ref, for which [chap01ref] is asserted for checking. Thus the EPUB CFI of epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:2) refers to the location of “1” in the text string 0123456789.