# Android Java PDF: Noto 収録外文字を `？` に置換する移植手順

# 1. 目的

Android Java の PDF 出力処理で、描画に使用するフォントを次の 3 ファイルだけに限定する。

NotoSansJP-Regular.ttf
NotoSansJP-Bold.ttf
NotoEmoji-Regular.ttf

これらのフォントに収録されていない文字は、システムフォントへフォールバックさせず、全角疑問符 ？（U+FF1F）へ置換する。

複合絵文字も保守的に ？ へ置換する。

ZWJ を含む絵文字: 😶‍🌫️
Variation Selector を含む絵文字: ✈️
国旗: 🇯🇵
キーキャップ: 1️⃣

単独 code point で NotoEmoji-Regular.ttf に収録されている絵文字は、そのまま描画する。

例:

入力	PDF 上の表示
`ABC日本語`	`ABC日本語`
`العربية`	`？？？？？？？`
`🙂`	`🙂`
`😶‍🌫️`	`？`

# 2. 前提条件

Android Java プロジェクトであること。
minSdk は 23 以上を想定する。
PDF 出力に android.graphics.pdf.PdfDocument、Canvas、Paint を使用していること。
PDF レイアウト、改ページ、セル幅、寄せ位置は既存実装を維持すること。
Paint.hasGlyph() は採用可否の主判定に使用しないこと。システムフォントの fallback chain を含めて判定されるため、本要件には使えない。

# 3. フォントの配置

次のファイルを別プロジェクトの app/src/main/res/font/ に追加する。

app/src/main/res/font/notosansjp_regular.ttf
app/src/main/res/font/notosansjp_bold.ttf
app/src/main/res/font/notoemoji_regular.ttf

Android の resource ID は次の名前になる。

R.font.notosansjp_regular
R.font.notosansjp_bold
R.font.notoemoji_regular

# 4. 補助クラスの追加

PDF 生成クラスと同じ package に、次の 3 ファイルを追加する。

例:

app/src/main/java/com/example/yourapp/NotoFontRegistry.java
app/src/main/java/com/example/yourapp/NotoTextSanitizer.java
app/src/main/java/com/example/yourapp/TtfCmapParser.java

各ファイルの先頭にある package は、移植先の package に変更する。

# 4.1 `NotoFontRegistry.java`

final class NotoFontRegistry {
    private final Typeface regularTypeface;
    private final Typeface boldTypeface;
    private final Typeface emojiTypeface;
    private final NotoTextSanitizer sanitizer;

    private NotoFontRegistry(
            Typeface regularTypeface,
            Typeface boldTypeface,
            Typeface emojiTypeface,
            NotoTextSanitizer sanitizer
    ) {
        this.regularTypeface = regularTypeface;
        this.boldTypeface = boldTypeface;
        this.emojiTypeface = emojiTypeface;
        this.sanitizer = sanitizer;
    }

    static NotoFontRegistry load(Context context) throws IOException {
        Typeface regularTypeface = requireTypeface(context, R.font.notosansjp_regular);
        Typeface boldTypeface = requireTypeface(context, R.font.notosansjp_bold);
        Typeface emojiTypeface = requireTypeface(context, R.font.notoemoji_regular);

        Set<Integer> regularCodePoints = parseCmap(context, R.font.notosansjp_regular);
        Set<Integer> boldCodePoints = parseCmap(context, R.font.notosansjp_bold);
        Set<Integer> emojiCodePoints = parseCmap(context, R.font.notoemoji_regular);

        if (!regularCodePoints.contains(0xFF1F) || !boldCodePoints.contains(0xFF1F)) {
            throw new IOException("replacement glyph is missing");
        }

        NotoTextSanitizer.ClusterSplitter clusterSplitter;
        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
            clusterSplitter = NotoFontRegistry::splitWithIcu;
        } else {
            clusterSplitter = NotoTextSanitizer::splitConservatively;
        }

        return new NotoFontRegistry(
                regularTypeface,
                boldTypeface,
                emojiTypeface,
                new NotoTextSanitizer(
                        regularCodePoints,
                        boldCodePoints,
                        emojiCodePoints,
                        clusterSplitter
                )
        );
    }

    Typeface typefaceFor(NotoTextSanitizer.FontRole fontRole) {
        switch (fontRole) {
            case BOLD:
                return boldTypeface;
            case EMOJI:
                return emojiTypeface;
            case REGULAR:
            default:
                return regularTypeface;
        }
    }

    NotoTextSanitizer sanitizer() {
        return sanitizer;
    }

    private static Typeface requireTypeface(Context context, int fontResId) throws IOException {
        Typeface typeface = ResourcesCompat.getFont(context, fontResId);
        if (typeface == null) {
            throw new IOException("font resource could not be loaded");
        }
        return typeface;
    }

    private static Set<Integer> parseCmap(Context context, int fontResId) throws IOException {
        try (InputStream inputStream = context.getResources().openRawResource(fontResId)) {
            return TtfCmapParser.parse(inputStream);
        }
    }

    @RequiresApi(Build.VERSION_CODES.N)
    private static List<String> splitWithIcu(String text) {
        android.icu.text.BreakIterator iterator =
                android.icu.text.BreakIterator.getCharacterInstance(java.util.Locale.ROOT);
        iterator.setText(text);

        java.util.ArrayList<String> clusters = new java.util.ArrayList<>();
        int start = iterator.first();
        for (int end = iterator.next();
             end != android.icu.text.BreakIterator.DONE;
             start = end, end = iterator.next()) {
            clusters.add(text.substring(start, end));
        }
        return clusters;
    }
}

# 4.2 `NotoTextSanitizer.java`

final class NotoTextSanitizer {
    static final String REPLACEMENT = "？";

    enum FontRole {
        REGULAR,
        BOLD,
        EMOJI
    }

    interface ClusterSplitter {
        List<String> split(String text);
    }

    static final class Run {
        final String text;
        final FontRole fontRole;

        Run(String text, FontRole fontRole) {
            this.text = text;
            this.fontRole = fontRole;
        }
    }

    private final Set<Integer> regularCodePoints;
    private final Set<Integer> boldCodePoints;
    private final Set<Integer> emojiCodePoints;
    private final ClusterSplitter clusterSplitter;

    NotoTextSanitizer(
            Set<Integer> regularCodePoints,
            Set<Integer> boldCodePoints,
            Set<Integer> emojiCodePoints,
            ClusterSplitter clusterSplitter
    ) {
        this.regularCodePoints = regularCodePoints;
        this.boldCodePoints = boldCodePoints;
        this.emojiCodePoints = emojiCodePoints;
        this.clusterSplitter = clusterSplitter;
    }

    List<Run> sanitize(String text, FontRole baseFontRole) {
        if (text == null || text.isEmpty()) {
            return Collections.emptyList();
        }
        if (baseFontRole == FontRole.EMOJI) {
            throw new IllegalArgumentException("base font must be Noto Sans JP");
        }

        Set<Integer> baseCodePoints = codePointsFor(baseFontRole);
        List<Run> runs = new ArrayList<>();

        for (String cluster : clusterSplitter.split(text)) {
            if (containsRestrictedSequenceMarker(cluster)
                    || isMultiCodePointEmojiCluster(cluster)) {
                appendRun(runs, REPLACEMENT, baseFontRole);
            } else if (containsAll(baseCodePoints, cluster)) {
                appendRun(runs, cluster, baseFontRole);
            } else if (isSingleCodePoint(cluster) && containsAll(emojiCodePoints, cluster)) {
                appendRun(runs, cluster, FontRole.EMOJI);
            } else {
                appendRun(runs, REPLACEMENT, baseFontRole);
            }
        }

        return runs;
    }

    String sanitizeToString(String text, FontRole baseFontRole) {
        StringBuilder sanitized = new StringBuilder();
        for (Run run : sanitize(text, baseFontRole)) {
            sanitized.append(run.text);
        }
        return sanitized.toString();
    }

    private Set<Integer> codePointsFor(FontRole fontRole) {
        return fontRole == FontRole.BOLD ? boldCodePoints : regularCodePoints;
    }

    private static boolean containsAll(Set<Integer> codePoints, String text) {
        for (int index = 0; index < text.length(); ) {
            int codePoint = Character.codePointAt(text, index);
            if (!codePoints.contains(codePoint)) {
                return false;
            }
            index += Character.charCount(codePoint);
        }
        return true;
    }

    private static boolean isSingleCodePoint(String text) {
        return text.codePointCount(0, text.length()) == 1;
    }

    private boolean isMultiCodePointEmojiCluster(String text) {
        return !isSingleCodePoint(text) && containsAny(emojiCodePoints, text);
    }

    private static boolean containsAny(Set<Integer> codePoints, String text) {
        for (int index = 0; index < text.length(); ) {
            int codePoint = Character.codePointAt(text, index);
            if (codePoints.contains(codePoint)) {
                return true;
            }
            index += Character.charCount(codePoint);
        }
        return false;
    }

    private static boolean containsRestrictedSequenceMarker(String text) {
        for (int index = 0; index < text.length(); ) {
            int codePoint = Character.codePointAt(text, index);
            if (codePoint == 0x200D || isVariationSelector(codePoint)) {
                return true;
            }
            index += Character.charCount(codePoint);
        }
        return false;
    }

    private static void appendRun(List<Run> runs, String text, FontRole fontRole) {
        if (!runs.isEmpty()) {
            Run lastRun = runs.get(runs.size() - 1);
            if (lastRun.fontRole == fontRole) {
                runs.set(runs.size() - 1, new Run(lastRun.text + text, fontRole));
                return;
            }
        }
        runs.add(new Run(text, fontRole));
    }

    static List<String> splitConservatively(String text) {
        List<String> clusters = new ArrayList<>();
        int index = 0;

        while (index < text.length()) {
            int start = index;
            int codePoint = Character.codePointAt(text, index);
            index += Character.charCount(codePoint);

            while (index < text.length()) {
                int nextCodePoint = Character.codePointAt(text, index);
                if (isVariationSelector(nextCodePoint) || isCombiningMark(nextCodePoint)) {
                    index += Character.charCount(nextCodePoint);
                    continue;
                }
                if (nextCodePoint == 0x200D) {
                    index += Character.charCount(nextCodePoint);
                    if (index < text.length()) {
                        int joinedCodePoint = Character.codePointAt(text, index);
                        index += Character.charCount(joinedCodePoint);
                    }
                    continue;
                }
                break;
            }

            clusters.add(text.substring(start, index));
        }

        return clusters;
    }

    private static boolean isVariationSelector(int codePoint) {
        return codePoint == 0xFE0E
                || codePoint == 0xFE0F
                || (codePoint >= 0xE0100 && codePoint <= 0xE01EF);
    }

    private static boolean isCombiningMark(int codePoint) {
        int type = Character.getType(codePoint);
        return type == Character.NON_SPACING_MARK
                || type == Character.COMBINING_SPACING_MARK
                || type == Character.ENCLOSING_MARK
                || (codePoint >= 0x1F3FB && codePoint <= 0x1F3FF)
                || (codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF);
    }
}

# 4.3 `TtfCmapParser.java`

final class TtfCmapParser {
    private static final int CMAP_TAG = 0x636D6170;

    private TtfCmapParser() {
    }

    static Set<Integer> parse(InputStream inputStream) throws IOException {
        byte[] fontData = readAllBytes(inputStream);
        int tableCount = readUnsignedShort(fontData, 4);
        int cmapOffset = -1;

        for (int i = 0; i < tableCount; i++) {
            int recordOffset = 12 + i * 16;
            if (readInt(fontData, recordOffset) == CMAP_TAG) {
                cmapOffset = readInt(fontData, recordOffset + 8);
                break;
            }
        }

        if (cmapOffset < 0) {
            throw new IOException("cmap table not found");
        }

        int subtableCount = readUnsignedShort(fontData, cmapOffset + 2);
        Set<Integer> codePoints = new HashSet<>();
        boolean parsedSupportedFormat = false;

        for (int i = 0; i < subtableCount; i++) {
            int recordOffset = cmapOffset + 4 + i * 8;
            int subtableOffset = cmapOffset + readInt(fontData, recordOffset + 4);
            int format = readUnsignedShort(fontData, subtableOffset);

            if (format == 4) {
                parseFormat4(fontData, subtableOffset, codePoints);
                parsedSupportedFormat = true;
            } else if (format == 12) {
                parseFormat12(fontData, subtableOffset, codePoints);
                parsedSupportedFormat = true;
            }
        }

        if (!parsedSupportedFormat) {
            throw new IOException("supported cmap subtable not found");
        }

        return codePoints;
    }

    private static void parseFormat4(
            byte[] fontData,
            int offset,
            Set<Integer> codePoints
    ) throws IOException {
        int length = readUnsignedShort(fontData, offset + 2);
        ensureAvailable(fontData, offset, length);

        int segmentCount = readUnsignedShort(fontData, offset + 6) / 2;
        int endCodesOffset = offset + 14;
        int startCodesOffset = endCodesOffset + segmentCount * 2 + 2;
        int idDeltasOffset = startCodesOffset + segmentCount * 2;
        int idRangeOffsetsOffset = idDeltasOffset + segmentCount * 2;

        for (int i = 0; i < segmentCount; i++) {
            int startCode = readUnsignedShort(fontData, startCodesOffset + i * 2);
            int endCode = readUnsignedShort(fontData, endCodesOffset + i * 2);
            int delta = readUnsignedShort(fontData, idDeltasOffset + i * 2);
            int rangeOffset = readUnsignedShort(fontData, idRangeOffsetsOffset + i * 2);

            if (startCode > endCode) {
                throw new IOException("invalid cmap format 4 segment");
            }

            for (int codePoint = startCode; codePoint <= endCode; codePoint++) {
                if (codePoint == 0xFFFF) {
                    continue;
                }

                int glyphId;
                if (rangeOffset == 0) {
                    glyphId = (codePoint + delta) & 0xFFFF;
                } else {
                    int glyphOffset = idRangeOffsetsOffset
                            + i * 2
                            + rangeOffset
                            + (codePoint - startCode) * 2;
                    glyphId = readUnsignedShort(fontData, glyphOffset);
                    if (glyphId != 0) {
                        glyphId = (glyphId + delta) & 0xFFFF;
                    }
                }

                if (glyphId != 0) {
                    codePoints.add(codePoint);
                }
            }
        }
    }

    private static void parseFormat12(
            byte[] fontData,
            int offset,
            Set<Integer> codePoints
    ) throws IOException {
        long length = readUnsignedInt(fontData, offset + 4);
        if (length > Integer.MAX_VALUE) {
            throw new IOException("cmap format 12 is too large");
        }
        ensureAvailable(fontData, offset, (int) length);

        long groupCount = readUnsignedInt(fontData, offset + 12);
        if (groupCount > Integer.MAX_VALUE) {
            throw new IOException("too many cmap format 12 groups");
        }

        for (int i = 0; i < (int) groupCount; i++) {
            int groupOffset = offset + 16 + i * 12;
            long startCodePoint = readUnsignedInt(fontData, groupOffset);
            long endCodePoint = readUnsignedInt(fontData, groupOffset + 4);
            long startGlyphId = readUnsignedInt(fontData, groupOffset + 8);

            if (startCodePoint > endCodePoint || endCodePoint > Character.MAX_CODE_POINT) {
                throw new IOException("invalid cmap format 12 group");
            }

            for (long codePoint = startCodePoint; codePoint <= endCodePoint; codePoint++) {
                if (startGlyphId + codePoint - startCodePoint != 0) {
                    codePoints.add((int) codePoint);
                }
            }
        }
    }

    private static byte[] readAllBytes(InputStream inputStream) throws IOException {
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        byte[] buffer = new byte[8192];
        int read;
        while ((read = inputStream.read(buffer)) != -1) {
            outputStream.write(buffer, 0, read);
        }
        return outputStream.toByteArray();
    }

    private static int readUnsignedShort(byte[] data, int offset) throws IOException {
        ensureAvailable(data, offset, 2);
        return ((data[offset] & 0xFF) << 8) | (data[offset + 1] & 0xFF);
    }

    private static int readInt(byte[] data, int offset) throws IOException {
        ensureAvailable(data, offset, 4);
        return ((data[offset] & 0xFF) << 24)
                | ((data[offset + 1] & 0xFF) << 16)
                | ((data[offset + 2] & 0xFF) << 8)
                | (data[offset + 3] & 0xFF);
    }

    private static long readUnsignedInt(byte[] data, int offset) throws IOException {
        return readInt(data, offset) & 0xFFFFFFFFL;
    }

    private static void ensureAvailable(byte[] data, int offset, int length) throws IOException {
        if (offset < 0 || length < 0 || offset > data.length - length) {
            throw new IOException("invalid TTF offset");
        }
    }
}

# 5. PDF 生成クラスへの組み込み

以下では PDF 生成処理を持つクラスを MainActivity として説明する。別クラスに PDF 処理がある場合は、そのクラスへ同じ変更を行う。

# 5.1 registry フィールドを追加する

private NotoFontRegistry notoFontRegistry;

NotoFontRegistry は TTF の cmap を解析するため、PDF のセルごとに生成してはいけない。PDF 出力の初回に一度だけロードし、フィールドで保持する。

# 5.2 PDF 出力開始時にロードする

PDF の文字列描画を始める前に追加する。

if (notoFontRegistry == null) {
    notoFontRegistry = NotoFontRegistry.load(this);
}

Activity 以外でロードする場合は、this の代わりに Context を渡す。

3 個の TTF のロード、cmap 解析、置換文字 ？ の確認に失敗した場合は例外になる。システムフォントへ切り替えて処理を継続してはいけない。

# 5.3 幅計測メソッドを追加する

右寄せ、中央寄せを維持するため、sanitizer 適用後の run 単位で幅を合計する。

private float measureSanitizedTextWidth(
        Paint paint,
        String text,
        NotoTextSanitizer.FontRole baseFontRole
) {
    if (TextUtils.isEmpty(text)) {
        return 0.0f;
    }

    float width = 0.0f;
    for (NotoTextSanitizer.Run run :
            notoFontRegistry.sanitizer().sanitize(text, baseFontRole)) {
        paint.setTypeface(notoFontRegistry.typefaceFor(run.fontRole));
        width += paint.measureText(run.text);
    }
    return width;
}

必要な import:

import android.graphics.Paint;
import android.text.TextUtils;

# 5.4 描画メソッドを追加する

PDF へ文字列を渡す直前に sanitizer を適用し、run ごとの Typeface で描画する。

private void drawSanitizedText(
        Canvas canvas,
        Paint paint,
        String text,
        float x,
        float y,
        NotoTextSanitizer.FontRole baseFontRole
) {
    if (TextUtils.isEmpty(text)) {
        return;
    }

    float currentX = x;
    for (NotoTextSanitizer.Run run :
            notoFontRegistry.sanitizer().sanitize(text, baseFontRole)) {
        Typeface typeface = notoFontRegistry.typefaceFor(run.fontRole);
        paint.setTypeface(typeface);
        canvas.drawText(run.text, currentX, y, paint);
        currentX += paint.measureText(run.text);
    }
}

必要な import:

import android.graphics.Canvas;
import android.graphics.Typeface;

# 5.5 既存の `Canvas.drawText()` を置き換える

通常本文:

drawSanitizedText(
        canvas,
        paint,
        text,
        x,
        baselineY,
        NotoTextSanitizer.FontRole.REGULAR
);

太字の見出し:

drawSanitizedText(
        canvas,
        paint,
        title,
        x,
        baselineY,
        NotoTextSanitizer.FontRole.BOLD
);

中央寄せ:

float textWidth = measureSanitizedTextWidth(
        paint,
        title,
        NotoTextSanitizer.FontRole.BOLD
);

drawSanitizedText(
        canvas,
        paint,
        title,
        centerX - textWidth / 2.0f,
        baselineY,
        NotoTextSanitizer.FontRole.BOLD
);

右寄せ:

float textWidth = measureSanitizedTextWidth(
        paint,
        amountText,
        NotoTextSanitizer.FontRole.REGULAR
);

drawSanitizedText(
        canvas,
        paint,
        amountText,
        rightX - textWidth,
        baselineY,
        NotoTextSanitizer.FontRole.REGULAR
);

表セルの縦位置を計算する直前には、基準フォントを設定する。

paint.setTypeface(notoFontRegistry.typefaceFor(baseFontRole));
Paint.FontMetrics metrics = paint.getFontMetrics();

セル内クリップを使用している既存実装では、canvas.save()、canvas.clipRect()、canvas.restore() を維持する。

# 6. 削除する既存処理

PDF 描画のために次の処理を使用している場合は削除する。

EmojiCompat による PDF 描画
EmojiSpan.draw() による PDF 描画
Typeface.DEFAULT、Typeface.SANS_SERIF などへの fallback
未収録文字をそのまま Canvas.drawText() へ渡す処理
Paint.hasGlyph() だけで文字を許可する処理

画面表示など PDF 以外の用途で EmojiCompat を使用している場合、その処理まで削除する必要はない。

# 7. Android 6.0 対応

NotoFontRegistry は OS バージョンに応じて grapheme cluster の分割方法を切り替える。

OS	分割方法
Android 7.0 以上、API 24 以上	`android.icu.text.BreakIterator`
Android 6.0、API 23	`NotoTextSanitizer.splitConservatively()`

splitWithIcu() には次の注記が必要になる。

@RequiresApi(Build.VERSION_CODES.N)

android.icu.text.BreakIterator を API 23 の実行経路から直接呼び出してはいけない。

# 8. 手動確認用サンプル

PDF に出力される任意の欄へ、一時的に次の文字列を設定する。

String verificationText =
        "通常: ABC日本語 / 未収録: 𠀀 / アラビア語: العربية / "
                + "単純絵文字: 🙂 / ZWJ: 😶‍🌫️ / VS: ✈️ / 国旗: 🇯🇵 / キー: 1️⃣";

期待される表示:

通常: ABC日本語 / 未収録: ？ / アラビア語: ？？？？？？？ /
単純絵文字: 🙂 / ZWJ: ？ / VS: ？ / 国旗: ？ / キー: ？

表の備考欄で確認する場合:

if (rowIndex == 0) {
    memo = "العربية";
}