Document scanner apps are one of the more popular tools that make smart use of mobile device features, like the built-in camera and touch-screen, to make scanning both convenient and practical for virtually anyone.

For example: Google Drive’s document scanning feature lets you take pictures of items such as receipts, letters, billing statements, etc. and save them as PDFs on your Drive. But the resulting PDF document only contains static images without any interactive text.

In this post, we recreate this feature and take it a step further, by building an Android document scanner app using PDFTron’s OCR module. This makes text in your scanned documents searchable and selectable. And since we’re using PDFTron to view the resulting PDF file, we can also annotate and edit the document!

Sample code for this post can be found on Github, and you can try our sample by installing the APK here.

To keep things simple, the OCR portion uses Firebase Cloud Functions and Firebase Cloud Storage, and the client scanner app will be based on our fork of a third-party library, AndroidScannerDemo.

linkServer Setup

  1. Create a new Firebase Project.
  2. Set up Firebase Cloud Storage by selecting a host region as described here.
  3. Clone our GitHub repository. For this part of the guide, we will primarily be working in the server folder.
  4. Follow the steps in the read me guide to set up Firebase Cloud Functions in your Firebase Project.

linkClient Setup

  1. Create a new Android project using Android Studio.
  2. Download the following AAR file and add the AAR as a new module dependency in your project.
  3. Integrate the PDFTron library via Gradle, as described here.
  4. Also, add the following dependencies to your app's build.gradle file.

    dependencies {
        // Firebase Storage required to upload our image for processing
        implementation 'com.google.firebase:firebase-storage:19.2.0'
    
        // OkHttp is used to call our cloud function
        implementation("com.squareup.okhttp3:okhttp:4.9.0")
        implementation 'io.reactivex.rxjava3:rxandroid:3.0.0'
        implementation 'io.reactivex.rxjava3:rxjava:3.0.0'
    
        // Any other dependencies
        // ...
    }
  5. Next, as mentioned previously, the Android app will use our fork of a third-party scanner library, found here. We'll use this library to capture, crop, and filter images using the built-in camera.

    You can launch the scanner and handle the returned image by calling the following in your MainActivity. (Note: the uploadFile method will be implemented later in the guide.)

    // Add callback to handle returned image from scanner
    val scannerLauncher = registerForActivityResult(ScannerContract()) { uri ->
        if (uri != null) {
            // Obtain the bitmap and save as a local image file
            var bitmap: Bitmap? = null
            bitmap = MediaStore.Images.Media.getBitmap(contentResolver, uri)
            contentResolver.delete(uri!!, null, null)
    
            // Save bitmap to local cache as image then upload for processing
            val localJpeg = Utils.saveBitmapAsJpeg(bitmap)
    
            // Process image on server
            uploadFile(localJpeg)
        }
    }
    
    ...
    
    // Launch the scanner activity
    scannerLauncher.launch(ScanConstants.OPEN_CAMERA)
  6. In your MainActivity, add the following fields below. Replace the placeholder strings with your Firebase Storage bucket and Cloud Function URL. These will be referenced later when processing the image.

    private val bucket = "FIREBASE_STORAGE_BUCKET"
    private val cloudFunctionUrl: String = "CLOUD_FUNCTION_URL"
    
    private val storage: FirebaseStorage = FirebaseStorage.getInstance(bucket)
  7. Now let's add code that will help us with the OCR portion, which creates searchable and selectable text from static images. There are three steps to this process: uploading the image to Firebase Cloud Storage, running our Firebase Cloud Function, and downloading the processed PDF document from Cloud Storage.

    In your MainActivity, add the following methods:

    private fun uploadFile(localFile: File) {
        val reference = storage.reference
        val fileName = localFile.name
        val fileReference = reference.child(fileName)
        val uploadTask = fileReference.putFile(Uri.fromFile(localFile))
        // Register observers to listen for when the download is done or if it fails
        uploadTask.addOnSuccessListener {
            // If successful, we run our cloud function with the given file
            runCloudFunction(fileName)
        }
    }
    
    private fun runCloudFunction(fileName: String) {
        // Call cloud function using HTTP request using OkHttp and RxJava
        Single.create<String> {
            // Create HTTP request to trigger cloud function
            val httpBuilder = cloudFunctionUrl.toHttpUrlOrNull()!!.newBuilder()
                .addQueryParameter("file", fileName)
            val request = Request.Builder().url(httpBuilder.build()).build()
            val client = OkHttpClient.Builder().readTimeout(60, TimeUnit.SECONDS)
                .writeTimeout(60, TimeUnit.SECONDS).callTimeout(60, TimeUnit.SECONDS).build()
            val response = client.newCall(request).execute()
            if (response.isSuccessful) {
                it.onSuccess(response.body!!.string())
            } else {
                it.onError(IOException(response.message))
            }
        }.apply {
            subscribeOn(Schedulers.io())
                .observeOn(AndroidSchedulers.mainThread())
                .subscribe { it ->
                    // If processing is successful, download processed file from Firebase Storage
                    downloadStorageFile(it.replace("\"", "")) // trim result
                }
        }
    }
    
    private fun downloadStorageFile(fileName: String) {
        val reference = storage.reference
        val fileReference = reference.child(fileName)
        val localFile = File(cacheDir, fileName)
    
        fileReference.getFile(localFile).addOnSuccessListener {
            // Open processed document in PDF viewer
            val config = ViewerConfig.Builder().openUrlCachePath(cacheDir.absolutePath).build()
            DocumentActivity.openDocument(this@MainActivity, Uri.fromFile(localFile), config)
        }
    }

Now you can capture a physical document, upload it to Firebase for OCR processing, and open the text-searchable and -selectable PDF document in the PDFTron viewer.

PDFTron Scanner with OCR

linkMore Features, Next Steps

By following the steps above, you’ve created a professional Android scanner app for your invoices, bills, letters, and other paper statements. And by using the PDFTron document viewer, you can then mark up those scanned documents by adding annotations, signatures, stamps, and much more! You can also drop in loads of other PDFTron SDK capabilities if you wish, such as redaction, page manipulation, etc. to edit your scanned documents.

Download our free trial and explore our guides & documentation for our Android PDF library to see the possibilities for yourself.

And if you have any questions, please feel free to get in touch!

You can find the source code for this blog post at Github.