SumatraPDF 2.5.2 released

from Krzysztof Kowalczyk blog [alt+shift+b] in programming

We, the SumatraPDF developers have released a version 2.5.2 of Sumatra, a PDF and ebook reader for Windows. Changes in this release: 2 page view for ebooks new keybindings: Ctrl+PgDn, Ctrl+Right : go to next page Ctrl+PgUp, Ctrl+Left : go to previous page 10x faster ebook layout support JP2 images new advanced settings: ShowMenuBar, ReloadModifiedDocuments, CustomScreenDPI left/right clicking no longer changes pages in fullscreen mode (use Presentation mode if you rely on this feature) fixed multiple crashes continuos improvements to PDF rendering You can download Sumatra from www.sumatrapdfreader.org

over a year ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Krzysztof Kowalczyk blog

Desktop UI frameworks written by a single person

Less known desktop UI frameworks Writing desktop software is hard. The UI technologies of Windows or MacOS are awful compared to web technology. What can trivially be done with HTML/CSS/JavaScript in few minutes can take hours using Windows’s win32 APIs or Mac’s Cocoa. That’s why the default technology for desktop apps, especially cross-platform, is Electron: a Chrome browser combined with Node runtime. The problem is that it’s bloaty: each app is a unique build of Chrome with a little bit of application code. Chrome is over 100MB so many apps ship less than 1MB of code in a 100M wrapper. People tried to address the problem of poor OS APIs by writing UI frameworks, often meant to be cross-platform. You’ve heard about QT, GTK, wxWindows. The problem with those is that they are also old, their APIs are not the greatest either and they are bloaty as well. There just doesn’t seem to be a good option. Writing your own framework seems impossible due to the size of task. But is it? I’ll show a couple of less-known UI frameworks written mostly be a single person, often done simply to enable writing an application. SWELL in WDL WDL is interesting. Justin Frankel, the guy who created Winamp, has a repository of C++ code he uses in different projects. After selling Winamp to AOL, a side quest of writing file sharing application, getting fired from AOL for writing file sharing application, he started a company building Reaper a digital audio workstation software for Windows. Winamp is a win32 API program and so is Reaper. At some point Justin decided to make a Mac version but by then he had a lot of code heavily using win32 APIs. So he did what anyone in his position would: he implemented win32 APIs for Mac OS and Linux and called it SWELL - Simple Windows Emulation Layer. Ok, actually no-one else would do it. It was an insane idea but it worked. It’s important to not over-state SWELL capabilities. It’s not Wine. You can’t take any win32 program and recompile for Mac with SWELL. Frankel is insanely pragmatic and so is his code. SWELL only implements the subset of APIs he uses in Reaper. At the same time Reaper is a big app so if SWELL works for Reaper, it could work for your app. WDL is open-source using permissive MIT license. Sublime Text For a few years Sublime Text was THE programmer’s editor. It was written by a single developer in C++ and he wrote a custom UI toolkit for it. Not open source but its existence shows it can be done. RAD Debugger RAD Debugger is an open-source Windows debugger for C/C++ apps written in C by mostly a single person. It implements a custom UI framework based on 3D renderer. The UI is integral part of the the app but the code is well structured so you probably can take just their UI / render code and use it in your own C / C++ app. Currently the app / UI is only for Windows but it’s designed to be cross-platform and they are working on porting the renderer to Mac OS / Linux. They use permissive MIT license and everything is written in C. Dear ImGUI Dear ImGui is a newer cross-platform, UI framework in C++. Open source, permissive MIT license. Written by mostly a single person. Ghostty Ghostty is a cross-platform terminal emulator and UI. It’s written in Zig by mostly a single person and uses it’s own low-level GPU renderer for the UI. You too can write your own UI framework At first the idea of writing your own UI framework seems impossibly daunting. What I’m hoping to show is that if you’re ambitious enough it’s possible to build cross platform desktop apps that are not just bloated 100MB Chrome wrappers around few kilobytes of custom code. I’m not saying it’s a simple thing, just that enough people did it that it’s possible. It shouldn’t be necessary but both Microsoft and Apple have tragically dropped the ball on providing decent, high-performance UI libraries for their OS. Microsoft even writes their own apps, like Teams, in web technologies. Thanks to open source you’re not at the staring line. You can just use Dear ImGUI or WDL’s SWELL. Or you can extract the UI code from RAD Debugger or Ghostty (if you write in Zig). Or you can look at how their implementation to speed up your own design and implementation.

yesterday • 2 votes

Evolving Edna Ask AI UI

This is a real life example of tweaking UI in Edna, my note taking application with super powers. Ask AI is a simple AI chat: you write a question, send it to LLM model and get a response. Here’s my first version of the UI: What is good and bad about this version? Good: there’s a learn more link. This is not an obvious feature because you need to pick a model and provide an API key. An easy access to an explanation is good. Bad: It’s not clear enough that you need to provide an API key. To make things even more confusing, for models from OpenAI or xAI you can use either OpenAI/xAI key or OpenRouter key. Hence the Use OpenRouter checkbox. I decided to make things more explicit: I think it’s clearer, especially for someone new to the API keys. Checkbox is gone. Now I just ask for one or the other and will use whichever is given. A learn more links to section of the docs explaining how to get API key and answering basic questions someone new might have. Visually I think it would look better if the link was on the right but then it would be easier to miss. Functionality over aesthetics. Another subtle touch: explicit placeholder Enter OpenAI API key text in input field.

yesterday • 1 votes

Implementing UI translation in SumatraPDF, a C++ Windows application

Translating user interface of SumatraPDF SumatraPDF is the best PDF/eBook/Comic Book viewer for Windows. It’s small, fast, full of features, free and open-source. It became popular enough that it made sense to translate the UI for non-English users. Currently we support 72 languages. This article describes how I designed and implemented a translation system in SumatraPDF, a native win32 C++ Windows application. Hard things about translating the UI There are 2 hard things about translating an application code for translation system (extracting strings to translate, translate strings from English to user’s language) translating them into many languages Extracting strings to translate from source code Currently there are 381 strings in SumatraPDF subject to translation. It’s important that the system requires the least amount of effort when adding new strings to translate. Every string that needs to be translated is marked in .cpp or .h file with one of two macros: _TRA("Rename") _TRN("Open") I have a script that extracts those strings from source files. Mine is written in Go but it could just as well be Python or JavaScript. It’s a simple regex job. _TR stands for “translation”. _TRA(s) expands into const char* trans::GetTranslation(const char* str) function which returns str translated to current UI language. We auto-detect language at startup based on Windows settings and allow the user to explicitly set UI language. For English we just return the original string. If a string to be translated is e.g. a part of const char* array[], we can’t use trans::GetTranslation(). For cases like that we have _TRN() which expands to English string. We have to write code to translate it at some point. Adding new strings is therefore as simple as wrapping them in _TRA() or _TRN() macros. Translating strings into many languages Now that we’ve extracted strings to be translated, we need to translate them into 72 languages. SumatraPDF is a free, open-source program. I don’t have a budget to hire translators. I don’t have a budget, period. The only option was to get help from SumatraPDF users. It was vital to make it very easy for users to send me translations. I didn’t want to ask them, for example, to download some translation software. Design and implementation of AppTranslator web app I couldn’t find a really simple software for crowd sourcing translations so I wrote my own: https://github.com/kjk/apptranslator You can see it in action: https://www.apptranslator.org/app/SumatraPDF I designed it to be generic but I don’t think anyone else is using it. AppTranslator is simple. Per https://tools.arslexis.io/wc/: 4k lines of Go server code 451 lines of html code a single dependency: bootstrap CSS framework (the project is old) It’s simple because I don’t want to spend a lot of time writing translation software. It’s just a side project in service of the goal of translating SumatraPDF. Login is exclusively via GitHub. It doesn’t even use a database. Like in Redis, changes are stored as a series of operations in an append-only log. We keep the whole state in memory and re-create it from the log at startup. Main operation is translate a string from English to language X represented as [kOpTranslation, english string, language, translation, user who provided translation]. When user provides a translation in the web UI, we send an API call to the server which appends the translation operation to the log. Simple and reliable. Because the code is written in Go, it’s very fast and memory efficient. When running it uses mere megabytes of RAM. It can comfortably run on the smallest 256 MB VPS server. I backup the log to S3 so if the server ever fails, I can re-install the program on a new server and re-download the translations from S3. I provide RSS feed for each language so that people who provide translations can monitor for new strings to be translated. Sending strings for translation and receiving translations So I have a web app for collecting translations and a script that extracts strings to be translated from source code. How do they connect? AppTranslator has an API for submitting the current set of strings to be translated in the simplest possible format: a line for each string (I ensure there are no newlines in the string itself by escaping them with \n) API is password protected because only I can submit the strings. The server compares the strings sent with the current set and records a difference in the log. It also sends a response with translations. Again the simplest possible format: AppTranslator: SumatraPDF 651b739d7fa110911f25563c933f42b1d37590f8 :%s annotation. Ctrl+click to edit. am:%s մեկնաբանություն: Ctrl+քլիք՝ խմբագրելու համար: ar:ملاحظة %s. اضغط Ctrl للتحرير. az:Qeyd %s. Düzəliş etmək üçün Ctrl+düyməyə basın. As you can see: a string to translate is on a line starting with : is followed by translations of that strings in the format: ${lang}: ${translation} An optimization: 651b739d7fa110911f25563c933f42b1d37590f8 is a hash of this response. If I submit this hash with my request and translations didn’t change on the server, the response is empty. Implementing C++ part of translation system So now I have a text file with translation downloaded from the server. How do I get a translation in my C++ code? As with everything in SumatraPDF, I try to do things in a simple and efficient way. The whole Translation.cpp is only 239 lines of code. The core of translation system is const char* trans::GetTranslation(const char* s); function. I embed the translations in exact the same format as received from AppTranslator in the executable as data file in resources. If the UI language is English, we do nothing. trans::GetTranslation() returns its argument. When we switch the language, we load the translations from resources and build an index: an array of English strings an array of corresponding translations Both arrays use my own StrVec class optimized for storing an array of strings. To find a translation we scan the first array to find an index of the string and return translation from the second array, at the same index. Linear scan seems like it would be slow but it isn’t. Resizing dialogs I have a few dialogs defined in SumatraPDF.rc file. The problem with dialogs is that position of UI elements is fixed. A translated string will almost certainly have a different size than the English string which will mess up fixed layout. Thankfully someone wrote DialogSizer that smartly resizes dialogs and solves this problem. The evolution of a solution No AppTranslator My initial implementation was simpler. I didn’t yet have AppTranslator so I stored the strings in a text file in repository in the same format as what I described above. People would download it, make changes using a text editor and send me the file via email which I would then checkin. It worked for a while but it became worse over time. More strings, more languages created more work for me to manually manage e-mail submissions. I decided to automate the process. Code generation My first implementation of C++ side used code generation instead of embedding the text file in resources. My Go script would generate C++ source code files with static const char* [] arrays. This worked well but I decided to improve it further by making the code use the text file with translations embedded in the app. The main motivation for the change was to open a possibility of downloading latest translations from the server to fix the problem of translations not being all ready when I build the release executable. I haven’t done that yet but it’s now easier to implement given that the format of strings embedded in the exe is the same as the one I can download from AppTranslator. Only utf-8 SumatraPDF started by using both WCHAR* Unicode strings and char* utf8 strings. For that reason the translation system had to support returning translation in both WCHAR* and char* version. Over time I refactored the code to use mostly utf8 and at some point I no longer needed to support WCHAR* version. That made the code even smaller and reduced memory usage. The experience I’m happy how things turned out. AppTranslator proved to be reliable and hassle free. It runs for many years now and collected 35440 string translations from users. I automated everything so that all I need to do is to periodically re-run the script that extracts strings from source code, uploads them to AppTranslator and downloads latest translations. One problem is that translations are not always ready in time for release so I make a release and then people start translating strings added since last release. I’ve considered downloading the latest translations from the server, in addition to embedding them in an executable at the time of building the app. Would I do the same today? While AppTranslator is reliable and doesn’t require on-going work, it would be better to not have to run a server at all. The world has changed since I started SumatraPDF. Namely: people are comfortable using GitHub and you can edit files directly in GitHub UI. It’s not a great experience but it works. One option would be to generate a translation text file for each language, in this format: :first untranslated string :second untranslated string :first translated string translation of first string :second translated string translation of second string Untranslated strings are listed at the top, to make it easier to find. A link would send a translator directly to edit this file in GitHub UI. When translator saves translations, it creates a PR for me to review and merge. The roads not taken But why did you re-invent everything? You should do X instead. All other X that I know about suck. Using per-language .rc resource files Traditional way of localizing / translating Window GUI apps is to store all strings and dialog definitions in an .rc file. Each language gets its own .rc file (or files) and the program picks the right resource based on a language. This doesn’t solve the 2 hard problems: having an easy way to add strings for translations having an easy way for users to provide translations XML horror show There was a dark time when the world was under the iron grip of XML fanaticism. Everything had to be an XML file even when it was the worst possible solution for the problem. XML doesn’t solve the 2 hard problems and a string storage format is an absolute nightmare for human editing. GNU gettext There’s a C library gettext that uses .po files. This is much saner solution than XML horror show. .po files are relatively simple text format. The code is already written. Warning: tooting my own horn. My format is better. It’s easier for people to edit, it’s easier to write code to parse it. This looks like many times more than 239 lines of code. Ok, gettext probably does a bit more than my code, but clearly nothing than I need. It also doesn’t solve the 2 hard problems. I would still have to write code to extract strings from source code and build a way to allow users to translate them easily.

2 days ago • 3 votes

Calling Grok, OpenAI, Anthropic, Google, OpenRouter API from the browser

Here’s what I learned about calling LLM APIs from the browser when building AI chat functionality in my note taking app Edna. The API I care about is getting LLM response to a question in a streaming way. OpenAI pioneered this and created https://api.openai.com/v1/chat/completions POST API. Others created a compatible API for their LLM to make it easy for programmers to migrate. xAI has https://api.x.ai/v1/chat/completions for Grok and OpenRouter has https://openrouter.ai/api/v1/chat/completions. Google and Anthropic have similar APIs but they use CORS to disallow calling them from the browser. Baffling restriction. For now I decided to not support them directly. I could route the requests via the server but I can use OpenRouter instead. I’ve seen TypingMind call Google API from the browser but using a different API endpoint. Again, for now I decided to not support Google directly. OpenRouter is an interesting service and business. They provide unified API for lots of different models so I can use Google or Anthropic APIs via OpenRouter and lots of other models. They charge 5% on top of what they pay the providers, which is reasonable if you consider that they probably pay ~3% for processing credit card fees. For now I support OpenAI and Grok directly and everyone else via OpenRouter.

2 days ago • 3 votes

Case study of over-engineered C++ code

You’ve heard of over-engineered, unnecessarily complex code but what exactly is it? I believe it’s best to show by example. While it’s not my intention to criticize other people’s code, I think it’s better to show code and how to improve it rather than vaguely talk about principles. The code that I consider over-engineered: xbinary.h xbmp.h xbmp.cpp The problem it’s trying to solve This code detects file format from file content. Are the bytes an audio wav file? A jpeg image? In addition, it extracts some properties of the file format e.g. a dimension of the image file. Over-engineered implementation If you’re C++ programmer it’s natural to use object oriented design. You create XBinary base class with some virtual methods and then write a class for each format that inherits from XBinary. Here’s just a small part: class XBinary { virtual FT getFileType(); virtual ENDIAN getEndian(); // ... much more stuff }; And implementation for detecting BMP images: class XBMP : public XBinary { // ... }; XBinary::FT XBMP::getFileType() { return FT_BMP; } XBinary::MODE XBMP::getMode() { return XBinary::MODE_DATA; } Over-engineering in the small: unnecessary virtual functions This code is over-engineered in the small. Notice that file type or mode are static per file format. They most likely do not depend on the data. We can easily eliminate virtual calls by storing this info as class members. class XBinary { private: FT _fileType; MODE _mode; public: FT getFileType() { return _fileType; } MODE getMode() { return _mode; } }; class XBMP : public XBinary { XBMP() { fileType = FT_BMP; mode = XBinary::MODE_DATA; } }; We replaced multiple virtual methods with a single non-virtual method in base class. That is a saving in both code size and code speed. Parsing and compiling more stuff requires compiler to do more work which takes more time. Removing getter methods We don’t need the method that just returns the value of a variable. A function is necessary if it does some computation on data. It doesn’t happen here so it’s just bad habit of mindlessly adding unnecessary code. We can simplify by making class members public: class XBinary { public: FT fileType; MODE mode; }; Over-engineering in the big: design XBinary defines 9 virtual classes, each to get small bits of information about the format: class XBMP { virtual FT getFileType(); virtual QString getMIMEString(); virtual QString getArch(); virtual MODE getMode(); virtual ENDIAN getEndian(); virtual QString getFileFormatExt(); virtual QString getFileFormatExtsString(); virtual _MEMORY_MAP getMemoryMap(MAPMODE mapMode = MAPMODE_UNKNOWN, PDSTRUCT *pPdStruct = nullptr); virtual QString getVersion(); } An observation: they are all cheap to compute. Typically reading a few bytes of memory. Instead of providing a function for each piece of information, we could have one function that returns everything. struct FormatInfo { FT fileType; QString mimeString; // ... the other 7 values }; class XBMP { virtual FormatInfo getFormatInfo(); }; We save lots of code by consolidating 9 different functions into one. We also simplified the API and I believe someone new to the library would figure out how to use smaller interface faster. Each format also provides a per-format data. XBMP has: class XBMP { BMPFILEHEADER getFileHeader(); BMPINFOHEADER getInfoHeader(); }; Again, we could combine this into one function: struct XbmpInfo { BMPFILEHEADER fileHeader; BMPINFOHEADER infoHeader; }; class XBMP { XbmpInfo getXbmpInfo(); virtual FormatInfo getFormatInfo(); }; But wait, we can combine this info a single function: struct XbmpInfo { FormatInfo formatInfo; BMPFILEHEADER fileHeader; BMPINFOHEADER infoHeader; } class XBMP { XbmpInfo getXbmpInfo(); }; In fairness, we’ve lost ability to use the API in a certain way. Before if code was only interested in common properties in FormatInfo it could use the same code for every class representing a format. Now it has to know it’s calling e.g. XBMP and get FormatInfo out of XbmpInfo. We don’t need classes at all The best part is no part. If we had to write this code in C, how would we do it? We could implement each format detector as a single function. struct XbmpInfo { bool isValid; FormatInfo formatInfo; BMPFILEHEADER fileHeader; BMPINFOHEADER infoHeader; }; XbmpInfo maybeDetectXbmp(char* data, size_t dataSize); We could remove all the classes, all their methods. It’s a massive saving in code. You save binary size, you make the code faster but most importantly you save yourself time because this is code you don’t have to write. You can focus on writing the logic for parsing file formats not on incidental complexity of typing up class names and methods. Why do people over-complicate? Because simplicity is hard. Because when you have a hammer, everything looks like a nail. After you’ve read a book on C++, teaching you about inheritance, virtual functions etc. it’s only natural to start modeling all problems with class inheritance and virtual functions. To then come up with a solution that reduces all that to a function requires thinking outside the C++ box. Thinking outside the box is hard.

3 days ago • 6 votes

More in programming

Thoughts on Motivation and My 40-Year Career

I’ve never published an essay quite like this. I’ve written about my life before, reams of stuff actually, because that’s how I process what I think, but never for public consumption. I’ve been pushing myself to write more lately because my co-authors and I have a whole fucking book to write between now and October. […]

10 hours ago • 4 votes

Single-Use Disposable Applications

As search gets worse and “working code” gets cheaper, apps get easier to make from scratch than to find.

15 hours ago • 4 votes

Desktop UI frameworks written by a single person

yesterday • 2 votes

Logic for Programmers Turns One

I released Logic for Programmers exactly one year ago today. It feels weird to celebrate the anniversary of something that isn't 1.0 yet, but software projects have a proud tradition of celebrating a dozen anniversaries before 1.0. I wanted to share about what's changed in the past year and the work for the next six+ months. The Road to 0.1 I had been noodling on the idea of a logic book since the pandemic. The first time I wrote about it on the newsletter was in 2021! Then I said that it would be done by June and would be "under 50 pages". The idea was to cover logic as a "soft skill" that helped you think about things like requirements and stuff. That version sucked. If you want to see how much it sucked, I put it up on Patreon. Then I slept on the next draft for three years. Then in 2024 a lot of business fell through and I had a lot of free time, so with the help of Saul Pwanson I rewrote the book. This time I emphasized breadth over depth, trying to cover a lot more techniques. I also decided to self-publish it instead of pitching it to a publisher. Not going the traditional route would mean I would be responsible for paying for editing, advertising, graphic design etc, but I hoped that would be compensated by much higher royalties. It also meant I could release the book in early access and use early sales to fund further improvements. So I wrote up a draft in Sphinx, compiled it to LaTeX, and uploaded the PDF to leanpub. That was in June 2024. Since then I kept to a monthly cadence of updates, missing once in November (short-notice contract) and once last month (Systems Distributed). The book's now on v0.10. What's changed? A LOT v0.1 was very obviously an alpha, and I have made a lot of improvements since then. For one, the book no longer looks like a Sphinx manual. Compare! Also, the content is very, very different. v0.1 was 19,000 words, v.10 is 31,000.1 This comes from new chapters on TLA+, constraint/SMT solving, logic programming, and major expansions to the existing chapters. Originally, "Simplifying Conditionals" was 600 words. Six hundred words! It almost fit in two pages! The chapter is now 2600 words, now covering condition lifting, quantifier manipulation, helper predicates, and set optimizations. All the other chapters have either gotten similar facelifts or are scheduled to get facelifts. The last big change is the addition of book assets. Originally you had to manually copy over all of the code to try it out, which is a problem when there are samples in eight distinct languages! Now there are ready-to-go examples for each chapter, with instructions on how to set up each programming environment. This is also nice because it gives me breaks from writing to code instead. How did the book do? Leanpub's all-time visualizations are terrible, so I'll just give the summary: 1180 copies sold, $18,241 in royalties. That's a lot of money for something that isn't fully out yet! By comparison, Practical TLA+ has made me less than half of that, despite selling over 5x as many books. Self-publishing was the right choice! In that time I've paid about $400 for the book cover (worth it) and maybe $800 in Leanpub's advertising service (probably not worth it). Right now that doesn't come close to making back the time investment, but I think it can get there post-release. I believe there's a lot more potential customers via marketing. I think post-release 10k copies sold is within reach. Where is the book going? The main content work is rewrites: many of the chapters have not meaningfully changed since 1.0, so I am going through and rewriting them from scratch. So far four of the ten chapters have been rewritten. My (admittedly ambitious) goal is to rewrite three of them by the end of this month and another three by the end of next. I also want to do final passes on the rewritten chapters; as most of them have a few TODOs left lying around. (Also somehow in starting this newsletter and publishing it I realized that one of the chapters might be better split into two chapters, so there could well-be a tenth technique in v0.11 or v0.12!) After that, I will pass it to a copy editor while I work on improving the layout, making images, and indexing. I want to have something worthy of printing on a dead tree by 1.0. In terms of timelines, I am very roughly estimating something like this: Summer: final big changes and rewrites Early Autumn: graphic design and copy editing Late Autumn: proofing, figuring out printing stuff Winter: final ebook and initial print releases of 1.0. (If you know a service that helps get self-published books "past the finish line", I'd love to hear about it! Preferably something that works for a fee, not part of royalties.) This timeline may be disrupted by official client work, like a new TLA+ contract or a conference invitation. Needless to say, I am incredibly excited to complete this book and share the final version with you all. This is a book I wished for years ago, a book I wrote because nobody else would. It fills a critical gap in software educational material, and someday soon I'll be able to put a copy on my bookshelf. It's exhilarating and terrifying and above all, satisfying. It's also 150 pages vs 50 pages, but admittedly this is partially because I made the book smaller with a larger font. ↩

2 days ago • 5 votes

Implementing UI translation in SumatraPDF, a C++ Windows application

2 days ago • 3 votes

New here?