How to study while walking in nature


    Note: This post is simply a collection of my ideas for this concept, with varying levels of completion. The code isn't public yet because I can't be bothered to remove the hardcoded API keys from the repos.

AR-based approach (11-3-2024) Sitting at a desk for many hours every day isn't good for me. Instead, I want to learn/study while walking in nature. Hardware - Xreal Air AR glasses ($200) - ~~Mini Bluetooth ring remote ($40)~~ Edit: Sadly this product turned out to be terrible, 8BitDo Micro is probably a better option - Phone (for mobile hotspot) - Laptop (sends video from web app to glasses, my phone doesn't support it) - Backpack (to hold laptop) - Drawing tablet w/ pen (optional) The glasses provide a 1080p overlayed screen for visual aid. They're wired to my laptop, which runs my custom study software. They also include 2 speakers with decent quality, and 2 microphones for speech input. The bluetooth ring controls simple parts of the application. Software (WIP) I want to be able to upload a source of content (video, document, audio), learn from that content while I walk. These sources are uploaded and prepared *before* the walk. Video Playback Videos simply play in an embed on the web interface. This could use the Jumpcutter extension to remove silences in lectures. It's super nice! The bluetooth remote would play, pause, and skip forward/back. PDF reading/listening When PDFs are uploaded, they would go through a preparation pipeline. PDF -> Mathpix OCR -> Latex -> Latex AST The AST is useful for removing less relevant content from the pdf (i.e. page numbers, large gaps). The PDF would be shown on the glasses display. For long paragraphs, you can enable speed-reader mode, which flashes the words in place, allowing most users to reader much quicker. For PDF listening, relevant parts of the document would be sent to ElevanLabs for TTS. Using NLP (spaCy), conceptual terms, proper nouns, and LaTeX are extracted from the text. After scraping the wikipedia image for each term, the images appear in the view of the listener as they're mentioned. This is possible due to ElevenLabs's token-level timestamps that come with the TTS. I also have plans for even better automatic accomanying visuals to appear, like a 3D globe zooming into a country/region, or a 3D model human anatomic structures. How would the LLM speak LaTeX correctly? When latex is detected in a piece of content, the latex is sent to an LLM, asking it for conversion to it's spoken form. For example: $\text{proj}_{\vec{b}} \vec{a} = \frac{\vec{a} \cdot \vec{b}}{\vec{b} \cdot \vec{b}} \vec{b}$ becomes... "The projection of vector a onto vector b is equal to the dot product of vector a with vector b, divided by the dot product of vector b with itself, multiplied by vector b." Chat/Input Mode If I get stuck on a topic from the content, I click one of the little buttons on the bluetooth remote and switch Chat Mode. For video lectures, all of words last spoken (subtitles) are added to an LLM's context window. PDFs follow a similar pattern. After entering Chat Mode, the microphone on the glasess would pick up my voice. Note: before my speech is transcribed, gaps of silence are removed, reducing STT costs. I can speak for any duration of time. I will not be annoyingly interupted. I can pause and think for minutes if I wish, articulating my question. Canvas input I like the idea of sometimes bringing my drawing tablet with me on study walks. Voice isn't the only possible input method. In canvas mode, the screen is taken up by a canvas, which allows me to complete workbook problems, and ask for feedback/guidance from the LLM by sending the canvas to the LLM's vision API. LLM Output Mode When I'm ready for the LLM to respond, I can manually toggle to LLM Output Mode. Similar to the PDF system, LLM Output Mode supports both reading text and listening. This would be toggled using the bluetooth remote. Automatic Anki flashcards This feature is essentially a GPT function call:


    {
        "name": "generate_flashcard",
        "description": "Generates a basic flashcard with a front and a back when the user requests them to make a flashcard.",
        "parameters": {
            "type": "object",
            "properties": {
            "front": {
                "type": "string",
                "description": "Content displayed on the front of the flashcard."
            },
            "back": {
                "type": "string",
                "description": "Content displayed on the back of the flashcard."
            }
            },
            "required": []
        }
    }

The resulting flashcard can be sent to Anki through AnkiConnect. I've also imagined a similar function called "Save for later". This can be used whenever a topic is better suited for another time or environemnt (not everything is best learned from a pair of glasses, I think).

Older notes (origin of the idea): Audio-based approach (8-14-2024) Sitting in a chair for hours upon hours per day can (at times) be an inherently depressing experience for a human. For most of our species history, we spent most of our time walking in nature. The question of whether or not doing light exercise boosts cognitive ability is an interesting one. However, that's not the problem I'm trying to solve. The main focus of this idea, and the ideas about the technologies behind it are focused on ergonomics rather than cognitive enhancement. - the user of this tech could have audio note-taking. they pause the video or TTS and speak their own thoughts on it - the llm powered system could create flashcards, or enquire about their knowledge of the topic for active recall I want to be able to go on walks and interact with my phone completely verbally. I'm willing to develop an app to do this. However, I want to have a small amount of input via buttons. This will probably require an external device, like some kind of remote. I'll include a few examples of the things I want to do: For one, I want to listen to audiobooks with interspersed audio active recall response recordings. Sometimes I want to be able to take notes on a book that I'm listening to by recording my voice. The way I would like this work is the following. First, when I'm holding down a button on my remote, the audio would play. When I let go of it, the audio stops and my phone immediately starts recording my voice. When I press the button on the remote again, the audio skips back about 3 seconds, and then continues. This way, the barrier of entry for recalling what I just listened to is very easy. Another idea I have is interacting with an LLM while walking and listening. The LLM would have context about the thing I'm listening to. The LLM should also have the ability to run functions like making flashcards about the subject at hand, or have the ability to add notes to my database. It could also just simplify the subject or supply additional information about it. The next part to consider for this project is the formats that can be imported. This should definitely include at least the following: - audiobooks - pdfs with tts - youtube videos (podcasts/lectures) The particular problem I'm encountering is the hardware setup. Here are my current ideas: - Phone as the main computing device - Cloud services for AI models or complex computation - Bluetooth remote of some kind for controlling the mode that the phone is in - Some kind of microphone