Personal News Feed

β€”

by

in

If you want to just check daily Astro and AI news you can just check the Latest News page on my blog. The following post describes my journey of getting there

(I say journey but it took a few days really – I would say it was a weekend project and could have been shorter if my secrets worked as expected! Also i was managing two remotes through the same code as they had slight differenced so that got a bit tricky – why i did that – i dont know but now i’m an expert)

Honest Disclaimer – This Blog was written with the help of ChatGPT with detailed prompting and some editing and enhancements by me πŸ™‚

Keeping up with the latest in AI and astrophysics is overwhelming. ArXiv gets 300+ astrophysics papers every single day. And I was juggling full time school, a full time job, and independent research, so keeping upto date felt harder and I was getting a strong sense of FOMO.

That’s why I built Personal News Feed an AI-powered daily summarization pipeline that condenses selected RSS feeds into digestible summaries. Think of it as your personal AI research assistant. It tracks astro research, AI breakthroughs, and eventually whatever you want it to.

It’s first public iteration live, and you can try it here. Scroll summaries, view top articles, or dig into full content.


✨ Why I Built This

I’m a physicist in training and a data scientist by profession. My dream has always been to merge deep learning with satellite data, AI with space, knowledge with impact. But the volume of updates in AI and astrophysics was unbearable. I needed a tool that filtered noise and gave me context that I wanted.

This project gave me the perfect playground to combine:

  • My love for research – cause no way i was going through abstracts of 300+ papers daily
  • My full stack and engineering skills – trust me, I thought i knew this afterall i had deployed react frontends on cloudfront and fast apis on beanstalk, apigateway, and use airflow crons on a daily basis – but setting up these so called simple managed things like HF spaces, gradio UI, Github actions was kinda new and a tad bit frustrating in the beginning
  • My curiosity to build tools that actually help people – charity begins at home – so people as in me – but i did take extra time to figure out how to set a proxy public UI in HF – its kinda cool
  • And a workflow I can run from a cozy cafe someday – life is busy when I wake up early in the morning, point my telescope right, go for a run, walk my dog, bake amazing stuff for my cafe and then look at all what the smart people in the world have discovered and shared and then see how I can use it to improve my research or whatever i’d be working on. Lol, i don’t know why my custom GPT felt the need to include this point – but I thought I’ll keep it afterall this is the ultimate dream πŸ˜›

πŸ› οΈ What It Does

  • Ingests custom-selected RSS feeds (ArXiv, science news, tech blogs)
  • Categorizes them by themes
  • Summarizes articles using OpenAI + prompt engineering (TODO – need to improve)
  • Top Entries Lists top entries which would definitely be worth exploring/reading further (available through both UI+API)
  • Raw Entries Lists all entries fetched – for going through if desired (UI also allows an export to csv)
  • Select Entry Allows user to concisely summarize any entry from the RSS feed (available through both UI + API)
  • Schedules Daily Summarization Uses the Flask API endpoints deployed on Hugging Face spaces through a docker – accesses them daily using a cron job in Github Actions
  • Publishes daily summaries on Github Pages for AI and Astro – embedded on my Word Press Blog using iframes πŸ™‚ and can be accessed here (I could add an action to write to WP – but requires WP credentials – and i was too lazy to set up new app credentials)
  • Provides an interactive UI via Gradio on Hugging Face – using a Public Hugging Face repo which calls the Gradio UI of Private Repo where the code is hosted with custom secret sauce prompts (need improvements though)

There’s also a historical archive you can browse.


🧱 Architecture Breakdown

Here’s how it all fits together:

1. Notebook to App

Started as a Jupyter notebook for summarizing ArXiv feeds. I refined it into a modular app with:

/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ init.py
β”‚ β”œβ”€β”€ main.py # CLI entry point for running summaries locally
β”‚ β”œβ”€β”€ config.py # App config: which feeds to use, which prompts, audience type, etc.
β”‚ β”œβ”€β”€ llm_client.py # Wrapper functions to communicate with OpenAI (or other LLMs)
β”‚ β”œβ”€β”€ prompt_templates.py # JSON templates for prompt logic (left empty in public repo)
β”‚ β”œβ”€β”€ response_parser.py # Post-processing logic to clean and structure LLM outputs
β”‚ β”œβ”€β”€ token_utils.py # Token + cost estimation for budget-aware usage
β”‚ β”œβ”€β”€ summarizer.py # Core logic for constructing prompts, querying LLMs, and returning summaries
β”‚ β”œβ”€β”€ summary_manager.py # Controller layer that connects summarizer logic to UI/API/CLI layers
β”‚ β”œβ”€β”€ logger.py # Logging setup for monitoring and debugging
β”‚ └── rss_utils.py # Utilities for fetching, parsing, and filtering RSS feeds

This separation of concerns means:

  • I can update prompts without touching code
  • I can add new feed types or models without breaking the pipeline
  • I can run it as a CLI tool, a UI app, or an API – all using the same core logic (well its not that complicated so why not – also helps with HF deployment as i use only 1 space and call everything in my single app.py)

2. Separation of Private and Public

  • Private repo contains the core logic, prompts, API keys
  • Public repo just hosts summaries via GitHub Pages
  • Public HF Space proxies the private Gradio UI using a read token

This separation gives me full control over the logic + privacy, while letting others interact with the tool.

It took me some time to figure out what URL to use and how to set the token, you can refer to my public repo if you want to do similar stuff or just ping me πŸ™‚

3. Deployment

  • Backend API (FastAPI + Uvicorn) lets me trigger summaries via HTTP
  • Frontend uses Gradio, hosted on a private Hugging Face Space (cause WP didnt allow python app hosting that straightforward)
  • Public Proxy UI relays the private Gradio interface via token
  • GitHub Actions calls the HF API daily β†’ saves results β†’ publishes to GitHub Pages
  • WordPress Integration is just simple iFrames of those GitHub Pages

So when i deploy (if i dont change any keys or output format or endpoint urls) – i just need to commit my python code to HF spaces and everything else should be taken care of πŸ™‚

4. Security & Secrets

  • Secrets (OpenAI key, HF tokens) are managed via GitHub actions repo secrets and Hugging Face private environment variables (for gradio UI)
  • Actions and endpoints are protected, no direct exposure

πŸ§ͺ Try It

πŸ“ Daily News summaries
Astro
AI

πŸ§ͺ Play with the UI
Public Demo Space

πŸ“ Browse the Code / Run it locally
Public GitHub

πŸ—ƒοΈ Archived Summaries
Historical Feeds


πŸ”§ What’s Next?

  • Add custom RSS support in the Gradio UI (already works via config file)
  • Allow semantic search – search by topic and surface best content (need to make some functional changes, UI changes, add api fields)
  • Expose top research papers on the Blog as Latest Papers, not just news (already working in API and UI, only needs github actions and pages display)
  • Highlight source reliability, author credentials, and research impact scores – wasnt in my plan – but why not – have to think about it – probably going to add reputed journals and institute lookups instead of just arxiv

🌱 Final Thoughts

Well, I’ll be using it, improving it, let me know if you have any suggestions πŸ™‚


Share:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *