03-14 Update: GPT4's Here [ OpenAI's GPT-4 to Launch Next Week as "Multimodal" Network ]

5 replies
Microsoft Said GPT-4 Allows Processing & Generation of Text, Images, Audio & Videos

"Microsoft explained that GPT-4 would be 'multimodal'. Holger Kenn, Director of Business Strategy at Microsoft Germany, explained that this would allow the company's AI to translate a user's text into images, music, and video."
source: https://www.digitaltrends.com/comput...eek-ai-videos/

What is "Multimodal" in Deep Learning Context?

"multimodality" refers to the ability of a Deep Learning model to process different types of digital content as inputs, to generate outputs also as different digital media types, or both -- Be it during model training and / or during production deployment.
#gpt4 #launch #multimodal #network #openai #week
Avatar of Unregistered
  • Microsoft's Cosmos1 MLLM (Multimodal Large Language Model)
    PAPER (Technical Deep Learning Model Architecture, Training, Validation & Testing Details): https://arxiv.org/pdf/2302.14045.pdf

    QUOTE:

    The latest milestone in OpenAI's effort in scaling up deep learning.
    GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.

    /QUOTE

    • Paper: https://cdn.openai.com/papers/gpt-4.pdf
    • Wahtch Developer Demo Livestream Here (04:00 March 15 Philippine Standard Time): https://youtube.com/live/outcGtbnMuQ?feature=share

    source: https://openai.com/research/gpt-4
    Signature
    • Deep Learning & Machine Vision Engineer: ARIA Research (Sydney, AU)
    • Founder: Grayscale (Manila, PH) & SEO Campaign Manager: Kiteworks, Inc. (SF, US)
    {{ DiscussionBoard.errors[11748793].message }}
  • Very much excited to use ChatGPT-4 multimodal model.
    {{ DiscussionBoard.errors[11748802].message }}
  • Profile picture of the author DWolfe
    Marx, how did you like the Demonstration? Do you see any specific things that would help a new member with their Marketing?
    Signature


    You can earn 10% average annual returns on your investments - https://app.groundfloor.us/r/m2aa7b
    {{ DiscussionBoard.errors[11749086].message }}
    • @DWolfe,

      Originally Posted by DWolfe View Post

      Marx, how did you like the Demonstration? Do you see any specific things that would help a new member with their Marketing?
      Still in OpenAI's waiting list for GPT4 API usage access.
      But 3 things look promising (based solely on what was presented at their dev demo):

      1) Improved Factual Correctness;
      2) Less "Hallucinations"; and
      3) Bigger Context Limits / Single API Call ...

      Notes

      Third one can also take care of the other two, in case they're just hyping up "improved factual correctness" and "less 'hallucinations'".

      ** For example, in one API call -- We can ask GPT4 to create a report after analyzing a relevant page (in any language) with suitable niche content depth from a trusted source we supply, such as say a whitepaper or case study from a research group at a reputable university, or a government site, or Google Trends for a certain niche topic and keyword, etc. -- We can even supply it with multiple pages / content sources to analyze; and

      ** So lots of ideas are there already for programmatic content development, real time translation and assisted data analytics ...

      Because previous 2 to 3 prompts may not consume expanded token limit for succeeding API call -- This allows us devs to implement contextual memory, i.e. We can then do something programmatic for GPT4 to remember our previous input prompts and continue with the task while following succeeding prompts ...

      ** This can be useful for internal tool dev in content dev, real time translation and assisted data analytics, as well as for customer-facing tools like virtual agents for customer support, content moderation and management ...

      And, I'm hoping they release their image processing features soon, which they presented in their dev demo livestream -- For now, this is just with their partner (BeMyEyes, an app for the blind) ...

      ** This can be quite useful, i.e. "As an expert data analyst in the field of [enter field here], analyze this graph. Convert the data into a format that provides granular control over data points, such as a spreadsheet. Also provide recommendations and notes about the data, which can be helpful for [enter your objective here]."

      P.S. OpenAI's saying there are two larger token limit options for GPT4 API. First is 8K, which, considering average system and user prompts total to 300 to 400++ words, is around 1 to 1.2K++ words (based on my tests with multiple GPT Davinci v3.5 API calls just to hit this content depth). Second is 32K tokens (for a limited group in waiting list), which is around 4 to 4.8K+ words with the same average total words for system and user prompts ...
      Signature
      • Deep Learning & Machine Vision Engineer: ARIA Research (Sydney, AU)
      • Founder: Grayscale (Manila, PH) & SEO Campaign Manager: Kiteworks, Inc. (SF, US)
      {{ DiscussionBoard.errors[11749113].message }}
Avatar of Unregistered

Trending Topics